# JSON Feed Viewer

Browse through the showcased feeds, or enter a feed URL below.

Now supporting RSS and Atom feeds thanks to Andrew Chilton's feed2json.org service

CURRENT FEED

And now it’s all this

I just said what I said and it was wrong. Or was taken wrong.

JSON

# Timers, reminders, alarms—oh, my!

Permalink - Posted on 2018-02-18 04:19

I was shocked—shocked!—to see people disagree with my last post. I was even more shocked to learn about bizarre omission in the HomePod software. I decided to dig into the many ways you can set timed alerts on your Apple devices and how the alert systems vary from device to device. It is, you will not be surprised to learn, a mess.

Let’s start with the summary. In the table below, I’m comparing the features of the three alert types on iOS: Timers, Alarms, and Reminders. Included in the comparison is how certain features work (or don’t work) on the iPhone, iPad, Watch, Mac,1 and HomePod. Most of the entries for the HomePod are empty because I don’t have one to test, but I’ve included it because it was the device that got me started down this path. Also, there’s that software omission I want to talk about.

Timer Alarm Reminder
Number 1 ∞︎ ∞︎
Name/Description No Yes Yes
Autodelete Yes No Yes
Shared
iPhone Yes Yes Yes
Watch Yes Yes Yes
Mac No No Yes
HomePod ? ? No
Time left
iPhone Yes No No
Watch Yes No No
Mac No No No
HomePod ? ? ?
Time of

[If the formatting looks odd in your feed reader, visit the original article]

# Friendly reminders

Permalink - Posted on 2018-02-16 01:02

My vision of myself as a powerful thinkfluencer in the Apple world took a real beating this week. It seemed as if everyone who got a HomePod was complaining that it couldn’t set multiple timers. This is something I’ve written about a couple of times, going back four years. And I’ve explained the solution. Is this thing on?

Of course, four years ago, I wasn’t talking about the HomePod, I was talking about the iPhone, but the principle is the same. In iOS, the timer function is in the Clock app, and there’s only one. There’s no way to have two timers running simultaneously and no way to give your timer a name that lets you know what it’s for.

But you do have Reminders. They have names and can be set to alarm not only at an absolute time, but also at a relative time:

“Hey Siri, remind me to check the casserole in 20 minutes.”

This works on my iPhone, iPad, and Watch, and I assume—based on this article—that it would work on my HomePod if I had one. This is clearly Apple’s preferred solution to setting mulitple timers, each with a distinct name.

So I was frustrated to hear John Gruber and Paul Kafasis in the latest episode of The Talk Show complain about the multiple timer problem. They should both know how to use Reminders to solve this problem. So should Myke Hurley, who made the same complaint in the most recent Upgrade.

I understand where they’re coming from. If you’re an Amazon Echo user, you’re probably in the habit of saying something like

“Alexa, set a 20-minute timer for the casserole.”

Habits like that are hard to break, especially as you get older.1 But Apple users should be used to the idea that Apple has strong opinions about the right way to use its products and you’re usually better off not bucking the system.

You don’t like cluttering up your Reminders with hundreds of “check the casserole” and “check the tea” items? Even though you typically don’t see completed reminders? There is a solution.

In the past couple of days, the HomePod complaint industry has moved on from multiple timers to white rings. Cheaply made leather circles are already coming onto the market, but I’m going to suggest that high end furniture protection should come from lace doilies with tatting that complements the HomePod’s fabric pattern.

1. Myke is 30 now, so his brain has lost much of its former plasticity.

[If the formatting looks odd in your feed reader, visit the original article]

# LaTeX contact info through Workflow

Permalink - Posted on 2018-02-10 21:00

I’ve been writing more on my iPad recently; not just blog posts, but reports for work, too. Because I have a lot of helper scripts and macros built up over many years of working on a Mac, writing on the iPad is still slower. But I’m gradually building up a set of iOS tools and techniques to make the process go faster. Today’s post is about a Workflow I built yesterday with advice from iOS automation experts conveyed over Twitter.

For several years, I wrote reports for work using a Markdown→LaTeX→PDF workflow. For most of those years, it was rare for me to have to edit the LaTeX before turning it into a PDF. Recently, though, that rarity has disappeared, mainly because my reports have more tables and figures of varying size that need to be carefully positioned, something that can’t be done in Markdown. A few months ago I decided it would be more efficient to just write in LaTeX from the start. This wasn’t as big a change as you might think. I used to write in LaTeX directly, and the combination of TextExpander and a few old scripts I resurrected got me back up to speed relatively quickly—on the Mac, anyway.

On iOS, most of the TextExpander snippets I built for writing in LaTeX work fine, but the helper scripts, which tend to rely on AppleScript, don’t. One of the scripts I definitely wanted an iOS counterpart for was one that extracted the contact information from a client in a particular format. In my reports, the title page usually includes section for the name, company, and address of the client. This is added in the LaTeX source code by this:

tex:
\client{John Cheatham\\
Dewey, Cheatham \& Howe\\
1515 Loquitor Lane\\
Amicus OH 44100}


where \client is a LaTeX command I created long ago, and its argument needs the usual LaTeX double backslashes to designate line breaks. Also, ampersands, which are special characters in LaTeX, need to be escaped.

I thought I could whip something up in Workflow, but my limited understanding of Workflow isn’t conducive to whipping. When I first tried to put something together a couple of weeks ago, it looked to me as if I was going to have to painstakingly extract every piece of information from the selected contact, create variables to store them in, and then put those variables together into a new string of text. So I gave up.

Yesterday I decided to ask for help.

I would like to extract from a selected contact a standard name/address block as plain text:

Full Name
Company
City, ST Zip

I don’t think Contacts or Interact do this. Does anything?
— Dr. Drang (@drdrang) Fri Feb 9 2018 9:37 PM

As you can see, I asked for something a bit simpler than what I really wanted, and I was kind of expecting suggestions for an app that would do the trick. But I soon got a response from Ari Weinstein with a Workflow solution:

Since Ari is a co-developer of Workflow, I kind of figured he knew what he was talking about. But I didn’t, and it’s because I didn’t appreciate Workflow’s magic variables. I’ve always thought of Workflow as being almost like a functional language, where each action transforms the data passed to in and sends it along to the next action in turn. That, at least, is what I thought happened when the actions are connected by lines.

Which is why I didn’t understand Ari’s workflow at first. I figured that if it was extracting the Street Address in the second step, there’d be no way for it to get ahold of the Name and Company in the fourth step. What I didn’t appreciate was that there can be side effects the usual view of a workflow doesn’t show you. In this case, the Contact that’s selected in the first step is saved to a magic variable (called “Contact”) that remains available for use in later steps. So the third and fourth steps have access to all the Contact information even after the extraction of the Street Address in the second step.

Ari’s sample is a standard workflow that would have to be run from within Workflow itself or from a launcher app like Launch Center Pro. I was thinking about how to turn it into an Action Extension that could be called from within Contacts when I noticed I had a Twitter reply from Federico Viticci:

His suggestion is set up as an Action Extension that accepts only Contacts and extracts the info from the Workflow Input magic variable. Just what I was going to do.

“My” final workflow, called , combines what I learned from Ari and Federico and adds some search-and-replace stuff to handle the LaTeX-specific parts:

The first two steps create a text variable named Ret that consists of a single line break. We’ll see why I needed it in a bit.

Steps 3–5 are the Ari/Federico mashup. I couldn’t use Federico’s suggestion to just add Workflow input:Street Address to the end of the block because my contacts usually include the country, even though the country is almost always the US, and I didn’t want that at the end of the block. At some point, I’ll improve this by writing up a filter that deletes the country line only if it’s the US, but this will do until I get another job with a non-US client.

Step 6 escapes the ampersands, and Step 7 adds the double backslashes to the ends of each line. You need four backslashes to get two in the output because regexes need two to produce one. I thought I could use \n at the end of the replacement string to get a line break, but I couldn’t get that to work. Thus, the Ret variable defined at the beginning of the workflow.

Finally, Step 8 puts the text on the clipboard, ready for pasting into a LaTeX document.

My plan is to use this extension in Split View, with my text editor, currently Textastic, on one side and Contacts on the other. When I need to insert the client info, I find it in Contacts, tap Share Contact to bring up the Sharing Sheet, and select the Run Workflow action.

This brings up the list of Workflow Action Extensions that can accept Contacts. I choose LaTeX Address from the list, switch focus back to Textastic, and paste the text block where it belongs. Boom.

I’ll try to remember to look for magic variables the next time I make a workflow. There is a trick to making them visible. When you’re editing a workflow and can insert a variable (magic or otherwise), a button with a magic wand will appear in the special keyboard row.

Tapping it will give you a new view of your workflow, with the magic variables appearing where the workflow creates them.

You don’t need to do this, as all of these variables should appear in the special keyboard row if you keep scrolling it to the right. But I find it easier to understand what they are and where they come from in this view.

Thanks to everyone who had suggestions for me, especially Ari and Federico.

[If the formatting looks odd in your feed reader, visit the original article]

Permalink - Posted on 2018-02-04 18:52

As promised, or threatened, here’s my setup for RSS feed reading. It consists of a few scripts that run periodically throughout the day on a server I control and which is accessible to me from any browser on any device. The idea is to have a system that fits the way I read and doesn’t rely on any particular service or company. If my current web host went out of business tomorrow, I could move this system to another and be back up and running in an hour or so—less time than it would take to research and decide on a new feed reading service.

The linchpin of the system is the getfeeds script:

python:
1:  #!/usr/bin/env python
2:  # coding=utf8
3:
4:  import feedparser as fp
5:  import time
6:  from datetime import datetime, timedelta
7:  import pytz
8:  from collections import defaultdict
9:  import sys
10:  import dateutil.parser as dp
11:  import urllib2
12:  import json
13:  import sqlite3
14:  import urllib
15:
17:    add = 'insert into items (blog, id) values (?, ?)'
19:    db.commit()
20:
21:  jsonsubscriptions = [
22:    'http://leancrew.com/all-this/feed.json',
23:    'https://daringfireball.net/feeds/json',
24:    'https://sixcolors.com/feed.json',
25:    'https://www.robjwells.com/feed.json',
26:    'http://inessential.com/feed.json',
27:    'https://macstories.net/feed/json']
28:
29:  xmlsubscriptions = [
30:    'http://feedpress.me/512pixels',
31:    'http://alicublog.blogspot.com/feeds/posts/default',
32:    'http://blog.ashleynh.me/feed',
33:    'http://www.betalogue.com/feed/',
34:    'http://bitsplitting.org/feed/',
35:    'https://kieranhealy.org/blog/index.xml',
37:    'http://brett.trpstra.net/brettterpstra',
38:    'http://feeds.feedburner.com/NerdGap',
40:    'http://feeds.feedburner.com/CommonplaceCartography',
41:    'http://kk.org/cooltools/feed',
42:    'https://david-smith.org/atom.xml',
43:    'http://feeds.feedburner.com/drbunsenblog',
44:    'http://stratechery.com/feed/',
45:    'http://feeds.feedburner.com/IgnoreTheCode',
46:    'http://indiestack.com/feed/',
47:    'http://feeds.feedburner.com/theendeavour',
48:    'http://feed.katiefloyd.me/',
49:    'http://feeds.feedburner.com/KevinDrum',
52:    'http://www.macdrifter.com/feeds/all.atom.xml',
53:    'http://mackenab.com/feed',
56:    'http://themindfulbit.com/feed.xml',
57:    'http://merrillmarkoe.com/feed',
58:    'http://mjtsai.com/blog/feed/',
62:    'http://www.practicallyefficient.com/feed.xml',
63:    'http://www.red-sweater.com/blog/feed/',
64:    'http://blog.rtwilson.com/feed/',
65:    'http://feedpress.me/candlerblog',
66:    'http://inversesquare.wordpress.com/feed/',
67:    'http://joe-steel.com/feed',
68:    'http://feeds.veritrope.com/',
69:    'https://with.thegra.in/feed',
70:    'http://xkcd.com/atom.xml',
72:
73:  # Feedparser filters out certain tags and eliminates them from the
74:  # parsed version of a feed. This is particularly troublesome with
75:  # embedded videos. This can be fixed by changing how the filter
76:  # works. The following is based these tips:
77:  #
78:  # http://rumproarious.com/2010/05/07/\
79:  #  universal-feed-parser-is-awesome-except-for-embedded-videos/
80:  #
81:  # http://stackoverflow.com/questions/30353531/\
83:  #
84:  # There is some danger here, as the included elements may contain
85:  # malicious code.
86:  fp._HTMLSanitizer.acceptable_elements |= {'object', 'embed', 'iframe'}
87:
88:  # Connect to the database of read posts.
90:  query = 'select * from items where blog=? and id=?'
91:
92:  # Collect all unread posts and put them in a list of tuples. The items
93:  # in each tuple are when, blog, title, link, body, n, and author.
94:  posts = []
95:  n = 0
96:
97:  # We're not going to accept items that are more than 3 days old, even
98:  # if they aren't in the database of read items. These typically come up
99:  # when someone does a reset of some sort on their blog and regenerates
100:  # a feed with old posts that aren't in the database or posts that are
101:  # in the database but have different IDs.
102:  utc = pytz.utc
103:  homeTZ = pytz.timezone('US/Central')
104:  daysago = datetime.today() - timedelta(days=3)
105:  daysago = utc.localize(daysago)
106:
108:  for s in jsonsubscriptions:
109:    try:
112:      blog = jfeed['title']
113:      for i in jfeed['items']:
114:        try:
115:          id = i['id']
116:        except KeyError:
117:          id = i['url']
118:
120:        match = db.execute(query, (blog, id)).fetchone()
121:        if not match:
122:          try:
123:            when = i['date_published']
124:          except KeyError:
125:            when = i['date_modified']
126:          when = dp.parse(when)
127:          when = utc.localize(when)
128:
129:          try:
130:            author = ' ({})'.format(i['author']['name'])
131:          except KeyError:
132:            author = ''
133:          try:
134:            title = i['title']
135:          except KeyError:
136:            title = blog
138:          body = i['content_html']
139:
140:          # Include only posts that are less than 3 days old. Add older posts
141:          # to the read database.
142:          if when > daysago:
143:            posts.append((when, blog, title, link, body, "{:04d}".format(n), author, id))
144:            n += 1
145:          else:
147:    except:
148:      pass
149:
151:  for s in xmlsubscriptions:
152:    try:
153:      f = fp.parse(s)
154:      try:
155:        blog = f['feed']['title']
156:      except KeyError:
157:        blog = "---"
158:      for e in f['entries']:
159:        try:
160:          id = e['id']
161:          if id == '':
163:        except KeyError:
165:
167:        match = db.execute(query, (blog, id)).fetchone()
168:        if not match:
169:
170:          try:
171:            when = e['published_parsed']
172:          except KeyError:
173:            when = e['updated_parsed']
174:          when =  datetime(*when[:6])
175:          when = utc.localize(when)
176:
177:          try:
178:            title = e['title']
179:          except KeyError:
180:            title = blog
181:          try:
182:            author = " ({})".format(e['authors'][0]['name'])
183:          except KeyError:
184:            author = ""
185:          try:
186:            body = e['content'][0]['value']
187:          except KeyError:
188:            body = e['summary']
190:
191:          # Include only posts that are less than 3 days old. Add older posts
192:          # to the read database.
193:          if when > daysago:
194:            posts.append((when, blog, title, link, body, "{:04d}".format(n), author, id))
195:            n += 1
196:          else:
198:    except:
199:      pass
200:
201:  # Sort the posts in reverse chronological order.
202:  posts.sort()
203:  posts.reverse()
205:  for p in posts:
207:
208:  # Create an HTML list of the posts.
209:  listTemplate = '''<li>
210:    <p class="title" id="{5}"><a href="{3}">{2}</a></p>
211:    <p class="info">{1}{6}<br />{0}</p>
212:    <p>{4}</p>
214:      <input type="hidden" name="blog" value="{8}" />
215:      <input type="hidden" name="id" value="{9}" />
217:    </form>
218:    <br />
220:      <input type="hidden" name="url" value="{11}" />
221:      <input type="hidden" name="title" value="{10}" />
222:      <input class="pinboard-field" type="text" name="tags" size="30" /><br />
223:      <input class="pinboard-button" type="submit" value="Pinboard" name="pbbutton{5}" />
224:    </form>
225:    </li>'''
226:  litems = []
227:  for p in posts:
228:    q = [ x.encode('utf8') for x in p[1:] ]
229:    timestamp = p[0].astimezone(homeTZ)
230:    q.insert(0, timestamp.strftime('%b %d, %Y %I:%M %p'))
231:    q += [urllib.quote_plus(q[1]),
232:          urllib.quote_plus(q[7]),
233:          urllib.quote_plus(q[2]),
234:          urllib.quote_plus(q[3])]
235:    litems.append(listTemplate.format(*q))
236:  body = '\n<hr />\n'.join(litems)
237:
239:  tocTemplate = '''<li class="toctitle"><a href="#{1}">{0}</a></li>\n'''
240:  toc = ''
242:  blogs.sort()
243:  for b in blogs:
244:    toc += '''<p class="tocblog">{0}</p>
246:    '''.format(b.encode('utf8'))
248:      q = [ x.encode('utf8') for x in p ]
249:      toc += tocTemplate.format(*q)
250:    toc += '</ul>\n'
251:
252:  # Print the HTMl.
253:  print '''<html>
254:  <meta charset="UTF-8" />
255:  <meta name="viewport" content="width=device-width" />
257:  <style>
258:  body {{
259:    background-color: #555;
260:    width: 750px;
261:    margin-top: 0;
262:    margin-left: auto;
263:    margin-right: auto;
265:    font-family: Georgia, Serif;
266:  }}
267:  h1, h2, h3, h4, h5, h6 {{
268:    font-family: Helvetica, Sans-serif;
269:  }}
270:  h1 {{
271:    font-size: 110%;
272:  }}
273:  h2 {{
274:    font-size: 105%;
275:  }}
276:  h3, h4, h5, h6 {{
277:    font-size: 100%;
278:  }}
279:  .content {{
281:    background-color: white;
282:  }}
284:    list-style-type: none;
285:    margin: 0;
286:    padding: .5em 1em 1em 1.5em;
287:    background-color: white;
288:  }}
290:    margin-left: -.5em;
291:    line-height: 1.4;
292:  }}
294:    overflow: auto;
295:  }}
297:    overflow-wrap: break-word;
298:    word-wrap: break-word;
299:    word-break: break-word;
300:    -webkit-hyphens: auto;
301:    hyphens: auto;
302:  }}
304:    -webkit-margin-before: 0;
305:    -webkit-margin-after: 0;
306:    -webkit-margin-start: 0;
307:    -webkit-margin-end: 0;
308:  }}
309:  .title {{
310:    font-weight: bold;
311:    font-family: Helvetica, Sans-serif;
312:    font-size: 120%;
313:    margin-bottom: .25em;
314:  }}
315:  .title a {{
316:    text-decoration: none;
317:    color: black;
318:  }}
319:  .info {{
320:    font-size: 85%;
321:    margin-top: 0;
322:    margin-left: .5em;
323:  }}
324:  .tocblog {{
325:    font-weight: bold;
326:    font-family: Helvetica, Sans-serif;
327:    font-size: 100%;
328:    margin-top: .25em;
329:    margin-bottom: 0;
330:  }}
331:  .toctitle {{
332:    font-weight: medium;
333:    font-family: Helvetica, Sans-serif;
334:    font-size: 100%;
336:    text-indent: -.75em;
337:    margin-bottom: 0;
338:  }}
339:  .toctitle a {{
340:    text-decoration: none;
341:    color: black;
342:  }}
343:  .tocinfo {{
344:    font-size: 75%;
345:    margin-top: 0;
346:    margin-left: .5em;
347:  }}
348:  img, embed, iframe, object {{
349:    max-width: 700px;
350:  }}
351:  .mark-button {{
352:    width: 15em;
353:    border: none;
355:    color: black;
356:    background-color: #B3FFB2;
357:    text-align: center;
358:    padding: .25em 0 .25em 0;
359:    font-weight: bold;
360:    font-size: 1em;
361:  }}
362:  .pinboard-button {{
363:    width: 7em;
364:    border: none;
366:    color: black;
367:    background-color: #B3FFB2;
368:    text-align: center;
369:    padding: .25em 0 .25em 0;
370:    font-weight: bold;
371:    font-size: 1em;
372:    margin-left: 11em;
373:  }}
374:  .pinboard-field {{
375:    font-size: 1em;
376:    font-family: Helvetica, Sans-serif;
377:  }}
378:
379:  @media only screen
380:    and (max-width: 667px)
381:    and (-webkit-device-pixel-ratio: 2)
382:    and (orientation: portrait) {{
383:    body {{
384:      font-size: 200%;
385:      width: 640px;
386:      background-color: white;
387:    }}
389:      line-height: normal;
390:    }}
391:    img, embed, iframe, object {{
392:      max-width: 550px;
393:    }}
394:  }}
395:  @media only screen
396:    and (min-width: 668px)
397:    and (-webkit-device-pixel-ratio: 2) {{
398:    body {{
399:      font-size: 150%;
400:      width: 800px;
401:      background-color: #555;
402:    }}
404:      line-height: normal;
405:    }}
406:    img, embed, iframe, object {{
407:      max-width: 700px;
408:    }}
409:  }}
410:  </style>
411:
412:  <script language=javascript type="text/javascript">
414:    var mark = new XMLHttpRequest();
415:    mark.open(theForm.method, theForm.action, true);
416:    mark.send(new FormData(theForm));
418:      if (mark.readyState == 4 && mark.status == 200) {{
420:        var theButton = document.getElementsByName(buttonName)[0];
421:        theButton.value = "Marked!";
422:        theButton.style.backgroundColor = "#FFB2B2";
423:      }}
424:    }}
425:    return false;
426:  }}
427:
429:    var mark = new XMLHttpRequest();
430:    mark.open(theForm.method, theForm.action, true);
431:    mark.send(new FormData(theForm));
433:      if (mark.readyState == 4 && mark.status == 200) {{
434:        var buttonName = theForm.name.replace("pbform", "pbbutton");
435:        var theButton = document.getElementsByName(buttonName)[0];
436:        theButton.value = "Saved!";
437:        theButton.style.backgroundColor = "#FFB2B2";
438:      }}
439:    }}
440:    return false;
441:  }}
442:
443:  </script>
444:
447:  <body>
448:  <div class="content">
450:  {}
451:  </ul>
452:  <hr />
453:  <a name="start" />
455:  {}
456:  </ul>
457:  </div>
458:  </body>
459:  </html>
460:  '''.format(toc, body)


For me, this is a very long script, but most of it is just the HTML template. What getfeeds does is go through my subscription list, gather all the articles from those feeds that I haven’t already read, and generate a static HTML file with the unread articles laid out in reverse chronological order. At the end of each article, it puts a button to mark the article as read and a form for adding a link to the article to my account at Pinboard.

Start by noticing that this is a Python 2 script, so Line 2 is a comment that tells Python that UTF-8 characters will be in the source code. We’ll also run into decode/encode invocations that wouldn’t be necessary if I’d written this in Python 3. I suppose I’ll translate it at some point.

Lines 16–19 are a function for adding an article to the database of read items. This is an SQLite database that’s also kept on the server. The database has a single table whose schema consists of just two fields: the blog name and the article GUID. Each article that I’ve marked as read gets entered as a new record in the database. The addItem function runs a simple SQL insertion command via Python’s sqlite3 library.

Lines 21–27 and 29–71 define my subscriptions: two lists of feed URLs, one for JSON feeds and the other for traditional RSS/Atom feeds. A lot of these feeds have gone silent over the past year, but I remain subscribed to them in the hope that they’ll come back to life.

Line 86 sets a parameter in the feedparser library that relaxes some of the filtering that library does by default. There is some danger to this, but I’ve found that some blogs are essentially worthless if I don’t do this. The comments above Line 86 contain links to discussions of feedparser’s filtering.

Lines 89–90 connect to the database of read items (note the fake path to the database file) and create a query string that we’ll use later to determine whether an article is in the database.

Lines 94–95 initialize the list of posts that will ultimately be turned into the HTML page and the n variable that keeps track of the post count.

Lines 102–105 initialize a set of variables used to handle timezone information and the filtering of older articles that aren’t in the database of read items. As discussed in the comments above Line 102 and in my previous post, old articles that aren’t in the database can sometimes appear in a blog’s RSS feed when the blog gets updated.

Lines 108–148 assemble the unread articles from the JSON feeds. For each subscription, the feed is downloaded, converted into a dictionary, and run through to extract information on each article. Articles that are in the database of read items are ignored (Lines 120-121). Articles that aren’t in the database are appended to the posts list, unless they’re more than three days old, in which case they are added to the database of read items instead of to posts (Lines 142–146).

Much of Lines 108–148 is devoted to error handling and the normalization of disparate input into a uniform output. Each item of the posts list is a tuple with

• the article date,
• the blog name,
• the article title,
• the article URL,
• the article content,
• the running count of posts,
• the article author, and
• the article GUID.

Lines 151–199 do for RSS/Atom feeds what Lines 108–148 do for JSON feeds. The main difference is that the feedparser library is used to download and convert the feed into a dictionary.

Lines 202–203 sort the posts in reverse chronological order. This is made easy by my choice to put the article date as the first item in the tuple described above.

Lines 204–206 generate a dictionary of lists of tuples, toclinks, for the HTML page’s table of contents, which appears at the top of the page. A table of contents isn’t really necessary, but I like seeing an overview of what’s available before I start reading. The keys of the dictionary are the blog names, and each tuple in the list consists of the article’s title and its number, as given in the running post count, n. The number will be used to create internal links in the HTML page.

From this point on, it’s all HTML templating. I suppose I could’ve used one of the myriad Python libraries for this, but I didn’t feel like doing the research to figure out which would be best for my needs. The ol’ format command works pretty well.

Lines 209–225 define the template for each article. It starts with the title (which links to the original article), the date, and the author. The id attribute in the title provides the internal target for the link in the table of contents. After the post contents come two forms. The first has two hidden fields with the blog name and the article GUID and a visible button that marks the article as read. The second form has the same hidden fields, a visible text field for Pinboard tags, and button to add a link to the original article to my Pinboard list. We’ll see later how these buttons work.

Lines 227–236 concatenate all of the posts, though their template, into one long stretch of HTML that will make up the bulk of the body of the page.

Line 239 defines a template for a table of contents entry (note the internal link), and Lines 240–250 then use that template to assemble the toclinks dictionary into the HTML for the table of contents.

The last piece, Lines 253–460, assembles and outputs the final, full HTML file. It’s as long as it is because I wanted a single, self-contained file with all the CSS and JavaScript in it. I’m sure this doesn’t comport with best practices, but I’ve noticed that best practices in web programming and design change more often than I have time to keep track of. Whenever I need to change something, I know it’ll be here in getfeeds.

The CSS is in Lines 257–410 and is set up to look decent (to me) on my computer, iPad, and iPhone. There’s a lot I don’t know about responsive web design, and I’m sure it shows here.

Lines 412–426 and Lines 428–441 define the markAsRead and addToPinboad JavaScript functions, which are activated by the buttons described above. These are basic AJAX functions that do not rely on any outside library. They’re based on what I read in David Flanagan’s JavaScript: The Definitive Guide and, I suspect, a Stack Overflow page or two that I forgot to preserve the links to. There’s a decent chance they don’t work in Internet Explorer, which I will worry about in the next life.

The markAsRead function triggers this addreaditem.py script on the server:

python:
1:  #!/usr/bin/python
2:  # coding=utf8
3:
4:  import sqlite3
5:  import cgi
6:  import sys
7:  import urllib
8:  import cgitb
9:
11:    add = 'insert into items (blog, id) values (?, ?)'
13:    db.commit()
14:
15:  def markedItem(db, blog, id):
16:    check = 'select * from items where blog=? and id=?'
17:    return db.execute(check, (blog, id)).fetchone()
18:
19:  # Connect to database of read items
21:
22:  # Get the item from the request and add it to the database
23:  form = cgi.FieldStorage()
24:  blog = urllib.unquote_plus(form.getvalue('blog')).decode('utf8')
25:  id = urllib.unquote_plus(form.getvalue('id')).decode('utf8')
26:  if markedItem(db, blog, id):
28:  else:
31:
32:  minimal='''Content-Type: text/html
33:
34:  <html>
37:  <body>
38:    <h1>{}</h1>
39:  </body>
41:
42:  print(minimal)


There’s not much to this script. It uses the same addItem function we saw before and a markedItem function uses the same query we saw earlier to check if an item is in the database. Lines 23–30 get the input from the form that called it, check whether that item is already in the database, and add it if it isn’t. There’s some minimal HTML for output, but that’s of no importance. What matters is that if the script returns a success, the markAsRead function changes the color of the button from green to red and the text of the button from “Mark as read” to “Marked!”

Before:

After:

The addToPinboard JavaScript function does essentially the same thing, except it triggers this addpinboarditem.py script on the server:

python:
1:  #!/usr/bin/python
2:  # coding=utf8
3:
4:  import cgi
5:  import pinboard
6:  import urllib
7:
8:  # Pinboard token
9:  token = 'myPinboardName:myPinboardToken'
10:
11:  # Get the page info from the request
12:  form = cgi.FieldStorage()
13:  url = urllib.unquote_plus(form.getvalue('url')).decode('utf8')
14:  title = urllib.unquote_plus(form.getvalue('title')).decode('utf8')
15:  tagstr = urllib.unquote_plus(form.getvalue('tags')).decode('utf8')
16:  tags = tagstr.split()
17:
18:  # Add the item to Pinboard
19:  pb = pinboard.Pinboard(token)
20:  result = pb.posts.add(url=url, description=title, tags=tags)
21:  if result:
23:  else:
25:
26:  minimal='''Content-Type: text/html
27:
28:  <html>
31:  <body>
32:    <h1>{}</h1>
33:  </body>
35:
36:  print(minimal)


This script uses the Pinboard API to add a link to the original article. Line 9 defines my Pinboard credentials. Lines 12–16 extract the article and tag information from the form. Lines 19–24 connect to Pinboard and add the item to my list. If the script returns a success, the addToPinboard function changes the color of the button from green to red and the text of the button from “Pinboard” to “Saved!”

Before:

After:

The overall system is controlled by this short shell script, runrss.sh:

bash:
1:  #!/bin/bash
2:
4:  cd /other/path/to


Line 3 runs the getfeeds script, sending the HTML output to a temporary file. Line 4 then changes to the directory that contains the temporary file, and Line 5 renames it. The file I direct my browser to is rsspage.html. This seeming extra step with the temporary file is there because the getfeeds script takes several seconds to run, and if I sent its output directly to rsspage.html, that file would be in a weird state during that run time. I don’t want to browse the page when it isn’t finished.

Finally, runrss.sh is executed periodically throughout the day by cron. The crontab entry is

*/20 0,6-23 * * * /path/to/runrss.sh


This runs the script every 20 minutes from 6:00 am through midnight every day.

So that’s it. Three Python scripts, one of which is long but mostly HTML templating, a short shell script, and a crontab entry. Was it easier to do this than set up a Feedbin (or whatever) account? Of course not. But I won’t have to worry if I see that Feedbin’s owners have written a Medium post.

[If the formatting looks odd in your feed reader, visit the original article]

Permalink - Posted on 2018-02-02 02:28

I had a bit of shock this afternoon when I opened my RSS feed reader to see if anything was new.

Not much new, but a lot that’s old. Over 1400 posts from Kieran Healy, holder of the Krzyzewski Chair in Sociological R at the second best basketball university in North Carolina and author of a much-anticipated forthcoming book on how to make good graphs.

What happened? I don’t know for sure, but something in Kieran’s site generation software decided to include every post he’s written in his blog’s RSS feed. It’s an impressive body of work, going back to 2002, but I didn’t have time during my lunch hour to read it all.

My homemade feed reader works like this. For every site I subscribe to, it

2. checks each article against a SQLite database of articles I’ve already read; and

After going through all the subscriptions, the script sorts the unread articles in alphabetical order and arranges them in a static HTML page on my server, adding a table of contents to the top of the page. The script runs via a cron job a few times an hour from 6:00 am until midnight.

So many of Kieran’s posts appeared today because my database of read posts is relatively young and only the last dozen or so of his articles are in it. It was all the earlier ones that were on my feed reader page.

This is my fault, not Kieran’s. I knew perfectly well when I wrote my script that blogging software will sometimes regenerate its feed with all new GUIDs for each article. When this happens, it makes the articles look new to the feed reader. I’d seen this happen even back when I was using professionally written feed reading apps. What made this especially troublesome for my definitely-not-professionally-written feed reading system was that it’s not equipped with a “Mark all as read” button. Which gave me three choices:

1. Do the programming to add a “Mark all as read” button, something I will almost never use.
2. Go through and individually mark all 1400 old posts as read so they get entered into the database and don’t appear again. Fat chance.
3. Figure out another way to add all these posts to the database.
4. Change my feed reading script to just ignore articles that are more than a few days old, regardless of whether they’re in the database.

I chose #4 because it was the quickest to implement and should protect me against this kind of thing happening again. Kieran’s older posts disappeared from my feed reading page, and my blog reading went back to normal. Afterward, though, I realized that I could have implemented #3 in combination with #4, ignoring the older articles for the purposes of assembing the feed reading page but adding them to the database of read articles to give me added protection against seeing them pop up again.

I’ll try to get that working in the next day or two and then post the script in its final form. I doubt that many people really want to set up their own feed reading system, but you never know.

[If the formatting looks odd in your feed reader, visit the original article]

# Subplots, axes, Matplotlib, OmniGraffle, and LaTeXiT

Permalink - Posted on 2018-01-30 03:36

When I learn something new in Matplotlib, I usually write a short post about it to reinforce what I’ve learned and to give me a place to look it up when I need to do it again. In my section properties post from last week, I had a 2×2 set of plots that helped explain which arctangent result I wanted to choose under different circumstances.

Here’s the plot:

And here’s the code that made most of it:

python:
1:  #!/usr/bin/env python
2:
3:  import matplotlib.pyplot as plt
4:  import numpy as np
5:
6:  x = np.linspace(-3, 3, 101)
7:  y1 = (10+6)/2 - (10-6)/2*np.cos(2*x) - 3*np.sin(2*x)
8:  y2 = (10+6)/2 - (6-10)/2*np.cos(2*x) - 3*np.sin(2*x)
9:  y3 = (10+6)/2 - (6-10)/2*np.cos(2*x) + 3*np.sin(2*x)
10:  y4 = (10+6)/2 - (10-6)/2*np.cos(2*x) + 3*np.sin(2*x)
11:
12:  f, axarr = plt.subplots(2, 2, figsize=(8, 8))
13:  axarr[0, 0].plot(x, y2, lw=2)
14:  axarr[0, 0].axhline(y=2, color='k', lw=1)
15:  axarr[0, 0].axvline(x=0, color='k')
16:  axarr[0, 0].set_ylim(0, 12)
17:  axarr[0, 0].set_xticks([])
18:  axarr[0, 0].set_yticks([])
19:  axarr[0, 0].set_frame_on(False)
20:
21:  axarr[0, 1].plot(x, y1, lw=2)
22:  axarr[0, 1].axhline(y=2, color='k', lw=1)
23:  axarr[0, 1].axvline(x=0, color='k')
24:  axarr[0, 1].set_ylim(0, 12)
25:  axarr[0, 1].set_xticks([])
26:  axarr[0, 1].set_yticks([])
27:  axarr[0, 1].set_frame_on(False)
28:
29:  axarr[1, 0].plot(x, y3, lw=2)
30:  axarr[1, 0].axhline(y=2, color='k', lw=1)
31:  axarr[1, 0].axvline(x=0, color='k')
32:  axarr[1, 0].set_ylim(0, 12)
33:  axarr[1, 0].set_xticks([])
34:  axarr[1, 0].set_yticks([])
35:  axarr[1, 0].set_frame_on(False)
36:
37:  axarr[1, 1].plot(x, y4, lw=2)
38:  axarr[1, 1].axhline(y=2, color='k', lw=1)
39:  axarr[1, 1].axvline(x=0, color='k')
40:  axarr[1, 1].set_ylim(0, 12)
41:  axarr[1, 1].set_xticks([])
42:  axarr[1, 1].set_yticks([])
43:  axarr[1, 1].set_frame_on(False)
44:


What was new to me was the use of the pyplot.subplots function to generate both the overall figure and the grid of subplots in one fell swoop. It’s possible that this technique was new to me because the documentation for Matplotlib’s Pyplot API doesn’t contain an entry for subplots.1 I don’t remember where I first learned about it—Stack Overflow would be a good guess—but I’ve since learned that pyplot.subplots is basically a combination of pyplot.figure and Figure.subplots.

Lines 6–10 define the four functions to be plotted. The x values are the same for each and the y values are named according to the quadrant they’re going to appear in. The y values are defined so the moments and product of inertia match the annotations shown in the graph. The actual numbers used in these definitions are less important than their signs and their relative magnitudes, as the plots are intended to be generic.

Line 12 then defines the figure and the array of “axes,” where you have to remember that Matplotlib unfortunately uses that word in a way that doesn’t fit the rest of the world’s usage. In Matplotlib, “axes” is usually treated as a singular noun and refers to the area of an individual plot. After Line 12, the axarr variable is a 2×2 array of Matplotlib axes.

Lines 13–19 then define the subplot in the upper left quadrant (what you learned as Quadrant II in analytic geometry class). Line 19 turns off the usual plot frame, and Lines 17–18 ensure there are no tick marks or labels. Lines 14—15 draw the $x$ and $y$ axes (here I’m using the normal definition of the word). You’ll notice that I’ve drawn the $x$ axis at $y = 2$ instead of $y = 0$. I didn’t like the way the graphs looked with the $x$ axis lower, so I moved it up. Again, this doesn’t change the meaning behind the graph because it’s generic.

The rest of the lines down through 43 are just repetitions for the the other quadrants. Finally, Line 45 saves the figure to a PDF file that looks like this:

Now it’s time to annotate the figure. In theory, I could do this in Matplotlib, but that’s a lot of programming for something that’s more visual than algorithmic. If I were making dozens of these figures, I’d probably invest the time in annotating them in Matplotlib, but for a one-off it’s much faster to do it in OmniGraffle.

I can open the PDF directly in OmniGraffle and start editing. First, I select the white background rectangle that’s usually included in files like this and delete it. It doesn’t add anything, and it’s too easy to select by mistake. Then I select all the axes (again, the usual definition) and add the arrowheads.

The command is very helpful in selecting repeated elements like this.

After placing red circles at the maxima, it was time to label the axes (yes, usual definition; we’re out of Matplotlib now) and add the annotations. I made the annotations in LaTeXiT, a very nice little program for generating equations to be pasted into graphics programs. I’ve been using it for ages.

LaTeXiT cleverly ties into your existing LaTeX installation, so you can take advantage of all the packages you’re used to having available. I usually have LaTeXiT use the Arev package because I like its sans-serif look in figures.

After adding all the annotations, I export the figure from OmniGraffle as a PNG, run it through OptiPNG to save a little bandwidth, and upload it to the server. If this were a figure for a report instead of the blog, I’d export it as a PDF.

1. I’ve complained about Matplotlib’s documentation before, so I’ll spare you the rant this time.

[If the formatting looks odd in your feed reader, visit the original article]

# Canvas and my remote iPad

Permalink - Posted on 2018-01-27 15:48

My older son’s notebook computer, an Asus bought a couple of years ago, has developed a hinge problem that’s reached the point where he doesn’t want to take it to class for fear of it falling apart. After talking over his needs, we decided he could get through the semester with my old MacBook Air. So I set it up for a new user, moved all of my files to an external disk, and delivered it to him yesterday. Coincidentally, on the drive back up through central Illinois, I listened to an episode of Canvas that gave decent explanation of why I could give up my notebook computer.

You could, of course, argue the every episode of Canvas is an explanation of how you can give up your notebook computer. It’s the podcast in which Federico Viticci and Fraser Speirs cover the software and work habits that allow you to use your iOS devices (especially the iPad) to accomplish things you might otherwise think you need a “real computer” to do. But Episode 52 was especially apropos because it covered SSH clients for iOS, which are the reason I feel comfortable in my current state, without a laptop computer for the first time in maybe 25 years or more.

I held off getting an iPad for several years, not because I thought it was a toy or a “consumption only” device, but because my work habits—lots of scripting and command-line use in a multi-window environment—weren’t aligned with the iPad’s strengths. I like to think I wasn’t an anti-iPad zealot during this time. I saw it as the perfect computer for many people, including my wife. I got her an iPad 2 back in 2011; she hasn’t touched a “real computer” since.

So when Split Screen and the iPad Pro were introduced, my ears pricked up. I got the 9.7″ model in late 2016 and have been slowly figuring out how to work with it. Panic’s Prompt and, more recently, the [mosh]1 client Blink Shell are my key apps. My typical setup is to have one of them on the right in Split Screen, connected to my iMac, while I edit in Textastic on the left. This edit/test system on my iPad is very similar to the BBEdit/Terminal window arrangement I use when working on a Mac.

The irony of using a modern, highly graphical device like the iPad to handle a remote, command-line connection to another computer is not lost on me. I often think back to using the Hazeltine terminal that was in a room around the corner from my graduate school office to connect to a Cyber 175 mainframe. And when my iPad is tethered to my iPhone, it’s not unlike using the Hazeltine’s acoustic coupler.

1. Mosh, the mobile shell, is a secure connection like SSH, but is designed to handle more gracefully the interrupted communications common to mobile connections.

[If the formatting looks odd in your feed reader, visit the original article]

# Transforming section properties and principal directions

Permalink - Posted on 2018-01-26 02:26

The Python section module has one last function we haven’t covered: the determination of the principal moments of inertia and the axes associated with them. We’ll start by looking at how the moments and products of inertia change with our choice of axes.

The formula for area,

will give the same answer regardless of where we put the origin of the $x\text{-}y$ coordinate system or how we orient them. You can see this if you think of the area as being the sum of all the little $dx\, dy$ squares in the cross-section.

The formulas for the location of the centroid,

will give different answers for different positions and orientations of the $x$ and $y$ axes, but those answers will all correspond to same physical point of the cross-section.

The moments and product of inertia as we defined them, relative to the centroid,

do not depend on the position of the $x\text{-}y$ origin (because $x - x_c$ and $y - y_c$ measure the horizontal and vertical distances away from the centroid, which is the same for any origin), but do depend on the orientation of the axes. We’ll show how this works by putting the origin at the centroid (which simplifies the math but does not make the results any less general) and comparing the moments and product of inertia for two coordinate systems, one of which is rotated relative to the other.

Note that $\theta$ is the angle from the $x$ axis to the $\xi$ axis and is positive in the counterclockwise direction.1

Because our origin is at the centroid, $x_c = y_c = \xi_c = \eta_c = 0$, and we can write the equations for the moments and products of inertia in a more compact form:

In the $x\text{-}y$ system,

and in the $\xi\text{-}\eta$ system,

We can go back and forth between the two coordinate systems by noting that

Thus,

The $\theta$ terms can come out of the integrals, leaving us with

or

Similarly,

So far, this is just a bunch of algebra that could’ve been done quickly in SymPy. Now it’s time to start thinking.

Looking at the expression for $I_{\xi\eta}$, you might notice that each term includes parts from the double angle formulas. So we can rewrite it this way:

Note that $I_{\xi\eta} = 0$ when

or

Because the tangent function repeats itself every 180°, this expression can be solved with an infinite number of values of $\theta$ that are 90° apart from one another. These orientations all look basically the same, except the $\xi$ and $\eta$ axes swap positions and flip around. For each of them, $I_{\xi\eta} = 0$.

Since we’ve written the expression for $I_{\xi\eta}$ in terms of $2\theta$, lets’s do the same for $I_{\xi\xi}$ and $I_{\eta\eta}$. We start by recognizing the double angle formula for sine in each equation:

Then we use the thoroughly non-obvious identity,2

and use the usual trig identities to get

Therefore,3

and

Now let’s look at how these moments of inertia change with $\theta$. Suppose we wanted to find the $\theta$ that maximized (or minimized) the value of $I_{\xi\xi}$? We’d take the derivative of the expression for $I_{\xi\xi}$ and set it to zero:

This should look familiar. It’s solution is

the same thing we got setting $I_{\xi\eta} = 0$. And if we took the derivative of $I_{\eta\eta}$ with respect to $\theta$ and set it to zero to find the maxima and minima of $I_{\eta\eta}$, we’d get the same thing.

These orientations of the axes are known as the principal directions of the cross-section. They give us both a product of inertia of zero and the largest and smallest values of the moments of inertia. (If $I_{\xi\xi}$ is at a maximum, then $I_{\eta\eta}$ is at a minimum, and vice versa.)

The largest and smallest moments of inertia are commonly called $I_1$ and $I_2$, respectively. They can be calculated by substituting our solution for $\theta$ back into the expressions for $I_{\xi\xi}$ and $I_{\eta\eta}$, but there’s some messy math along the way. It’s easier to recognize that the maximum and minimum moments of inertia are determined entirely by the second and third terms of $I_{\xi\xi}$ and $I_{\eta\eta}$, which are in the form

This expression can be thought of as the horizontal projection of a pair of vectors, one of length $A$ at an angle $\alpha$ to the horizontal and the other of length $B$ at right angles to $A$.

The largest value of this expression will come when the hypotenuse of the triangle, $\sqrt{A^2 + B^2}$, is itself horizontal and pointing to the right. The algebraically smallest value will come when the hypotenuse is horizontal and pointing to the left.

Applying this idea to our expressions for $I_{\xi\xi}$ and $I_{\eta\eta}$, the larger principal moment of inertia will be

and the smaller will be

The axis associated with the larger principal moment of inertia is called the major principal axis and the axis associated with the smaller principal moment of inertia is called the minor principal axis. These are sometimes called the strong and weak axes, respectively. Whatever you call them, they’ll be 90° apart.

Now let’s look at the principal function from the section module and see how these formulas were used.

python:
def principal(Ixx, Iyy, Ixy):
'Principal moments of inertia and orientation.'

avg = (Ixx + Iyy)/2
diff = (Ixx - Iyy)/2      # signed
I1 = avg + sqrt(diff**2 + Ixy**2)
I2 = avg - sqrt(diff**2 + Ixy**2)
theta = atan2(-Ixy, diff)/2
return I1, I2, theta


Looks like I was careless with the $(I_{yy} - I_{xx})$ term and got it backward in the expression for diff, doesn’t it? Also, there seems to be a stray negative sign in the expression for theta. But the principal function does work despite these apparent errors. What we’re running into is the sometimes vexing difference between math and computation.

First, in the formulas for $I_1$ and $I_2$, the diff term gets squared, so flipping its sign doesn’t matter. Second, the numerical calculation of the arctangent isn’t as straightforward as you might think.

There are two arctangent functions in Python’s math library (and in the libraries of many languages):

• atan takes a single argument and returns a result between $-\pi/2$ and $\pi/2$ (-90° and 90°, but in radians instead of degrees).
• atan2 takes two arguments, the $y$ and $x$ components of a vector directed out from the origin at the angle of interest, and returns a result between $-\pi$ and $\pi$ (-180° and 180°), depending on which quadrant the vector points toward.

We can’t use atan in our code because it isn’t robust for some inputs. If we tried

python:
theta = atan(2*Ixy/(Iyy - Ixx))/2


as our formula suggests, we’d get divide-by-zero errors whenever $I_{xx} = I_{yy}$. We can’t have that because there are real cross sections of practical importance for which that’s the case. Any equal-legged angle, for example.

But atan2 can be a problem, too, because we need to distinguish between the major and minor principal axes. In particular, I decided that theta should be the angle between the $x$ axis and the major principal axis. Using atan2 directly from the formula like this

python:
theta = atan2(2*Ixy, Iyy - Ixx)/2


can return an angle 90° away from what we want.

Using the inertia function developed earlier, the moments and product of inertia of the equal-legged angle we just looked at are

Ixx = 9.4405
Iyy = 9.4405
Ixy = -5.1429


Plopping these numbers into the naive formula above, we get

theta = -0.7854


or -45°. This is the angle from the $x$ axis to the weak axis, not the strong axis. The correct answer is 45°, just like the blue line in the figure.

To figure out a way around this problem, let’s plot $I_{\xi\xi}$ for the four cases of interest:

1. $I_{xy} > 0 \quad \text{and} \quad I_{yy} > I_{xx}$
2. $I_{xy} > 0 \quad \text{and} \quad I_{yy} < I_{xx}$
3. $I_{xy} < 0 \quad \text{and} \quad I_{yy} < I_{xx}$
4. $I_{xy} < 0 \quad \text{and} \quad I_{yy} > I_{xx}$

This will let us see what we need for all four quadrants of the atan2 function.

In each of the subplots, successive peaks and valleys of $I_{\xi\xi}$ are 90° apart.

We’re looking for the maximum values of $I_{\xi\xi}$ that are closest to $\theta = 0$, which I’ve marked with the red dots. That means

1. When $I_{xy} > 0 \quad \text{and} \quad I_{yy} > I_{xx}$ (upper right), we want the negative $\theta$ with an absolute value greater than 45°.
2. When $I_{xy} > 0 \quad \text{and} \quad I_{yy} < I_{xx}$ (upper left), we want the negative $\theta$ with an absolute value less than 45°.
3. When $I_{xy} < 0 \quad \text{and} \quad I_{yy} < I_{xx}$ (lower left), we want the positive $\theta$ with an absolute value less than 45°.
4. When $I_{xy} < 0 \quad \text{and} \quad I_{yy} > I_{xx}$ (lower right), we want the positive $\theta$ with an absolute value greater than 45°.

The invocation of atan2 that gives us all of these is

python:
theta = atan2(-2*Ixy, Ixx - Iyy)/2


which we can visualize this way, where the curved arrows represent the angle $2\theta$ for each type of result:

By flipping the signs of both arguments of atan2, we get the sign and magnitude of theta we’re looking for. Note that the expression used in the principal function,

python:
theta = atan2(-Ixy, diff)/2


is equivalent to

python:
theta =  atan2(-2*Ixy, Ixx - Iyy)/2


because of the way we defined diff. And by flipping the signs of both the numerator and denominator, we’re not changing the quotient or the definition of $\theta$. We’re just choosing which solution of

is the most useful.

If you’re mathematically inclined, you may recognize the rotation of axes as a tensor transformation and the determination of principal moments of inertia and principal directions as a eigenvalue/eigenvector problem. But writing principal in those terms would have required me to use more libraries than just math. The formulas in principal are simple, even if their derivation can take us all over the map.

Now that we’ve figured out how principal works, what good is it? It can be shown4 that when the loads on a beam are aligned with one of the principal directions, the beam will bend in that direction only. If the loading is not aligned with a principal direction, the beam will bend both in the direction of the load and in a direction perpendicular to it.

For example, if we were using the equal-legged angle above as a beam and hung a vertical downward load off of it, it would bend both downward and to the left. Not the most intuitively obvious result, but true nonetheless.

Everyone who takes an advanced strength of materials class learns the formulas for the principal moments of inertia and their directions, but there’s usually a bit of hand waving to make the math go faster. And, because the strong and weak directions are typically easy to determine by inspection, the details of picking out the correct arctangent value aren’t discussed. But there’s a richness to even the simplest mechanics, I enjoy exploring it. And since computers can’t figure things out by inspection, you can’t gloss over the details when writing a program.

1. In case you don’t recognize them, the Greek letters $\xi$ and $\eta$ are xi and eta, respectively. They’re often used for coordinate directions when $x$ and $y$ are already taken. You’re probably more familiar with theta, $\theta$, usually the first choice to represent an angle.

2. Don’t believe it? Well, I told you it was non-obvious. But go ahead and multiply out the right hand side and see for yourself.

3. Pay close attention to the negative signs.

4. Don’t worry, I’m not going to show it (not here, anyway). We’re almost done.

[If the formatting looks odd in your feed reader, visit the original article]

# Section properties and SymPy

Permalink - Posted on 2018-01-18 23:28

Yes, the product of inertia integral is definitely more complicated if you’re going to do the derivation by hand. So don’t do it by hand. Learn SymPy and you’ll be able to zip through it.

This is entirely too much like those “it can be easily shown” tricks that math textbook writers use to avoid complicated and unintuitive manipulations. If I’m going to claim you can zip through the product of inertia, I should be able to prove it. So let’s do it.

SymPy comes with the Anaconda Python distribution, and that’s how I installed it. I believe you can get it working with Apple’s system-supplied Python, but Anaconda is so helpful in getting and maintaining a numerical/scientific Python installation, I don’t see why you’d try anything else.

If you’ve ever used a symbolic math program, like Mathematica or Maple, SymPy will seem reasonably familiar to you. My main hangup is the need in SymPy to declare certain variables as symbols before doing any other work. I understand the reason for it—SymPy needs to protect symbols from being evaluated the way regular Python variables are—but I tend to forget to declare all the symbols I need and don’t realize it until an error message appears.

That one personal quirk aside, I find SymPy easy to use for the elementary math I tend to do. The functions I use most often, like diff, integrate, expand, and factor, are easy to remember, so I don’t have to continually look things up in the documentation. And the docs are well-organized when I do have to use them.

The problem we’re going to look at is the solution of this integral for a polygonal area:

We’ll use Green’s theorem to turn this area integral into a path integral around the polygon’s perimeter:

For each side of the polygon, from point $(x_i, y_i)$ to point $(x_{i+1}, y_{i+1})$, the line segment defining the perimeter can be expressed in parametric form,

which means

Now we’re ready to use SymPy to evaluate and simplify the integral for a single line segment. To make the typing go faster as I used SymPy, which I ran interactively in Jupyter console session, I decided to use 0 for subscript $i$ and 1 for subscript $i+1$. Here’s a transcript of the session, where I’ve broken up long lines to make it easier to read:

In [1]: from sympy import *

In [2]: x, y, x_0, x_1, y_0, y_1, t = symbols('x y x_0 x_1 y_0 y_1 t')

In [3]: x = x_0 + (x_1 - x_0)*t

In [4]: y = y_0 + (y_1 - y_0)*t

In [5]: full = integrate(x**2*y/2*diff(y, t), (t, 0, 1))

In [6]: full
Out[6]: -x_0**2*y_0**2/8 + x_0**2*y_0*y_1/12 + x_0**2*y_1**2/24
- x_0*x_1*y_0**2/12 + x_0*x_1*y_1**2/12 - x_1**2*y_0**2/24
- x_1**2*y_0*y_1/12 + x_1**2*y_1**2/8

In [7]: part = x_0**2*y_0*y_1/12 + x_0**2*y_1**2/24 - x_0*x_1*y_0**2/12
+ x_0*x_1*y_1**2/12 - x_1**2*y_0**2/24 - x_1**2*y_0*y_1/12

In [8]: factor(part)
Out[8]: (x_0*y_1 - x_1*y_0)*(2*x_0*y_0 + x_0*y_1 + x_1*y_0 + 2*x_1*y_1)/24

In [9]: print(latex(_))
\frac{1}{24} \left(x_{0} y_{1} - x_{1} y_{0}\right) \left(2 x_{0} y_{0}
+ x_{0} y_{1} + x_{1} y_{0} + 2 x_{1} y_{1}\right)


We start by importing everything from SymPy and defining all the symbols needed. Then we define the parametric equations of the line segment in In[3] and In[4].

In[5] does a lot of work. We define the integrand inside the integrate function and tell it to integrate that expression over $t$ from 0 to 1 (i.e., from $(x_0, y_0)$ to $(x_1, y_1)$). Note that we didn’t need to explicitly enter the expressions for $x$, $y$, or $dy$; SymPy did all the substitution for us, including the differentiation.

I called the result of the integration full because it contains every term of the integration. But we learned in the last post that the leading and trailing terms get cancelled out when we sum over all the segments of the polygon. So I copied just the inner terms from full and pasted them into In[7] to define a new expression, called part.

In[8] then factors part to get a more compact expression, and In[9] converts it to a LaTeX expression, so I can render it nicely here:

With a quick search-and-replace to convert the subscripts to their more general forms, we get the expression presented in the last post (albeit with the terms in a different order):

SymPy didn’t do everything for us. We had to figure out the Green’s function transformation and recognize the cancellation of the leading and trailing terms of full. But it did all the boring stuff, which is its real value.

[If the formatting looks odd in your feed reader, visit the original article]

# Green’s theorem and section properties

Permalink - Posted on 2018-01-17 17:07

In the last post, I presented a simple Python module with functions for calculating section properties of polygons. Now we’ll go through the derivations of the formulas used in those functions.

The basis for all the formulas is Green’s theorem, which is usually presented something like this:

where $P$ and $Q$ are functions of $x$ and $y$, $A$ is the region over which the right integral is being evaluated, and $C$ is the boundary of that region. The integral on the right is evaluated in accordance with the right-hand rule, i.e., counterclockwise for the usual orientation of the $x$ and $y$ axes.

The section properties of interest are all area integrals. We’ll use Green’s theorem to turn them into boundary integrals and then evaluate those integrals using the coordinates of the polygon’s vertices.

## Area

This is the easiest one, but instead of going through the full derivation here, I’ll refer you to this excellent StackExchange page by apnorton and just hit the highlights.

1. The area is defined

and we’ll choose $P = 0$ and $Q = x$ as our Greens’ theorem functions. This gives us

2. We break the polygonal boundary into a series of straight-line segments, each of which can be parameterized this way:

where the $(x_i, y_i)$ are the coordinates of the vertices.

3. Plugging these equations into the integral, we get

A note on the indexing: The polygon has $n$ vertices, which we’ll number from 0 to $n-1$. The last segment of the boundary goes from $(x_{n-1}, y_{n-1})$ to $(x_0, y_0)$. To make this work with the equation, we’ll define $(x_n, y_n) = (x_0, y_0)$.

Let’s compare this with the area function in the module:

python
def area(pts):
'Area of cross-section.'

if pts[0] != pts[-1]:
pts = pts + pts[:1]
x = [ c[0] for c in pts ]
y = [ c[1] for c in pts ]
s = 0
for i in range(len(pts) - 1):
s += x[i]*y[i+1] - x[i+1]*y[i]
return s/2


We start by checking the pts list to see if the starting and ending items match. If they don’t, we copy the starting item to the end to fit the indexing convention discussed above. We then initialize some variables and execute a loop, summing terms along the way. Rewriting the loop in mathematical terms, we get

This doesn’t look like the equation derived from Green’s theorem, does it? But it’s not too hard to see that they are equivalent. Expanding out the binomial product in the earlier equation gives

As we loop through all the values of $i$ from 0 to $n-1$, the leading term of one trip through the loop will cancel the trailing term of the next trip through the loop. Here’s an example for a triangle:

After the cancellations, all that’s left are the inner terms, and that’s the formula used in the area function.

The cancellation doesn’t do much for us here, changing from two additions and one multiplication per loop to two multiplications and one addition per loop. But we’ll see this same sort of cancellation in the other section properties, and it will provide greater simplification in those.

## Centroid

The centroid is essentially the average position of the area. If a sheet of material of uniform thickness and density were cut into a shape, the centroid would be the center of gravity, the balance point, of that shape. The coordinates of the centroid are defined this way:

Let’s derive the formula for $x_c$ for a polygon; the derivation of the formula for $y_c$ will be similar.

In applying Green’s theorem, we’ll take $P = 0$ and $Q = \frac{1}{2} x^2$. Therefore,

Breaking the polygonal boundary into straight-line segments and using the same parametric equations as before, we get an integral that looks like this

for each segment. This integral evaluates to

so our formula for the centroid is

As we found in the formula for area, the leading and trailing terms in the expansion of this product cancel out as we loop through the sum, leaving us with

This looks like a mess, but it can be factored into a more compact form:

The expression for the other centroidal coordinate is as you’d expect:

These are the formulas used in the centroid function.

python:
def centroid(pts):
'Location of centroid.'

if pts[0] != pts[-1]:
pts = pts + pts[:1]
x = [ c[0] for c in pts ]
y = [ c[1] for c in pts ]
sx = sy = 0
a = area(pts)
for i in range(len(pts) - 1):
sx += (x[i] + x[i+1])*(x[i]*y[i+1] - x[i+1]*y[i])
sy += (y[i] + y[i+1])*(x[i]*y[i+1] - x[i+1]*y[i])
return sx/(6*a), sy/(6*a)


## Moments and product of inertia

You may be familiar with moments and products of inertia from dynamics, where the terms are related to the distribution of mass in a body. The moments and product of inertia we’ll be talking about here—more properly called the second moments of area—are mathematically similar and refer to the distribution of area across a planar shape.

The moments and product of inertia that matter in beam bending are taken about the centroidal axis (i.e., a set of $x$ and $y$ axes with the origin at the centroid of the shape). Since we don’t know where the centroid is when we set up our coordinate system, our list of vertex points don’t work off that basis. But we can still calculate the centroidal moments and product of inertia by using these formulas:

We’ll concentrate on $I_{yy}$; the other two will be similarly derived.

First, let’s expand the square inside the integral and see what we get:

The integral in the second term is $A x_c$ and the integral in the third term is just $A$. Putting this together, we get1

Since we already have formulas for $x$ and $x_c$, we can concentrate on the integral in the first term on the right.

Returning to Green’s theorem, we’ll use $P = 0$ and $Q = \frac{1}{3}x^3$, giving us

Once again, we break the polygonal boundary into straight-line segments and use parametric equations to define the segments. For each segment, we’ll get the following integral:

This integral evaluates to

giving us

Once again, if we expand out the product inside the sum, we’ll find that the leading and trailing terms cancel as we work through the loop. That gives us

And that long expression can be factored, leaving

Similar2 derivations give us

These formulas, and the terms accounting for the location of the centroid, are in the function inertia.

python:
def inertia(pts):
'Moments and product of inertia about centroid.'

if pts[0] != pts[-1]:
pts = pts + pts[:1]
x = [ c[0] for c in pts ]
y = [ c[1] for c in pts ]
sxx = syy = sxy = 0
a = area(pts)
cx, cy = centroid(pts)
for i in range(len(pts) - 1):
sxx += (y[i]**2 + y[i]*y[i+1] + y[i+1]**2)*(x[i]*y[i+1] - x[i+1]*y[i])
syy += (x[i]**2 + x[i]*x[i+1] + x[i+1]**2)*(x[i]*y[i+1] - x[i+1]*y[i])
sxy += (x[i]*y[i+1] + 2*x[i]*y[i] + 2*x[i+1]*y[i+1] + x[i+1]*y[i])*(x[i]*y[i+1] - x[i+1]*y[i])
return sxx/12 - a*cy**2, syy/12 - a*cx**2, sxy/24 - a*cx*cy


This older post explains the use of the moment of inertia in beam bending, but I avoided the trickier bits associated with the product of inertia and principal axes. We’ll cover them in the next post.

Update Jan 23, 2018 12:42 PM  Thanks to Glenn Walker for finding an error in one of the formulas. They’re more annoying to me than mistakes in the text.

1. Yes, this is the parallel axis theorem

2. Yes, the product of inertia integral is definitely more complicated if you’re going to do the derivation by hand. So don’t do it by hand. Learn SymPy and you’ll be able to zip through it.

[If the formatting looks odd in your feed reader, visit the original article]

# Python module for section properties

Permalink - Posted on 2018-01-16 03:34

A lot of what I do at work involves analyzing the bending of beams, and that means using properties of the beams’ cross sections. The properties of greatest importance are the area, the location of the centroid, and the moments of inertia. Most of the time, I can just look these properties up in a handbook, as I did in this post, or combine the properties of a few well-known shapes. Recently, though, I needed the section properties of an oddball shape, and my handbooks failed me.

In the past, I would open a commercial program that had a section properties module, draw in the shape, and copy out the results. But my partners and I stopped paying the license for that program several years ago, so that wasn’t an option anymore. I decided to write a Python module to do the calculations and draw the cross-section.

If the cross section is a polygon, there are formulas for calculating the section properties from the coordinates of the vertices. Most of the formulas are on the aforelinked Wikipedia pages and on this very nice page from Paul Bourke of the University of Western Australia. I’ll explain how and why the formulas work in a later post; for now, we’ll just accept them. For cross sections that aren’t polygons, we can create a close approximation by fitting a series of short straight lines to any boundary curve.

Here’s the source code of the module, which I call section.py:

python:
1:  import matplotlib.pyplot as plt
2:  from math import atan2, sin, cos, sqrt, pi, degrees
3:
4:  def area(pts):
5:    'Area of cross-section.'
6:
7:    if pts[0] != pts[-1]:
8:      pts = pts + pts[:1]
9:    x = [ c[0] for c in pts ]
10:    y = [ c[1] for c in pts ]
11:    s = 0
12:    for i in range(len(pts) - 1):
13:      s += x[i]*y[i+1] - x[i+1]*y[i]
14:    return s/2
15:
16:
17:  def centroid(pts):
18:    'Location of centroid.'
19:
20:    if pts[0] != pts[-1]:
21:      pts = pts + pts[:1]
22:    x = [ c[0] for c in pts ]
23:    y = [ c[1] for c in pts ]
24:    sx = sy = 0
25:    a = area(pts)
26:    for i in range(len(pts) - 1):
27:      sx += (x[i] + x[i+1])*(x[i]*y[i+1] - x[i+1]*y[i])
28:      sy += (y[i] + y[i+1])*(x[i]*y[i+1] - x[i+1]*y[i])
29:    return sx/(6*a), sy/(6*a)
30:
31:
32:  def inertia(pts):
33:    'Moments and product of inertia about centroid.'
34:
35:    if pts[0] != pts[-1]:
36:      pts = pts + pts[:1]
37:    x = [ c[0] for c in pts ]
38:    y = [ c[1] for c in pts ]
39:    sxx = syy = sxy = 0
40:    a = area(pts)
41:    cx, cy = centroid(pts)
42:    for i in range(len(pts) - 1):
43:      sxx += (y[i]**2 + y[i]*y[i+1] + y[i+1]**2)*(x[i]*y[i+1] - x[i+1]*y[i])
44:      syy += (x[i]**2 + x[i]*x[i+1] + x[i+1]**2)*(x[i]*y[i+1] - x[i+1]*y[i])
45:      sxy += (x[i]*y[i+1] + 2*x[i]*y[i] + 2*x[i+1]*y[i+1] + x[i+1]*y[i])*(x[i]*y[i+1] - x[i+1]*y[i])
46:    return sxx/12 - a*cy**2, syy/12 - a*cx**2, sxy/24 - a*cx*cy
47:
48:
49:  def principal(Ixx, Iyy, Ixy):
50:    'Principal moments of inertia and orientation.'
51:
52:    avg = (Ixx + Iyy)/2
53:    diff = (Ixx - Iyy)/2      # signed
54:    I1 = avg + sqrt(diff**2 + Ixy**2)
55:    I2 = avg - sqrt(diff**2 + Ixy**2)
56:    theta = atan2(-Ixy, diff)/2
57:    return I1, I2, theta
58:
59:
60:  def summary(pts):
61:    'Text summary of cross-sectional properties.'
62:
63:    a = area(pts)
64:    cx, cy = centroid(pts)
65:    Ixx, Iyy, Ixy = inertia(pts)
66:    I1, I2, theta = principal(Ixx, Iyy, Ixy)
67:    summ = """Area
68:    A = {}
69:  Centroid
70:    cx = {}
71:    cy = {}
72:  Moments and product of inertia
73:    Ixx = {}
74:    Iyy = {}
75:    Ixy = {}
76:  Principal moments of inertia and direction
77:    I1 = {}
78:    I2 = {}
79:    θ︎ = {}°""".format(a, cx, cy, Ixx, Iyy, Ixy, I1, I2, degrees(theta))
80:    return summ
81:
82:
83:  def outline(pts, basename='section', format='pdf', size=(8, 8), dpi=100):
84:    'Draw an outline of the cross-section with centroid and principal axes.'
85:
86:    if pts[0] != pts[-1]:
87:      pts = pts + pts[:1]
88:    x = [ c[0] for c in pts ]
89:    y = [ c[1] for c in pts ]
90:
91:    # Get the bounds of the cross-section
92:    minx = min(x)
93:    maxx = max(x)
94:    miny = min(y)
95:    maxy = max(y)
96:
97:    # Whitespace border is 5% of the larger dimension
98:    b = .05*max(maxx - minx, maxy - miny)
99:
100:    # Get the properties needed for the centroid and principal axes
101:    cx, cy = centroid(pts)
102:    i = inertia(pts)
103:    p = principal(*i)
104:
105:    # Principal axes extend 10% of the minimum dimension from the centroid
106:    length = min(maxx-minx, maxy-miny)/10
107:    a1x = [cx - length*cos(p[2]), cx + length*cos(p[2])]
108:    a1y = [cy - length*sin(p[2]), cy + length*sin(p[2])]
109:    a2x = [cx - length*cos(p[2] + pi/2), cx + length*cos(p[2] + pi/2)]
110:    a2y = [cy - length*sin(p[2] + pi/2), cy + length*sin(p[2] + pi/2)]
111:
112:    # Plot and save
113:    # Axis colors chosen from http://mkweb.bcgsc.ca/colorblind/
114:    fig, ax = plt.subplots(figsize=size)
115:    ax.plot(x, y, 'k*-', lw=2)
116:    ax.plot(a1x, a1y, '-', color='#0072B2', lw=2)     # blue
117:    ax.plot(a2x, a2y, '-', color='#D55E00')           # vermillion
118:    ax.plot(cx, cy, 'ko', mec='k')
119:    ax.set_aspect('equal')
120:    plt.xlim(xmin=minx-b, xmax=maxx+b)
121:    plt.ylim(ymin=miny-b, ymax=maxy+b)
122:    filename = basename + '.' + format
123:    plt.savefig(filename, format=format, dpi=dpi)
124:    plt.close()


The key data structure is a list of tuples,1 which represent all of the vertices of the polygon. Each tuple is a pair of (x, y) coordinates for a vertex, and the list must be arranged so the vertices are in consecutive clockwise order. This ordering is the result of Green’s theorem, which is the source of the formulas.2

Here’s a brief example of using the module:

python:
1:  #!/usr/bin/env python
2:
3:  from section import summary, outline
4:
5:  shape = [(0, 0), (5, 0), (5, 1), (3.125, 1), (2.125, 3), (0.875, 3), (1.875, 1), (0, 1)]
6:  print(summary(shape))
7:  outline(shape, 'skewed', format='png', size=(8, 6))


Line 5 defines the vertices of the shape. The printed output from Line 6 is

Area
A = 7.5
Centroid
cx = 2.3333333333333335
cy = 1.0
Moments and product of inertia
Ixx = 5.0
Iyy = 11.367187499999993
Ixy = -1.666666666666666
Principal moments of inertia and direction
I1 = 11.77706657483349
I2 = 4.590120925166502
θ︎ = 76.18358042418826°


and the PNG file created from Line 7, named skewed.png, looks like this

As you might expect, the x-axis is horizontal and the y-axis is vertical. In addition to the shape itself, the outline function also plots the centroid as a black dot and the orientation of the principal axes. The major axis is the thicker bluish line and the minor axis is the thinner reddish line.

The outline function is the most interesting, in that it isn’t just the transliteration of a formula into Python. Lines 92–95 extract the extreme x and y values, and Line 98 calculates the size of a whitespace border (5% of the larger dimension) to keep the frame of the plot a reasonable distance away from the shape. This also makes it easy to crop the drawing to omit the frame. The ends of the principal axes are calculated in Lines 106–110 to make their lengths 20% of the smaller dimension; the idea is to make them long enough to see but not so long as to be distracting.

As noted in the comments, I chose the axis colors from a colorblind-safe palette given by Martin Krzywinski on this page. He got the palette from a paper by Bang Wong that I didn’t feel like paying \$59 for (my scholarship has its limits). To better emphasize which is the major principal axis, I made it thicker.

Mechanical and civil engineers learn how to calculate section properties early on in their undergraduate curriculum, so it’s not a particularly difficult topic, but there is a surprising depth to it. Enough depth that I plan to milk it for three more posts, which I’ll link to here when they’re done.

1. Strictly speaking, any data structure that indexes like a list of tuples—a list of lists, for example—would work just as well, but because coordinates are usually given as parenthesized pairs, a list of tuples seemed the most natural.

2. As promised, we’ll get to the derivation of the formulas in a later post, but if you want a taste, here’s a good derivation of the area formula by apnorton.

[If the formatting looks odd in your feed reader, visit the original article]

# A small hint of big data

Permalink - Posted on 2018-01-06 16:59

Shortly before Christmas, I got a few gigabytes of test data from a client and had to make sense of it. The first step was being able to read it.

The data came from a series of sensors installed in the some equipment manufactured by the client but owned by one of its customers. It was the customer who had collected the data, and precise information about it was limited at best. Basically, all I knew going in was that I had a handful of very large files, most of them about half a gigabyte, and that they were almost certainly text files of some sort.

One of the files was much smaller than the other, only about 50 MB. I decided to start there and opened it in BBEdit, which took a little time to suck it all in but handled it flawlessly. Scrolling through it, I learned that the first several dozen lines described the data that was being collected and the units of that data. At the end of the header section was a line with just the string

[data]


and after that came line after line of numbers. Each line was about 250 characters long and used DOS-style CRLF line endings. All the fields were numeric and were separated by single spaces. The timestamp field for each data record looked like a floating point number, but after some review, I came to understand that it was an encoding of the clock time in hhmmss.ssss format. This also explained why the files were so big: the records were 0.002 seconds apart, meaning the data had been collected at 500 Hz, much faster than was necessary for the type of information being gathered.

Anyway, despite its excessive volume, the data seemed pretty straightforward, a simple format that I could do a little editing of to get it into shape for importing into Pandas. So I confidently right-clicked one of the larger files to open it in BBEdit, figuring I’d see the same thing. But BBEdit wouldn’t open it.

As the computer I was using has 32 GB of RAM, physical memory didn’t seem like the cause of this error. I had never before run into a text file that BBEdit couldn’t handle, but then I’d never tried to open a 500+ MB file before. I don’t blame BBEdit for the failure—data files like this aren’t what it was designed to edit—but it was surprising. I had to come up with Plan B.

Plan B started with running head -100 on the files to make sure they were all formatted the same way. I learned that although the lengths of the header sections were different, they were collecting same type of data and using the same space-separated format for the data itself. Also, in each file the header and data were separated by a [data] line.

The next step was stripping out the header lines and transforming the data into CSV format. Pandas can certainly read space-separated data, but I figured that as long as I had to do some editing of the files, I might as well put them into a form that lots of software can read. I considered using a pipeline of standard Unix utilities and maybe Perl to do the transformation, but settled on a writing a Python script. Even though such a script was likely to be longer than the equivalent pipeline, my familiarity with Python would make it easier to write.

Here’s the script:

python:
1: #!/usr/bin/env python
2:
3: import sys
4:
5: f = open(sys.argv[1], 'r')
6: for line in f:
7:   if line.rstrip() == '[data]':
8:     break
9:
11:
12: for line in f:
13:   print line.rstrip().replace(' ', ',')


(You can see from the print commands that this was done back before I switched to Python 3.)

The script, data2csv, was run from the command line like this for each data file in turn:

data2csv file01.dat > file01.csv


The script takes advantage of the way Python iterates through an open file line-by-line, keeping track of where it left off. The first loop, Lines 6–8, runs through the header lines, doing nothing and then breaking out of the loop when the [data] line is encountered.

Line 10 prints a CSV header line of my own devising. This information was in the original file, but its field names weren’t useful, so it made more sense for me to create my own.

Finally, the loop in Lines 12–13 picks up the file iteration where the previous loop left off and runs through to the end of the file, stripping off the DOS-style line endings and replacing the spaces with commas before printing each line in turn.

Even on my old 2012 iMac, this script took less than five seconds to process the large files, generating CSV files with over two million lines.

I realize my paltry half-gigabyte files don’t really qualify as big data, but they were big to me. I’m usually not foolish enough to run high frequency data collection processes on low frequency equipment for long periods of time. Since the usual definition of big data is something like “too voluminous for traditional software to handle,” and my traditional software is BBEdit, this data set fit the definition for me.

[If the formatting looks odd in your feed reader, visit the original article]