Implementing PubSubHubbub

by Peter Bijkerk

I mentioned in last night’s post that I wanted to implement Nathan Griggs’s system for instant updates to the site’s feed at Google Reader. I managed to get it done, but I ran into a couple of problems along the way. One I was able to solve cleanly, the other required an underhanded trick.

Even if you never visit the Google Reader site itself, there’s a good chance the feed readers you do use—NetNewsWire, Reeder, Vienna, whatever—use Google Reader to sync the status of feed subscriptions across your devices. And if you’re a blogger, the same is true of most of the subscribers to your site’s feed—Google Reader has become almost everyone’s master subscription list.

In the first paragraph of his post, Nathan lays out why bloggers might want to exercise some control over when this master subscription list gets updated:

Google Reader fetches my RSS feed about once every hour. So if I publish a new post, it will be 30 minutes, on average, before the post appears there. If I notice a typo but Google Reader already cached the feed, then I have to wait patiently until the Feed Fetcher returns. In the mean time, everyone reads and makes fun of my mistake.

As someone who’s always finding (or being told about) typos in his just-published posts, I’m mortified to know that subscribers may be seeing my stupid mistakes for as long as an hour after I fix them. I found the ability to control when Google Reader’s cache for ANIAT was updated very appealing. So I followed Nathan’s instructions to implement the PubSubHubbub protocol here.

PubSubHubbub is an intermediary between a publisher’s site and Google Reader.1 Instead of Google’s Feed Fetcher checking the site periodically to see if the feed has changed, the publisher tells PubSubHubbub when the feed has changed and PubSubHubbub then pushes those changes to Google Reader. Reader updates its cache of the site’s feed almost instantly and no longer needs to poll the site periodically.

The two tasks a publisher must complete to implement PubSubHubbub are:

  1. Tell Google Reader to look for updates to come from PubSubHubbub.
  2. Ping PubSubHubbub whenever the feed changes.

Task 1 requires a line or two to be added to the site’s feed. Because ANIAT is a WordPress site and most of my readers subscribe to the RSS2 feed, I added a line to wp-includes/feed-rss2.php:2

23:  <channel>
24:    <atom:link href="" rel="hub" />
25:      <title><?php bloginfo_rss('name'); wp_title_rss(); ?></title>

Line 24 is the new line. After making this change, the next time Google Reader polled the site, it learned that PubSubHubbub was now the intermediate hub from which it would get future updates.

In my first attempt to get PubSubHubbub working, I misinterpreted Nathan’s instructions and put Line 24 in the wrong place. I thought the line was supposed to go after the entire channel element, and therefore after the </channel> end tag. But as you can see, the proper place for the line is as a child of the channel element—putting it immediately after the opening <channel> tag does the trick. This was the clean solution I described in the opening paragraph.

Task 2 can be accomplished in several ways. Because I use a Python script to publish posts (and to republish them after editing), I simply added these lines to the end of the script to ping the PubSubHubbub server:

104: # Ping PubSubHubbub so Google Reader knows to update its feed cache.
105: data = urllib.urlencode({'hub.mode': 'publish',
106:                          'hub.url': ''})
107: psh = httplib.HTTPConnection('')
108: psh.request('POST', '', data)

Lines 105-106 define the data that needs to be POSTed to the hub server. Lines 107-108 make the connection to the server and POST the data. There are, of course import httplib and import urllib lines at the top of the script.

Nathan does his pinging through the curl command. I could have done that, too, by calling curl from within my script. But I thought using an httplib request was more Pythonic.

With these two tasks complete, the Google Reader cache of the RSS2 feed now updates within seconds of my publishing or republishing a post.

To accomplish the same thing with the Atom feed, I had to resort to dirty tricks. There’s a simple line to add to wp-includes/feed-atom.php that should work the same as Line 24 above, but for reasons I can’t explain, it never did. Despite many attempts and rereadings of the Discovery section of the PubSubHubbub spec, I just couldn’t get Google Reader to update its cache of the Atom feed.

Luckily, there are only a handful of readers who subscribe via the Atom feed, and I don’t think any of them really care whether they get Atom or RSS2. So I cheated, redirecting requests for the Atom feed to the RSS2 feed by adding this line to the blog’s .htaccess file:

RewriteRule ^feed/atom/$ feed/ [R,L]

Already present at the beginning of the file were the lines:

RewriteEngine On
RewriteBase /all-this

which allowed me to do the rewriting without having to write out long URLs.

Now I have a blog publishing system that’s more tolerant of my errors and doesn’t keep broadcasting them after I make the fixes.

  1. Or other subscription services, but we’re focusing on Google Reader. 

  2. There’s a similar addition to be made to the Atom feed, but I’ll discuss that later in the post. 

via And now it’s all this