Home
2014-10-04

Pelican, RSS, and the Machine Readable Web

So I've switched the tool I use to generate this blog:

Pelican is written in Python, whereas Jekyll is written in Ruby.

The main benefit of Pelican is that I know Python well, because I use Django everyday for work. I've used a little Ruby here and there, but my knowledge of the language and (perhaps more importantly) the ecosystem is pretty minimal.

I've already looked at the code for some simple Pelican plugins and feel confident I could write one up in a jiffy.

Another benefit of Pelican is the built-in RSS feed generator. With Jekyll, you have to use a plugin.

Subscribe here to this blog's shiny new RSS feed.

I realize I'm about 10 years late to this party, but I'm really, really excited about RSS. I've been using Feedly and didn't understand you could add arbitrary URLs to it. And the process for "publishers" to request a Feedly hashtag for their site that shows up in Feedly searches seems simple enough.

But there is something even more exciting about RSS: a machine readable web isn't a thing of the future. It's already here.

Technically, all of the web is already machine readable. But in practice, extracting and processing the content on webpages is pretty hard. When I first started programming, I spent a while trying to figure out how to parse certain websites and knit them into a "book" format suitable for reading on my Kindle. I was trying to use the raw HTML. It was a slow process and the results weren't the best.

(I was a total noob programmer. So that also made it harder.)

When I finally succeeded in parsing one website reasonably well, I basically had to start from scratch on the next site. The sites themselves were similar: they were both blogs and I wanted the posts. But on each site, the nesting and format of all the HTML tags (h1, div, p) was completely different.

Using sophisticated algorithms, a big search engine may be able to extract ALL the content from webpages and figure out which pages are relevant to whom. But it's a very different task to extract just the specific content you are interested in and put that into a nice longform format.

With RSS, it becomes so much easier!

RSS (and SEO for that matter) are just the first baby steps toward the semantic web. Imagine the things that would be possible with a true semantic web!


Blog Index