Over the last few years, I have gathered a list of blogs and websites I read regularly. Some of them I subscribed to via email, but a lot of them I also navigated to whenever I remembered. It lead to many a mess of open tabs and missing posts I would’ve liked to read. One of the reasons it took me this long to switch everything over to an RSS reader is that some of the websites I really wanted to follow don’t provide any. My solution is simple: I’ll just generate my own feeds for websites that don’t provide them.
The setup is simple. I have a docker container on my server that runs a python script every hour which checks each predefined feed. For each feed, I have written a more-or-less custom script of how to extract an rss feed entry from the website. The feeds are saved and served on a custom subdomain, which I can then feed into my RSS reader. I use CapyReader on mobile at the moment and am quite happy with it.
Some feeds are easy to define, like the daily game raddle which has a new URL for each daily riddle, labeled by the date. The humble bundle bundles each come with plenty of information, both on the overview and the bundles page to define useful feed entries for myself. Others are less obvious, like zamonien.de, the website to one of my favourite book series, has a news page, but provides no date and no context on how new posts will show up – I’ll have to see how well my implementation does once they do update it. The poetry foundation‘s poem of the day lives on a surprisingly difficult to parse HTML page with less-than-helpfully labeled elements. On the other hand, pages on fanfiction.net are kind enough to keep the text of chapters in an element labeled “storycontent”, but make it surprisingly difficult to extract the chapter’s title. Some, like the feed for the epic game store free game of the week, are so successful in convoluting their page’s code that I instead found a secondary source in an article about the topic that gets updated weekly.
Are these feeds going to work reliably and consistently? Probably not. I imagine as websites change, I will have to change my implementations, and not all the websites I have written feeds for lend themselves to it.
Would I prefer it if the websites all published their own feeds? Definitely. So far, I have been very glad to use the existing feed for any website I want to follow that has one, even if I don’t like some of their decisions (e.g., serving only part of the content to the feed and requiring you to click through to the source anyway).
The code is on my gitlab as rss-generator and was written by myself with partial support by coding agents. Depending on how clean a page’s source code is, it can be a fun puzzle or alternatively a really painful affair to try and find the pattern of data to extract, so I’ve been deciding on what to do myself on a page-by-page basis.
It’s been a fun project and I’m glad I’ve built it. It’s been a joy to have a single app to turn to to find all the content I want to see – and have it be all content I decided on myself, in chronological order.
Some of my custom feeds:
- Feed for the daily game raddle
- Feed for new humble bundles
- Feed for the poem of the day
- Feed for the epic game of the week
- Feed for darebee‘s exercise of the day
Part of what inspired me to finally write about this project I’ve been building and using for going on two months now is following other people’s blog and reading their articles on the matter. Here are three recent ones that spoke to me:
- Cory Doctorow’s post “The web is bearable with RSS”
- Terry Godier’s post on his new RSS Reader, Current
- Justin Jackson’s “Don’t kill my pretty RSS feed”
I’ll try and add a blogroll or list of rss feeds I recommend somewhere on my page, because I am always glad to find new blogs that way. Thanks to any of you out there that still have RSS feeds available on your websites. You keep the internet usable and alive.
Leave a Reply