Benjamin Hays

Building Custom RSS Feeds for LWN.net

  ·   4 min read

Introduction

If you’ve spent a bit of time in the Linux community, especially anything kernel-related, you may be familiar with Linux Weekly News. It’s an invaluable resource for kernel development news, security updates, and general open source coverage. While they offer RSS for syndication, I found myself wanting more control over the feeds themselves - specifically, the ability to filter out subscriber-only articles.

Disclaimer: I completely support the efforts of LWN.net, and you should strongly consider purchasing a membership there if you are able to. As a student, it’s unfortunately a bit outside of the realm of possibility for me right now. All of the original RSS feeds used are publicly available for free on their website.

If you want to skip my code/solution altogether then you can find my filtered feeds here

My Solution

LWN.net operates on a subscription model where some articles are only available to subscribers and become freely available after a week. Others, like security patch updates are available instantly and free forever. The official RSS feeds include all articles, which can be a bit frustrating when scrolling through my e-reader only to find things that won’t be available till next week, conveniently just enough time for me to forget about them entirely.

def download_feed(s, url, file, remove_premium=False):
    r = s.get(url) # where s is a requests.Session object
    tree = ET.ElementTree(ET.fromstring(r.text))
    root = tree.getroot()
    for post in tree.iter('item'):
        if remove_premium and "[$]" in post.find('title').text:
            root[0].remove(post)
    
    tree.write(file)

The code is quite straightforward and minimal - it downloads the RSS feed, parses it using lxml, and removes any items marked with the “[$]” symbol that indicates subscriber-only content. I chose the lxml package over alternatives like feedparser because it gives me more direct control over the XML structure and allows me to easily write the results of my modifications to a file.

s = requests.Session()
download_feed(s, "https://lwn.net/headlines/Features", "lwn-features.xml", remove_premium=True)
download_feed(s, "https://lwn.net/headlines/rss", "lwn-all.xml", remove_premium=True)

This code snippet creates the feeds that I eventually upload and use in my news app. Pretty convenient for 11 lines of code I’d say.

Automating Updates

Of course, a static, out-of-date feed isn’t very useful. I needed a way to regularly update the feeds to catch new articles as they’re published. This is where Gitea Actions comes in. I set up a workflow that runs every 4 hours:

name: Update RSS Feeds
on:
  push:
    branches:
      - 'main'
  schedule:
    - cron: '@hourly'
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Copy SSH Key
        run: |
          mkdir ~/.ssh/
          echo "Host *" > ~/.ssh/config
          echo "    StrictHostKeyChecking no" >> ~/.ssh/config
          echo '${{secrets.SSH_PRIVATE_KEY}}' > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa          

      - name: Install Prereqs
        run: |
          apt update -y
          apt install python3-requests python3-lxml -y          

      - uses: actions/checkout@v3
        with:
          submodules: recursive

      - name: Generate Feeds
        run: |
          python3 generate_feeds.py          

      - name: Deploy to Server
        run: |
          scp -i ~/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -r lwn-*.xml [email protected]:/var/www/html/          

The workflow is also fairly small - it just installs the required Python packages and runs the feed generator script. The generated feeds are then copied to my web server using SSH, making them available at predictable URLs. It’s a bit of a hacky solution, given that CI/CD jobs were never really quite “made” for this, but it seems to be working perfectly fine without any downsides.

Looking Forward

There’s still plenty of room for improvement. I’d like to add full-text parsing for the articles that are freely available - you might notice the TODO comment in the original code repository. This would make the feeds more useful in feed readers that don’t automatically fetch the full article content. Personally, I’m content with the other services that perform this task, but it’s definitely an idea that could be worked upon.

I’m also considering adding more specialized feeds. For example, a feed that only includes security-related articles, or one that focuses on kernel development discussions. The nice thing about having the basic infrastructure in place is that adding new feeds is just a matter of writing the appropriate filters.

Running Your Own Instance

If you want to set up your own custom LWN feeds, the code is available on my Gitea instance. You’ll need:

  • Python 3 with the requests and lxml packages
  • A web server to host the generated feeds (or you could find a way to publish it using GitHub pages)
  • Basic understanding of Gitea Actions (or GitHub Actions if you prefer)

Conclusion

Just remember that while we’re filtering the feeds, all the content still belongs to LWN.net. If you find value in their reporting, consider supporting them with a subscription. I certainly do.

The feeds I generate are available at:

As always, thanks for reading!