A lightweight alternative to SingleFile

July 10, 2025

I’ve written a few blog drafts on the subject, but I thought I would focus on the end result instead of all of side tangents one can become lost in. I must give credit to Gwern for prompting me to take a closer look at how I archive links on the internet. His paraphrased quote from the Buddha sets the tone for how we as users need to think about how we interact with what we find on the internet.

Decay is inherent in all compound things. Work out your own salvation with diligence.

As in every single link you encounter will die someday. Or become difficult to access. Maybe you will be able to find the text, but not the images. The Internet Archive while useful is too fragile to serve as humanity’s only backup of our collective knowledge. Gwern highlights a number of issues but mostly come down to speed and reliability.

You need to invest in creating an offline backup of the links you care about if you want to read them in the future.

The story of Pinboard is a cautionary tale in degraded service. Although I still continue to use Pinboard, a number of its features no longer work as intended. As of the time of this blog post, I cannot accesss Pinboard on Firefox, full text search is not working, and bookmark archiving is not reliable. The last one is not entirely the fault of Pinboard, as many sites are making it harder for servers to scrape webpages.

The easiest way to create an offline backup of a webpage is to use SingleFile. All you have to do is install the extension and press the button. Backup these files in a suitable means. Rehost them on your own website or put them in some sort of cloud storage. I’m not going to consider how you should go about making reliable backups of these HTML files, but you should.

The second way is what I am showing here: creat a shell script to generate your own self-contained HTML files from markdown. Again there are a number of ways to go about generating the markdown from an article you find. The focus here is how I go from markdown to HTML file.

#!/bin/bash

TEMPLATE=$HOME/Desktop/text/template.html
NEW=$HOME/Desktop/text/new.md
TIME=$(gdate +"%H%M%S")
STYLE=$HOME/Desktop/text/style.css

pandoc -f markdown -t html $NEW -o ~/Desktop/output.html --css=$STYLE --template $TEMPLATE --embed-resources --standalone

While the pandoc command it simple, it relies on having specific YAML data in the input markdown file. This is used to provide a similar output to SingleFile with a HTML comment at the top with additional metadata.

<!--
 Page saved with SingleFile 
 url: https://www.theguardian.com/books/2022/dec/10/fran-lebowitz-new-york-writer-essayist-interview 
 saved date: Wed Jul 09 2025 11:31:35 GMT-0700 (Pacific Daylight Time)

The most reliable way to do so is to generate it with YAML data. I add these attributes. Date in this context is the date the article was published and date-generated is when I created the HTML file.

---
pagetitle: "Fran Lebowitz on life without the internet: ‘If I’m cancelled, don’t tell me!’"
date: 20221210
date-generated: "Wed Jul 09 2025 10:04:16 GMT-0700"
url: https://www.theguardian.com/books/2022/dec/10/fran-lebowitz-new-york-writer-essayist-interview
...

Finally I use a custom template, modified slightly from the default pandoc HTML5 template. I added the following just before the head tag.

<!--
 Page saved with pandoc
 url: $url$
 saved date: $date-generated$
-->

I’m using the same CSS as I use on my website. While this process is automated, there is still a fair amount of manual effort involved. I don’t do this for every single link I come across, only the ones I feel are worth the effort.