Journal tags: related

2

sparkline

Indie webbing

The past weekend’s Indie Web Camp Brighton was wonderful! Many thanks to Mark and Paul for all their work putting it together.

There was a great turn-out. It felt like the perfect time for an Indie Web Camp. There’s a real appetite for getting away from ever more extractive silos and staking claim to our own corners of the web. Most of the attendees were at their first ever Indie Web Camp.

Paul asked me to oversee the schedule planning on day one, which I was happy to do. We made sure that first-timers got first dibs on proposing sessions. In the end, every single session was proposed by new attendees.

Day two was all about putting ideas into practice: coding, designing, and writing on our own website. I’m always blown away by how much gets done in just one short day. Best of all is when there’s someone who starts the weekend without their own website but finishes with a live site. That happened again this time.

I spent the second day tinkering with something I started at Indie Web Camp Nuremberg in October. Back then, I got related posts working here on my journal; a list of suggested follow-up posts to read based on the tags of the current post.

I wanted to do the same for my links; show links related to the one I’m currently linking to. It didn’t take too long to get that up and running.

But then I thought about it some more and realised it would be good to also show blog posts related to the link. So I did that. Then I realised it would be really good to show related links under blog posts too.

So now, if everything’s working correctly, then at the end of this post you will not only see related blog posts I’ve previously written, but also links related to the content of this post.

It was a very inspiring weekend. There’s something about being in a room with other people working on their websites that makes me super productive.

While we were hacking away on day two, somebody mentioned that they still find hard to explain the indie web to people.

“It’s having your own website”, I said.

But surely there’s more to it than that, they wondered.

Nope. If someone has their own website, then they’re part of the indie web. It doesn’t matter if that website is made with a complicated home-rolled tech stack or if it’s a Squarespace site.

What you do with your own website is entirely up to you. The technologies are just plumbing wether it’s webmentions, RSS, or anything else. None of it is a requirement. Heck, even HTML is optional. If you want to put plain text files on your website, go for it. It’s your website.

Indie Web Camp Nuremberg

After two days at border:none in Nuremberg, it was time for two days at Indie Web Camp, also in Nuremberg.

I hadn’t been to an Indie Web Camp since before The Situation. It felt very good to be back. I had almost forgotten how inspiring and productive they can be.

This one had a good turnout of around twenty people. We had ourselves an excellent first day of thought-provoking sessions. Then on day two it was time to put some of those ideas into action.

A little trick I like to do on the practical day is to have two tasks to attempt: one of them quite simple, and the other more ambitious. That way, as long as I get the simpler task done, I’ll always have at least something to demo at the end of the day.

This time I attempted three bits of home improvement on my website.

Autolinking Mastodon usernames

The first problem I set myself was ostensibly the simple one. But it involved regular expressions, so then I had two problems.

I wanted to automatically link up Mastodon usernames if I mentioned one in my notes. For example, during border:none I mentioned Brian’s mastodon username in a note: @briansuda@loðfíll.is.

That turned out to be an excellent test case. Those Icelandic characters made sure I wasn’t making unwarranted assumptions about character sets.

Here’s the regular expression I came up with. It’s not foolproof by any means. Basically it looks for @[email protected].

Good enough. Ship it.

Related posts

My next task was a bit more ambitious. It involved SQL queries, something I’m slightly better at than regular expressions but that’s a very low bar.

I wanted to show related posts when you get to the end of one of my blog posts.

I’ve been tagging all my blog posts for years so that’s the mechanism I used for finding similar posts. There’s probably a clever SQL statement that could do this, but I ended up brute-forcing it a bit.

I don’t feel too bad about the hacky clunky nature of my solution, because I cache blog post pages. That means only the first person to view the blog post (usually me) will suffer any performance impacts from my clunky database queries. After that everything’s available straight from a cached file.

Let’s say you’re reading a blog post of mine that I’ve tagged with ten different keywords. I make a separate SQL query for each keyword to get all the other posts that use that tag. Then it’s a matter of sorting through all the results.

I loop through the results of each tag and apply a score to the tagged post. If the post shares one tag with the post you’re looking at, it has a score of one. If it shares two tags, it has a score of two, and so on.

I decided that for a post to be considered related, it had to share at least three tags. I also decided to limit the list of related posts to a maximum of five.

It worked out pretty well. If you scroll down on my recent post about JavaScript, you’ll see links to related posts about JavaScript. If you read through a post on accessibility testing, you’ll find other posts about accessibility testing. If you make it to the end of this post about Mars colonisation you’ll see links to more posts about exploring our solar system.

Right now I’m just doing this for my blog but I’d like to do it for my links too. A job for a future Indie Web Camp.

Link rot

I was very inspired by Remy’s recent post on how he’s tackling link rot on his site. I wanted to do the same for mine.

On the first day at Indie Web Camp I led a session on link rot to gather ideas and alternative approaches. We had a really good discussion, though it’s always worth bearing in mind that there’ll never be a perfect solution. There’ll always be some false positives and some false negatives.

The other Jeremy at Indie Camp Nuremberg blogged about the session. Sebastian Greger was attending remotely and the session inspired him to spend the second day also tackling linkrot.

In the end I decided to stick with Remy’s two-pronged approach:

  1. a client-side script that—as a progressive enhancement—intercepts outbound links and re-routes them to
  2. a server-side script that redirects to the Internet Archive if the link is broken.

Here’s the JavaScript I wrote for the first part.

It’s very similar to Remy’s but with one little addition. I check to see if the clicked link is inside an h-entry and if it is, I pass on the date from the post’s dt-published value.

Here’s the PHP I wrote for the server-side redirector. The comments tell the story of what the code is doing:

  • Check that the request is coming from my site.
  • There also has to be a URL provided in the query string.
  • Make a very quick curl request to get the response headers from the URL. The time limit is set to 1 second.
  • If there was any error (like a time out), give up and go to the URL.
  • Pick the response headers apart to get the HTTP status code.
  • If the response is OK, go to the URL.
  • If the response is a redirect, go around again but this time use the redirect URL.
  • Construct the archive.org search endpoint.
  • If we have a date, provide it. Otherwise ask for the latest snapshot.
  • Ping that archive.org URL. This time there’s no time limit; this might take a while.
  • If there’s an archived copy, redirect to that.
  • There’s no archived copy. Give up and go the URL anyway.

Not perfect by any means, but it works for the most common cases of link rot.

For the demo at the end of the day I went back into my archive of over 10,000 links and plucked out some old posts, like this one from December 2005. It takes a little while to do the rerouting but eventually you get to see the archived version from the same time period as when I linked to it.

Here’s another link from 2005. Here’s another. Those links are broken now, but with a little patience, you’ll still get to read them on the Internet Archive.

The Internet Archive’s wayback machine really is a gift. I can’t imagine how would it be even remotely possible to try to address link rot on my site without archive.org.

I will continue to donate money to the Internet Archive and I encourage you to do the same.