Blocking bots

The other day I was emailing with Matthew “Starbreaker” Graybosch about his recent post titled “robots.txt: the Nuclear Option”. If you’re a regular reader of this site you know I love this kind of stuff and I especially love nuclear options when it comes to fighting silly tech.

I experimented with blocking everything in the past but this recent exchange made me want to revisit this idea. With perfect timing, Robb Knight posted “Perplexity AI Is Lying about Their User Agent” and that was all the extra motivation I needed to join the fun.

I already had a 403 in place for Mastodon because I don’t want to get a shit ton of traffic coming my way every time someone posts a link of mine but I loved Matthew’s idea of returning a 402.

So I grabbed 180 or so entries from the Dark Visitors’s agents list and set up an NGINX redirect based on those UA. Gonna be interesting to see if this has any effect on the server so I’ll write a follow-up.

I tried to leave out all the RSS fetchers because I love RSS, RSS is great and if you’re using RSS in 2024 you’re an awesome person BUT I might have inadvertently broken some RSS feed out there with this move. If you notice something not working properly let me know and I’ll fix it.