Tags: useragent

19

sparkline

Thursday, August 29th, 2024

What RSS Needs

I love my feed reader:

Feed readers are an example of user agents: they act on behalf of you when they interact with publishers, representing your interests and preserving your privacy and security. The most well-known user agents these days are Web browsers, but in many ways feed readers do it better – they don’t give nearly as much control to sites about presentation and they don’t allow privacy-invasive technologies like cookies or JavaScript.

Also:

Feed support should be built into browsers, and the user experience should be excellent.

Agreed!

However, convincing the browser vendors that this is in their interest is going to be challenging – especially when some of them have vested interests in keeping users on the non-feed Web.

Wednesday, July 3rd, 2024

Declare your AIndependence: block AI bots, scrapers and crawlers with a single click

This is a great move from Cloudflare. I may start using their service.

Monday, June 17th, 2024

AI Pollution – David Bushell – Freelance Web Design (UK)

AI is steeped in marketing drivel, built upon theft, and intent on replacing our creative output with a depressingly shallow imitation.

Blocking bots – Manu

Blocking the bots is step one.

Sunday, June 16th, 2024

Monday, October 2nd, 2023

Crawlers

A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.

I still think Chris put it best:

I just think it’s fuckin’ rude.

When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:

It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.

Alas, the current situation is opt-out. The onus is on us to update our robots.txt file.

Neil handily provides the current list to add to your file. Pass it on:

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:

User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /

There’s a bigger issue with robots.txt though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.

As Jim says:

I realized why I hadn’t yet added any rules to my robots.txt: I have zero faith in it.

That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:

So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your robots.txt.

Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess or in a .conf file, you can block user agents using mod_rewrite:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]

You’ll see that Google-Extended isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.

Thursday, July 14th, 2022

Body Margin 8px | Miriam Eric Suzanne

I love this kind of spelunking into the history of why things are they way they are on the web!

Here, Detective Chief Inspector Suzanne tries to get to the bottom of why every browser has eight pixels of margin applied to the body element in the user-agent stylesheet.

Monday, April 4th, 2022

UA gotta be kidding

Brian recounts the sordid messy history of user-agent strings.

I remember somebody once describing a user-agent string as “a reverse-chronological history of web browsers.”

Thursday, November 4th, 2021

A Web Browser Built for Me • Robin Rendle

What I want instead is an anarchist web browser.

What I’d really like to see is a browser that cuts things out, that takes things away from the web. Colors, fonts, confusion. Do you need an enormous JavaScript engine under the hood to power a modern web browser? I don’t think you do. Do you need all the extensions? All the latest CSS features? Nah, mate.

Throw away everything and start again and focus intensely about what people care about when it comes to the web.

Friday, July 9th, 2021

Bruce Lawson’s personal site  : prefers-reduced-motion and browser defaults

I think Bruce is onto something here:

It seems to me that browsers could do more to protect their users. Browsers are, after all, user agents that protect the visitor from pop-ups, malicious sites, autoplaying videos and other denizens of the underworld. They should also protect users against nausea and migraines, regardless of whether the developer thought to (or had the tools available to).

So, I propose that browsers should never respect scroll-behavior: smooth; if a user prefers reduced motion, regardless of whether a developer has set the media query.

Monday, May 24th, 2021

Principles of User Privacy (PUP)

This looks like an excellent proposal for agreement around discussing privacy on the web.

The section on user agents resonates with what I wrote recently about not considering Google Chrome a user agent any more:

Its fiduciary duties include:

  • Duty of Protection
  • Duty of Discretion
  • Duty of Honesty
  • Duty of Loyalty

Monday, March 8th, 2021

Progressive enhancement and accessibility redux - QuirksBlog

This is a really interesting take on the intersection between accessibility and progressive enhancement (which I always felt was there, but this expresses it well):

Accessibility aims to optimize an experience across a spectrum of user capabilities. Progressive enhancement aims to optimize an experience across a spectrum of user agent capabilities.

Indeed, if you broaden the definition of “user agent” to include a user’s physiology, I think the concepts become nearly identical.

Thursday, January 16th, 2020

Intent to Deprecate and Freeze: The User-Agent string - Google Groups

Excellent news! All the major browsers have agreed to freeze their user-agent strings, effectively making them a relic (which they kinda always were).

For many (most?) uses of UA sniffing today, a better tool for the job would be to use feature detection.

Monday, June 18th, 2018

User agent as a runtime | Blog | Decade City

Ultimately you can’t control when and how things go wrong but you do have some control over what happens. This is why progressive enhancement exists.

Tuesday, January 27th, 2015

Windows 10 Technical Preview IE UA String

I love Lyza’s comment on the par-for-the-course user-agent string of Microsoft’s brand new Spartan browser:

There must be an entire field emerging: UA archaeologist and lore historian. It’s starting to read like the “begats” in the bible. All browsers much connect their lineage to Konqueror or face a lack-of-legitimacy crisis!

Monday, August 4th, 2014

The Mobile Web should just work for everyone - IEBlog

One more reason why you should never sniff user-agent strings: Internet Explorer is going to lie some more. Can’t really blame them though—if developers didn’t insist on making spurious conclusions based on information in the user-agent string, then browsers wouldn’t have to lie.

Oh, and Internet Explorer is going to parse -webkit prefixed styles. Again, if developers hadn’t abused vendor prefixes, we wouldn’t be in this mess.

Monday, December 9th, 2013

Type Rendering Mix

I got excited when Tim Brown announced this at An Event Apart today: a small JavaScript tool for detecting what kind of rasterising and anti-aliasing a browser is using, and adding the appropriate classes to the root element (in much the same way that Web Font Loader does).

Alas, it turns out that it’s reliant on user-agent string sniffing. I guess that’s to be expected: this isn’t something that can be detected directly. Still, it feels a little fragile: whenever you use any user-agent sniffing tool you are entering an arms race that requires you to keep your code constantly updated.

Monday, February 25th, 2013

gaia/build/ua-override-prefs.js at master · mozilla-b2g/gaia · GitHub

And this is why user-agent sniffing not a future-friendly technique. A new mobile browser comes along, and it has to spoof a fake UA string to all of these sites.

It’s a Red Queen arms race.