Tags: mission

55

sparkline

Tuesday, June 25th, 2024

An origin trial for a new HTML <permission> element  |  Blog  |  Chrome for Developers

This looks interesting. On the hand, it’s yet another proprietary creation by one browser vendor (boo!), but on the other hand it’s a declarative API with no JavaScript required (yay!).

Even if this particular feature doesn’t work out, I hope that this is the start of a trend for declarative access to browser features.

Wednesday, February 28th, 2024

The Subversive Hyperlink - Jim Nielsen’s Blog

Subvert the status quo. Own a website. Make and share links.

Monday, October 2nd, 2023

Crawlers

A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.

I still think Chris put it best:

I just think it’s fuckin’ rude.

When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:

It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.

Alas, the current situation is opt-out. The onus is on us to update our robots.txt file.

Neil handily provides the current list to add to your file. Pass it on:

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:

User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /

There’s a bigger issue with robots.txt though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.

As Jim says:

I realized why I hadn’t yet added any rules to my robots.txt: I have zero faith in it.

That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:

So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your robots.txt.

Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess or in a .conf file, you can block user agents using mod_rewrite:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]

You’ll see that Google-Extended isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.

Wednesday, July 12th, 2023

Pulling my site from Google over AI training – Tracy Durnell

I’m not down with Google swallowing everything posted on the internet to train their generative AI models.

A principled approach to evolving choice and control for web content

This would mean a lot more if it happened before the wholesale harvesting of everyone’s work.

But I’m sure Google will put a mighty fine lock on that stable door that the horse bolted from.

Tuesday, July 11th, 2023

Permission

Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Wednesday, March 15th, 2023

How slimmed-down websites can cut their carbon emissions - BBC News

Interesting to see an article on web performance on the BBC. Perhaps we should be emphasising green over speed?

Behind the scenes, animation and interaction effects were added using HTML and CSS, two fundamental web languages. That meant there was no need to download large JavaScript files often used to do this on other sites.

Tuesday, January 24th, 2023

Inside the Globus INK: a mechanical navigation computer for Soviet spaceflight

The positively steampunk piece of hardware used for tracking Alexei Leonov’s Apollo-Soyuz mission.

Monday, January 16th, 2023

Mars distracts

A few years ago, I wrote about how much I enjoyed the book Aurora by Kim Stanley Robinson.

Not everyone liked that book. A lot of people were put off by its structure, in which the dream of interstellar colonisation meets the harsh truth of reality and the book follows where that leads. It pours cold water over the very idea of humanity becoming interplanetary.

But our own solar system is doable, right? I mean, Kim Stanley Robinson is the guy who wrote the Mars trilogy and 2312, both of which depict solar system colonisation in just a few centuries.

I wonder if the author might regret the way that some have taken his Mars trilogy as a sort of manual, Torment Nexus style. Kim Stanley Robinson is very much concerned with this planet in this time period, but others use his work to do the opposite.

But the backlash to Mars has begun.

Maciej wrote Why Not Mars:

The goal of this essay is to persuade you that we shouldn’t send human beings to Mars, at least not anytime soon. Landing on Mars with existing technology would be a destructive, wasteful stunt whose only legacy would be to ruin the greatest natural history experiment in the Solar System. It would no more open a new era of spaceflight than a Phoenician sailor crossing the Atlantic in 500 B.C. would have opened up the New World. And it wouldn’t even be that much fun.

Manu Saadia is writing a book about humanity in space, and he has a corresponding newsletter called Against Mars: Space Colonization and its Discontents:

What if space colonization was merely science-fiction, a narrative, or rather a meta-narrative, a myth, an ideology like any other? And therefore, how and why did it catch on? What is so special and so urgent about space colonization that countless scientists, engineers, government officials, billionaire oligarchs and indeed, entire nations, have committed work, ingenuity and treasure to make it a reality.

What if, and hear me out, space colonization was all bullshit?

I mean that quite literally. No hyperbole. Once you peer under the hood, or the nose, of the rocket ship, you encounter a seemingly inexhaustible supply of ghoulish garbage.

Two years ago, Shannon Stirone went into the details of why Mars Is a Hellhole

The central thing about Mars is that it is not Earth, not even close. In fact, the only things our planet and Mars really have in common is that both are rocky planets with some water ice and both have robots (and Mars doesn’t even have that many).

Perhaps the most damning indictment of the case for Mars colonisation is that its most ardent advocate turns out to be an idiotic small-minded eugenicist who can’t even run a social media company, much less a crewed expedition to another planet.

But let’s be clear: we’re talking here about the proposition of sending humans to Mars—ugly bags of mostly water that probably wouldn’t survive. Robots and other uncrewed missions in our solar system …more of that, please!

Monday, January 2nd, 2023

Why Not Mars (Idle Words)

I’ve come to believe the best way to look at our Mars program is as a faith-based initiative. There is a small cohort of people who really believe in going to Mars, the way some people believe in ghosts or cryptocurrency, and this group has an outsize effect on our space program.

Maciej lays out the case against a crewed mission to Mars.

Like George Lucas preparing to release another awful prequel, NASA is hoping that cool spaceships and nostalgia will be enough to keep everyone from noticing that their story makes no sense. But you can’t lie your way to Mars, no matter how sincerely you believe in what you’re doing.

And don’t skip the footnotes:

Fourth graders writing to Santa make a stronger case for an X-Box than NASA has been able to put together for a Mars landing.

Sunday, December 11th, 2022

Artemis I | Flickr

NASA is posting some lovely pictures on Flickr from the first Artemis mission.

Flight Day 20: Orion and Our Moon

Wednesday, March 30th, 2022

Make Beautifully Resilient Apps With Progressive Enhancement

You had me at “beautifully resilient apps with progressive enhancement”.

This is a great clear walkthrough of enhancing a form submission. A lot of this seems like first principles to me, but if you’ve only ever built single page apps, then thinking about a server-submission process first might well be revelatory.

Saturday, February 5th, 2022

Is Momentum Shifting Toward a Ban on Behavioral Advertising? – The Markup

I really hope that Betteridge’s Law doesn’t apply to this headline.

Monday, November 29th, 2021

Google, Facebook hiding behind skirts of small business

While the dream of “personalized” ads has turned out to be mostly a nightmare, adtech has built some of the wealthiest companies in the world based on tracking us. It’s no surprise to me that as Members of the European Parliament contemplate tackling these many harms, Big Tech is throwing millions of Euros behind a “necessary evil” PR defense for its business model.

But tracking is an unnecessary evil.

Yes! This!

Even in today’s tracking-obsessed digital ecosystem it’s perfectly possible to target ads successfully without placing people under surveillance. In fact right now, some of the most effective and highly valued online advertising is contextual — based on search terms, other non-tracking based data, and the context of websites rather than intrusive, dangerous surveillance.

Let’s be clear. Advertising is essential for small and medium size businesses, but tracking is not.

Rather than creating advertising that is more relevant, more timely and more likable we are creating advertising that is more annoying, more disliked, and more avoided.

I promise you, the minute tracking is outlawed, Facebook, Google and the rest of the adtech giants will claim that their new targeting mechanisms (whatever they turn out to be) are superior to tracking.

UK ICO: surveillance advertising is dead

Behavioral ads are only more profitable than context ads if all the costs of surveillance – the emotional burden of being watched; the risk of breach, identity-theft and fraud; the potential for government seizure of surveillance data – is pushed onto internet users. If companies have to bear those costs, behavioral ads are a total failure, because no one in the history of the human race would actually grant consent to all the things that gets done with our data.

Tuesday, November 10th, 2020

HTML Forms: How (and Why) to Prevent Double Form Submissions – Bram.us

I can see how this would be good to have fixed at the browser level.

Sunday, May 17th, 2020

Why is this interesting? - The Transmission Edition

Looking at COVID-19 through the lens of pace layers.

…a citizen could actually play a part that was as important as a vaccine, but instead of preventing transmission of the virus into another cell at the ACE receptor level, it’s preventing transmission of the virus at the social network level. So we’re actually adopting a kind of behavioral vaccine policy, by voluntarily or otherwise self-isolating.

Friday, April 10th, 2020

Apollo 13 in Real Time

It was fifty years ago this weekend. Follow along here, timeshifted by half a century.

Friday, April 3rd, 2020

The Worm is Back! | NASA

The return of NASA’s iconic “worm” logo (for some missions).