Journal tags: generative

10

sparkline

Unsaid

I went to the UX Brighton conference yesterday.

The quality of the presentations was really good this year, probably the best yet. Usually there are one or two stand-out speakers (like Tom Kerwin last year), but this year, the standard felt very high to me.

But…

The theme of the conference was UX and “AI”, and I’ve never been more disappointed by what wasn’t said at a conference.

Not a single speaker addressed where the training data for current large language models comes from (it comes from scraping other people’s copyrighted creative works).

Not a single speaker addressed the energy requirements for current large language models (the requirements are absolutely mahoosive—not just for the training, but for each and every query).

My charitable reading of the situation yesterday was that every speaker assumed that someone else would cover those issues.

The less charitable reading is that this was a deliberate decision.

Whenever the issue of ethics came up, it was only ever in relation to how we might use these tools: considering user needs, being transparent, all that good stuff. But never once did the question arise of whether it’s ethical to even use these tools.

In fact, the message was often the opposite: words like “responsibility” and “duty” came up, but only in the admonition that UX designers have a responsibility and duty to use these tools! And if that carrot didn’t work, there’s always the stick of scaring you into using these tools for fear of being left behind and having a machine replace you.

I was left feeling somewhat depressed about the deliberately narrow focus. Maggie’s talk was the only one that dealt with any externalities, looking at how the firehose of slop is blasting away at society. But again, the focus was only ever on how these tools are used or abused; nobody addressed the possibility of deliberately choosing not to use them.

If audience members weren’t yet using generative tools in their daily work, the assumption was that they were lagging behind and it was only a matter of time before they’d get on board the hype train. There was no room for the idea that someone might examine the roots of these tools and make a conscious choice not to fund their development.

There’s a quote by Finnish architect Eliel Saarinen that UX designers like repeating:

Always design a thing by considering it in its next larger context. A chair in a room, a room in a house, a house in an environment, an environment in a city plan.

But none of the speakers at UX Brighton chose to examine the larger context of the tools they were encouraging us to use.

One speaker told us “Be curious!”, but clearly that curiosity should not extend to the foundations of the tools themselves. Ignore what’s behind the curtain. Instead look at all the cool stuff we can do now. Don’t worry about the fact that everything you do with these tools is built on a bedrock of exploitation and environmental harm. We should instead blithely build a new generation of user interfaces on the burial ground of human culture.

Whenever I get into a discussion about these issues, it always seems to come back ’round to whether these tools are actually any good or not. People point to the genuinely useful tasks they can accomplish. But that’s not my issue. There are absolutely smart and efficient ways to use large language models—in some situations, it’s like suddenly having a superpower. But as Molly White puts it:

The benefits, though extant, seem to pale in comparison to the costs.

There are no ethical uses of current large language models.

And if you believe that the ethical issues will somehow be ironed out in future iterations, then that’s all the more reason to stop using the current crop of exploitative large language models.

Anyway, like I said, all the talks at UX Brighton were very good. But I just wish just one of them had addressed the underlying questions that any good UX designer should ask: “Where did this data come from? What are the second-order effects of deploying this technology?”

Having a talk on those topics would’ve been nice, but I would’ve settled for having five minutes of one talk, or even one minute. But there was nothing.

There’s one possible explanation for this glaring absence that’s quite depressing to consider. It may be that these topics weren’t covered because there’s an assumption that everybody already knows about them, and frankly, doesn’t care.

To use an outdated movie reference, imagine a raving Charlton Heston shouting that “Soylent Green is people!”, only to be met with indifference. “Everyone knows Soylent Green is people. So what?”

Mismatch

This seems to be the attitude of many of my fellow nerds—designers and developers—when presented with tools based on large language models that produce dubious outputs based on the unethical harvesting of other people’s work and requiring staggering amounts of energy to run:

This is the future! I need to start using these tools now, even if they’re flawed, because otherwise I’ll be left behind. They’ll only get better. It’s inevitable.

Whereas this seems to be the attitude of those same designers and developers when faced with stable browser features that can be safely used today without frameworks or libraries:

I’m sceptical.

What price?

I’ve noticed a really strange justification from people when I ask them about their use of generative tools that use large language models (colloquially and inaccurately labelled as artificial intelligence).

I’ll point out that the training data requires the wholesale harvesting of creative works without compensation. I’ll also point out the ludicrously profligate energy use required not just for the training, but for the subsequent queries.

And here’s the thing: people will acknowledge those harms but they will justify their actions by saying “these things will get better!”

First of all, there’s no evidence to back that up.

If anything, as the well gets poisoned by their own outputs, large language models may well end up eating their own slop and getting their own version of mad cow disease. So this might be as good as they’re ever going to get.

And when it comes to energy usage, all the signals from NVIDIA, OpenAI, and others are that power usage is going to increase, not decrease.

But secondly, what the hell kind of logic is that?

It’s like saying “It’s okay for me to drive my gas-guzzling SUV now, because in the future I’ll be driving an electric vehicle.”

The logic is completely backwards! If large language models are going to improve their ethical shortcomings (which is debatable, but let’s be generous), then that’s all the more reason to avoid using the current crop of egregiously damaging tools.

You don’t get companies to change their behaviour by rewarding them for it. If you really want better behaviour from the purveyors of generative tools, you should be boycotting the current offerings.

I suspect that most people know full well that the “they’ll get better!” defence doesn’t hold water. But you can convince yourself of anything when everyone around is telling you that this is the future baby, and you’d better get on board or you’ll be left behind.

Baldur reminds us that this is how people talked about asbestos:

Every time you had an industry campaign against an asbestos ban, they used the same rhetoric. They focused on the potential benefits – cheaper spare parts for cars, cheaper water purification – and doing so implicitly assumed that deaths and destroyed lives, were a low price to pay.

This is the same strategy that’s being used by those who today talk about finding productive uses for generative models without even so much as gesturing towards mitigating or preventing the societal or environmental harms.

It reminds me of the classic Ursula Le Guin short story, The Ones Who Walk Away from Omelas that depicts:

…the utopian city of Omelas, whose prosperity depends on the perpetual misery of a single child.

Once citizens are old enough to know the truth, most, though initially shocked and disgusted, ultimately acquiesce to this one injustice that secures the happiness of the rest of the city.

It turns out that most people will blithely accept injustice and suffering not for a utopia, but just for some bland hallucinated slop.

Don’t get me wrong: I’m not saying large language models aren’t without their uses. I love seeing what Simon and Matt are doing when it comes to coding. And large language models can be great for transforming content from one format to another, like transcribing speech into text. But the balance sheet just doesn’t add up.

As Molly White put it: AI isn’t useless. But is it worth it?:

Even as someone who has used them and found them helpful, it’s remarkable to see the gap between what they can do and what their promoters promise they will someday be able to do. The benefits, though extant, seem to pale in comparison to the costs.

Trust

In their rush to cram in “AI” “features”, it seems to me that many companies don’t actually understand why people use their products.

Google is acting as though its greatest asset is its search engine. Same with Bing.

Mozilla Developer Network is acting as though its greatest asset is its documentation. Same with Stack Overflow.

But their greatest asset is actually trust.

If I use a search engine I need to be able to trust that the filtering is good. If I look up documentation I need to trust that the information is good. I don’t expect perfection, but I also don’t expect to have to constantly be thinking “was this generated by a large language model, and if so, how can I know it’s not hallucinating?”

“But”, the apologists will respond, “the results are mostly correct! The documentation is mostly true!”

Sure, but as Terence puts it:

The intern who files most things perfectly but has, more than once, tipped an entire cup of coffee into the filing cabinet is going to be remembered as “that klutzy intern we had to fire.”

Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.

I am honestly astonished that so many companies don’t seem to realise what they’re destroying.

InstAI

If you use Instagram, there may be a message buried in your notifications. It begins:

We’re getting ready to expand our AI at Meta experiences to your region.

Fuck that. Here’s the important bit:

To help bring these experiences to you, we’ll now rely on the legal basis called legitimate interests for using your information to develop and improve AI at Meta. This means that you have the right to object to how your information is used for these purposes. If your objection is honoured, it will be applied going forwards.

Follow that link and fill in the form. For the field labelled “Please tell us how this processing impacts you” I wrote:

It’s fucking rude.

That did the trick. I got an email saying:

We’ve reviewed your request and will honor your objection.

Mind you, there’s still this:

We may still process information about you to develop and improve AI at Meta, even if you object or don’t use our products and services.

Continuous partial ick

The output of generative tools based on large language models gives me the ick.

This isn’t a measured logical response. It’s more of an involuntary emotional reaction.

I could try to justify my reaction by saying I’m concerned about the exploitation involved in the training data, or the huge energy costs involved, or the disenfranchisement of people who create art. But those would be post-facto rationalisations.

I just find myself wrinkling my nose and mentally going “Ew!” whenever somebody posts the output of some prompt they gave to ChatGPT or Midjourney.

Again, I’m not saying this is rational. It’s more instinctual.

You could well say that this is my problem. You may be right. But I wonder what it is that’s so unheimlich about these outputs that triggers my response.

Just to clarify, I am talking about direct outputs, shared verbatim. If someone were to use one of these tools in the process of creating something I’d be none the wiser. I probably couldn’t even tell that a large language model was involved at some point. I’m fine with that. It’s when someone takes something directly from one of these tools and then shares it online, that’s what raises my bile.

I was at a conference a few months back where your badge featured a hallucinated picture of you. Now, this probably sounded like a fun idea. It probably is a fun idea. I can’t tell. All I know is that it made me feel a little queasy.

Perhaps it’s a question of taste. In which case, I’m being a snob. I’m literally turning my nose up at something I deem to be tacky.

But isn’t it tacky, though? It’s not something I can describe, but there’s just something about the vibe of these images—and words—that feels off. It’s sort of creepy, but it’s mostly just the mediocrity that sits so uneasily with me.

These tools do an amazing job of solving the quantity problem—how to produce an image or piece of text quickly. And by most measurements, you could say that they also solve the quality problem. These outputs are good enough to pass for “the real thing.” The outputs are, like, 90% to 95% there. And the gap is closing.

And yet. There’s something in that gap. Something that I feel in my gut. Something that makes me go “nope.”

Automation

I just described prototype code as code to be thrown away. On that topic…

I’ve been observing how people are programming with large language models and I’ve seen a few trends.

The first thing that just about everyone agrees on is that the code produced by a generative tool is not fit for public consumption. At least not straight away. It definitely needs to be checked and tested. If you enjoy debugging and doing code reviews, this might be right up your street.

The other option is to not use these tools for production code at all. Instead use them for throwaway code. That could be prototyping. But it could also be the code for those annoying admin tasks that you don’t do very often.

Take content migration. Say you need to grab a data dump, do some operations on the data to transform it in some way, and then pipe the results into a new content management system.

That’s almost certainly something you’d want to automate with bespoke code. Once the content migration is done, the code can be thrown away.

Read Matt’s account of coding up his Braggoscope. The code needed to spider a thousand web pages, extract data from those pages, find similarities, and output the newly-structured data in a different format.

I’ve noticed that these are just the kind of tasks that large language models are pretty good at. In effect you’re training the tool on your own very specific data and getting it to do your drudge work for you.

To me, it feels right that the usefulness happens on your own machine. You don’t put the machine-generated code in front of other humans.

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

You can call me AI

I’ve mentioned before that I’m not a fan of initialisms and acronyms. They can be exclusionary.

It bothers me doubly when everyone is talking about AI.

First of all, the term is so vague as to be meaningless. Sometimes—though rarely—AI refers to general artificial intelligence. Sometimes AI refers to machine learning. Sometimes AI refers to large language models. Sometimes AI refers to a series of if/else statements. That’s quite a spectrum of meaning.

Secondly, there’s the assumption that everyone understands the abbreviation. I guess that’s generally a safe assumption, but sometimes AI could refer to something other than artificial intelligence.

In countries with plenty of pastoral agriculture, if someone works in AI, it usually means they’re going from farm to farm either extracting or injecting animal semen. AI stands for artificial insemination.

I think that abbreviation might work better for the kind of things currently described as using AI.

We were discussing this hot topic at work recently. Is AI coming for our jobs? The consensus was maybe, but only the parts of our jobs that we’re more than happy to have automated. Like summarising some some findings. Or perhaps as a kind of lorem ipsum generator. Or for just getting the ball rolling with a design direction. As Terence puts it:

Midjourney is great for a first draft. If, like me, you struggle to give shape to your ideas then it is nothing short of magic. It gets you through the first 90% of the hard work. It’s then up to you to refine things.

That’s pretty much the conclusion we came to in our discussion at Clearleft. There’s no way that we’d use this technology to generate outputs for clients, but we certainly might use it to generate inputs. It’s like how we’d do a quick round of sketching to get a bunch of different ideas out into the open. Terence is spot on when he says:

Midjourney lets me quickly be wrong in an interesting direction.

To put it another way, using a large language model could be a way of artificially injecting some seeds of ideas. Artificial insemination.

So now when I hear people talk about using AI to create images or articles, I don’t get frustrated. Instead I think, “Using artificial insemination to create images or articles? Yes, that sounds about right.”

Design research on the Clearleft podcast

We’re halfway through the third season of the Clearleft podcast already!

Episode three is all about design research. I like the narrative structure of this. It’s a bit like a whodunnit, but it’s more like a whydunnit. The “why” question is “why aren’t companies hiring more researchers?”

The scene of the crime is this year’s UX Fest, where talks by both Teresa Torres and Gregg Bernstein uncovered the shocking lack of researchers. From there, I take up the investigation with Maite Otondo and Stephanie Troeth.

I won’t spoil it but by the end there’s an answer to the mystery.

I learned a lot along the way too. I realised how many axes of research there are. There’s qualitative research (stories, emotion, and context) and then there’s quantitative research (volume and data). But there’s also evualative research (testing a hyphothesis) and generative research (exploring a problem space before creating a solution). By my count that gives four possible combos: qualitative evaluative research, quantitative evaluative research, qualitative generative research, and quantitative generative research. Phew!

Steph was a terrific guest. Only a fraction of our conversation made it into the episode, but we chatted for ages.

And Maite kind of blew my mind too, especially when she was talking about the relationship between research and design and she said:

Research is about the present and design is about the future.

🤯

I’m going to use that quote again in a future episode. In fact, this episode on design research leads directly into the next two episodes. You won’t want to miss them. So if you’re not already subscribed to the Clearleft podcast, you should get on that, whether it’s via the RSS feed, Apple, Google, Spotify, Overcast, or wherever you get your podcasts from.

Have a listen to this episode on design research and if you’re a researcher yourself, remember that unlike most companies we value research at Clearleft and that’s why we’re hiring another researcher right now. Come and work with us!