screen time

Why AI Search Blew Up in Google’s Face

Photo: Google AI

Last week, if you asked Google how to keep cheese from sliding off pizza, it might have told you to add some glue to your sauce — nontoxic, of course. It had some other surprising information to share with users, too. How many rocks should you eat? At least one per day (small). Barack Obama was America’s first and only Muslim president; also, 13 others attended the University of Wisconsin-Madison. You can pass kidney stones by drinking liters of urine. Staring at the sun for five to 15 minutes is “generally safe.” Some say, according to Google, “that if you run off a cliff, you can stay in the air as long as you keep running and don’t look down.”

This was the wide release of Google’s AI Overviews, formerly known as Search Generative Experience, which promises to answer questions for searchers in plain language instead of the expected series of links. It was the second viral PR disaster of the year for Google’s AI products after its Gemini chatbot and image generator went viral for absurdist historical revisionism. These examples were, according to Google, rare and unrepresentative. The “vast majority of AI Overviews provide high-quality information,” the company told reporters, while the bizarre examples above represent “uncommon queries.” Which, sure, maybe, but as the company well knows, people use Google a lot — often dozens of times a day and in some cases hundreds. Strange, funny, or alarming errors delivered confidently at the top of a search page don’t have to be that common to create the impression that Google is profoundly broken.

The company is manually scrubbing the most egregious examples from its platform, but wildly wrong answers still show up every once in a while, and slightly wrong answers show up most days I use it (this has been true for the year I’ve been testing it as well). In other words, the problem Google has created for itself isn’t a simple one to fix. Google, which probably has as much institutional knowledge about AI as any competing company on earth, seems prone to overestimating the capabilities of its technology and underestimating the complexity of the tasks to which it’s assigned. While Google’s nonsense AI answers are mostly funny, they’re also a preview of things to come. As other tech companies follow Google’s lead — and as corporate America turns millions of vague meetings about AI into concrete plans — we can all expect to eat a little bit of glue.

AI Overviews are easy enough to describe in general terms: They’re Google’s answers to your queries. This is helpful for Google in that the company can roll out a huge new feature to billions of customers without explaining very much; it’s also incredibly unhelpful to Google in that it sets very high expectations. Google’s AI answers are often basically correct, in my experience, but too vague to be helpful. More specific and helpful answers are undermined by user uncertainty, so you end up double-checking them, discovering in the process that Google has done something close to plagiarism. When the feature really works and synthesizes something specific and germane from a variety of sources, which is in many ways a remarkable technological feat, the user’s minimum expectation has merely been met. It’s a fairly unforgiving scenario for Google but one the company has very much created for itself.

What AI Overviews are actually doing, though, is a bit less obvious and more specific than “answering questions.” In Google’s language, the feature is intended to “take the legwork out of searching.” It does this, according to the company, by combining Gemini’s “multi-step reasoning, planning, and multimodality” with Google’s “best-in-class Search systems.” Like most AI initiatives, this is an attempt at automation. But what Google is automating here, mostly, is the work previously done by its users. It’s automating us

Google gets close to framing it as such: By synthesizing search results into answers, the company removes steps, reduces friction, improves productivity, etc. It “does the Googling for you.” But this helps explain why such a feature is so hard to get right. Google as most people know it is a complicated tool with which users have decades of experience. We have habits, routines, tricks, and expectations — we are, in our highly personal ways, practiced at using a complicated machine. It’s a bit like driving a car. It’s something lots of people do, with a wide range of skill and attentiveness, to a vast range of ends, with real risks that individual drivers mostly find tolerable.

Similarly, although most people probably don’t think of it this way, using Google is also work. In exchange for considerable utility, the tool asks a lot of its users. It needs them to vet information, wade through ads, and consider sources and context. It gives them lots of options to sort through, in part because it’s good for Google’s business to keep people searching and clicking and tapping and in part because in many cases the best Google can do is present a bunch of possible results that, together, get the user where they need to go.

Before it tried to synthesize its own results, what Google had been automating was link sorting and retrieval. It was, and is, a machine for surfacing material from the web in response to user requests. As any user can attest, this was an automation problem that Google never fully solved because it’s unsolvable on some level — just as a lot of questions don’t have simple agreed-upon answers, a lot of queries can’t be satisfied by a single link or even many. Search results going into the AI era were already full of noise, errors, and irrelevant content. This was the result, in part, of the perverse incentives created by having a single advertising-supported company situated at the center of the web and a marketplace full of publishers who tried to manipulate Google to their own ends. But it was also the result of the web being a fundamentally human place full of unpredictable human behavior, desires, and needs, a lot like the open road, or a retail store, or a warehouse floor.

Google is a search engine, not a social network, but using Google is fundamentally about interacting with other people — albeit at a distance, through multiple layers of obfuscation, each with its own distorting effects. Google draws from commercial websites, scam websites, online communities, and reference resources. There is evidence of unpredictable humanity everywhere. The content Google serves to its users is created (mostly) by people, who are unreliable, disagree with one another, and publish things online for millions of different reasons: to inform; to persuade; to mislead; to make money; to be funny; to be cruel; to pass time. Regular Google users become fairly fluent in the strange language of a search-results page, which they peruse, parse, reset, and refine as needed. Understanding that you’ll encounter some nonsense, scams, jokes, and ads on the way to finding what you’re looking for, or realizing that you won’t, is part of the job of using Google. By attempting to automate this job, Google has revealed — and maybe discovered — just how hard it is and how alien its understanding of its own users has become.

It’s easy to see, from Google’s perspective, how automating more of the search process is incredibly obvious: Initially, it could scrape, gather, and sort; now, with its powerful artificial-intelligence tools, it can read, interpret, and summarize. But Google seems manifestly unable to see its products from the perspective of its users, who actually deal with the mess of using them and for whom ignoring a link to The Onion (the “eat rocks” result) or to an old joke Reddit thread (the “eat glue” result) is obvious, done without thought, and also the result of years of living in, learning from, and adapting to the actual web. AI Overviews suggest a level of confidence by Google not just in its new software for extracting and summarizing information but in its ability to suss out what its users are asking in the first place and in the ability of its much older underlying search product to contain and surface the right information in the first place.

For the many sorts of questions that can’t be answered to every user’s satisfaction with a couple of paragraphs of summary text, Google’s best option will be not to try (already, many search categories don’t trigger AI results). That AI Overviews seem quite good at producing plausible answers to simple questions is complicated by the fact that large-language-model-powered AI isn’t great at, for example, basic math.

There are echoes, here, of the run-up to self-driving cars, in which legitimately massive leaps forward in software and hardware capability made it possible to automate a wide range of driving tasks, leading to an industry consensus by the mid-2010s that autonomous vehicles were just around the corner and would be ubiquitous by the early 2020s. Today, autonomous vehicles exist but with more caveats than promised. There are driverless taxis that operate cautiously in limited markets with mixed results. Tesla claims — then denies that it claims — to offer “full self-driving” features in its popular cars, which are genuinely shocking to use on first encounter and remain prone to catastrophic failure, largely because conditions on real roads, including other drivers, are unpredictable. Countless cars now have driver-assist features derived from similar technologies, and there remains a plausible path forward to a world where most cars do most driving most of the time, but not for a while and not before the industry discovers the many ways that automating the incredibly complicated task of driving on public roads is harder than the initial pace of progress initially suggested. The automotive industry also learned some early lessons about the punishing expectations of customers who’ve been promised full automation. Drivers tolerate a lot of mess, danger, and mortal risk when they’re behind the wheel and feel responsible for their fates. When a machine is in charge — when it’s attempting to synthesize answers to questions like “Is that a train?” — passenger tolerance for errors is somewhere near zero.

Likewise, while Google users are accustomed to dealing with a fair amount of bullshit in search results, they’ll probably be less tolerant of a Google that occasionally scoops up some of that bullshit and insists it’s something else.

Elsewhere on our devices, online, or at work, we should expect to see this pattern emerge at scale. Surveys about AI suggest that people are more bullish about its prospects outside their areas of expertise. In some cases, this could just be a bias toward self-preservation or a mild delusion in the face of inevitability. In others, perhaps they simply understand better than most that their jobs are more complicated than people outside their industries tend to assume — not necessarily harder but messier. The intoxicating promise of efficiency combined with a poorly understood technology is going to result in some truly ill-advised attempts at automation by companies that, in some cases, won’t be entirely sure what they’re automating. If Google couldn’t avoid this trap, others will follow.

Why AI Search Blew Up in Google’s Face