TikTok Will Never Die
∞ Jan 18, 2025Damon Beres in the Atlantic Intelligence newsletter:
“Although it was not the first app to offer an endless feed, and it was certainly not the first to use algorithms to better understand and target its users, TikTok put these ingredients together like nothing else before it.” The app was so effective—so sticky—that every meaningful competitor tried to copy its formula. Now TikTok-like feeds have been integrated into Instagram, Facebook, Snapchat, YouTube, X, even LinkedIn.
Today, AI is frequently conflated with generative AI because of the way ChatGPT has captured the world’s imagination. But generative AI is still a largely speculative endeavor. The most widespread and influential AI programs are the less flashy ones quietly whirring away in your pocket, influencing culture, business, and (in this case) matters of national security in very real ways.
MS Copilot Flying Straight Into the Mountain
∞ Jan 10, 2025AI agents have lately captured the industry’s imagination (and marketing communications) in recent months. Agents work on their own; they set and pursue goals, make decisions about how to achieve them, and take action across multiple systems until they decide the goal is complete. Vaclav Vincalek ponders what happens when anyone can create and set these loose.
Now imagine that anyone in the organization will be able to create, connect, interact with a ‘constellation of agents.’
Perhaps you don’t see this as a problem.
That only means that you were never responsible for technology within your organization.
Maybe you had a glimpse in the news about all the latest threats from viruses, phishing or other various forms of hacking. Every IT department is trying to stay above water just to safely run what they have now.
These departments are managing networks, firewalls, desktops, laptops, people working remotely, integrating applications, running backups and updates.
The list is longer than you can imagine.
Thanks to Microsoft, you will add to the mix an ability for anyone in the company to automate any task to ‘orchestrate business processes ranging from lead generation, to sales order processing, to confirming order deliveries.’
What could possibly go wrong?
Look at the person sitting in the cubicle next to you (or in the next square on your Zoom call).
Would you trust the person with any work automation, or do you still question that person’s ability to differentiate between a left and right mouse click?
When Combinations of Humans and A.I. are Useful
∞ Nov 10, 2024This study from MIT researchers raises some challenging questions about collaborative AI interfaces, “human in the loop” supervision, and the value of explaining AI logic and confidence.
Their meta-study looked at over 100 experiments of humans and AI working both separately and together to accomplish tasks. They found that some tasks benefited a ton from human-AI teamwork, while others got worse from the pairing.
Poor Performers Make Poor Supervisors
For tasks where humans working solo do worse than AI, the study found that putting humans in the loop to make final decisions actually delivers worse results. For example, in a task to detect fake reviews, AI working alone achieved 73% accuracy, while humans hit 55%—but the combined human-AI system landed at 69%, watering down what AI could do alone.
In these scenarios, people oscillate between over-reliance (“using suggestions as strong guidelines without seeking and processing more information”) and under-reliance (“ignoring suggestions because of adverse attitudes towards automation”).
Since the people were less accurate, in general, than the AI algorithms, they were also not good at deciding when to trust the algorithms and when to trust their own judgement, so their participation resulted in lower overall performance than for the AI algorithm alone.
Takeaway: “Human in the loop” may be an anti-pattern for certain tasks where AI is more high-performing. Measure results; don’t assume that human judgment always makes things better.
Explanations Didn’t Help
The study found that common design patterns like AI explanations and confidence scores showed no significant impact on performance for human-AI collaborative systems. “These factors have received much attention in recent years [but] do not impact the effectiveness of human-AI collaboration,” the study found.
Given our result that, on average across our 300+ effect sizes, they do not impact the effectiveness of human-AI collaboration, we think researchers may wish to de-emphasize this line of inquiry and instead shift focus to the significant and less researched moderators we identified: the baseline performance of the human and AI alone, the type of task they perform, and the division of labour between them.
Takeaway: Transparency doesn’t always engage the best human judgment; explanations and confidence scores need refinement—or an entirely new alternative. I suspect that changing the form, manner, or tone of these explanations could improve outcomes, but also: Are there different ways to better engage critical thinking and productive skepticism?
Creative Tasks FTW
The study found that human-AI collaboration was most effective for open-ended creative and generative tasks—but worse at decision-making tasks to choose between defined options. For those decision-making tasks, either humans or AI did better working alone.
We hypothesize that this advantage for creation tasks occurs because even when creation tasks require the use of creativity, knowledge or insight for which humans perform better, they often also involve substantial amounts of somewhat routine generation of additional content that AI can perform as well as or better than humans.
This is a great example of “let humans do what they do best, and let machines do what they do best.” They’re rarely the same thing. And creative/generative tasks tend to have elements of each, where humans excel at creative judgment, and the machines excel at production/execution.
Takeaway: Focus human-machine collaboration on creative and generative tasks; humans and AI may handle decision-making tasks better solo.
Divide and Conquer
A very small number of experiments in the study split tasks between human and machine intelligence based on respectie strengths. While only three of the 100+ experiments explored this approach, the researchers hypothesized that “better results might have been obtained if the experimenters had designed processes in which the AI systems did only the parts of the task for which they were clearly better than humans.” This suggests an opportunity for designers to explore more intentional division of labor in human-AI interfaces. Break out your journey maps, friends.
Takeaway: Divvy up and define tasks narrowly around the demonstrated strengths of both humans and machines, and make responsibilities clear for each.
This AI Pioneer Thinks AI Is Dumber Than a Cat
∞ Oct 14, 2024Christopher Mims of the Wall Street Journal profiles Yann LeCun, AI pioneer and senior researcher at Meta. As you’d expect, LeCun is a big believer in machine intelligence—but has no illusions about the limitations of the current crop of generative AI models. Their talent for language distracts us from their shortcomings:
Today’s models are really just predicting the next word in a text, he says. But they’re so good at this that they fool us. And because of their enormous memory capacity, they can seem to be reasoning, when in fact they’re merely regurgitating information they’ve already been trained on.
“We are used to the idea that people or entities that can express themselves, or manipulate language, are smart—but that’s not true,” says LeCun. “You can manipulate language and not be smart, and that’s basically what LLMs are demonstrating.”
As I’m fond of saying, these are not answer machines, they’re dream machines: “When you ask generative AI for an answer, it’s not giving you the answer; it knows only how to give you something that looks like an answer.”
LLMs are fact-challenged and reasoning-incapable. But they are fantastic at language and communication. Instead of relying on them to give answers, the best bet is to rely on them to drive interfaces and interactions. Treat machine-generated results as signals, not facts. Communicate with them as interpreters, not truth-tellers.
Beware of Botshit
∞ Oct 13, 2024botshit noun: hallucinated chatbot content that is uncritically used by a human for communication and decision-making tasks. “The company withdrew the whitepaper due to excessive botshit, after the authors relied on unverified machine-generated research summaries.”
From this academic paper on managing the risks of using generated content to perform tasks:
Generative chatbots do this work by ‘predicting’ responses rather than ‘knowing’ the meaning of their responses. This means chatbots can produce coherent sounding but inaccurate or fabricated content, referred to as ‘hallucinations’. When humans use this untruthful content for tasks, it becomes what we call ‘botshit’.
See also: slop.
A Radically Adaptive World Model
∞ Oct 13, 2024Ethan Mollick posted this nifty little demo of a research project that generates a world based on Counter-Strike, frame by frame in response to your actions. What’s around that corner at the end of the street? Nothing, that portion of the world hasn’t been created yet—until you turn in that direction, and the world is created just for you in that moment.
This is not a post that proposes the future of gaming or that tech will replace well-crafted game worlds and the people who make them. This proof of concept is nowhere near ready or good enough for that, except perhaps as a tool to assist/support game authors.
Instead, it’s interesting as a remarkable example of a radically adaptive interface, a core aspect of Sentient Design experiences. The demo and the research paper behind it show a whole world being conceived, compiled, and delivered in real time. What happens when you apply this thinking to a web experience? To a data dashboard? To a chat interface? To a calculator app that lets you turn a blank canvas into a one-of-a-kind on-demand interface?
The risk of radically adaptive interfaces is that they turn into robot fever dreams without shape or destination. That’s where design comes in: to conceive and apply thoughtful constraints and guardrails. It’s weird and hairy and different from what’s come before.
Far from replacing designers (or game creators), these experiences require designers more than ever. But we have to learn some new skills and point them in new directions.
Exploring the AI Solution Space
∞ Oct 13, 2024Jorge Arango explores what it means for machine intelligence to be “used well” and, in particular, questions the current fascination with general-purpose, open-ended chat interfaces.
There are obvious challenges here. For one, this is the first time weâve interacted with systems that match our linguistic abilities while lacking other attributes of intelligence: consciousness, theory of mind, pride, shame, common sense, etc. AIsâ eloquence tricks us into accepting their output when we have no competence to do so.
The AI-written contract may be better than a human-written one. But can you trust it? After all, if youâre not a lawyer, you donât know what you donât know. And the fact that the AI contract looks so similar to a human one makes it easy for you to take its provenance for granted. That is, the better the outcome looks to your non-specialist eyes, the more likely you are to give up your agency.
Another challenge is that ChatGPTâs success has driven many people to equate AIs with chatbots. As a result, the current default approach to adding AI to products entails awkwardly grafting chat onto existing experiences, either for augmenting search (possibly good) or replacing human service agents (generally bad.)
But these âchatbotâ scenarios only cover a portion of the possibility space â and not even the most interesting one.
I’m grateful for the call to action to think beyond chat and general-purpose, open-ended interfaces. Those have their place, but there’s so much more to explore here.
The popular imagination has equated intelligence with convincing conversation since Alan Turing proposed his “imitation game” in 1950. The concept is simple: if a system can fool you into thinking you’re talking to a human, it can be considered intelligent. For the better part of a century, the Turing Test has shaped popular expectations of machine intelligence from science fiction to Silicon Valley. Chat is an interaction cliché for AI that we have to escape (or at least question), but it has a powerful gravitational force. “Speaks well = thinks well” is a hard perception to break. We fall for it with people, too.
The “AI can make mistakes” labels don’t cut it.
Given the outsized trust we have in systems that speak so confidently, designers have a big challenge when crafting intelligent interfaces: how can you engage the user’s agency and judgment when the answer is not actually as confident as the LLM delivers it? Communicating the accuracy/confidence of results is a design job. The “AI can make mistakes” labels don’t cut it.
This isn’t a new challenge. I’ve been writing about systems smart enough to know they’re not smart enough for years. But the problem gets steeper as the systems appear outwardly smarter and lull us into false confidence.
Jorge’s 2x2 matrix of AI control vs AI accuracy is a helpful tool to at least consider the risks as you explore solutions.
This is a tricky time. It’s natural to seek grounding in times of change, which can cause us to cling too tightly to assumptions or established patterns. Loyalty to the long-held idea that conflates conversation with intelligence is doing a disservice. Conversation between human and machine doesnât have to mean literal dialogue. Letâs be far more expansive in what we consider âchatâ and unpack the broad forms these interactions can take.
Introducing Generative Canvas
∞ Oct 8, 2024On-demand UI! Salesforce announced its pilot of “generative canvas,” a radically adaptive interface for CRM users. It’s a dynamically generated dashboard that uses AI to assemble the right content and UI elements based on your specific context or request. Look out, enterprise, here comes Sentient Design.
I love to see big players doing this. Here at Big Medium, we’re building on similar foundations to help our clients build their own AI-powered interfaces. It’s exciting stuff! Sentient Design is about creating AI-mediated experiences that are aware of context/intent so that they can adapt in real time to specific needs. Veronika Kindred and I call these radically adaptive interfaces, and it shows that machine-intelligent experiences can be so much more than chat. This new Salesforce experience offers a good example.
For Salesforce, generative canvas is an intelligent interface that animates traditional UI in new and effective ways. It’s a perfect example of a first-stage radically adaptive interface—and one that’s well suited to the sturdy reliability of enterprise software. Generative canvas uses all of the same familiar data sources as a traditional Salesforce experience might, but it assembles and presents that data on the fly. Instead of relying on static templates built through a painstaking manual process, generative canvas is conceived and compiled in real time. That presentation is tailored to context: it pulls data from the user’s calendar to give suggested prompts and relevant information tailored to their needs. Every new prompt or new context gives you a new layout. (In Sentient Design’s triangle framework, we call this the Bespoke UI experience posture.)
So the benefits are: 1) highly tailored content and presentation to deliver the most relevant content in the most relevant format (better experience), and 2) elimination or reduction of manual configuration processes (efficiency).
In Sentient Design, we call this the Bespoke UI experience posture.
Never fear: you’re not turning your dashboard into a hallucinating robot fever dream. The UI stays on the rails by selecting from a collection of vetted components from Salesforce’s Lightning design system: tables, charts, trends, etc. AI provides radical adaptivity; the design system provides grounded consistency. The concept promises a stable set of data sources and design patterns—remixed into an experience that matches your needs in the moment.
This is a tidy example of what happens when you sprinkle machine intelligence onto a familiar traditional UI. It starts to dance and move. And this is just the beginning. Adding AI to the UX/UI layer lets you generate experiences, not just artifacts (images, text, etc.). And that can go beyond traditional UI to yield entirely new UX and interaction paradigms. That’s a big focus of Big Medium’s product work with clients these days—and of course of the Sentient Design book. Stay tuned, lots more to come.
Change Blindness
∞ Aug 13, 2024A great reminder from Ethan Mollick of how quickly things have changed in AI generation quality in the last 18 months. AI right now is the worst that it will ever be; only getting better from here. Good inspiration to keep cranking!
When I started this blog there were no AI chatbot assistants. Now, all indications that they are likely the fastest-adopted technology in recent history.
Plus, super cute otters.
Introducing Structured Outputs in the API
∞ Aug 7, 2024OpenAI introduced a bit of discipline to ensure that its GPT models are precise in the data format of their responses. Specifically, the new feature makes sure that, when asked, the model responds exactly to JSON schemas provided by developers.
Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window), extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.
Most of us experience OpenAI’s GPT models as a chat interface, and that’s certainly the interaction of the moment. But LLMs are fluent in lots of languages—not just English or Chinese or Spanish, but JSON, SVG, Python, etc. One of their underappreciated talents is to move fluidly between different representations of ideas and concepts. Here specifically, they can translate messy English into structured JSON. This is what allows these systems to be interoperable with other systems, one of the three core attributes that define the form of AI-mediated experiences, as I describe in The Shape of Sentient Design.
What this means for product designers: As I shared in my Sentient Design talk, moving nimbly between structured and unstructured data is what enables LLMs to help drive radically adaptive interfaces. (This part of the talk offers an example.) This is the stuff that will animate the next generation of interaction design.
Alas, as in all things LLM, the models sometimes drift a bit from the specific ask—the JSON they come back with isn’t always what we asked for. This latest update is a promising direction for helping us get disciplined responses when we need it—so that Sentient Design experiences can reliably communicate with other systems.
Why I Finally Quit Spotify
∞ Aug 3, 2024In The New Yorker, Kyle Chayka bemoans the creeping blandness that settled into his Spotify listening experience as the company leaned into algorithmic personalization and playlists.
Issues with the listening technology create issues with the music itself; bombarded by generic suggestions and repeats of recent listening, listeners are being conditioned to rely on what Spotify feeds them rather than on what they seek out for themselves. “You’re giving them everything they think they love and it’s all homogenized,” Ford said, pointing to the algorithmic playlists that reorder tracklists, automatically play on shuffle, and add in new, similar songs. Listeners become alienated from their own tastes; when you never encounter things you don’t like, it’s harder to know what you really do.
This observation that the automation of your tastes can alienate you from them feels powerful. There’s obviously a useful and meaningful role for “more like this” recommendation and prediction engines. Still, there’s a risk when we overfit those models and eliminate personal agency and/or discovery in the experience. Surely there’s an opportunity to add more texture—a push and pull between lean-back personalization and more effortful exploration.
Let’s dial up the temperature on these models, or at least some of them. Instead of always presenting “more like this” recommendations, we could benefit from “more not like this,” too.
AI Is Confusing — Here’s Your Cheat Sheet
∞ Jul 28, 2024Scratching your head about diffusion models versus frontier models versus foundation models? Don’t know a token from a transformer? Jay Peters assembled a helpful glossary of AI terms for The Verge:
To help you better understand what’s going on, we’ve put together a list of some of the most common AI terms. We’ll do our best to explain what they mean and why they’re important.
Great, accessible resource for literacy in fundamental AI lingo.