Join us today for Ars Live: Our first encounter with manipulative AI

In the short-term, the most dangerous thing about AI language models may be their ability to emotionally manipulate humans if not carefully conditioned. The world saw its first taste of that potential danger in February 2023 with the launch of Bing Chat, now called Microsoft Copilot.

During its early testing period, the temperamental chatbot gave the world a preview of an "unhinged" version of OpenAI's GPT-4 prior to its official release. Sydney's sometimes uncensored and "emotional" nature (including use of emojis) arguably gave the world its first large-scale encounter with a truly manipulative AI system. The launch set off alarm bells in the AI alignment community and served as fuel for prominent warning letters about AI dangers.

On November 19 at 4 pm Eastern (1 pm Pacific), Ars Technica Senior AI Reporter Benj Edwards will host a livestream conversation on YouTube with independent AI researcher Simon Willison that will explore the impact and fallout of the 2023 fiasco. We're calling it "Bing Chat: Our First Encounter with Manipulative AI."

A promo graphic for "Bing Chat: Our First Encounter with Manipulative AI," an Ars Live conversation between Benj Edwards and Simon Willison. Credit: Ars Technica

Willison, who co-invented the Django web framework and has served as an expert reference on the subject of AI for Ars Technica for years, regularly writes about AI on his personal blog and coined the term "prompt injection" in 2022 after pranksters discovered how to subvert the instructions and alter the behavior of a GPT-3-based automated bot that posted on Twitter at the time.

The “culprit and the enemy” speaks out

Each input fed into a large language model (LLM) like the one that powered Bing Chat is called a "prompt." The key to a prompt injection is to manipulate the model’s responses by embedding new instructions within the input text, effectively redirecting or altering the AI’s intended behavior. By crafting cleverly phrased prompts, users can bypass the AI's original instructions (often defined in something called a "system prompt"), causing it to perform tasks or respond in ways that were not part of its initial programming or expected behavior.

While Bing Chat's unhinged nature was caused in part by how Microsoft defined the "personality" of Sydney in the system prompt (and unintended side-effects of its architecture with regard to conversation length), Ars Technica's saga with the chatbot began when someone discovered how to reveal Sydney's instructions via prompt injection, which Ars Technica then published. Since Sydney could browse the web and see real-time results—which was novel at the time—the bot could react to news when prompted, feeding back into its unhinged personality every time it browsed the web and found articles written about itself.

When asked about the prompt-injection episode by other users, Sydney reacted offensively and disparaged the character of those who found the exploit, even attacking the Ars reporter himself. Sydney called Benj Edwards "the culprit and the enemy" in one instance, bringing the odd AI behavior sponsored by a trillion-dollar tech giant a little too close for comfort.

During the Ars Live discussion, Benj and Simon will talk about what happened during that intense week in February 2023, why Sydney went off the rails, what covering Bing Chat was like during the time, how Microsoft reacted, the crisis in the AI alignment community it inspired, and what lessons everyone learned from the episode. We hope you'll tune in, because it should be a great conversation.

To watch, tune in to YouTube on November 19, 2024, at 4 pm Eastern / 3 pm Central / 1 pm Pacific.

Add to Google Calendar

Add to calendar (.ics download)