Claude 3: ChatGPT finally has a serious rival

Claude achieves parity with GPT-4 but doesn't unlock any major new abilities.

Mar 06, 2024

∙ Paid

On Monday, Anthropic released Claude 3, a family of large language models the company says “sets new industry benchmarks across a wide range of cognitive tasks.” I decided to test those claims using the same sequence of challenges I used to test Google’s Gemini language model a couple of weeks ago.

Gemini 1.0 Ultra didn’t quite live up to Google’s hype; it lagged behind GPT-4 Turbo in a number of ways. But Claude 3 seems to be the real deal.

Like Gemini, Claude 3 comes in three versions—Opus, Sonnet, and Haiku. Opus is the most capable but also the most expensive to use. Anthropic published benchmarks showing Claude 3 Opus edging out its leading rivals, GPT-4 Turbo and Gemini 1.0 Ultra, on a variety of performance benchmarks.

For example, Claude 3 achieved 86.8 percent accuracy on the Massive Multitask Language Understanding benchmark. That compares to 86.4 percent for GPT-4 and 83.7 percent for Gemini 1.0 Ultra.

Claude did better than GPT-4 on some of my tests and worse on others. Overall, the models’ performance was too similar to pick a winner. But that in itself is notable. Until now, OpenAI has indisputably had the world’s best-performing foundation model. Now, GPT-4 finally has some serious competition.

Keep reading with a 7-day free trial

Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.