Introducing Visual Copilot 2.0: Make Figma designs real

Announcing Visual Copilot - Figma to production in half the time

Builder.io logo
Contact Sales
Platform
Developers
Contact Sales

Blog

Home

Resources

Blog

Forum

Github

Login

Signup

×

Visual CMS

Drag-and-drop visual editor and headless CMS for any tech stack

Theme Studio for Shopify

Build and optimize your Shopify-hosted storefront, no coding required

Resources

Blog

Get StartedLogin

‹ Back to blog

ai

The Big Lie AI Vendors Keep Telling You

November 27, 2024

Written By Steve Sewell

OpenAI and other big AI vendors keep pumping one huge lie, even though all the data suggests they're definitely wrong. Sam Altman loves to preach that Artificial General Intelligence (AGI) is right around the corner.

"We're about to give you AGI," he claims, even despite key leadership roles leaving the company. In my opinion, if you really were right around the corner from AGI, I wouldn't be leaving.

OpenAI's recent key leadership team with Xs over the faces of those no longer there

At the same time, their latest model just looks like GPT-4 with a chain of thought prompt, which is already a common technique, and just a really long output window (not to mention how incredibly slow it is).

The truth about AI model progress

It seems like OpenAI is noticing the truth here, which is that new models are plateauing in terms of how effective they are compared to older ones.

Graph of models showing them plateauing

So instead of shipping a GPT-5 that's probably only marginally better and probably larger and more costly, they're trying to make GPT-4 work better for different use cases.

At the same time, Anthropic consistently produces better models. But even those are only incrementally better each time they launch one. This is a big gap from the GPT-2 to 3 to 3.5 days, where we saw huge leaps each time. We're just not seeing that anymore.

Screenshot of Business Insider article highlighting the text "OpenAI cofounder Ilya Sutskever told Reuters that results from scaling up AI models like this have plateaued."

This leads to three really interesting implications of where the AI ecosystem is going, and one really big unknown that I also think people are really, really lying about.

OpenAI's Sam Altman used to tout "good luck" if you try and compete with their models, but they simply don't have the best models anymore, and haven't for a while. It seems they're acknowledging this and keep focusing time on specializing in consumer products built around the models.

Graph showing the revenue share of chatbot subscriptions vs API for OpenAI and Anthropic

They make three-quarters of their revenue from ChatGPT, which they're investing in better native apps, better experience, better search — all the consumer experience around the model. They're conceding that they just don't build the best models themselves anymore.

In fact, if you're a vendor using OpenAI's APIs, in almost all cases you probably should use Anthropic, which is reflected in their revenue — 85% of their revenue comes from the API.

They invest far less in their consumer products, even though the model is better. The user experience of the products just isn't as good. They are clearly not investing there. They're putting everything into just being a few percent better of a model.

Model benchmarks of Claude vs GPT-4o

Source: Anthropic

More and more vendors are building specialized hardware for LLMs which can drastically increase the speed of inference.

New chip companies like Groq and Cerebras are able to do LLM inference three to five times faster than typical GPUs.

Performance of different hardware

Source: Cerebras (evaluation of Llama 3.1 - 8B)

The interesting key here is they're not just five times faster. The can also evaluate three to five times cheaper because they're simply much more efficient.

So when we can make AI products that are equally as smart but massively faster and cheaper, that could unlock a lot of new use cases that were previously just prohibitively slow and expensive.

The third implication is that vendors are going to get a lot better at figuring out how to use LLMs, even if they don't get that much smarter.

Below is an example of a launch title for the Xbox 360 - Perfect Dark Zero, which came out when the 360 was brand new and the graphics are…fine. Like, it's representative of that era of the console.

But what's fascinating is when you remember launch titles versus late in a console series titles. On the same exact foundation, there was no change to the console or hardware itself, but years later we see games like this.

This is Halo 4, and this looks like a whole different generation of console. Yet, this is on the same hardware as the prior game.

What happened over those years? Developers got way better at optimizing how they use this technology. Game engines found more and more clever approaches to get the most out of what they had.

This is what I think will happen with LLMs. Those who make products on top of LLMs will get better and better at getting more of the smart parts out of them and avoiding the suboptimal, problematic parts.

I don't think we're anywhere close to AGI, but I do think we'll see major leaps in product quality, just by using the LLM technology better and better.

What about agents? What about Devin, the AI software engineer that made all these crazy claims, raised a ton of VC money, and just kind of disappeared off the radar? Or Anthropic's Claude with computer use that will navigate your computer and use it like you?

Screenshot of Devin launch

Realistically, I think the models are just not smart enough yet for this to make sense. Even in Anthropic's own announcement post, they cited that Claude with computer use only has a 14.9% success rate. I don't see any world where we give AI agents the ability to act on our behalf without them having something more like a 99— if not 100% — success rate.

Screenshot of Anthropic's Claude computer use with "Claude 3.5 Sonnet scored 14.9%" highlighted

Source: Anthropic

In all of my experience and testing, the models are not smart enough to be able to accomplish autonomous tasks. They need a human in the loop. Otherwise, they derail quickly and those small errors compound very rapidly.

But that's not to say that some form of agents won't exist in the short term. I think mini or micro agents do show a lot of practical value. For example, the open-source Micro Agent project that can generate tests, take a set of tests, and write and iterate on code until the tests pass.

Or in the case of Builder.io and our product that uses AI to turn Figma designs into code, we've actually found that rather than using one big traditional LLM, we can use a series of smaller agents that communicate with each other to have a way higher quality output when you convert that Figma design into code.

By using a series of smaller agents, in combination with self-trained models, we found that you can both increase reliability, speed, and accuracy, compared to just plugging in something off the shelf.

Diagram with a traditional approach "single LLM" and a "Gen agents" approach with multiple connected

These are the types of techniques that I think will bring us to the next generation of our era. Even if the LLMs don't get smarter and we don't have personal coding assistants doing all our work tomorrow, we sure as heck will keep getting better and better products that are more reliable.

We’ll automate tedious work — like converting designs to code, using generative AI to produce designs and functionality, adding functionality to your designs, making them real-world applications — or develop other kinds of interesting use cases around content generation, code generation, co-pilots, and other interesting products Builder.io keeps producing.

Just be sure to pay a keen attention to which ones people are actually using, versus what are gathering hype but actually don't work. Both will keep happening. I'm certain of that. I don't think we're anywhere near being past the hype curve of BS products intermixed with real effective products.So while I think there'll be a lot of noise in the market, I think we have a lot to look forward to as well.

Share

Twitter
LinkedIn
Facebook
Hand written text that says "A drag and drop headless CMS?"

Introducing Visual Copilot:

A new AI model to turn Figma designs to high quality code using your components.

Try Visual CopilotGet a demo
Newsletter

Like our content?

Join Our Newsletter

Continue Reading
AI8 MIN
Is OpenAI's o1 model a breakthrough or a bust?
December 9, 2024
AI3 MIN
Solved: Why ChatGPT Won't Say 'Brian Hood' (Blame Regexes)
December 9, 2024
Web Design10 MIN
Figma Components: Supercharge Your Design System
December 9, 2024