Stratechery by Ben Thompson

YouTube TV, Wiz, and Why Monopolies Buy Innovation

Monday, March 24, 2025

Listen to Podcast
Listen to this post:

Log in to listen

While “March Madness” refers to the NCAA basketball tournaments, the maddest weekend of all is the first one, when fields of 64¹ are trimmed down to the Sweet 16; this means there are 16 games a day the first two days, and 8 games a day for the next two. Inevitably this means that multiple games are on at the same time, and Max has a solution for you; from The Streamable:

The 2025 NCAA Men’s Basketball Tournament starts today, and just in time, Warner Bros. Discovery has announced the addition of some very modern features for games that stream on its on-demand service Max. Fans can use Max to stream all March Madness games on TNT, TBS, and truTV, and that viewing experience is about to improve in a big way.

The new Max feature that fans will likely appreciate most while watching NCAA Men’s Basketball Tournament games is a multiview. This will allow fans to watch up to three games at once, ensuring they never miss a single bucket, block, or steal from the tournament.

Except that’s not correct; Warner Bros. Discovery shares the rights to the NCAA Men’s Basketball Tournament with CBS, and there were times over the weekend when there were games on CBS and a Warner Bros. Discovery property — sometimes four at once. That means that Max multiview watchers were in fact missing buckets, blocks, and steals, and likely from the highest profile games, which were more likely to be on the broadcast network.

Notice, however, that I specified Max multiview watchers; YouTube TV has offered multiview for the NCAA Tournament since last year. Critically, YouTube TV’s offering includes CBS, and, starting this upcoming weekend, will also let you watch the women’s tournament as well; from Sportico:

Generally, events from the same leagues are kept together. On Friday, for instance, men’s and women’s multiviews will be offered separately. If you truly want to watch all of March Madness live, it’ll be time to break out that second screen again. However, in part due to user demand, YouTube TV says mixed gender multiviews will be available starting with the Sweet 16.

The job of prioritizing selections has only gotten more complicated as interest in women’s hoops has boomed. Through the first two rounds in 2024, viewership of the women’s tourney was up 108% over the year prior. Though the “March Madness” brand is now used for both men’s and women’s competitions, separate media deals dictate their distribution. CBS and TNT Sports networks split the men’s games, including streaming on March Madness Live apps, while ESPN’s channels host women’s action. Disney+ will also carry the Final Four. Cable providers, then, are required for fans hoping to seamlessly hop back and forth between the two brackets, even as fans shift to a streaming-first future.

That last sentence is the key: Warner Bros. Discovery only has access to the games it owns rights to; YouTube TV, by virtue of being a virtual Multichannel Video Programming Distributor (vMVPD), has access to every game that is on cable, which is all of them. That lets the service offer an objectively better multiview experience.

YouTube TV’s Virtual Advantage

Multiview isn’t a new idea; in 1983 George Schnurle III invented the MultiVision:

Mrmazda, CC-SA

This image is of the MultiVision 1.1, which took in four composite inputs; the 3.1 model included two built-in tuners — you provided the antenna. The Multivision didn’t provide multiview a la YouTube TV, but rather picture-in-picture, support for which was eventually built into TVs directly.

Picture-in-picture, however, assumed that consumers had easy access to TV signals; this was a reasonable assumption when signals came in over-the-air or via basic cable. That changed in the late 1990s with the shift to digital cable, which required a set-top box to decrypt; most TVs only had one, and the picture-in-picture feature faded away. This loss was made up in part by the addition of DVR functionality to most of those set-top boxes; with time-shifting you couldn’t watch two things at once, but you could watch two things that aired at the same time.

Cable companies offered DVR functionality in response to the popularity of TiVo; when the first model launched in 1999 it too relied on the relative openness of TV signals. Later models needed cable cards, which were mandated by the FCC in 2007; that mandate was repealed in 2020, as the good-enough nature of cable set-top boxes effectively killed the market for TiVo and other 3rd-party tuners.

The first vMVPD, meanwhile, was Sling TV, which launched in 2015.² YouTube TV launched two years later, with an old Google trick: unlimited storage for your cloud DVR, which you could watch anywhere in the U.S. on any device. That was possible because the point of integration for YouTube TV, unlike traditional cable, was on Google’s servers, not a set-top box (which itself was a manifestation of traditional MVPD’s point of integration being the cable into your house).

This point of integration also explains why it was YouTube TV that came up with the modern implementation of multiview: Google could create this new feature centrally and make it available to everyone without needing to install high-powered set-top boxes in people’s homes. Indeed, this explains one of the shortcomings of multiview: because Google can not rely on viewers having high powered devices capable of showing four independent video streams, Google actually pre-mixes the streams into a single video feed on their servers.

YouTube TV + NFL Sunday Ticket

I mentioned above that YouTube TV offered multiview for March Madness starting last year, but that’s not quite right: a subset of the consumer base actually got access for March Madness in 2023; that was a beta test for the real launch, which was the 2023 NFL season. That was the first year that Google had the rights to NFL Sunday Ticket, which lets subscribers view out-of-market games. NFL Sunday Ticket was a prerequisite for multiview, because without it you would have access to at most two football games at a time; once you could watch all of the games, the utility was obvious.

The point of this Article is not multiview; it’s a niche use case for events like March Madness or football fanatics on Sunday afternoons. What is notable about the latter example, however, is that Google needed to first secure the rights to NFL Sunday Ticket. This, unlike March Madness, wasn’t a situation where every game was already on cable, and thus accessible to YouTube TV; Google needed to pay $2 billion/year to secure the necessary rights to make multiview work.

That’s a high price, even if multiview is cool; it seems unlikely that Google will ever make its money back directly. That, though, is often the case with the NFL. Back in 1993 Rupert Murdoch shocked the world by buying NFL broadcasting rights for a then-unprecedented $395 million/year, $100 million/year more than CBS was offering for the same package. Sports Illustrated explained his reasoning:

There are skeptics who think that Murdoch will lose his custom-made shirt over the NFL deal; one estimate has him losing $500 million over the next four years. Says Murdoch, “I’ve seen those outrageous numbers. We’ll lose a few million in the first year, but even if it was 40 or 50 million, it would be tax deductible. It was a cheap way of buying a network.”

What Murdoch meant was that demand for the NFL — which had already built ESPN — would get Fox into the cities where it didn’t yet exist, and improve its affiliate station’s standing (many of which Murdoch owned) in cities where they were weak and buried on the inferior UHF band. And, of course, that is exactly what happened.

NFL Sunday Ticket is not, to be sure, the same as regular NFL rights; it is much more of a niche product with a subscription business model. That, though, is actually a good thing from Google’s perspective: the company’s opportunity is not to build a TV station, but rather a TV Aggregator.

YouTube TV’s Aggregation Potential

Google announced the NFL deal a month after it launched Primetime Channels, a marketplace for streaming services along the lines of Amazon’s Prime Video Channels or Apple TV Channels; I wrote in early 2023:

The missing piece has been — in contrast to Apple and Amazon in particular — other streaming services. Primetime Channels, though, is clearly an attempt to build up YouTube’s own alternative to the Apple TV App Store or Amazon Prime Video Marketplace. This, as I noted last month, is why I think YouTube’s extravagant investment in NFL Sunday Ticket makes sense: it is a statement of intent and commitment that the service wants to use to convince other streaming services to come on board. The idealized future is one where YouTube is the front-door of all video period, whether that be streaming, linear, or user-generated.

YouTube’s big advantage, as I noted in that Update, is that it has exclusive access to YouTube content; it is the only service that can offer basically anything you might want to watch on TV:
- YouTube TV has linear television, which remains important for sports
- YouTube proper dominates user-generated content
- Primetime Channels is a way to bring other streaming services on board
The real potential with streaming channels, however, is to go beyond selling subscriptions on an ad-hoc basis and actually integrating them into a single interface to drive discoverability and on-demand conversions. How useful would it be to see everything that is on in one place, and be able to either watch with one click, or subscribe with two?

This is going to be an increasingly pressing need as sports in particular move to streaming. It used to be that all of the sports you might watch were in a centralized place: the channel guide on your set-top box. Today, however, many sports are buried in apps. Prominent examples include Amazon Thursday Night Football and Peacock’s exclusive NFL playoff games, but as a Wisconsin fan I’ve already experienced the challenge of an increasing number of college basketball games being exclusively streamed on Peacock; the problem is only going to get worse next season when an increasing number of NBA games are on Amazon and Peacock, and when ESPN releases a standalone streaming app with all of its games.

The challenge for any one of these services is the same one seen with Max’s multiview offering: any particular streaming service is limited to its own content. Sure, any one of these services could try and build this offering anyways — ESPN is reportedly considering it — but then they run into the problem of not being a platform or marketplace with a massive audience already in place.

The reason why that is an essential prerequisite is that executing on this vision will require forming partnerships with all of the various streamers — or at least those with live events like sports. On one hand, of course each individual streamer wants to own the customer relationship; on the other hand, sports rights both cost a lot of money and also lose their value the moment an event happens. That means they are motivated to trade away customer control and a commission for more subscribers, which works to the benefit of whoever can marshal the most demand, and YouTube, thanks primarily to its user-generated content, has the largest audience of all, and thanks to YouTube TV, is the only service that can actually offer everything.

Google’s Product Problem

Two quick questions for the audience:
1. Did you know that Primetime Channels existed?
2. How do you subscribe to Primetime Channels?
The answer to number 2 is convoluted, to say the least; on a PC, you click the hamburger button in the upper left, then click “Your movies & TV”, then click the “Browse” tab, and there you will finally find Primetime channels; on mobile the “Your movies & TV” is found by clicking your profile photo on the bottom right.

And, once you finally figure this out, you see a pretty pathetic list:

As the arrow indicates, there are more options, but the only one of prominence is Paramount+; there is no Disney+, Peacock, Amazon Prime Video, Apple TV+, or Netflix.
- Netflix’s resistance to being aggregated is long-running; they were the stick in the mud when Apple tried to aggregate streaming a decade ago. The company gets away with it — and are right to resist — because they have the largest user base amongst subscription platforms. The biggest bull case for Netflix is that many of the other streamers throw in the towel and realize they are better off just selling content to Netflix.
- Disney+ actually could pull off a fair bit of what YouTube is primed to do: no, Disney doesn’t have YouTube’s user-generated content, but the company does have Hulu Live, which gives a potential Aggregation offering access to content still on linear TV.
- Amazon and Apple are Google’s most obvious competitors when it comes to building an Aggregator for streaming services, and they have the advantage of owning hardware to facilitate transactions.
That leaves Peacock, and this is where I hold Google responsible. Peacock has large bills and a relatively small userbase; there is also a Peacock app for both Amazon devices (although you have to subscribe to Peacock directly) and Apple devices (where Apple enforces an in-app subscription offering). If Google is serious about Primetime Channels specifically, and being a streaming and sports Aggregator generally, then it should have Peacock available as an offering.

That’s the thing, though: it’s not clear that Google has made any sort of progress in achieving the vision I perceived two years ago in the wake of the launch of Primetime Channels and the NFL Sunday Ticket deal. Yes, YouTube continues to grow, particularly on TVs, and yes, multivision is slowly getting better, but both of those are products of inertia; is Google so arthritic that it can’t make a play to dominate an entertainment industry that is getting religion about the need to acquire and keep customers profitably? That’s exactly why Aggregators gain power over suppliers: they solve their demand problem. And yet Primetime Channels might as well not even exist, given how buried it is, and it might as well be, given that Google hasn’t signed a meaningful new deal since launch.

Google’s Wiz Acquisition

This is all convoluted way to explain why I approve of Google’s decision to pay $32 billion in cash for Wiz, a cybersecurity firm that has absolutely nothing to do with the future of TV. From Bloomberg:

Google parent Alphabet Inc. agreed to acquire cybersecurity firm Wiz Inc. for $32 billion in cash, reaching a deal less than a year after initial negotiations fell apart because the cloud-computing startup wanted to stay independent. Wiz will join the Google Cloud business once the deal closes, the companies said in a statement on Tuesday. The takeover is subject to regulatory approvals and is likely to close next year, they said.

The deal, which would be Alphabet’s largest to date, comes after Wiz turned down a $23 billion bid from the internet search leader last year after several months of discussions. At the time, Wiz walked away after deciding it could ultimately be worth more by pursuing an initial public offering company. Concerns about regulatory challenges also influenced the decision. The companies have agreed to a breakup fee of about 10% of the deal value, or $3.2 billion, if the deal doesn’t close, according to a person familiar with the matter. Shares of Alphabet fell nearly 3% in New York on Tuesday.

Wiz provides cybersecurity solutions for multi-cloud environments, and is growing fast. This makes it a natural fit for Google Cloud, which is a distant third place to AWS and Microsoft Azure. Google Cloud’s biggest opportunity for growth is to be a service that is used in addition to a large corporation’s existing cloud infrastructure, and Wiz provides both a beachhead into those organizations and also a solution to managing a multi-cloud setup.

Google Cloud’s selling point — the reason it might expand beyond a Wiz beachhead — are Google’s AI offerings. Google continues to have excellent AI research and the best AI infrastructure; where the company is struggling is product, particularly in the consumer space, thanks to some combination of fear of disruption and, well, the fact that product capability seems to be the first casualty of a monopoly (Apple’s declining product chops, particularly in software and obviously AI, is another example).

The company’s tortoise-like approach to TV lends credence to the latter explanation: Google is in an amazing position in TV, thanks to the long-ago acquisition of YouTube and the launch of YouTube TV, but it has accomplished little since then beyond agreeing to pay the NFL a lot of money. Arguably the ideal solution to this sort of malaise, at least from a shareholder perspective, would be to simply collect monopoly rents and return the money to shareholders at a much higher rate than Google has to date; absent that, buying product innovation seems like the best way to actually accomplish anything.

In other words, while I understand the theory of people who think that Google ought to just build Wiz’s functionality instead of paying a huge revenue multiple for a still-unprofitable startup, I think the reality of a company like Google is that said theory would run into the morass that is product development in a monopoly. It simply would not ship, and would suck if it did. Might as well pay up for momentum in a market that has some hope of leveraging the still considerable strengths that exist beneath the flab.
1. Technically 68; there are four games on Tuesday that trim the field to 64 ↩
2. The original Sling TV was a cable card device that allowed you to watch your TV from anywhere in the world; it was massively popular amongst expats here in Taiwan ↩
Get notified about new Articles

Sign Up

Please verify your email address to proceed.

Apple AI’s Platform Pivot Potential

Monday, March 10, 2025

Listen to this post:

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way — in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.
— Charles Dickens, A Tale of Two Cities

Apple’s Bad Week

Apple has had the worst of weeks when it comes to AI. Consider this commercial which the company was running incessantly last fall:

In case you missed the fine print in the commercial, it reads:

Apple Intelligence coming fall 2024 with Siri and device language set to U.S. English. Some features and languages will be coming over the next year.

“Next year” is doing a lot of work, now that the specific feature detailed in this commercial — Siri’s ability to glean information from sources like your calendar — is officially delayed. Here is the statement Apple gave to John Gruber at Daring Fireball:

Siri helps our users find what they need and get things done quickly, and in just the past six months, we’ve made Siri more conversational, introduced new features like type to Siri and product knowledge, and added an integration with ChatGPT. We’ve also been working on a more personalized Siri, giving it more awareness of your personal context, as well as the ability to take action for you within and across your apps. It’s going to take us longer than we thought to deliver on these features and we anticipate rolling them out in the coming year.

It was a pretty big surprise, even at the time, that Apple, a company renowned for its secrecy, was so heavily advertising features that did not yet exist; I also, in full disclosure, thought it was all an excellent idea. From my post-WWDC Update:

The key part here is the “understanding personal context” bit: Apple Intelligence will know more about you than any other AI, because your phone knows more about you than any other device (and knows what you are looking at whenever you invoke Apple Intelligence); this, by extension, explains why the infrastructure and privacy parts are so important.

What this means is that Apple Intelligence is by-and-large focused on specific use cases where that knowledge is useful; that means the problem space that Apple Intelligence is trying to solve is constrained and grounded — both figuratively and literally — in areas where it is much less likely that the AI screws up. In other words, Apple is addressing a space that is very useful, that only they can address, and which also happens to be “safe” in terms of reputation risk. Honestly, it almost seems unfair — or, to put it another way, it speaks to what a massive advantage there is for a trusted platform. Apple gets to solve real problems in meaningful ways with low risk, and that’s exactly what they are doing.

Contrast this to what OpenAI is trying to accomplish with its GPT models, or Google with Gemini, or Anthropic with Claude: those large language models are trying to incorporate all of the available public knowledge to know everything; it’s a dramatically larger and more difficult problem space, which is why they get stuff wrong. There is also a lot of stuff that they don’t know because that information is locked away — like all of the information on an iPhone. That’s not to say these models aren’t useful: they are far more capable and knowledgable than what Apple is trying to build for anything that does not rely on personal context; they are also all trying to achieve the same things.

So is Apple more incompetent than these companies, or was my evaluation of the problem space incorrect? Much of the commentary this week assumes point one, but as Simon Willison notes, you shouldn’t discount point two:

I have a hunch that this delay might relate to security. These new Apple Intelligence features involve Siri responding to requests to access information in applications and then performing actions on the user’s behalf. This is the worst possible combination for prompt injection attacks! Any time an LLM-based system has access to private data, tools it can call, and exposure to potentially malicious instructions (like emails and text messages from untrusted strangers) there’s a significant risk that an attacker might subvert those tools and use them to damage or exfiltrating a user’s data.

Willison links to a previous piece of his on the risk of prompt injections; to summarize the problem, if your on-device LLM is parsing your emails, what happens if one of those emails contains malicious text perfectly tuned to make your on-device AI do something you don’t want it to? We intuitively get why code injections are bad news; LLMs expand the attack surface to text generally; Apple Intelligence, by being deeply interwoven into the system, expands the attack surface to your entire device, and all of that precious content it has unique access to.

Needless to say, I regret not raising this point last June, but I’m sure my regret pales in comparison to Apple executives and whoever had to go on YouTube to pull that commercial over the weekend.

Apple’s Great Week

Apple has had the best of weeks when it comes to AI. Consider their new hardware announcements, particularly the Mac Studio and its available M3 Ultra; from the company’s press release:

Apple today announced M3 Ultra, the highest-performing chip it has ever created, offering the most powerful CPU and GPU in a Mac, double the Neural Engine cores, and the most unified memory ever in a personal computer. M3 Ultra also features Thunderbolt 5 with more than 2x the bandwidth per port for faster connectivity and robust expansion. M3 Ultra is built using Apple’s innovative UltraFusion packaging architecture, which links two M3 Max dies over 10,000 high-speed connections that offer low latency and high bandwidth. This allows the system to treat the combined dies as a single, unified chip for massive performance while maintaining Apple’s industry-leading power efficiency. UltraFusion brings together a total of 184 billion transistors to take the industry-leading capabilities of the new Mac Studio to new heights.

“M3 Ultra is the pinnacle of our scalable system-on-a-chip architecture, aimed specifically at users who run the most heavily threaded and bandwidth-intensive applications,” said Johny Srouji, Apple’s senior vice president of Hardware Technologies. “Thanks to its 32-core CPU, massive GPU, support for the most unified memory ever in a personal computer, Thunderbolt 5 connectivity, and industry-leading power efficiency, there’s no other chip like M3 Ultra.”

That Apple released a new Ultra chip wasn’t a shock, given there was an M1 Ultra and M2 Ultra; almost everything about this specific announcement, however, was a surprise.

Start with the naming. Apple chip names have two components: M_ refers to the core type, and the suffix to the configuration of those cores. Therefore, to use the M1 series of chips as an example:

	Perf Cores	Efficiency Cores	GPU Cores	Max RAM	Bandwidth
M1	4	4	8	16GB	70 GB/s
M1 Pro	8	4	16	32GB	200 GB/s
M1 Max	8	2	32	64GB	400 GB/s
M1 Ultra	16	4	64	128GB	800 GB/s

The “M1” cores in question were the “Firestorm” high-performance core, “Icestorm” energy-efficient core, and a not-publicly-named GPU core; all three of these cores debuted first on the A14 Bionic chip, which shipped in the iPhone 12.

The suffix, meanwhile, referred to some combination of increased core count (both CPU and GPU), as well as an increased number of memory controllers and associated bandwidth (and, in the case of the M1 series, faster RAM). The Ultra, notably, was simply two Max chips fused together; that’s why all of the numbers simply double.

The M2 was broadly similar to the M1, at least in terms of the relative performance of the different suffixes. The M2 Ultra, for example, simply doubled up the M2 Max. The M3 Ultra, however, is unique when it comes to max RAM:

	Perf Cores	Efficiency Cores	GPU Cores	Controllers	Max RAM	Bandwidth
M3	4	4	10	8	32GB	100 GB/S
M3 Pro	6	6	18	12	48GB	150 GB/s
M3 Max	12	4	40	32	128GB	400 GB/s
M3 Ultra	24	8	80	64	512GB	800 GB/s

I can’t completely vouch for every number on this table (which was sourced from Wikipedia), as Apple hasn’t yet released the full technical details of the M3 Ultra, and it’s not yet available for testing. What seems likely, however, is that instead of simply doubling up the M3 Max, Apple also reworked the memory controllers to address double the memory. That also explains why the M3 Ultra came out so much later than the rest of the family — indeed, the Mac Studio base chip is actually the M4 Max.

The wait was worth it, however: what makes Apple’s chip architecture unique is that that RAM is shared by the CPU and GPU, and not in the carve-out way like integrated graphics of old; rather, every part of the chip — including the Neural Processing Units, which I didn’t include on these tables — has full access to (almost¹) all of the memory all of the time.

What that means in practical terms is that Apple just shipped the best consumer-grade AI computer ever. A Mac Studio with an M3 Ultra chip and 512GB RAM can run a 4-bit quantized version of DeepSeek R1 — a state-of-the-art open-source reasoning model — right on your desktop. It’s not perfect — quantization reduces precision, and the memory bandwidth is a bottleneck that limits performance — but this is something you simply can’t do with a standalone Nvidia chip, pro or consumer. The former can, of course, be interconnected, giving you superior performance, but that costs hundreds of thousands of dollars all-in; the only real alternative for home use would be a server CPU and gobs of RAM, but that’s even slower, and you have to put it together yourself.

Apple didn’t, of course, explicitly design the M3 Ultra for R1; the architectural decisions undergirding this chip were surely made years ago. In fact, if you want to include the critical decision to pursue a unified memory architecture, then your timeline has to extend back to the late 2000s, whenever the key architectural decisions were made for Apple’s first A4 chip, which debuted in the original iPad in 2010.

Regardless, the fact of the matter is that you can make a strong case that Apple is the best consumer hardware company in AI, and this week affirmed that reality.

Apple Intelligence vs. Apple Silicon

It’s probably a coincidence that the delay in Apple Intelligence and the release of the M3 Ultra happened in the same week, but it’s worth comparing and contrasting why one looks foolish and one looks wise.

Apple Silicon

Start with the latter: Tony Fadell told me the origin story of Apple Silicon in a 2022 Stratechery Interview; the context of the following quote was his effusive praise for Samsung, which made the chips for the iPod and the first several models of the iPhone:

Samsung was an incredible partner. Even though they got sued, they were an incredible partner, they had to exist for the iPod to be as successful and for the iPhone to even exist. That happened. During that time, obviously Samsung was rising up in terms of its smartphones and Android and all that stuff, and that’s where things fell apart.

At the same time, there was the strategic thing going on with Intel versus ARM in the iPad, and then ultimately iPhone where there’s that fractious showdown that I had with various people at Apple, including Steve, which was Steve wanted to go Intel for the iPad and ultimately the iPhone because that’s the way we went with the Mac and that was successful. And I was saying, “No, no, no, no! Absolutely not!” And I was screaming about it and that’s when Steve was, well after Intel lost the challenge, that’s when Steve was like, “Well, we’re going to go do our own ARM.” And that’s where we bought P.A. Semi.

So there was the Samsung thing happening, the Intel thing happening, and then it’s like we need to be the master of our own destiny. We can’t just have Samsung supplying our processors because they’re going to end up in their products. Intel can’t deliver low power embedded the way we would need it and have the culture of quick turns, they were much more standard product and non custom products and then we also have this, “We got to have our own strategy to best everyone”. So all of those things came together to make what happened happen to then ultimately say we need somebody like TSMC to build more and more of our chips. I just want to say, never any of these things are independently decisions, they were all these things tied together for that to pop out of the oven, so to speak.

This is such a humbling story for me as a strategy analyst; I’d like to spin up this marvelous narrative about Apple’s foresight with Apple Silicon, but like so many things in business, it turns out the best consumer AI chips were born out of pragmatic realities like Intel not being competitive in mobile, and Samsung becoming a smartphone competitor.

Ultimately, though, the effort is characterized by four critical qualities:

Time: Apple has been working on Apple Silicon for 17 years.

Motivation: Apple was motivated to build Apple Silicon because having competitive and differentiated mobile chips was deemed essential to their business.

Differentiation: Apple’s differentiation has always been rooted in the integration of hardware and software, and controlling their own chips let them do exactly that, wringing out unprecedented efficiency in particular.

Iteration: The M3 Ultra isn’t Apple’s first chip; it’s not even the first M chip; heck, it’s not even the first M3! It’s the result of 17 years of iteration and experimentation.

Apple Intelligence

Notice how these qualities differ when it comes to Apple Intelligence:

Time: The number one phrase that has been used to characterize Apple’s response to the ChatGPT moment in November 2022 is flat-footed, and that matches what I have heard anecdotally. That, by extension, means that Apple has been working on Apple Intelligence for at most 28 months, and that is almost certainly generous, given that the company likely took a good amount of time to figure out what its approach would be. That not nothing — xAI went from company formation to Grok 3 in 19 months — but it’s certainly not 17 years!

Motivation: If you look at Apple’s earnings calls in the wake of ChatGPT, February 2023, May 2023, and August 2023, all contain some variation of “AI and machine learning have been integrated into our products for years, and we’ll continue to be thoughtful about how we implement them”; finally in November 2023 CEO Tim Cook said the company was working on something new:

In terms of generative AI, we have — obviously, we have work going on. I’m not going to get into details about what it is, because, as you know, we don’t — we really don’t do that. But you can bet that we’re investing, we’re investing quite a bit, we’re going to do it responsibly and it will — you will see product advancements over time that where the — those technologies are at the heart of them.

First, this obviously has bearing on the “time” point above; secondly, one certainly gets the sense that Apple, after tons of industry hype and incessant questions from analysts, very much representing the concerns of shareholders, felt like they had no choice but to be doing something with generative AI. In other words — and yes, this is very much driving with the rearview mirror — Apple didn’t seem to be working on generative AI because they felt it was essential to their product vision, but rather because they had to keep up with what everyone else was doing.

Differentiation: This is the most alluring part of the Apple Intelligence vision, which I myself hyped up from the beginning: Apple’s exclusive access to its users’ private information. What is interesting to consider, however, beyond the security implications, is the difference between “exclusivity” and “integration”.

Consider your address book: the iOS SDK included the Contacts API, which gave any app on the system full access to your contacts without requiring explicit user permission. This was essential to the early success of services like WhatsApp, which cleverly bootstrapped your network by using phone numbers as unique IDs; this meant that pre-existing username-based networks like Skype and AIM were actually at a disadvantage on iOS. iMessage did the same thing when it launched in 2011, and then Apple started requiring user permission to access your contacts in 2012.

Even this amount of access, however, paled in comparison to the Mac, where developers could access information from anywhere on the system. iOS, on the other hand, put apps in sandboxes, cut off from other apps and system information outside of APIs like the Contacts API, all of which have become more and more restricted over time. Apple made these decisions for very good reasons, to be clear: iOS is a much safer and secure environment than macOS; increased restrictions generally mean increased privacy, albeit at the cost of decreased competition.

Still, it’s worth pointing out that exclusive access to data is downstream of a policy choice to exclude third parties; this is distinct from the sort of hardware and software integration that Apple can exclusively deliver in the pursuit of superior performance. This distinction is subtle, to be sure, but I think it’s notable that Apple Silicon’s differentiation was in the service of building a competitive moat, while Apple Intelligence’s differentiation was about maintaining one.

Iteration: From one perspective, Apple Intelligence is the opposite of an evolved system: Apple put together an entire suite of generative AI capabilities, and aimed to launch them all in iOS 18. Some of these, like text manipulation and message summaries, were straightforward and made it out the door without a problem; others, particularly the reimagined Siri and its integration with 3rd party apps and your personal data, are now delayed. It appears Apple tried to do too much all at once.

The Incumbent Advantage

At the same time, it’s not as if Siri is new; the voice assistant launched in 2011, alongside iMessage. In fact, though, Siri has always tried to do too much too soon; I wrote last week about the differences between Siri and Alexa, and how Amazon was wise to focus their product development on the basics — speed and accuracy — while making Alexa “dumber” than Siri tried to be, particularly in its insistence on precise wording instead of attempting to figure out what you meant.

To that end, this speaks to how Apple could have been more conservative in its generative AI approach (and, I fear, Amazon too, given my skepticism of Alexa+): simply make a Siri that works. The fact of the matter is that Siri has always struggled with delivering on its promised functionality, but a lot of its shortcomings could have been solved by generative AI. Apple, however, promised much more than this at last year’s WWDC: Siri wasn’t simply going to work better, it was actually going to understand and integrate your personal data and 3rd-party apps in a way that had never been done before.

Again, I applauded this at the time, so this is very much Monday-morning quarterbacking. I increasingly suspect, however, we are seeing a symptom of big-company disease that I hadn’t previously considered: while one failure state in the face of new technology is moving too slowly, the opposite failure state is assuming you can do too much too quickly, when simply delivering the basics would be more than good enough.

Consider home automation: the big three players in the space are Siri and Alexa and Google Assistant. What makes these companies important is not simply that they have devices you can put in your home and talk to, but also that there is an entire ecosystem of products which work with them. Given that, consider two possible products in the space:

OpenAI releases a ChatGPT speaker that you can talk to and interact with; it works brilliantly and controls, well, it doesn’t control anything, because the ecosystem hasn’t adopted it. OpenAI would need to work diligently to build out partnerships with everyone from curtain makers to smart light to locks and more; that’s hard enough in its own right, and even more difficult when you consider that many of these objects are only installed once and updated rarely.
Apple or Amazon or Google update their voice assistants with basic LLMs. Now, instead of needing to use precise language, you can just say whatever you want, and the assistant can figure it out, along with all of the other LLM niceties like asking about random factoids.

In this scenario the Apple/Amazon/Google assistants are superior, even if their underlying LLMs are worse, or less capable than OpenAI’s offering, because what the companies are selling is not a standalone product but an ecosystem. That’s the benefit of being a big incumbent company: you have other advantages you can draw on beyond your product chops.

What is striking about new Siri — and, I worry, Alexa+ — is the extent to which they are focused on being compelling products in their own right. It’s very clever for Siri to remember who I had coffee with; it’s very useful — and probably much more doable — to reliably turn my lights on and off. Apple (and I suspect Amazon) should have absolutely nailed the latter before promising to deliver the former.

If you want to be generous to Apple you could make the case that this was what they were trying to deliver with the Siri Intents expansion: developers could already expose parts of their apps to Siri for things like music playback, and new Siri was to build on that framework to enhance its knowledge about a user’s context to provide useful answers. This, though, put Apple firmly in control of the interaction layer, diminishing and commoditizing apps; that’s what an Aggregator does, but what if Apple went in a different direction?

An AI Platform

While my clearest delineation of the difference between Aggregators and Platforms is probably in A Framework for Regulating Competition on the Internet, perhaps the most romantic was in Tech’s Two Philosophies:

There is certainly an argument to be made that these two philosophies arise out of their historical context; it is no accident that Apple and Microsoft, the two “bicycle of the mind” companies, were founded only a year apart, and for decades had broadly similar business models: sure, Microsoft licensed software, while Apple sold software-differentiated hardware, but both were and are at their core personal computer companies and, by extension, platforms.

Google and Facebook, on the other hand, are products of the Internet, and the Internet leads not to platforms but to Aggregators. While platforms need 3rd parties to make them useful and build their moat through the creation of ecosystems, Aggregators attract end users by virtue of their inherent usefulness and, over time, leave suppliers no choice but to follow the Aggregators’ dictates if they wish to reach end users.

The business model follows from these fundamental differences: a platform provider has no room for ads, because the primary function of a platform is to provide a stage for the applications that users actually need to shine. Aggregators, on the other hand, particularly Google and Facebook, deal in information, and ads are simply another type of information. Moreover, because the critical point of differentiation for Aggregators is the number of users on their platform, advertising is the only possible business model; there is no more important feature when it comes to widespread adoption than being “free.”

Still, that doesn’t make the two philosophies any less real: Google and Facebook have always been predicated on doing things for the user, just as Microsoft and Apple have been built on enabling users and developers to make things completely unforeseen.

I said this was romantic, but the reality of Apple’s relationship with developers, particularly over the last few years as the growth of the iPhone has slowed, has been considerably more antagonistic. Apple gives lip service to the role developers played in making the iPhone a compelling platform — and in collectively forming a moat for iOS and Android — but its actions suggest that Apple views developers as a commodity: necessary in aggregate, but mostly a pain in the ass individually.

This is all very unfortunate, because Apple — in conjunction with its developers — is being presented with an incredible opportunity by AI, and it’s one that takes them back to their roots: to be a platform.

Start with the hardware: while the M3 Ultra is the biggest beast on the block, all of Apple’s M chips are highly capable, particularly if you have plenty of RAM. I happen to have an M2 MacBook Pro with 96GB of memory (I maxed out for this specific use case), which lets me run Mixtral 8x22B, an open-source model from Mistral with 141 billion parameters, at 4-bit quantization; I asked it a few questions:

You don’t need to actually try and read the screen-clipping; the output is pretty good, albeit not nearly as detailed and compelling as what you might expect from a frontier model. What’s amazing is that it exists at all: that answer was produced on my computer with my M2 chip, not in the cloud on an Nvidia datacenter GPU. I didn’t need to pay a subscription, or worry about rate limits. It’s my model on my device.

What’s arguably even more impressive is seeing models run on your iPhone:

This is a much smaller model, and correspondingly less capable, but the fact it is running locally on a phone is amazing!

Apple is doing the same thing with the models that undergird Apple Intelligence — some models run on your device, and others on Apple’s Private Cloud Compute — but those models aren’t directly accessible by developers; Apple only exposes writing tools, image playground, and Genmoji. And, of course, they ask for your app’s data for Siri, so they can be the AI Aggregator. If a developer wants to do something unique, they need to bring their own model, which is not only very large, but hard to optimize for a specific device.

What Apple should do instead is make its models — both local and in Private Cloud Compute — fully accessible to developers to make whatever they want. Don’t limit them to cutesy-yet-annoying frameworks like Genmoji or sanitized-yet-buggy image generators, and don’t assume that the only entity that can create something compelling using developer data is the developer of Siri; instead return to the romanticism of platforms: enabling users and developers to make things completely unforeseen. This is something only Apple could do, and, frankly, it’s something the entire AI industry needs.

When the M1 chip was released I wrote an Article called Apple’s Shifting Differentiation. It explained that while Apple had always been about the integration of hardware and software, the company’s locus of differentiation had shifted over time:

When OS X first came out, Apple’s differentiation was software: Apple hardware was stuck on PowerPC chips, woefully behind Intel’s best offerings, but developers in particular were lured by OS X’s beautiful UI and Unix underpinnings.
When Apple moved to Intel chips, its hardware was just as fast as Windows hardware, allowing its software differentiation to truly shine.
Over time, as more and more applications moved to the web, the software differences came to matter less and less; that’s why the M1 chip was important for the Mac’s future.

Apple has the opportunity with AI to press its hardware advantage: because Apple controls the entire device, they can guarantee to developers the presence of particular models at a particular level of performance, backed by Private Cloud Compute; this, by extension, would encourage developers to experiment and build new kinds of applications that only run on Apple devices.

This doesn’t necessarily preclude finally getting new Siri to work; the opportunity Apple is pursuing continues to make sense. At the same time, the implication of the company’s differentiation shifting to hardware is that the most important job for Apple’s software is to get out of the way; to use Apple’s history as analogy, Siri is the PowerPC of Apple’s AI efforts, but this is a self-imposed shortcoming. Apple is uniquely positioned to not do everything itself; instead of seeing developers as the enemy, Apple should deputize them and equip them in a way no one else in technology can.

I wrote a follow-up to this Article in this Daily Update.

Apple reserves some memory for the CPU at all times, so that the computer can actually run ↩

Get notified about new Articles

AI Promise and Chip Precariousness

Tuesday, February 25, 2025

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

Yesterday Anthropic released Claude Sonnet 3.7; Dylan Patel had the joke of the day about Anthropic’s seeming aversion to the number “4”, which means “die” in Chinese:

Anthropic is also a chinese ai company because of their aversion to the number four
— Dylan Patel (@dylan522p) February 24, 2025

Jokes aside, the correction on this post by Ethan Mollick suggests that Anthropic did not increment the main version number because Sonnet 3.7 is still in the GPT-4 class of models as far as compute is concerned.

After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered a 10^26 FLOP model and cost a few tens of millions of dollars to train, though future models will be much bigger. I updated the post with that information. The only significant change is that Claude 3 is now referred to as an advanced model but not a Gen3 model.

I love Mollick’s work, but reject his neutral naming scheme: whoever gets to a generation first deserves the honor of the name. In other words, if Gen2 models are GPT-4 class, then Gen3 models are Grok 3 class.

And, whereas Sonnet 3.7 is an evolution of Sonnet 3.5’s fascinating mixture of personality and coding prowess, likely a result of some Anthropic special sauce in post-training, Grok 3 feels like a model that is the result of a step-order increase in compute capacity, with a much lighter layer of reinforcement learning with human feedback (RLHF). Its answers are far more in-depth and detailed (model good!), but frequently becomes too verbose (RLHF lacking); it gets math problems right (model good!), but its explanations are harder to follow (RLHF lacking). It is also much more willing to generate forbidden content, from erotica to bomb recipes, while having on the surface the political sensibilities of Tumblr, with something more akin to 4chan under the surface if you prod.¹ Grok 3, more than any model yet, feels like the distilled Internet; it’s my favorite so far.

Grok 3 is also a reminder of how much speed matters, and, by extension, why base models are still important in a world of AI’s that reason. Grok 3 is tangibly faster than the competition, which is a better user experience; more generally, conversation is the realm of quick wits, not deep thinkers. The latter is who I want doing research or other agentic-type tasks; the former makes for a better consumer user experience in a chatbot or voice interface.

ChatGPT, meanwhile, still has the best product experience — its Mac app in particular is dramatically better than Claude’s² — and it handles more consumer-y use cases like math homework in a much more user-friendly way. Deep Research, meanwhile, is significantly better than all of its competitors (including Grok’s “Deep Search”), and, for me anyways, the closest experience yet to AGI.

OpenAI’s biggest asset, however, is the ChatGPT brand and associated mindshare; COO Brad Lightcap just told CNBC that the service had surpassed 400 million weekly active users, a 33% increase in less than 3 months. OpenAI is, as I declared four months after the release of ChatGPT, the accidental consumer tech company. Consumer tech companies are the hardest to build and have the potential to be the most valuable; they also require a completely different culture and value chain than a research organization with an API on the side. That is the fundamental reality that I suspect has driven much of the OpenAI upheaval over the last two-and-a-half years: long-time OpenAI employees didn’t sign up to be the next Google Search or Meta, nor is Microsoft interested in being a mere component supplier to a company that must own the consumer relationship to succeed.

In fact, though, OpenAI has moved too slowly: the company should absolutely have an ad-supported version by now, no matter how much the very idea might make AI researchers’ skin crawl; one of the takeaways from the DeepSeek phenomenon was how many consumers didn’t understand how good OpenAI’s best models were because they were not paying customers. It is very much in OpenAI’s competitive interest to make it cost-effective to give free users the best models, and that means advertising. More importantly, the only way for a consumer tech company to truly scale to the entire world is by having an ad model, which maximizes the addressable market while still making it possible to continually increase the average revenue per user (this doesn’t foreclose a subscription model of course; indeed, ads + subscriptions is the ultimate destination for a consumer content business).

DeepSeek, meanwhile, has both been the biggest story of the year, in part because it is the yin to Grok 3’s yang. DeepSeek’s V3 and R1 models are excellent and worthy competitors in the GPT-4 class, and they achieved this excellence through extremely impressive engineering on both the infrastructure and model layers; Grok 3, on the other hand, simply bought the most top-of-the-line Nvidia chips, leveraging the company’s networking to build the biggest computing cluster yet, and came out with a model that is better, but not astronomically so.

The fact that DeepSeek is Chinese is critically important, for reasons I will get to below, but it is just as important that it is an open lab, regularly publishing papers, full model weights, and underlying source code. DeepSeek’s models — which are both better than Meta’s Llama models and more open (and unencumbered by an “openish” license) — set the bar for “minimum open capability”; any model at or below DeepSeek’s models has no real excuse to not be open. Safety concerns are moot when you can just run DeepSeek, while competitive concerns are dwarfed by the sacrifice in uptake and interest entailed in having a model that is both worse and closed.

Both DeepSeek and Llama, meanwhile, provide significant pressure on pricing; API costs in both the U.S. and China have come down in response to the Chinese research lab’s releases, and the only way to have a sustainable margin in the long run is to either have a cost advantage in infrastructure (i.e. Google), have a sustainable model capability advantage (potentially Claude and coding), or be an Aggregator (which is what OpenAI ought to pursue with ChatGPT).

The State of AI Chips

All of this is — but for those with high p-doom concerns — great news. AI at the moment seems to be a goldilocks position: there is sufficient incentive for the leading research labs to raise money and continue investing in new foundation models (in the hope of building an AI that improves itself), even as competition drives API prices down relentlessly, further incentivizing model makers to come up with differentiated products and capabilities.

The biggest winner, of course, continues to be Nvidia, whose chips are fabbed by TSMC: DeepSeek’s success is causing Chinese demand for the H20, Nvidia’s reduced-compute-and-reduced-bandwidth-to-abide-by-export-controls version of the H200, to skyrocket, even as xAI just demonstrated that the fastest way to compete is to pay for the best chips. DeepSeek’s innovations will make other models more efficient, but it’s reasonable to argue that those efficiences are downstream from the chip ban, and that it’s understandable why companies who can just buy the best chips haven’t pursued — but will certainly borrow! — similar gains.

That latter point is a problem for AMD in particular: SemiAnalysis published a brutal breakdown late last year demonstrating just how poor the Nvidia competitor’s software is relative to its hardware; AMD promises to do better, but, frankly, great chips limited by poor software has been the story of AMD for its entire five decades of existence. Some companies, like Meta or Microsoft, might put in the work to write better software, but leading labs don’t have the time nor expertise.

The story is different for Huawei and its Ascend line of AI chips. Those chips are fabbed on China’s Semiconductor Manufactoring International Corporation’s (SMIC) 7nm process, using western-built deep ultraviolet lighography (DUV) and quad-patterning; that this is possible isn’t a surprise, but it’s reasonable to assume that the fab won’t progress further without a Chinese supplier developing extreme ultraviolet lithography (EUV) (and no, calling an evolution of the 7nm process 5.5nm doesn’t count).

Still, the primary limitation for AI chips — particularly when it comes to inference — isn’t necessarily chip speed, but rather memory bandwidth, and that can be improved at the current process level. Moreover, one way to (somewhat) overcome the necessity of using less efficient chips is to simply build more data centers with more power, something that China is much better at than the U.S. Most importantly, however, is that China’s tech companies have the motivation — and the software chops — to make the Ascend a viable contender, particularly for inference.

There is one more player who should be mentioned alongside Nvidia/TSMC and Huawei/SMIC, and that is the hyperscalers who design their own chips, either on their own (AWS with Trainium and Microsoft with Maia) or in collaboration with Broadcom (Google with TPUs and Meta with MTIA). The capabilities and importance of these efforts varies — Google has been investing in TPUs for a decade now, and trains its own models on them, while the next-generation Anthropic model is being trained on Trainium; Meta’s MTIA is about recommendations and not generative AI, while Microsoft’s Maia is a much more nascent effort — but what they all have in common is that their chips are fabbed by TSMC.

TSMC and Intel

That TSMC is dominant isn’t necessarily a surprise. Yes, much has been written, including on this site, about Intel’s stumbles and TSMC’s rise, but even if Intel had managed to stay on the leading edge — and 18A is looking promising — there is still the matter of the company needing to transform itself from an integrated device manufacturer (IDM) who designs and makes its own chips, to a foundry that has the customer service, IP library, and experience to make chips for 3rd parties like all of the entities I just discussed.

Nvidia, to take a pertinent example, was making its chips at TSMC (and Samsung) even when Intel had the leading process; indeed, it was the creation of TSMC and its pure-play foundry model that even made Nvidia possible.³ This also means that TSMC doesn’t just have leading edge capacity, but trailing edge capacity as well. There are a lot of chips in the world — both on AI servers and also in everything from cars to stereos to refrigerators — that don’t need to be on the cutting edge and which benefit from the low costs afforded by the fully depreciated foundries TSMC still maintains, mostly in Taiwan. And TSMC, in turn, can take that cash flow — along with increasing prices for the leading edge — and invest in new fabs on the cutting edge.

Those leading edge fabs continue to skyrocket in price, which means volume is critical. That is why it was clear to me back when this site started in 2013 that Intel needed to become a foundry; unfortunately the company didn’t follow my advice, preferring to see their stock price soar on the back of cloud server demand. Fast forward to 2021 and Intel — now no longer on the leading edge, and with its cloud server business bleeding share to a resurgent AMD on TSMC’s superior process — tried, under the leadership of Pat Gelsinger, to become a foundry; unfortunately the company’s diminishing cash position is larger than its foundry customer base, which is mostly experimental chips or x86 variants.

Intel’s core problem goes back to the observation above: becoming a foundry is about more than having the leading edge process; Intel might have been able to develop those skills in conjunction with customers eager to be on the best process in the world, but once Intel didn’t even have that, it had nothing to offer. There simply is no reason for an Apple or AMD or Nvidia to take the massive risk entailed in working with Intel when TSMC is an option.

China and a Changing World

TSMC is, of course, headquartered in Taiwan; that is where the company’s R&D and leading edge fabs are located, along with most of its trailing edge capacity. SMIC, obviously, is in China; another foundry is Samsung, in South Korea. I told the story as to why so much of this industry ended up in Asia last fall in A Chance to Build:

Semiconductors are so integral to the history of Silicon Valley that they give the region its name, and, more importantly, its culture: chips require huge amounts of up-front investment, but they have, relative to most other manufactured goods, minimal marginal costs; this economic reality helped drive the development of the venture capital model, which provided unencumbered startup capital to companies who could earn theoretically unlimited returns at scale. This model worked even better with software, which was perfectly replicable.

That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.

In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.

I recounted in that Article about how this outsourcing was an intentional policy of the U.S. government, and launched into a broader discussion about the post-War Pax Americana global order that placed the U.S. consumer market at the center of global trade, denominated by the dollar, and why that led to an inevitable decline in American manufacturing and the rise of a country in China that, in retrospect, was simply too big, and thus too expensive, for America to bear.

That, anyways, is how one might frame many of the signals coming out of the 2nd Trump administration, including what appears to be a Monroe 2.0 Doctrine approach to North America, an attempt to extricate the U.S. from the Ukraine conflict specifically and Europe broadly, and, well, a perhaps tamer approach to China to start, at least compared to Trump’s rhetoric on the campaign trail.

One possibility is that Trump is actually following through on the “pivot to Asia” that U.S. Presidents have been talking about but failing to execute on for years; in this view the U.S. is girding itself up to defend Taiwan and other entities in Asia, and hopefully break up the burgeoning China-Russia relationship in the process.

The other explanation is more depressing, but perhaps more realistic: President Trump may believe that the unipolar U.S.-dominated world that has been the norm since the fall of the Soviet Union is drawing to a close, and it’s better for the U.S. to proactively shift to a new norm than to have it forced upon them.

The important takeaway that is relevant to this Article is that Taiwan is the flashpoint in both scenarios. A pivot to Asia is about gearing up to defend Taiwan from a potential Chinese invasion or embargo; a retrenchment to the Americas is about potentially granting — or acknowledging — China as the hegemon of Asia, which would inevitably lead to Taiwan’s envelopment by China.

This is, needless to say, a discussion where I tread gingerly, not least because I have lived in Taipei off and on for over two decades. And, of course, there is the moral component entailed in Taiwan being a vibrant democracy with a population that has no interest in reunification with China. To that end, the status quo has been simultaneously absurd and yet surprisingly sustainable: Taiwan is an independent country in nearly every respect, with its own border, military, currency, passports, and — pertinent to tech — economy, increasingly dominated by TSMC; at the same time, Taiwan has not declared independence, and the official position of the United States is to acknowledge that China believes Taiwan is theirs, without endorsing either that position or Taiwanese independence.

Chinese and Taiwanese do, in my experience, handle this sort of ambiguity much more easily than do Americans; still, gray zones only go so far. What has been just as important are realist factors like military strength (once in favor of Taiwan, now decidedly in favor of China), economic ties (extremely deep between Taiwan and China, and China and the U.S.), and war-waging credibility. Here the Ukraine conflict and the resultant China-Russia relationship looms large, thanks to the sharing of military technology and overland supply chains for oil and food that have resulted, even as the U.S. has depleted itself. That, by extension, gets at another changing factor: the hollowing out of American manufacturing under Pax Americana has been directly correlated with China’s dominance of the business of making things, the most essential war-fighting capability.

Still, there is — or rather was — a critical factor that might give China pause: the importance of TSMC. Chips undergird every aspect of the modern economy; the rise of AI, and the promise of the massive gains that might result, only make this need even more pressing. And, as long as China needs TSMC chips, they have a powerful incentive to leave Taiwan alone.

Trump, Taiwan, and TSMC

Anyone who has been following the news for the last few years, however, can surely see the problem: the various iterations of the chip ban, going back to the initial action against ZTE in 2018, have the perhaps-unintended effect of making China less dependent on TSMC. I wrote at the time of the ZTE ban:

What seems likely to happen in the long run is a separation at the hardware layer as well; China is already investing heavily in chips, and this action will certainly spur the country to focus on the sort of relatively low-volume high-precision components that other countries like the U.S., Taiwan, and Japan specialize in (to date it has always made more sense for Chinese companies to focus on higher-volume lower-precision components). To catch up will certainly take time, but if this action harms ZTE as much as it seems it will I suspect the commitment will be even more significant than it already is.

I added two years later, after President Trump barred Huawei from TSMC chips in 2020:

I am, needless to say, not going to get into the finer details of the relationship between China and Taiwan (and the United States, which plays a prominent role); it is less that reasonable people may disagree and more that expecting reasonableness is probably naive. It is sufficient to note that should the United States and China ever actually go to war, it would likely be because of Taiwan.

In this TSMC specifically, and the Taiwan manufacturing base generally, are a significant deterrent: both China and the U.S. need access to the best chip maker in the world, along with a host of other high-precision pieces of the global electronics supply chain. That means that a hot war, which would almost certainly result in some amount of destruction to these capabilities, would be devastating…one of the risks of cutting China off from TSMC is that the deterrent value of TSMC’s operations is diminished.

Now you can see the fly in Goldilocks’ porridge! China would certainly like the best chips from TSMC, but they are figuring out how to manage with SMIC and the Ascend and surprisingly efficient state-of-the-art models; the entire AI economy in the U.S., on the other hand — the one that is developing so nicely, with private funding pursuing the frontier, and competition and innovation up-and-down the stack — is completely dependent on TSMC and Taiwan. We have created a situation where China is less dependent on Taiwan, even while we are more dependent on the island.

This is the necessary context for two more will-he-or-won’t-he ideas floated by President Trump; both are summarized in this Foreign Policy article:

U.S. President Donald Trump has vowed to impose tariffs on Taiwan’s semiconductor industry and has previously accused Taiwan of stealing the U.S. chip industry…The primary strategic goal for the administration is to revitalize advanced semiconductor manufacturing in the United States…As the negotiations between TSMC and the White House unfold, several options are emerging.

The most discussed option is a deal between TSMC, Intel, the U.S. government, and U.S. chip designers such as Broadcom and Qualcomm. Multiple reports indicate that the White House has proposed a deal that would have TSMC acquire a stake in Intel Foundry Services and take a leading role in its operations after IFS separated from Intel. Other reports suggest a potential joint venture involving TSMC, Intel, the U.S. government, and industry partners, with technology transfer and technical support from TSMC.

The motivation for such a proposal is clear: Intel’s board, who fired Gelsinger late last year, seems to want out of the foundry business, and Broadcom or Qualcomm are natural landing places for the design division; the U.S., however, is the entity that needs a leading edge foundry in the U.S., and the Trump administration is trying to compel TSMC to make it happen.

Unfortunately, I don’t think this plan is a good one. It’s simply not possible for one foundry to “take over” another: while the final output is the same — a microprocessor — nearly every step of the process is different in a multitude of ways. Transistors — even ones of the same class — can have different dimensions, with different layouts (TSMC, for example, packs its transistors more densely); production lines can be organized differently, to serve different approaches to lithography; chemicals are tuned to individual processes, and can’t be shared; equipment is tailored to a specific line, and can’t be switched out; materials can differ, throughout the chip, along with how exactly they are prepared and applied. Sure, most of the equipment could be repurposed, but one doesn’t simply layer a TSMC process onto an Intel fab! The best you could hope for is that TSMC could rebuild the fabs using the existing equipment according to their specifications.

That, though, doesn’t actually solve the Taiwan problem: TSMC is still headquartered in Taiwan, still has its R&D division there, and is still beholden to a Taiwanese government directive to not export its most cutting edge processes (and yes, there is truth to Trump’s complaints that Taiwan sees TSMC as leverage to guarantee that the U.S. defends Taiwan in the event of a Chinese invasion). Moreover, the U.S. chip problem isn’t just about the leading edge, but also the trailing edge. I wrote in Chips and China:

It’s worth pointing out, though, that this is producing a new kind of liability for the U.S., and potentially more danger for Taiwan…these aren’t difficult chips to make, but that is precisely why it makes little sense to build new trailing edge foundries in the U.S.: Taiwan already has it covered (with the largest marketshare in both categories), and China has the motivation to build more just so it can learn.

What, though, if TSMC were taken off the board?

Much of the discussion around a potential invasion of Taiwan — which would destroy TSMC (foundries don’t do well in wars) — centers around TSMC’s lead in high end chips. That lead is real, but Intel, for all of its struggles, is only a few years behind. That is a meaningful difference in terms of the processors used in smartphones, high performance computing, and AI, but the U.S. is still in the game. What would be much more difficult to replace are, paradoxically, trailing node chips, made in fabs that Intel long ago abandoned…

The more that China builds up its chip capabilities — even if that is only at trailing nodes — the more motivation there is to make TSMC a target, not only to deny the U.S. its advanced capabilities, but also the basic chips that are more integral to everyday life than we ever realized.

It’s good that the administration is focused on the issue of TSMC and Taiwan: what I’m not sure anyone realizes is just how deep the dependency goes, and just how vulnerable the U.S. — and our future in AI — really is.

What To Do

Everything that I’ve written until now has been, in some respects, trivial: it’s easy to identify problems and criticize proposed solutions; it’s much more difficult to come up with solutions of one’s own. The problem is less the need for creative thinking and more the courage to make trade-offs: the fact of the matter is that there are no good solutions to the situation the U.S. has got itself into with regards to Taiwan and chips. That is a long-winded way to say that the following proposal includes several ideas that, in isolation, I find some combination of distasteful, against my principles, and even downright dangerous. So here goes.

End the China Chip Ban

The first thing the U.S. should do — and, by all means, make this a negotiating plank in a broader agreement with China — is let Chinese companies, including Huawei, make chips at TSMC, and further, let Chinese companies buy top-of-the-line Nvidia chips.

The Huawei one is straightforward: Huawei’s founder may have told Chinese President Xi Jinping that Huawei doesn’t need external chip makers, but I think that the reality of having access to cutting edge TSMC fabrication would show that the company’s revealed preference would be for better chips than Huawei can get from SMIC — and the delta is only going to grow. Sure, Huawei would still work with SMIC, but the volume would go down; critically, so would the urgency of having no other choice. This, by extension, would restart China’s dependency on TSMC, thereby increasing the cost of making a move on Taiwan.

At the same time, giving Huawei access to cutting edge chips would be a significant threat to Nvidia’s dominance; the reason the company is so up-in-arms about the chip ban isn’t simply foregone revenue but the forced development of an alternative to their CUDA ecosystem. The best way to neuter that challenge — and it is in the U.S.’s interest to have Nvidia in control, not Huawei — is to give companies like Bytedance, Alibaba, and DeepSeek the opportunity to buy the best.

This does, without question, unleash China in terms of AI; preventing that has been the entire point of the various flavors of chip bans that came down from the Biden administration. DeepSeek’s success, however, should force a re-evaluation about just how viable it is to completely cut China off from AI.

It’s also worth noting that success in stopping China’s AI efforts has its own risks: another reason why China has held off from moving against Taiwan is the knowledge that every year they wait increases their relative advantages in all the real world realities I listed above; that makes it more prudent to wait. The prospect of the U.S. developing the sort of AI that matters in a military context, however, even as China is cut off, changes that calculus: now the prudent course is to move sooner rather than later, particularly if the U.S. is dependent on Taiwan for the chips that make that AI possible.

Double Down on the Semiconductor Equipment Ban

While I’ve continually made references to “chip bans”, that’s actually incomplete: the U.S. has also made moves to limit China’s access to semiconductor equipment necessary for making leading edge chips (SMIC’s 7nm process, for example, is almost completely dependent on western semiconductor equipment). Unfortunately, this effort has mostly been a failure, thanks to generous loopholes that are downstream from China being a large market for U.S. semiconductor equipment manufacturers.

It’s time for those loopholes to go away; remember, the overriding goal is for China to increase its dependence on Taiwan, and that means cutting SMIC and China’s other foundries off at the knees. Yes, this increases the risk that China will develop its own alternatives to western semiconductor manufacturers, leading to long-term competition and diminished money for R&D, but this is a time for hard choices and increasing Taiwan’s importance to China is more important.

Build Trailing Edge Fabs in the U.S.

The U.S.’s dependency on TSMC for trailing edge chip capacity remains a massive problem; if you think the COVID chip shortages were bad, then a scenario where the U.S. is stuck with GlobalFoundries and no one else is a disaster so great it is hard to contemplate. However, as long as TSMC exists, there is zero economic rationale for anyone to build more trailing edge fabs.

This, then, is a textbook example of where government subsidies are the answer: there is a national security need for trailing edge capacity, and no economic incentive to build it. And, as an added bonus, this helps fill in some of the revenue for semiconductor manufacturers who are now fully cut off from China. TSMC takes a blow, of course, but they are also being buttressed by orders from Huawei and other Chinese chip makers.

Intel and the Leading Edge

That leaves Intel and the need for native leading edge capacity, and this is in some respects the hardest problem to solve.

First, the U.S. should engineer a spin-off of Intel’s x86 chip business to Broadcom or Qualcomm at a nominal price; the real cost for the recipient company will be guaranteed orders for not just Intel chips but also a large portion of their existing chips for Intel Foundry. This will provide the foundational customer to get Intel Foundry off the ground.

Second, the U.S. should offer to subsidize Nvidia chips made at Intel Foundry. Yes, this is an offer worth billions of dollars, but it is the shortest, fastest route to ground the U.S. AI industry in U.S. fabs.

Third, if Nvidia declines — and they probably will, given the risks entailed in a foundry change — then the U.S. should make a massive order for Intel Gaudi AI accelerators, build data centers to house them, and make them freely available to companies and startups who want to build their own AI models, with the caveat that everything is open source.

Fourth, the U.S. should heavily subsidize chip startups to build at Intel Foundry, with the caveat that all of the resultant IP that is developed to actually build chips — the basic building blocks, that are separate from the “secret sauce” of the chip itself — is open-sourced.

Fifth, the U.S. should indemnify every model created on U.S.-manufactured chips against any copyright violations, with the caveat that the data used to train the model must be made freely available.

Here is the future state the U.S. wants to get to: a strong AI industry running on U.S.-made chips, along with trailing edge capacity that is beyond the reaches of China. Getting there, however, will take significant interventions into the market to undo the overwhelming incentives for U.S. companies to simply rely on TSMC; even then, such a shift will take time, which is why making Taiwan indispensable to China’s technology industry is the price that needs to be paid in the meantime.

AI is in an exciting place; it’s also a very precarious one. I believe this plan, with all of the risks and sacrifices it entails, is the best way to ensure that all of the trees that are sprouting have time to actually take root and change the world.

I wrote a follow-up to this Article in this Daily Update.
1. This suggests a surprising takeaway: it’s possible that while RLHF on ChatGPT and especially Claude block off the 4chan elements, they also tamp down the Tumblr elements, which is to say the politics don’t come from the post-training, but from the dataset — i.e. the Internet. In other words, if I’m right about Grok 3 having a much lighter layer of RLHF, then that explains both the surface politics, and what is available under the surface. ↩
2. Grok doesn’t yet have a Mac app, but its iPhone app is very good ↩
3. Although Nvidia’s first chip was made by SGS-Thomson Microelectronics ↩
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Deep Research and Knowledge Value

Monday, February 10, 2025

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

“When did you feel the AGI?”

This is a question that has been floating around AI circles for a while, and it’s a hard one to answer for two reasons. First, what is AGI, and second, “feel” is a bit like obscenity: as Supreme Court Justice Potter Stewart famously said in Jacobellis v. Ohio, “I know it when I see it.”

I gave my definition of AGI in AI’s Uneven Arrival:

What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford…My definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place).

The “feel” part of that question is a more recent discovery: DeepResearch from OpenAI feels like AGI; I just got a new employee for the shockingly low price of $200/month.

Deep Research Bullets

OpenAI announced Deep Research in a February 2 blog post:

Today we’re launching deep research in ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours.

Deep research is OpenAI’s next agent that can do work for you independently — you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst. Powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis, it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters.

The ability to synthesize knowledge is a prerequisite for creating new knowledge. For this reason, deep research marks a significant step toward our broader goal of developing AGI, which we have long envisioned as capable of producing novel scientific research.

It’s honestly hard to keep track of OpenAI’s AGI definitions these days — CEO Sam Altman, just yesterday, defined it as “a system that can tackle increasingly complex problems, at human level, in many fields” — but in my rather more modest definition Deep Research sits right in the middle of that excerpt: it synthesizes research in an economically valuable way, but doesn’t create new knowledge.

I already published two examples of Deep Research in last Tuesday’s Stratechery Update. While I suggest reading the whole thing, to summarize:
- First, I published my (brief) review of Apples recent earnings, including three observations:
  - It was notable that Apple earned record revenue even though iPhone sales were down year-over-year, in the latest datapoint about the company’s transformation into a Services juggernaut.
  - China sales were down again, but this wasn’t a new trend: it actually goes back nearly a decade, but you can only see that if you realize how the Huawei chip ban gave Apple a temporary boost in the country.
  - While Apple executives claimed that Apple Intelligence drove iPhone sales, there really wasn’t any evidence in the geographic sales numbers supporting that assertion.
- Second, I published a Deep Research report using a generic prompt:
  
  I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings in the style and voice of Stratechery that is in line with my previous analysis.
- Third, I published a Deep Research report using a prompt that incorporated my takeaways from the earnings:
  
  I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings for fiscal year 2025 q1 (calendar year 2024 q4). There are a couple of angles I am particularly interested in:
  
  - First, there is the overall trend of services revenue carrying the companies earnings. How has that trend continued, what does it mean for margins, etc.
  
  - Second, I am interested in the China angle. My theory is that Apple's recent decline in China is not new, but is actually part of a longer trend going back nearly a decade. I believe that trend was arrested by the chip ban on Huawei, but that that was only a temporary bump in terms of a long-term decline. In addition, I would like to marry this to deeper analysis of the Chinese phone market, the distinction between first tier cities and the rest of China, and what that says about Apple's prospects in the country.
  
  - Third, what takeaways are there about Apple's AI prospects? The company claims that Apple Intelligence is helping sales in markets where it has launched, but isn't this a function of not being available in China?
  
  Please deliver this report in a format and style that is suitable for Stratechery.
You can read the Update for the output, but this was my evaluation:

The first answer was decent given the paucity of instruction; it’s really more of a summary than anything, but there are a few insightful points. The second answer was considerably more impressive. This question relied much more heavily on my previous posts, and weaved points I’ve made in the past into the answer. I don’t, to be honest, think I learned anything new, but I think that anyone encountering this topic for the first time would have. Or, to put it another way, were I looking for a research assistant, I would consider hiring whoever wrote the second answer.

In other words, Deep Research isn’t a rifle barrel, but for this question at least, it was a pretty decent piece of ammunition.

DeepResearch Examples

Still, that ammunition wasn’t that valuable to me; I read the transcript of Apple’s earnings call before my 8am Dithering recording and came up with my three points immediately; that’s the luxury of having thought about and covered Apple for going on twelve years. And, as I noted above, the entire reason that the second Deep Research report was interesting was because I came up with the ideas and Deep Research substantiated them; the substantiation, however, wasn’t nearly to the standard (in my very biased subjective opinion!) of a Stratechery Update.

I found a much more beneficial use case the next day. Before I conduct a Stratechery Interview I do several hours of research on the person I am interviewing, their professional background, the company they work for, etc.; in this case I was talking to Bill McDermott, the Chairman and CEO of ServiceNow, a company I am somewhat familiar with but not intimately so. So, I asked Deep Research for help:

I am going to conduct an interview with Bill McDermott, the CEO of ServiceNow, and I need to do research about both McDermott and ServiceNow to prepare my questions.

First, I want to know more about McDermott and his background. Ideally there are some good profiles of him I can read. I know he used to work at SAP and I would like to know what is relevant about his experience there. Also, how and why did he take the ServiceNow job?

Then, what is the background of ServiceNow? How did it get started? What was its initial product-market fit, and how has it expanded over time? What kind of companies use ServiceNow?

What is the ServiceNow business model? What is its go-to-market strategy?

McDermott wants to talk about ServiceNow's opportunities in AI. What are those opportunities, and how are they meaningfully unique, or different from simple automation?

What do users think of ServiceNow? Is it very ugly and hard to use? Why is it very sticky? What attracts companies to it?

What competitors does ServiceNow have? Can it be a platform for other companies? Or is there an opportunity to disrupt ServiceNow?

What other questions do you have that would be useful for me to ask?

You can use previous Stratechery Interviews as a resource to understand the kinds of questions I typically ask.

I found the results eminently useful, although the questions were pretty mid; I did spend some time doing some additional reading of things like earnings reports before conducting the Interview with my own questions. In short, it saved me a fair bit of time and gave me a place to start from, and that alone more than paid for my monthly subscription.

Another compelling example came in researching a friend’s complicated medical issue; I’m not going to share my prompt and results for obvious reasons. What I will note is that this friend has been struggling with this issue for over a year, and has seen multiple doctors and tried several different remedies. Deep Research identified a possible issue in ten minutes that my friend has only just learned about from a specialist last week; while it is still to be determined if this is the answer he is looking for, it is notable that Deep Research may have accomplished in ten minutes what has taken my friend many hours over many months with many medical professionals.

It is the final example, however, that is the most interesting, precisely because it is the question on which Deep Research most egregiously failed. I generated a report about another friend’s industry, asking for the major players, supply chain analysis, customer segments, etc. It was by far my most comprehensive and detailed prompt. And, sure enough, Deep Research came back with a fully fleshed out report answering all of my questions.

It was also completely wrong, but in a really surprising way. The best way to characterize the issue is to go back to that famous Donald Rumsfeld quote:

There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know.

The issue with the report I generated — and once again, I’m not going to share the results, but this time for reasons that are non-obvious — is that it completely missed a major entity in the industry in question. This particular entity is not a well-known brand, but is a major player in the supply chain. It is a significant enough entity that any report about the industry that did not include them is, if you want to be generous, incomplete.

It is, in fact, the fourth categorization that Rumsfeld didn’t mention: “the unknown known.” Anyone who read the report that Deep Research generated would be given the illusion of knowledge, but would not know what they think they know.

Knowledge Value

One of the most painful lessons of the Internet was the realization by publishers that news was worthless. I’m not speaking about societal value, but rather economic value: something everyone knows is both important and also non-monetizable, which is to say that the act of publishing is economically destructive. I wrote in Publishers and the Pursuit of the Past:

Too many newspaper advocates utterly and completely fail to understand this; the truth is that newspapers made money in the past not by providing societal value, but by having quasi-monopolistic control of print advertising in their geographic area; the societal value was a bonus. Thus, when Chavern complains that “today’s internet distribution systems distort the flow of economic value derived from good reporting”, he is in fact conflating societal value with economic value; the latter does not exist and has never existed.

This failure to understand the past leads to a misdiagnosis of the present: Google and Facebook are not profitable because they took newspapers’ reporting, they are profitable because they took their advertising. Moreover, the utility of both platforms is so great that even if all newspaper content were magically removed — which has been tried in Europe — the only thing that would change is that said newspapers would lose even more revenue as they lost traffic.

This is why this solution is so misplaced: newspapers no longer have a monopoly on advertising, can never compete with the Internet when it comes to bundling content, and news remains both valuable to society and, for the same reasons, worthless economically (reaching lots of people is inversely correlated to extracting value, and facts — both real and fake ones — spread for free).

It is maybe a bit extreme to say has always been such; in truth it is very hard to draw direct lines from the analog era, defined as it was by friction and scarcity, to the Internet era’s transparency and abundance. It may have technically been the case that those of us old enough to remember newsstands bought the morning paper because a local light manufacturing company owned printing presses, delivery trucks, and an advertising sales team, but we too believed we simply wanted to know what was happening. Now we get that need fulfilled for free, and probably by social media (for better or worse); I sometimes wish I knew less!

Still, what Deep Research reveals is how much more could be known. I read a lot of things on the Internet, but it’s not as if I will ever come close to reading everything. Moreover, as the amount of slop increases — whether human or AI generated — the difficulty in finding the right stuff to read is only increasing. This is also one problem with Deep Research that is worth pointing out: the worst results are often, paradoxically, for the most popular topics, precisely because those are the topics that are the most likely to be contaminated by slop. The more precise and obscure the topic, the more likely it is that Deep Research will have to find papers and articles that actually cover the topic well:

This graph, however, is only half complete, as the example of my friend’s industry shows:

There is a good chance that Deep Research, particularly as it evolves, will become the most effective search engine there has ever been; it will find whatever information there is to find about a particular topic and present it in a relevant way. It is the death, in other words, of security through obscurity. Previously we shifted from a world where you had to pay for the news to the news being fed to you; now we will shift from a world where you had to spend hours researching a topic to having a topic reported to you on command.

Unless, of course, the information that matters is not on the Internet. This is why I am not sharing the Deep Research report that provoked this insight: I happen to know some things about the industry in question — which is not related to tech, to be clear — because I have a friend who works in it, and it is suddenly clear to me how much future economic value is wrapped up in information not being public. In this case the entity in question is privately held, so there aren’t stock market filings, public reports, barely even a webpage! And so AI is blind.

There is another example, this time in tech, of just how valuable secrecy can be. Amazon launched S3, the first primitive offered by AWS, in 2006, followed by EC2 later that year, and soon transformed startups and venture capital. What wasn’t clear was to what extent AWS was transforming Amazon; the company slowly transitioned Amazon.com to AWS, and that was reason enough to list AWS’s financials under Amazon.com until 2012, and then under “Other” — along with things like credit card and (then small amounts of) advertising revenue — after that.

The grand revelation would come in 2015, when Amazon announced in January that it would break AWS out into a separate division for reporting purposes. From a Reuters report at the time:

After years of giving investors the cold shoulder, Amazon.com Inc is starting to warm up to Wall Street. The No. 1 U.S. online retailer was unusually forthcoming during its fourth-quarter earnings call on Thursday, saying it will break out results this year, for the first time, for its fast-growing cloud computing unit, Amazon Web Services

The additional information shared during Amazon’s fourth-quarter results as well as its emphasis on becoming more efficient signaled a new willingness by Amazon executives to listen to investors as well. “This quarter, Amazon flexed its muscles and said this is what we can do when we focus on profits,” said Rob Plaza, senior equity analyst for Key Private Bank. “If they could deliver that upper teens, low 20s revenue growth and be able to deliver profits on top of that, the stock is going to respond.” The change is unlikely to be dramatic. When asked whether this quarter marked a permanent shift in Amazon’s relationship with Wall Street, Plaza laughed: “I wouldn’t be chasing the stock here based on that.”

Still, the shift is a good sign for investors, who have been clamoring for Amazon to disclose more about its fastest-growing and likely most profitable division that some analysts say accounts for 4 percent of total sales.

In fact, AWS accounted for nearly 7 percent of total sales, and it was dramatically more profitable than anyone expected. The revelation caused such a massive uptick in the stock price that I called it The AWS IPO:

One of the technology industry’s biggest and most important IPOs occurred late last month, with a valuation of $25.6 billion dollars. That’s more than Google, which IPO’d at a valuation of $24.6 billion, and certainly a lot more than Amazon, which finished its first day on the public markets with a valuation of $438 million. Don’t feel too bad for the latter, though: the “IPO” I’m talking about was Amazon Web Services, and it just so happens to still be owned by the same e-commerce company that went public nearly 20 years ago.

I’m obviously being facetious; there was no actual IPO for AWS, just an additional line item on Amazon’s financial reports finally breaking out the cloud computing service Amazon pioneered nine years ago. That line item, though, was almost certainly the primary factor in driving an overnight increase in Amazon’s market capitalization from $182 billion on April 23 to $207 billion on April 24. It’s not only that AWS is a strong offering in a growing market with impressive economics, it also may, in the end, be the key to realizing the potential of Amazon.com itself.

That $25.6 billion increase in market cap, however, came with its own costs: both Microsoft and Google doubled down on their own cloud businesses in response, and while AWS is still the market leader, it faces stiff competition. That’s a win for consumers and customers, but also a reminder that known unknowns have a value all their own.

Surfacing Data

I wouldn’t go so far as to say that Amazon was wrong to disclose AWS’s financials. In fact, SEC rules would have required as much once AWS revenue became 10% of the company’s overall business (today it is 15%, which might seem low until you remember that Amazon’s top-line revenue includes first-party e-commerce sales). Moreover, releasing AWS’s financials gave investors renewed confidence in the company, giving management freedom to continue investing heavily in capital expenditures for both AWS and the e-commerce business, fueling Amazon’s transformation into a logistics company. The point, rather, is to note that secrets are valuable.

What is interesting to consider is what this means for AI tools like Deep Research. Hedge funds have long known the value of proprietary data, paying for everything from satellite images to traffic observers and everything in between in order to get a market edge. My suspicion is that work like this is going to become even more valuable as security by obscurity disappears; it’s going to be more difficult to harvest alpha from reading endless financial filings when an AI can do that research in a fraction of the time.¹

The problem with those hedge fund reports, however, is that they themselves are proprietary; however, they are not a complete secret. After all, the way to monetize that research is through making trades on the open market, which is to say those reports have an impact on prices. Pricing is a signal that is available to everyone, and it’s going to become an increasingly important one.

That, by extension, is why AIs like Deep Research are one of the most powerful arguments yet for prediction markets. Prediction markets had their moment in the sun last fall during the U.S. presidential election, when they were far more optimistic about a Trump victory than polls. However, the potential — in fact, the necessity — of prediction markets is only going to increase with AI. AI’s capability of knowing everything that is public is going to increase the incentive to keep things secret; prediction markets in everything will provide a profit incentive for knowledge to be disseminated, by price if nothing else.

It is also interesting that prediction markets have become associated with crypto, another technology that is poised to come into its own in an AI-dominated world; infinite content generation increases the value of digital scarcity and verification, just as infinite transparency increases the value of secrecy. AI is likely to be the key to tying all of this together: a combination of verifiable information and understandable price movements may the only way to derive any meaning from the slop that is slowly drowning the Internet.

This is the other reality of AI, and why it is inescapable. Just as the Internet’s transparency and freedom to publish has devolved into torrents of information of questionable veracity, requiring ever more heroic efforts to parse, and undeniable opportunities to thrive by building independent brands — like this site — AI will both be the cause of further pollution of the information ecosystem and, simultaneously, the only way out.

Deep Research Impacts

Much of this is in the (not-so-distant) future; for now Deep Research is one of the best bargains in technology. Yes, $200/month is a lot, and yes, Deep Research is limited by the quality of information on the Internet and is highly dependent on the quality of the prompt. I can’t say that I’ve encountered any particular sparks of creativity, at least in arenas that I know well, but at the same time, there is a lot of work that isn’t creative in nature, but necessary all the same. I personally feel much more productive, and, truth be told, I was never going to hire a researcher anyways.

That, though, speaks to the peril in two distinct ways. First, one reason I’ve never hired a researcher is that I see tremendous value in the search for and sifting of information. There is so much you learn on the way to a destination, and I value that learning; will serendipity be an unwelcome casualty to reports on demand? Moreover, what of those who haven’t — to take the above example — been reading Apple earnings reports for 12 years, or thinking and reading about technology for three decades? What will be lost for the next generation of analysts?

And, of course, there is the job question: lots of other entities employ researchers, in all sorts of fields, and those salaries are going to be increasingly hard to justify. I’ve known intellectually that AI would replace wide swathes of knowledge work; it is another thing to feel it viscerally.

At the same time, that is why the value of secrecy is worth calling out. Secrecy is its own form of friction, the purposeful imposition of scarcity on valuable knowledge. It speaks to what will be valuable in an AI-denominated future: yes, the real world and human-denominated industries will rise in economic value, but so will the tools and infrastructure that both drive original research and discoveries, and the mechanisms to price it. The power of AI, at least on our current trajectory, comes from knowing everything; the (perhaps doomed) response of many will be to build walls, toll gates, and marketplaces to protect and harvest the fruits of their human expeditions.
1. I don’t think Deep Research is good at something like this, at least not yet. For example, I generated a report about what happened in 2015 surrounding Amazon’s disclosure, and the results were pretty poor; this is, however, the worst the tool will ever be. ↩
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
DeepSeek FAQ

Monday, January 27, 2025

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

It’s Monday, January 27. Why haven’t you written about DeepSeek yet?

I did! I wrote about R1 last Tuesday.

I totally forgot about that.

I take responsibility. I stand by the post, including the two biggest takeaways that I highlighted (emergent chain-of-thought via pure reinforcement learning, and the power of distillation), and I mentioned the low cost (which I expanded on in Sharp Tech) and chip ban implications, but those observations were too localized to the current state of the art in AI. What I totally failed to anticipate were the broader implications this news would have to the overall meta-discussion, particularly in terms of the U.S. and China.

Is there precedent for such a miss?

There is. In September 2023 Huawei announced the Mate 60 Pro with a SMIC-manufactured 7nm chip. The existence of this chip wasn’t a surprise for those paying close attention: SMIC had made a 7nm chip a year earlier (the existence of which I had noted even earlier than that), and TSMC had shipped 7nm chips in volume using nothing but DUV lithography (later iterations of 7nm were the first to use EUV). Intel had also made 10nm (TSMC 7nm equivalent) chips years earlier using nothing but DUV, but couldn’t do so with profitable yields; the idea that SMIC could ship 7nm chips using their existing equipment, particularly if they didn’t care about yields, wasn’t remotely surprising — to me, anyways.

What I totally failed to anticipate was the overwrought reaction in Washington D.C. The dramatic expansion in the chip ban that culminated in the Biden administration transforming chip sales to a permission-based structure was downstream from people not understanding the intricacies of chip production, and being totally blindsided by the Huawei Mate 60 Pro. I get the sense that something similar has happened over the last 72 hours: the details of what DeepSeek has accomplished — and what they have not — are less important than the reaction and what that reaction says about people’s pre-existing assumptions.

So what did DeepSeek announce?

The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.

Is this model naming convention the greatest crime that OpenAI has committed?

Second greatest; we’ll get to the greatest momentarily.

Let’s work backwards: what was the V2 model, and why was it important?

The DeepSeek-V2 model introduced two important breakthroughs: DeepSeekMoE and DeepSeekMLA. The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each.

DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.

DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.

I’m not sure I understood any of that.

The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.

That seems impossibly low.

DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:

Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

So no, you can’t replicate DeepSeek the company for $5.576 million.

I still don’t believe that number.

Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.

Scale AI CEO Alexandr Wang said they have 50,000 H100s.

I don’t know where Wang got his information; I’m guessing he’s referring to this November 2024 tweet from Dylan Patel, which says that DeepSeek had “over 50k Hopper GPUs”. H800s, however, are Hopper GPUs, they just have much more constrained memory bandwidth than H100s because of U.S. sanctions.

Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.

Meanwhile, DeepSeek also makes their models available for inference: that requires a whole bunch of GPUs above-and-beyond whatever was used for training.

So was this a violation of the chip ban?

Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.

Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.

So V3 is a leading edge model?

It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.

What is distillation?

Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.

Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.

Distillation seems terrible for leading edge models.

It is! On the positive side, OpenAI and Anthropic and Google are almost certainly using distillation to optimize the models they use for inference for their consumer-facing apps; on the negative side, they are effectively bearing the entire cost of training the leading edge, while everyone else is free-riding on their investment.

Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated.

Is this why all of the Big Tech stock prices are down?

In the long run, model commoditization and cheaper inference — which DeepSeek has also demonstrated — is great for Big Tech. A world where Microsoft gets to provide inference to its customers for a fraction of the cost means that Microsoft has to spend less on data centers and GPUs, or, just as likely, sees dramatically higher usage given that inference is so much cheaper. Another big winner is Amazon: AWS has by-and-large failed to make their own quality model, but that doesn’t matter if there are very high quality open source models that they can serve at far lower costs than expected.

Apple is also a big winner. Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference (Nvidia gaming GPUs max out at 32GB of VRAM, while Apple’s chips go up to 192 GB of RAM).

Meta, meanwhile, is the biggest winner of all. I already laid out last fall how every aspect of Meta’s business benefits from AI; a big barrier to realizing that vision is the cost of inference, which means that dramatically cheaper inference — and dramatically cheaper training, given the need for Meta to stay on the cutting edge — makes that vision much more achievable.

Google, meanwhile, is probably in worse shape: a world of decreased hardware requirements lessens the relative advantage they have from TPUs. More importantly, a world of zero-cost inference increases the viability and likelihood of products that displace search; granted, Google gets lower costs as well, but any change from the status quo is probably a net negative.

I asked why the stock prices are down; you just painted a positive picture!

My picture is of the long run; today is the short run, and it seems likely the market is working through the shock of R1’s existence.

Wait, you haven’t even talked about R1 yet.

R1 is a reasoning model like OpenAI’s o1. It has the ability to think through a problem, producing much higher quality results, particularly in areas like coding, math, and logic (but I repeat myself).

Is this more impressive than V3?

Actually, the reason why I spent so much time on V3 is that that was the model that actually demonstrated a lot of the dynamics that seem to be generating so much surprise and controversy. R1 is notable, however, because o1 stood alone as the only reasoning model on the market, and the clearest sign that OpenAI was the market leader.

R1 undoes the o1 mythology in a couple of important ways. First, there is the fact that it exists. OpenAI does not have some sort of special sauce that can’t be replicated. Second, R1 — like all of DeepSeek’s models — has open weights (the problem with saying “open source” is that we don’t have the data that went into creating it). This means that instead of paying OpenAI to get reasoning, you can run R1 on the server of your choice, or even locally, at dramatically lower cost.

How did DeepSeek make R1?

DeepSeek actually made two models: R1 and R1-Zero. I actually think that R1-Zero is the bigger deal; as I noted above, it was my biggest focus in last Tuesday’s Update:

R1-Zero, though, is the bigger deal in my mind. From the paper:

In this paper, we take the first step toward improving language model reasoning capabilities using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO as the RL framework to improve model performance in reasoning. During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912.

Reinforcement learning is a technique where a machine learning model is given a bunch of data and a reward function. The classic example is AlphaGo, where DeepMind gave the model the rules of Go with the reward function of winning the game, and then let the model figure everything else on its own. This famously ended up working better than other more human-guided techniques.

LLMs to date, however, have relied on reinforcement learning with human feedback; humans are in the loop to help guide the model, navigate difficult choices where rewards aren’t obvious, etc. RLHF was the key innovation in transforming GPT-3 into ChatGPT, with well-formed paragraphs, answers that were concise and didn’t trail off into gibberish, etc.

R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions.

What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments”:

A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment”. This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The “aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.

This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself!

Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. Back to the introduction:

However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.

This sounds a lot like what OpenAI did for o1: DeepSeek started the model out with a bunch of examples of chain-of-thought thinking so it could learn the proper format for human consumption, and then did the reinforcement learning to enhance its reasoning, along with a number of editing and refinement steps; the output is a model that appears to be very competitive with o1.

Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime.

So are we close to AGI?

It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.

But isn’t R1 now in the lead?

I don’t think so; this has been overstated. R1 is competitive with o1, although there do seem to be some holes in its capability that point towards some amount of distillation from o1-Pro. OpenAI, meanwhile, has demonstrated o3, a far more powerful reasoning model. DeepSeek is absolutely the leader in efficiency, but that is different than being the leader overall.

So why is everyone freaking out?

I think there are multiple factors. First, there is the shock that China has caught up to the leading U.S. labs, despite the widespread assumption that China isn’t as good at software as the U.S.. This is probably the biggest thing I missed in my surprise over the reaction. The reality is that China has an extremely proficient software industry generally, and a very good track record in AI model building specifically.

Second is the low training cost for V3, and DeepSeek’s low inference costs. This part was a big surprise for me as well, to be sure, but the numbers are plausible. This, by extension, probably has everyone nervous about Nvidia, which obviously has a big impact on the market.

Third is the fact that DeepSeek pulled this off despite the chip ban. Again, though, while there are big loopholes in the chip ban, it seems likely to me that DeepSeek accomplished this with legal chips.

I own Nvidia! Am I screwed?

There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:
- CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
- Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.

That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that:

Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of. https://t.co/omEcOPhdIz
— Satya Nadella (@satyanadella) January 27, 2025

Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!

Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.

In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.

So what about the chip ban?

The easiest argument to make is that the importance of the chip ban has only been accentuated given the U.S.’s rapidly evaporating lead in software. Software and knowhow can’t be embargoed — we’ve had these debates and realizations before — but chips are physical objects and the U.S. is justified in keeping them away from China.

At the same time, there should be some humility about the fact that earlier iterations of the chip ban seem to have directly led to DeepSeek’s innovations. Those innovations, moreover, would extend to not just smuggled Nvidia chips or nerfed ones like the H800, but to Huawei’s Ascend chips as well. Indeed, you can very much make the case that the primary outcome of the chip ban is today’s crash in Nvidia’s stock price.

What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. Yes, this may help in the short term — again, DeepSeek would be even more effective with more computing — but in the long run it simply sews the seeds for competition in an industry — chips and semiconductor equipment — over which the U.S. has a dominant position.

Like AI models?

AI models are a great example. I mentioned above I would get to OpenAI’s greatest crime, which I consider to be the 2023 Biden Executive Order on AI. I wrote in Attenuating Innovation:

The point is this: if you accept the premise that regulation locks in incumbents, then it sure is notable that the early AI winners seem the most invested in generating alarm in Washington, D.C. about AI. This despite the fact that their concern is apparently not sufficiently high to, you know, stop their work. No, they are the responsible ones, the ones who care enough to call for regulation; all the better if concerns about imagined harms kneecap inevitable competitors.

That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. These alleged dangers were the impetus for OpenAI becoming closed back in 2019 with the release of GPT-2:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code⁠(opens in a new window). We are not releasing the dataset, training code, or GPT-2 model weights…We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.

We also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems. If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly.

The arrogance in this statement is only surpassed by the futility: here we are six years later, and the entire world has access to the weights of a dramatically superior model. OpenAI’s gambit for control — enforced by the U.S. government — has utterly failed. In the meantime, how much innovation has been foregone by virtue of leading edge models not having open weights? More generally, how much time and energy has been spent lobbying for a government-enforced moat that DeepSeek just obliterated, that would have been better devoted to actual innovation?

So you’re not worried about AI doom scenarios?

I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us.

Wait, why is China open-sourcing their model?

Well DeepSeek is, to be clear; CEO Liang Wenfeng said in a must-read interview that open source is key to attracting talent:

In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.

The interviewer asked if this would change:

DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.

We will not change to closed source. We believe having a strong technical ecosystem first is more important.

This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what DeepSeek has delivered, which itself is resonant of how China has come to dominate other industries. This is also contrary to how most U.S. companies think about differentiation, which is through having differentiated products that can sustain larger margins.

So is OpenAI screwed?

Not necessarily. ChatGPT made OpenAI the accidental consumer tech company, which is to say a product company; there is a route to building a sustainable consumer business on commoditizable models through some combination of subscriptions and advertisements. And, of course, there is the bet on winning the race to AI take-off.

Anthropic, on the other hand, is probably the biggest loser of the weekend. DeepSeek made it to number one in the App Store, simply highlighting how Claude, in contrast, hasn’t gotten any traction outside of San Francisco. The API business is doing better, but API businesses in general are the most susceptible to the commoditization trends that seem inevitable (and do note that OpenAI and Anthropic’s inference costs look a lot higher than DeepSeek because they were capturing a lot of margin; that’s going away).

So this is all pretty depressing, then?

Actually, no. I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevons Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.

Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game; The End of the Beginning was right.

China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.

That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.’s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that has nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.

I wrote a follow-up to this Article in this Daily Update.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Stratechery Plus + Asianometry

Monday, January 20, 2025

Listen to Podcast
Listen to this post:

Log in to listen

Back in 2022, I rebranded a Stratechery subscription as Stratechery Plus, a bundle of content that would enhance the value of your subscription; today the bundle includes:
- The Stratechery Update
- The Stratechery Interview series
- The Sharp Tech with Ben Thompson podcast, hosted by Andrew Sharp
- The Sharp China with Bill Bishop podcast, hosted by Andrew Sharp
- The Dithering podcast, with John Gruber and myself
- The Greatest of All Talk podcast, with Ben Golliver and Andrew Sharp
Today I am excited to announce a new addition: the Asianometry newsletter and podcast, by Jon Yu.

Asianometry is one of the best tech YouTube channels in existence, with over 768,000 subscribers. Jon produces in-depth videos explaining every aspect of technology, with a particular expertise in semiconductors. To give you an idea of Jon’s depth, he has made 31 videos about TSMC alone. His semiconductor course includes 30 videos covering everything from designing chips to how ASML builds EUV machines to Moore’s Law. His video on the end of Dennard’s Law is a particular standout:

Jon is about more than semiconductors though: he’s made videos about other tech topics like The Tragedy of Compaq, and non-tech topics like Japanese Whisky and Taiwan convenience stores. In short, Jon is an intensely curious person who does his research, and we are blessed that he puts in the work to share what he learns.

I am blessed most of all, however. Over the last year Jon has been making Stratechery Articles into video essays and cutting clips for Sharp Tech; he did a great job with one of my favorite articles of 2024:

And now, starting today, Stratechery Plus subscribers can get exclusive access to Asianometry’s content in newsletter and podcast form. The Asianometry YouTube Channel will remain free and Jon’s primary focus, but from now on all of his content will be simultaneously released as a transcript and podcast. Stratechery Plus subscribers can head over to the new Asianometry Passport site to subscribe to his emails, or to add the podcast feed to your favorite podcast player.

And, of course, subscribe to Jon’s YouTube channel, along with Stratechery Plus.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
AI’s Uneven Arrival

Monday, January 13, 2025

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

Box’s route to its IPO, ten years ago this month, was a difficult one: the company first released an S-1 in March 2014, and potential investors were aghast at the company’s mounting losses; the company took a down round and, eight months later, released an updated S-1 that created the template for money-losing SaaS businesses to explain themselves going forward:

Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

We experience a range of profitability with our customers depending in large part upon what stage of the customer phase they are in. We generally incur higher sales and marketing expenses for new customers and existing customers who are still in an expanding stage…For typical customers who are renewing their Box subscriptions, our associated sales and marketing expenses are significantly less than the revenue we recognize from those customers.

This was the justification for those top-line losses; I wrote in an Update at the time:

That right there is the SaaS business model: you’re not so much selling a product as you are creating annuities with a lifetime value that far exceeds whatever you paid to acquire them. Moreover, if the model is working — and in retrospect, we know it has for that 2010 cohort — then I as an investor absolutely would want Box to spend even more on customer acquisition, which, of course, Box has done. The 2011 cohort is bigger than 2010, the 2012 cohort bigger than 2011, etc. This, though, has meant that the aggregate losses have been very large, which looks bad, but, counterintuitively, is a good thing.

Numerous SaaS businesses would include some version of this cohort chart in their S-1’s, each of them manifestations of what I’ve long considered tech’s sixth giant: Apple, Amazon, Google, Meta, Microsoft, and what I call “Silicon Valley Inc.”, the pipeline of SaaS companies that styled themselves as world-changing startups but which were, in fact, color-by-numbers business model disruptions enabled by cloud computing and a dramatically expanded venture capital ecosystem that increasingly accepted relatively low returns in exchange for massively reduced risk profiles.

This is not, to be clear, an Article about Box, or any one SaaS company in particular; it is, though, an exploration of how an era that opened — at least in terms of IPOs — a decade ago is both doomed in the long run and yet might have more staying power than you expect.

Digital Advertising Differences

John Wanamaker, a department store founder and advertising pioneer, famously said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” That, though, was the late 19th century; the last two decades have seen the rise of digital advertising, the defining characteristic of which is knowledge about whom is being targeted, and whether or not they converted. The specifics of how this works have shifted over time, particularly with the crackdown on cookies and Apple’s App Tracking Transparency initiative, which made digital advertising less deterministic and more probabilistic; the probabilities at play, though, are a lot closer to 100% than they are to a flip-of-a-coin.

What is interesting is that this advertising approach hasn’t always worked for everything, most notably some of the most advertising-centric businesses in the world. Back in 2016 Procter & Gamble announced they were scaling back targeted Facebook ads; from the Wall Street Journal:

Procter & Gamble Co., the biggest advertising spender in the world, will move away from ads on Facebook that target specific consumers, concluding that the practice has limited effectiveness. Facebook Inc. has spent years developing its ability to zero in on consumers based on demographics, shopping habits and life milestones. P&G, the maker of myriad household goods including Tide and Pampers, initially jumped at the opportunity to market directly to subsets of shoppers, from teenage shavers to first-time homeowners.

Marc Pritchard, P&G’s chief marketing officer, said the company has realized it took the strategy too far. “We targeted too much, and we went too narrow,” he said in an interview, “and now we’re looking at: What is the best way to get the most reach but also the right precision?”…On a broader scale, P&G’s shift highlights the limits of such targeting for big brands, one of the cornerstones of Facebook’s ad business. The social network is able to command higher prices for its targeted marketing; the narrower the targeting the more expensive the ad.

P&G is a consumer packaged goods (CPG) company, and what mattered most for CPG companies was shelf space. Consumers would become aware of a brand through advertising, motivated to buy through things like coupons, and the payoff came when they were in the store and chose one of the CPG brands off the shelf; of course CPG companies paid for that shelf space, particularly coveted end-caps that made it more likely consumers saw the brands they were familiar with through advertising. There were returns to scale, as well: manufacturing is a big one; the more advertising you bought the less paid per ad; more importantly, the more shelf space you had the more room you had to expand your product lines, and crowd out competitors.

The advertising component specifically was usually outsourced to ad agencies, for reasons I explained in a 2017 Article:

Few advertisers actually buy ads, at least not directly. Way back in 1841, Volney B. Palmer, the first ad agency, was opened in Philadelphia. In place of having to take out ads with multiple newspapers, an advertiser could deal directly with the ad agency, vastly simplifying the process of taking out ads. The ad agency, meanwhile, could leverage its relationships with all of those newspapers by serving multiple clients:

It’s a classic example of how being in the middle can be a really great business opportunity, and the utility of ad agencies only increased as more advertising formats like radio and TV became available. Particularly in the case of TV, advertisers not only needed to place ads, but also needed a lot more help in making ads; ad agencies invested in ad-making expertise because they could scale said expertise across multiple clients.

At the same time, the advertisers were rapidly expanding their geographic footprints, particularly after the Second World War; naturally, ad agencies increased their footprint at the same time, often through M&A. The overarching business opportunity, though, was the same: give advertisers a one-stop shop for all of their advertising needs.

The Internet provided two big challenges to this approach. First, the primary conversion point changed from the cash register to the check-out page; the products that benefited the most were either purely digital (like apps) or — at least in the earlier days of e-commerce — spur-of-the-moment purchases without major time pressure. CPG products didn’t really fall in either bucket.

Second, these types of purchases aligned well with the organizing principle of digital advertising, which is the individual consumer. What Facebook — now Meta — is better at than anyone in the world is understanding consumers not as members of a cohort or demographic group but rather as individuals, and serving them ads that are uniquely interesting to them.

Notice, though, that nothing in the traditional advertiser model was concerned with the individual: brands are created for cohorts or demographic groups, because they need to be manufactured at scale; then, ad agencies would advertise at scale — making money along the way — and the purchase would be consummated in physical stores at some later point in time, constrained (and propelled by) limited shelf space. Thus P&G’s pullback — and thus the opportunity for an entirely new wave of companies that were built around digital advertising and its deep personalization from the get-go.

This bifurcation manifested itself most starkly in the summer of 2020, when large advertisers boycotted Facebook over the company’s refusal to censor then-President Trump; Facebook was barely affected. I wrote in Apple and Facebook:

This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers…

This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.

In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.

It has been nine years since that P&G pullback I referenced above, and one of the big changes that P&G has made in that timeframe is to take most of their ad-buying in-house. This was in the long run inevitable, as the Internet ate everything, including traditional TV viewing, and as the rise of Aggregation platforms meant that the number of places you needed to actually buy an ad to reach everyone decreased even as potential reach increased. Those platforms also got better: programmatic platforms achieve P&G’s goal of mass reach in a way that actually increased efficiency instead of over-spending to over-target; programmatic advertising also covers more platforms now, including TV.

o3 Ammunition

Late last month OpenAI announced its o3 model, validating its initial o1 release and the returns that come from test-time scaling; I explained in an Update when o1 was released:

There has been a lot of talk about the importance of scale in terms of LLM performance; for auto-regressive LLMs that has meant training scale. The more parameters you have, the larger the infrastructure you need, but the payoff is greater accuracy because the model is incorporating that much more information. That certainly still applies to o1, as the chart on the left indicates.

It’s the chart on the right that is the bigger deal: o1 gets more accurate the more time it spends on compute at inference time. This makes sense intuitively given what I laid out above: the more time spent on compute the more time o1 can spend spinning up multiple chains-of-thought, checking its answers, and iterating through different approaches and solutions.

It’s also a big departure from how we have thought about LLMs to date: one of the “benefits” of auto-regressive LLMs is that you’re only generating one answer in a serial manner. Yes, you can get that answer faster with beefier hardware, but that is another way of saying that the pay-off from more inference compute is getting the answer faster; the accuracy of the answer is a function of the underlying model, not the amount of compute brought to bear. Another way to think about it is that the more important question for inference is how much memory is available; the more memory there is, the larger the model, and therefore, the greater amount of accuracy.

In this o1 represents a new inference paradigm: yes, you need memory to load the model, but given the same model, answer quality does improve with more compute. The way that I am thinking about it is that more compute is kind of like having more branch predictors, which mean more registers, which require more cache, etc.; this isn’t a perfect analogy, but it is interesting to think about inference compute as being a sort of dynamic memory architecture for LLMs that lets them explore latent space for the best answer.

o3 significantly outperforms o1, and the extent of that outperformance is dictated by how much computing is allocated to the problem at hand. One of the most stark examples was o3‘s performance on the ARC prize, a visual puzzle test that is designed to be easy for humans but hard for LLMs:

OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.

This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3…

Despite the significant cost per task, these numbers aren’t just the result of applying brute force compute to the benchmark. OpenAI’s new o3 model represents a significant leap forward in AI’s ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.

Of course, such generality comes at a steep cost, and wouldn’t quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.

I don’t believe that o3 and inference-time scaling will displace traditional LLMs, which will remain both faster and cheaper; indeed, they will likely make traditional LLMs better through their ability to generate synthetic data for further scaling of pre-training. There remains a large product overhang for traditional LLMS — the technology is far more capable than the products that have been developed to date — but even the current dominant product, chatbots, are better experienced with a traditional LLM.

That very use case, however, gets at traditional LLM limitations: because they lack the ability to think and decide and verify they are best thought of as a tool for humans to leverage. Indeed, while conventional wisdom about these models is that it allows anyone to generate good enough writing and research, the biggest returns come to those with the most expertise and agency, who are able to use their own knowledge and judgment to reap efficiency gains while managing hallucinations and mistakes.

What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford:

So I like this idea of barrels and ammunition. Most companies, once they get into hiring mode…just hire a lot of people, you expect that when you add more people your horsepower or your velocity of shipping things is going to increase. Turns out it doesn’t work that way. When you hire more engineers you don’t get that much more done. You actually sometimes get less done. You hire more designers, you definitely don’t get more done, you get less done in a day.

The reason why is because most great people actually are ammunition. But what you need in your company are barrels. And you can only shoot through the number of unique barrels that you have. That’s how the velocity of your company improves is adding barrels. Then you stock them with ammunition, then you can do a lot. You go from one barrel company, which is mostly how you start, to a two barrel company, suddenly you get twice as many things done in a day, per week, per quarter. If you go to three barrels, great. If you go to four barrels, awesome. Barrels are very difficult to find. But when you have them, give them lots of equity. Promote them, take them to dinner every week, because they are virtually irreplaceable. They are also very culturally specific. So a barrel at one company may not be a barrel at another company because one of the ways, the definition of a barrel is, they can take an idea from conception and take it all the way to shipping and bring people with them. And that’s a very cultural skill set.

The promise of AI generally, and inference-time scaling models in particular, is that they can be ammunition; in this context, the costs — even marginal ones — will in the long run be immaterial compared to the costs of people, particularly once you factor in non-salary costs like coordination and motivation.

The Uneven AI Arrival

There is a long way to go to realize this vision technically, although the arrival of first o1 and then o3 signal that the future is arriving more quickly than most people realize. OpenAI CEO Sam Altman wrote on his blog:

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

I grant the technical optimism; my definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place). The reason for the extended digression on advertising, however, is to explain why I’m skeptical about AI “materially chang[ing] the output of companies”, at least in 2025.

In this analogy CPG companies stand in for the corporate world generally. What will become clear once AI ammunition becomes available is just how unsuited most companies are for high precision agents, just as P&G was unsuited for highly-targeted advertising. No matter how well-documented a company’s processes might be, it will become clear that there are massive gaps that were filled through experience and tacit knowledge by the human ammunition.

SaaS companies, meanwhile, are the ad agencies. The ad agencies had value by providing a means for advertisers to scale to all sorts of media across geographies; SaaS companies have value by giving human ammunition software to do their job. Ad agencies, meanwhile, made money by charging a commission on the advertising they bought; SaaS companies make money by charging a per-seat licensing fee. Look again at that S-1 excerpt I opened with:

Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

The positive return on investment comes from retaining and increasing seat licenses; those seats, however, are proxies for actually getting work done, just as advertising was just a proxy for actually selling something. Part of what made direct response digital advertising fundamentally different is that it was tied to actually making a sale, as opposed to lifting brand awareness, which is a proxy for the ultimate goal of increasing revenue. To that end, AI — particularly AI’s like o3 that scale with compute — will be priced according to the value of the task they complete; the amount that companies will pay for inference time compute will be a function of how much the task is worth. This is analogous to digital ads that are priced by conversion, not CPM.

The companies that actually leveraged that capability, however, were not, at least for a good long while, the companies that dominated the old advertising paradigm. Facebook became a juggernaut by creating its own customer base, not by being the advertising platform of choice for companies like P&G; meanwhile, TV and the economy built on it stayed relevant far longer than anyone expected. And, by the time TV truly collapsed, both the old guard and digital advertising had evolved to the point that they could work together.

If something similar plays out with AI agents, then the most important AI customers will primarily be new companies, and probably a lot of them will be long tail type entities that take the barrel and ammunition analogy to its logical extreme. Traditional companies, meanwhile, will struggle to incorporate AI (outside of whole-scale job replacement a la the mainframe); the true AI takeover of enterprises that retain real world differentiation will likely take years.

None of this is to diminish what is coming with AI; rather, as the saying goes, the future may arrive but be unevenly distributed, and, contrary to what you might think, the larger and more successful a company is the less they may benefit in the short term. Everything that makes a company work today is about harnessing people — and the entire SaaS ecosystem is predicated on monetizing this reality; the entities that will truly leverage AI, however, will not be the ones that replace them, but start without them.

Get notified about new Articles

Sign Up

Please verify your email address to proceed.
The 2024 Stratechery Year in Review

Thursday, December 19, 2024
Stratechery, incredibly enough, has been my full-time job for over a decade; this is the 12th year-in-review. Here are the previous editions:

2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

It has long been a useful cliché to say that covering tech is easy, because something is always happening; now that that something is AI, that is more true than ever. Nearly every Article on Stratechery this year was about AI in some way or another, and that is likely to be true for years to come.

This year Stratechery published 29 free Articles, 109 subscriber Updates, and 40 Interviews. Today, as per tradition, I summarize the most popular and most important posts of the year.

The Five Most-Viewed Articles

The five most-viewed articles on Stratechery according to page views:
1. Intel Honesty — The best way to both save Intel and have leading edge manufacturing in the U.S. is to split the company, and for the U.S. government to pick up the bill via purchase guarantees.
2. Gemini and Google’s Culture — The Google Gemini fiasco shows that the biggest challenge for Google in AI is not business model but rather company culture; change is needed from the top down.
3. Intel’s Humbling — Intel under Pat Gelsinger is reaping the disaster that came from a lack of investment and execution a decade ago; the company, though, appears to be headed in the right direction, as evidenced by its execution and recent deal with UMC.
4. The Apple Vision Pro — The Apple Vision Pro is a disappointment for productivity, in part because of choices made to deliver a remarkable entertainment experience. Plus, the future of AR/VR for Apple and Meta.
5. MKBHDs For Everything — Marques Brownlee has tremendous power because he can go direct to consumers; that is possible in media, and AI will make it possible everywhere.
AI and the Future

Looking ahead to how AI will change everything.
- Enterprise Philosophy and The First Wave of AI — The first wave of successful AI implementations will probably look more like the first wave of computing, which was dominated by large-scale enterprise installations that eliminated jobs. Consumer will come later. YouTube
- The Gen AI Bridge to the Future — Generative AI is the bridge to the next computing paradigm of wearables, just like the Internet bridged the gap from PCs to smartphones.
- The New York Times’ AI Opportunity — The New York Times is suing OpenAI, but it is the New York Times that stands to benefit the most from large language models, thanks to its transformation to being an Internet entity. YouTube
- AI Integration and Modularization — Breaking down the Big Tech AI landscape through the lens of integration and modularization. YouTube
- Aggregator’s AI Risk — A single AI can never make everyone happy, which is fundamentally threatening to the Aggregator business model; the solution is personalized AI. YouTube
Government and Regulation

An emerging theme this year — which I expect to continue alongside AI — is the rising importance of non-economic factors in terms of technological development, even as regulators ramp up pressure on the giants of the Aggregator era.
- A Chance to Build — Silicon Valley has always been deeply integrated with Asia; Trump’s attempt to change trade could hurt Silicon Valley more than expected, and also present opportunities to build something new. YouTube
- Intel’s Death and Potential Revival — Intel died when mobile cost it its software differentiation; if the U.S. wants a domestic foundry, then it ought to leverage the need for AI chips to make an independent Intel foundry viable. YouTube
- The E.U. Goes Too Far — Recent E.U. regulatory decisions cross the line from market correction to property theft; if the E.U. continues down this path they are likely to see fewer new features and no new companies. YouTube
- Friendly Google and Enemy Remedies — The DOJ brought the right kind of case against an Aggregator, which stagnates by being too nice; the goal is for companies to act like they actually have enemies. YouTube
- United States v. Apple — Apple is being sued by the DOJ, but most of the complaints aren’t about the App Store. I think, though, Apple’s approach to the App Store is what led to this case.
Big Tech

The biggest tech companies, as usual, provided the most consistent lens on how the world is changing.
- Meta’s AI Abundance — Meta is well-positioned to the biggest beneficiary of AI and the largest company in the world. YouTube
- Gemini 1.5 and Google’s Nature — Google Cloud Next 2024 was Google’s most impressive assertion yet that it has the AI scale advantage and is determined to use it. YouTube (See also: Integration and Android)
- Elon Dreams and Bitter Lessons — SpaceX’s triumph is downstream of a dream and getting the cost structure necessary to make it happen; Elon Musk is trying the same approach for Tesla self-driving cars. YouTube
- Apple Intelligence is Right On Time — Apple is expected to announce a range of AI features at WWDC; the company is well placed to benefit from AI: they are not too late, but right on time. YouTube (See also: WWDC, Apple Intelligence, Apple Aggregates AI)
- Nvidia Waves and Moats — Nvidia’s GTC was an absolute spectacle; it was also a different kind of keynote than before ChatGPT, which is related to Nvidia’s need to dig a new kind of software moat. YouTube
Other Articles this year included: The Apple Vision Pro’s Missing Apps | Sora, Groq, and Virtual Reality | Meta and Open | Meta and Reasonable Doubt | The Great Flattening | Windows Returns | Crashes and Competition | Boomer Apple

Stratechery Interviews

Thursdays are for Stratechery Interview — in podcast and transcript form — with public company executives, founders and private company executives, and other analysts.

Public Company Executive Interviews:

Arm CEO Rene Haas | Netflix co-CEO Greg Peters | Zoom CEO Eric Yuan | dLocal Founder Sebastian Kanovich and CEO Pedro Arnt | Google Cloud CEO Thomas Kurian | Walmart CEO Doug McMillon | Microsoft CEO Satya Nadella and CTO Kevin Scott | AMD CEO Lisa Su | Google SVP Rick Osterloh | Zillow CEO Jeremy Wacksman | Meta CTO Andrew Bosworth | Salesforce CEO Marc Benioff | Synopsys CEO Sassine Ghazi

Startup/Private Company Executive Interviews:

Rescale CEO Joris Poort | Databricks CEO Ali Ghodsi | Terraform Industries CEO Casey Handmer | Scale AI CEO Alex Wang | Canva CEO Melanie Perkins

Analysts:

Om Malik on tech history | Joanna Stern on the Apple Vision Pro | Eric Seufert on digital advertising in February and October | Matthew Ball on VR and gaming | Daniel Gross and Nat Friedman on AI in February and June | Hugo Barra on AR and VR in March and October | Benedict Evans on regulation and AI | Michael Morton on e-commerce | Matthew Belloni on Hollywood and streaming | Marques Brownlee (MKBHD) on YouTube | Ben Bajarin on Apple and Intel | Craig Moffett on Apple and telecoms | Gregory Allen in October on the U.S. defense industry, and December on the China chip ban | Timothy B. Lee on AI and self-driving cars | Dylan Patel and Doug O’Laughlin on the semiconductor industry | Byrne Hobart on innovation | Tae Kim on The Nvidia Way

The Year in Stratechery Updates

Some of my favorite Stratechery Updates:
- January 29: Apple and the DMA, Apple and “Or”, A Reluctant Apple Apologist (See also: European Commission Charges Apple, Apple Delays New Features for E.U.)
- February 19: Xbox’s Announcement; Microsoft’s Messy Middle; Apple in Europe, Continued
- March 12: Walmart Earnings, Walmart Connect and Closing the Loop, Walmart Acquires Vizio
- April 1: MLS on Vision Pro, The Vision Pro’s Missing Content, The Vision Pro’s DRI
- May 7: TSMC Earnings, TSMC’s Pricing Mistake, Intel v. TSMC
- May 20: Netflix and the NFL, Netflix Internalizes Ads, Comcast’s Bundle (See also: Netflix’s Boxing Event, Customer Acquisition vs. Churn Mitigation, Accounting for Events)
- June 18: FTC Sues Adobe, The Legal Question, The Value of Doing Right
- June 24: Perplexity and Robots.txt, Perplexity’s Defense, Google and Competition
- July 17: Tech For Trump, Breaking the Deal, From Inertness to Interest
- August 26: Telegram CEO Arrested, Telegram’s Non-Encrypted Advantage, Telegram Complexities
- September 16: OpenAI’s New Model, How o1 Works, Scaling Inference
- September 30: More on Orion, Where Vision Pro Went Wrong, Apple’s Response and Meta’s Motivation
- October 1: Taking Waymo, Uber and Waymo (See also: GM Kills Cruise, Fleets Versus Autonomy, Robotaxi Outlook)
- October 7: U.S. Communications Hacked, The History of CALEA, Encryption and Backdoors
- October 22: Stripe Acquires Bridge, Stablecoins, Platform of Platforms
- October 28: Trump on Rogan, The Voters Decide, The Podcast Election
- November 6: President Trump, Take Two; Big Tech, Little Tech, Chips, and Hardware; Elon Musk’s Triumph
- November 13: Shopify Earnings, Software Self-Awareness, Rebels and the Arms Dealer
- December 4: AWS re:Invent, Nova and Model Choice, AI as Commodity
- December 17: Google Announces Veo 2, The Empire Strikes Back, Free ChatGPT Search
I am so grateful to the subscribers that make it possible for me to do this as a job. I wish all of you a Merry Christmas and Happy New Year, and I’m looking forward to a great 2025!
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Intel’s Death and Potential Revival

Monday, December 9, 2024

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

In 1980 IBM, under pressure from its customers to provide computers for personal use, not just mainframes, set out to create the IBM PC; given the project’s low internal priority but high external demand they decided to outsource two critical components: Microsoft would provide the DOS operating system, which would run on the Intel 8088 processor.

Those two deals would shape the computing industry for the following 27 years. Given that the point of the personal computer was to run applications, the operating system that provided the APIs for those applications would have unassailable lock-in, leading to Microsoft’s dominance with first DOS and then Windows, which was backwards compatible.

The 8088 processor, meanwhile, was a low-cost variant of the 8086 processor; up to that point most new processors came with their own instruction set, but the 8088 and 8086 used the same instruction set, which became the foundation of Intel processors going forward. That meant that the 286 and 386 processors that followed were backwards compatible with the 8088 in the IBM PC; in other words, Intel, too, had lock-in, and not just with MS-DOS: while the majority of applications leveraged operating system-provided APIs, it was much more common at that point in computing history to leverage lower level APIs, including calling on the processor instruction set directly. This was particularly pertinent for things like drivers, which powered all of the various peripherals a PC required.

Intel’s CISC Moat

The 8086 processor that undergirded the x86 instruction set was introduced in 1978, when memory was limited, expensive, and slow; that’s why the x86 used Complex Instruction Set Computing (CISC), which combined multiple steps into a single instruction. The price of this complexity was the necessity of microcode, dedicated logic that translated CISC instructions into its component steps so they could actually be executed.

The same year that IBM cut those deals, however, was the year that David Patterson and a team at Berkeley started work on what became known as the RISC-1 processor, which took an entirely different approach: Reduced Instruction Set Computing (RISC) replaced the microcode-focused transistors with registers, i.e. memory that operated at the same speed as the processor itself, and filled them with simple instructions that corresponded directly to transistor functionality. This would, in theory, allow for faster computing with the same number of transistors, but memory access was still expensive and more likely to be invoked given the greater number of instructions necessary to do anything, and programs and compilers needed to be completely reworked to take advantage of the new approach.

Intel, more than anyone, realized that this would be manageable in the long run. “Moore’s Law”, the observation that the number of transistors in an integrated circuit doubles every two years, was coined by Gordon Moore, their co-founder and second CEO; the implication for instruction sets was that increased software complexity and slow hardware would be solved through ever faster chips, and those chips could get even faster if they were simplified RISC designs. That is why most of the company wanted, in the mid-1980s, to abandon x86 and its CISC instruction set for RISC.

There was one man, however, who interpreted the Moore’s Law implications differently, and that was Pat Gelsinger; he led the development of the 486 processor and was adamant that Intel stick with CISC, as he explained in an oral history at the Computer Museum:

Gelsinger: We had a mutual friend that found out that we had Mr. CISC working as a student of Mr. RISC, the commercial versus the university, the old versus the new, teacher versus student. We had public debates of John and Pat. And Bear Stearns had a big investor conference, a couple thousand people in the audience, and there was a public debate of RISC versus CISC at the time, of John versus Pat.

And I start laying out the dogma of instruction set compatibility, architectural coherence, how software always becomes the determinant of any computer architecture being developed. “Software follows instruction set. Instruction set follows Moore’s Law. And unless you’re 10X better and John, you’re not 10X better, you’re lucky if you’re 2X better, Moore’s Law will just swamp you over time because architectural compatibility becomes so dominant in the adoption of any new computer platform.” And this is when x86– there was no server x86. There’s no clouds at this point in time. And John and I got into this big public debate and it was so popular.

Brock: So the claim wasn’t that the CISC could beat the RISC or keep up to what exactly but the other overwhelming factors would make it the winner in the end.

Gelsinger: Exactly. The argument was based on three fundamental tenets. One is that the gap was dramatically overstated and it wasn’t an asymptotic gap. There was a complexity gap associated with it but you’re going to make it leap up and that the CISC architecture could continue to benefit from Moore’s Law. And that Moore’s Law would continue to carry that forward based on simple ones, number of transistors to attack the CISC problems, frequency of transistors. You’ve got performance for free. And if that gap was in a reasonable frame, you know, if it’s less than 2x, hey, in a Moore’s Law’s term that’s less than a process generation. And the process generation is two years long. So how long does it take you to develop new software, porting operating systems, creating optimized compilers? If it’s less than five years you’re doing extraordinary in building new software systems. So if that gap is less than five years I’m going to crush you John because you cannot possibly establish a new architectural framework for which I’m not going to beat you just based on Moore’s Law, and the natural aggregation of the computer architecture benefits that I can bring in a compatible machine. And, of course, I was right and he was wrong.

Intel would, over time, create more RISC-like processors, switching out microcode for micro-ops processing units that dynamically generated RISC-like instructions from CISC-based software that maintained backwards compatibility; Gelsinger was right that no one wanted to take the time to rewrite all of the software that assumed an x86 instruction set when Intel processors were getting faster all of the time, and far out-pacing RISC alternatives thanks to Intel’s manufacturing prowess.

That, though, turned out to be Intel’s soft underbelly; while the late Intel CEO Paul Ottelini claimed that he turned down the iPhone processor contract because of price, Tony Fadell, who led the creation of the iPod and iPhone hardware, told me in a Stratechery Interview that the real issue was Intel’s obsession with performance and neglect of efficiency.

The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life and why do we do what we do? David Tupman was on the team, the iPod team with me, he would always say every nanocoulomb was sacred, and we would go after that and say, “Okay, where’s that next coulomb? Where are we going to go after it?” And so when you take that microscopic view of what you’re building, you look at the world very differently.

For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down.

I was always going, “Look, do you see how the iPhone was created? It started with the guts of the iPod, and we grew up from very little computing, and very little space, and we grew into an iPhone, and added more layers to it.” But we weren’t taking something big and shrinking it down. We were starting from the bottom up and yeah, we were taking Mac OS and shrinking it down, but we were significantly shrinking it down. Most people don’t want to take those real hard cuts to everything because they’re too worried about compatibility. Whereas if you’re just taking pieces and not worrying about compatibility, it’s a very different way of thinking about how building and designing products happens.

This is why I was so specific with that “27 year” reference above; Apple’s 2007 launch of the iPhone marked the end of both Microsoft and Intel’s dominance, and for the same reason. The shift to efficiency as the top priority meant that you needed to rewrite everything; that, by extension, meant that Microsoft’s API and Intel’s x86 instruction set were no longer moats but millstones. On the operating side Apple stripped macOS to the bones and rebuilt it for efficiency; that became iOS, and the new foundation for apps; on the processor side Apple used processors based on the ARM instruction set, which was RISC from the beginning. Yes, that meant a lot of things had to be rewritten, but here the rewriting wasn’t happening by choice, but by necessity.

This leads, as I remarked to Fadell in that interview, to a rather sympathetic interpretation of Microsoft and Intel’s failure to capture the mobile market; neither company had a chance. They were too invested in the dominant paradigm at the time, and thus unable to start from scratch; by the time they realized their mistake, Apple, Android, and ARM had already won.

Intel’s Missed Opportunity

It was their respective response to missing mobile that saved Microsoft, and doomed Intel. For the first seven years of the iPhone both companies refused to accept their failure, and tried desperately to leverage what they viewed as their unassailable advantages: Microsoft declined to put its productivity applications on iOS or Android, trying to get customers to adopt Windows Mobile, while Intel tried to bring its manufacturing prowess to bear to build processors that were sufficiently efficient while still being x86 compatible.

It was in 2014 that their paths diverged: Microsoft named Satya Nadella its new CEO, and his first public decision was to launch Office on iPad. This was a declaration of purpose: Microsoft would no longer be defined by Windows, and would instead focus on Azure and the cloud; no, that didn’t have the software lock-in of Windows — particularly since a key Azure decision was shifting from Windows servers to Linux — but it was a business that met Microsoft’s customers where they were, and gave the company a route to participating in the massive business opportunities enabled by mobile (given that most apps are in fact cloud services), and eventually, AI.

The equivalent choice for Intel would have been to start manufacturing ARM chips for 3rd parties, i.e. becoming a foundry instead of an integrated device manufacturer (IDM); I wrote that they should do exactly that in 2013:

It is manufacturing capability, on the other hand, that is increasingly rare, and thus, increasingly valuable. In fact, today there are only four major foundries: Samsung, GlobalFoundries, Taiwan Semiconductor Manufacturing Company, and Intel. Only four companies have the capacity to build the chips that are in every mobile device today, and in everything tomorrow.

Massive demand, limited suppliers, huge barriers to entry. It’s a good time to be a manufacturing company. It is, potentially, a good time to be Intel. After all, of those four companies, the most advanced, by a significant margin, is Intel. The only problem is that Intel sees themselves as a design company, come hell or high water.

Making chips for other companies would have required an overhaul of Intel’s culture and processes for the sake of what was then a significantly lower margin opportunity; Intel wasn’t interested, and proceeded to make a ton of money building server chips for the cloud.

In fact, though, the company was already fatally wounded. Mobile meant volume, and as the cost of new processes skyrocketed, the need for volume to leverage those costs skyrocketed as well. It was TSMC that met the moment, with Apple’s assistance: the iPhone maker would buy out the first year of every new process advancement, giving TSMC the confidence to invest, and eventually surpass Intel. That, in turn, benefited AMD, Intel’s long-time rival which now fabbed its chips at TSMC, which not only had better processor designs but, for the first time, had a better process, leading to huge gains in the data center. All of that low-level work on ARM, meanwhile, helped make ARM in PCS and in the datacenter viable, putting further pressure on Intel’s core markets.

AI was the final blow: not only did Intel not have a competitive product, it also did not have a foundry through which it could have benefitted from the exploding demand for AI chips; making matters worse is the fact that data center spending on GPUs is coming at the expense of traditional server chips, Intel’s core market.

Intel’s Death

The fundamental flaw with Pat Gelsinger’s 2020 return to Intel and his IDM 2.0 plan is that it was a decade too late. Gelsinger’s plan was to become a foundry, with Intel as its first-best customer. The former was the way to participate in mobile and AI and gain the volume necessary to push technology forward, which Intel has always done better than anyone else (EUV was the exception to the rule that Intel invents and introduces every new advance in processor technology); the latter was the way to fund the foundry and give it guaranteed volume.

Again, this is exactly what Intel should have done a decade ago, while TSMC was still in their rear-view mirror in terms of processing technology, and when its products were still dominant in PCs and the data center. By the time Gelsinger came on board, though, it was already too late: Intel’s process was behind, its product market share was threatened on all of the fronts noted above, and high-performance ARM processors had been built by TSMC for years (which meant a big advantage in terms of pre-existing IP, design software, etc.). Intel brought nothing to the table as a foundry other than being a potential second source to TSMC, which, to make matters worse, has dramatically increased its investment in leading edge nodes to absorb that skyrocketing demand. Intel’s products, meanwhile, are either non-competitive (because they are made by Intel) or not-very-profitable (because they are made by TSMC), which means that Intel is simply running out of cash.

Given this, you can make the case that Gelsinger was never the right person for the job; shortly after he took over I wrote in Intel Problems that the company needed to be split up, but he told me in a 2022 Stratechery Interview that he — and the board — weren’t interested in that:

So last week, AMD briefly passed Intel in market value, and I think Nvidia did a while ago, and neither of these companies build their own chips. It’s kind of like an inverse of the Jerry Sanders quote about “Real men have fabs!” When you were contemplating your strategy for Intel as you came back, how much consideration was there about going the same path, becoming a fabless company and leaning into your design?

PG: Let me give maybe three different answers to that question, and these become more intellectual as we go along. The first one was I wrote a strategy document for the board of directors and I said if you want to split the company in two, then you should hire a PE kind of guy to go do that, not me. My strategy is what’s become IDM 2.0 and I described it. So if you’re hiring me, that’s the strategy and 100% of the board asked me to be the CEO and supported the strategy I laid out, of which this is one of the pieces. So the first thing was all of that discussion happened before I took the job as the CEO, so there was no debate, no contemplation, et cetera, this is it.

Fast forward to last week, and the Intel board — which is a long-running disaster — is no longer on board, firing Gelsinger in the process. And, to be honest, I noted a couple of months ago that Gelsinger’s plan probably wasn’t going to work without a split and a massive cash infusion from the U.S. government, far in excess of the CHIPS Act.

That, though, doesn’t let the board off the hook: not only are they abandoning a plan they supported, their ideas for moving Intel forward are fundamentally wrong. Chairman Frank Yeary, who has inexplicable been promoted despite being present for the entirety of the Intel disaster, said in Intel’s press release about Gelsinger’s departure:

While we have made significant progress in regaining manufacturing competitiveness and building the capabilities to be a world-class foundry, we know that we have much more work to do at the company and are committed to restoring investor confidence. As a board, we know first and foremost that we must put our product group at the center of all we do. Our customers demand this from us, and we will deliver for them. With MJ’s permanent elevation to CEO of Intel Products along with her interim co-CEO role of Intel, we are ensuring the product group will have the resources needed to deliver for our customers. Ultimately, returning to process leadership is central to product leadership, and we will remain focused on that mission while driving greater efficiency and improved profitability.

Intel’s products are irrelevant to the future; that’s the fundamental foundry problem. If x86 still mattered, then Intel would be making enough money to fund its foundry efforts. Moreover, prospective Intel customers are wary that Intel — as it always has — will favor itself at the expense of its customers; the board is saying that is exactly what they want to do.

In fact, it is Intel’s manufacturing that must be saved. This is a business that yes, needs billions upon billions of dollars in funding, but it not only has a market as a TSMC competitor, but also the potential to lead that market in the long run. Moreover, Intel foundry existing is critical to national security: currently the U.S. is completely dependent on TSMC and Taiwan and all of the geopolitical risk that entails. That means it will fall on the U.S. government to figure out a solution.

Saving Intel

Last month, in A Chance to Build, I explained how tech has modularized itself over the decades, with hardware — including semiconductor fabrication — largely being outsourced to Asia, while software is developed in the U.S. The economic forces undergirding this modularization, including the path dependency from the past sixty years, will be difficult to overcome, even with tariffs.

Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all. This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computers or phones; hardware is software’s most essential complement.

The inverse may be the key to American manufacturing: software as hardware’s grantor of viability through integration. This is what Tesla did: the company is deeply integrated from software down through components, and builds vehicles in California (of course it has an even greater advantage with its China factory).

This is also what made Intel profitable for so long: the company’s lock-in was predicated on software, which allowed for massive profit margins that funded all of that innovation and leading edge processes in America, even as every other part of the hardware value chain went abroad. And, by extension, the reason why a product focus is a dead end for the company is because nothing is preserving x86 other than the status quo.

It follows, then, that if the U.S. wants to make Intel viable, it ideally will not just give out money, but also a point of integration. To that end, consider this report from Reuters:

A U.S. congressional commission on Tuesday proposed a Manhattan Project-style initiative to fund the development of AI systems that will be as smart or smarter than humans, amid intensifying competition with China over advanced technologies. The bipartisan U.S.-China Economic and Security Review Commission stressed that public-private partnerships are key in advancing artificial general intelligence, but did not give any specific investment strategies as it released its annual report.

To quote the report’s recommendation directly:
The Commission recommends:
1. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would surpass the sharpest human minds at every task. Among the specific actions the Commission
  recommends for Congress:
- Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and
- Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.
The problem with this proposal is that spending the money via “public-private partnerships” will simply lock-in the current paradigm; I explained in A Chance to Build:
Software runs on hardware, and here Asia dominates. Consider AI:
- Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.
- Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.
- An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.
- Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.
- AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.
The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.
Given this, if the U.S. is serious about AGI, then the true Manhattan Project — doing something that will be very expensive and not necessarily economically rational — is filling in the middle of the sandwich. Saving Intel, in other words.

Start with the fact that we know that leading AI model companies are interested in dedicated chips; OpenAI is reportedly working on its own chip with Broadcom, after flirting with the idea of building its own fabs. The latter isn’t viable for a software company in a world where TSMC exists, but it is for the U.S. government if it’s serious about domestic capabilities continuing to exist. The same story applies to Google, Amazon, Microsoft, and Meta.

To that end, the U.S. government could fund an independent Intel foundry — spin out the product group along with the clueless board to Broadcom or Qualcomm or private equity — and provide price support for model builders to design and buy their chips there. Or, if the U.S. government wanted to build the whole sandwich, it could directly fund model builders — including one developed in-house — and dictate that they not just use but deeply integrate with Intel-fabricated integrated chips (it’s not out of the question that a fully integrated stack might actually be the optimal route to AGI).

It would, to be sure, be a challenge to keep such an effort out of the clutches of the federal bureaucracy and the dysfunction that has befallen the U.S. defense industry. It would be essential to give this effort the level of independence and freedom that the original Manhattan Project had, with compensation packages to match; perhaps this would be a better use of Elon Musk’s time — himself another model builder — than DOGE?

This could certainly be bearish for Nvidia, at least in the long run. Nvidia is a top priority for TSMC, and almost certainly has no interest in going anywhere else; that’s also why it would be self-defeating for a U.S. “Manhattan Project” to simply fund the status quo, which is Nvidia chips manufactured in Taiwan. Competition is ok, though; the point isn’t to kill TSMC, but to stand up a truly domestic alternative (i.e. not just a fraction of non-leading edge capacity in Arizona). Nvidia for its part deserves all of the success it is enjoying, but government-funded alternatives would ultimately manifest for consumers and businesses as lower prices for intelligence.

This is all pretty fuzzy, to be clear. What does exist, however, is a need — domestically sourced and controlled AI, which must include chips — and a company, in Intel, that is best placed to meet that need, even as it needs a rescue. Intel lost its reason to exist, even as the U.S. needs it to exist more than ever; AI is the potential integration point to solve both problems at the same time.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
The Gen AI Bridge to the Future

Monday, December 2, 2024

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

In the beginning was the mainframe.

In 1945 the U.S. government built ENIAC, an acronym for Electronic Numerical Integrator and Computer, to do ballistics trajectory calculations for the military; World War 2 was nearing its conclusion, however, so ENIAC’s first major job was to do calculations that undergirded the development of the hydrogen bomb. Six years later, J. Presper Eckert and John Mauchly, who led the development of ENIAC, launched UNIVAC, the Universal Automatic Computer, for broader government and commercial applications. Early use cases included calculating the U.S. census and assisting with calculation-intensive back office operations like payroll and bookkeeping.

These were hardly computers as we know them today, but rather calculation machines that took in reams of data (via punch cards or magnetic tape) and returned results according to hardwired calculation routines; the “operating system” were the humans actually inputting the data, scheduling jobs, and giving explicit hardware instructions. Originally this instruction also happened via punch cards and magnetic tape, but later models added consoles to both provide status and also allow for register-level control; these consoles evolved into terminals, but the first versions of these terminals, like the one that was available for the original version of the IBM System/360, were used to initiate batch programs.

Any recounting of computing history usually focuses on the bottom two levels of that stack — the device and the input method — because they tend to evolve in parallel. For example, here are the three major computing paradigms to date:

These aren’t perfect delineations; the first PCs had terminal-like interfaces, and pre-iPhone smartphones used windows-icons-menus-pointer (WIMP) interaction paradigms, with built-in keyboards and styluses. In the grand scheme of things, though, the distinction is pretty clear, and, by extension, it’s pretty easy to predict what is next:

Wearables is an admittedly broad category that includes everything from smart watches to earpieces to glasses, but I think it is a cogent one: the defining characteristic of all of these devices, particularly in contrast to the three previous paradigms, is the absence of a direct mechanical input mechanism; that leaves speech, gestures, and at the most primitive level, thought.

Fortunately there is good progress being made on on all of these fronts: the quality and speed of voice interaction has increased dramatically over the last few years; camera-intermediated gestures on the Oculus and Vision Pro work well, and Meta’s Orion wristband uses electromyography (EMG) to interpret gestures without any cameras at all. Neuralink is even more incredible: an implant in the brain captures thoughts directly and translates them into actions.

These paradigms, however, do not exist in isolation. First off, mainframes still exist, and I’m typing this Article on a PC, even if you may consume it on a phone or via a wearable like a set of AirPods. What stands out to me, however, is the top level of the initial stack I illustrated above: the application layer on one paradigm provides the bridge to the next one. This, more than anything, is why generative AI is a big deal in terms of realizing the future.

Bridges to the Future

I mentioned the seminal IBM System/360 above, which was actually a family of mainframes; the first version was the Model 30, which, as I noted, did batch processing: you would load up a job using punch cards or magnetic tape and execute the job, just like you did with the ENIAC or UNIVAC. Two years later, however, IBM came out with the Model 67 and the TSS/360 operating system: now you could actually interact with a program via the terminal. This represented a new paradigm at the application layer:

It is, admittedly, a bit confusing to refer to this new paradigm at the application layer as Applications, but it is the most accurate nomenclature; what differentiated an application from a program was that while the latter was a pre-determined set of actions that ran as a job, the former could be interacted with and amended while running.

That new application layer, meanwhile, opened up the possibility for an entirely new industry to create those applications, which could run across the entire System/360 family of mainframes. New applications, in turn, drove demand for more convenient access to the computer itself. This ultimately led to the development of the personal computer (PC), which was an individual application platform:

Initial PCs operated from a terminal-like text interface, but truly exploded in popularity with the roll-out of the WIMP interface, which was invented by Xerox PARC, commercialized by Apple, and disseminated by Microsoft. The key point in terms of this Article, however, is that Applications came first: the concept created the bridge from mainframes to PCs.

PCs underwent their own transformation over their two decades of dominance, first in terms of speed and then in form factor, with the rise of laptops. The key innovation at the application layer, however, was the Internet:

The Internet differed from traditional applications by virtue of being available on every PC, facilitating communication between PCs, and by being agnostic to the actual device it was accessed on. This, in turn, provided the bridge to the next device paradigm, the smartphone, with its touch interface:

I’ve long noted that Microsoft did not miss mobile; their error was in trying to extend the PC paradigm to mobile. This not only led to a focus on the wrong interface (WIMP via stylus and built-in keyboard), but also an assumption that the application layer, which Windows dominated, would be a key differentiator.

Apple, famously, figured out the right interface for the smartphone, and built an entirely new operating system around touch. Yes, iOS is based on macOS at a low level, but it was a completely new operating system in a way that Windows Mobile was not; at the same time, because iOS was based on macOS, it was far more capable than smartphone-only alternatives like BlackBerry OS or PalmOS. The key aspect of this capability was that the iPhone could access the real Internet.

What is funny is that Steve Jobs’ initial announcement of this capability was met with much less enthusiasm than the iPhone’s other two selling points of being a widescreen iPod and a mobile phone:

Today, we’re introducing three revolutionary products of this class. The first one is a wide-screen iPod with touch controls. The second is a revolutionary mobile phone. The third is a breakthrough Internet communications device…These are not three separate devices, this is one device, and we are calling iPhone. Today, Apple is going to reinvent the phone.

I’ve watched that segment hundreds of times, and the audience’s confusion at “Internet communications device” cracks me up every time; in fact, that was the key factor in reinventing the phone, because it was the bridge that linked a device in your pocket to the world of computing writ large, via the Internet. Jobs listed the initial Internet features later on in the keynote:

Now let’s take a look at an Internet communications device, part of the iPhone. What’s this all about? Well, we’ve got some real breakthroughs here: to start off with, we’ve got rich HTML email on iPhone. The first time, really rich email on a mobile device, and it works with any IMAP or POP email service. You’ve got your favorite mail service, it’ll likely work with it, and it’s rich text email. We wanted the best web browser on our phone, not a baby browser or a WAP browser, a real browser, and we picked the best one in the world: Safari, and we have Safari running on iPhone. It is the first fully-usable HTML browser on a phone. Third, we have Google Maps. Maps, satellite images, directions, and traffic. This is unbelievable, wait until you see it. We have Widgets, starting off with weather and stocks. And, this communicates with the Internet over Edge and Wifi, and iPhone automatically detects Wifi and switches seamless to it. You don’t have to manage the network, it just does the right thing.

Notice that the Internet is not just the web; in fact, while Apple wouldn’t launch a 3rd-party App Store until the following year, it did, with the initial iPhone, launch the app paradigm which, in contrast to standalone Applications from the PC days, assumed and depended on the Internet for functionality.

The Generative AI Bridge

We already established above that the next paradigm is wearables. Wearables today, however, are very much in the pre-iPhone era. On one hand you have standalone platforms like Oculus, with its own operating system, app store, etc.; the best analogy is a video game console, which is technically a computer, but is not commonly thought of as such given its singular purpose. On the other hand, you have devices like smart watches, AirPods, and smart glasses, which are extensions of the phone; the analogy here is the iPod, which provided great functionality but was not a general computing device.

Now Apple might dispute this characterization in terms of the Vision Pro specifically, which not only has a PC-class M2 chip, along with its own visionOS operating system and apps, but can also run iPad apps. In truth, though, this makes the Vision Pro akin to Microsoft Mobile: yes, it is a capable device, but it is stuck in the wrong paradigm, i.e. the previous one that Apple dominated. Or, to put it another way, I don’t view “apps” as the bridge between mobile and wearables; apps are just the way we access the Internet on mobile, and the Internet was the old bridge, not the new one.

To think about the next bridge, it’s useful to jump forward to the future and work backwards; that jump forward is a lot easier to envision, for me anyways, thanks to my experience with Meta’s Orion AR glasses:

They’re real and they’re spectacular. pic.twitter.com/hIJZuS6taY
— Ben Thompson (@benthompson) September 25, 2024

The most impressive aspect of Orion is the resolution, which is perfect. I’m referring, of course, to the fact that you can see the real world with your actual eyes; I wrote in an Update:

The reality is that the only truly satisfactory answer to passthrough is to not need it at all. Orion has perfect field-of-view and infinite resolution because you’re looking at the real world; it’s also dramatically smaller and lighter. Moreover, this perfect fidelity actually gives more degrees of freedom in terms of delivering the AR experience: no matter how high resolution the display is, it will still be lower resolution than the world around it; I tried a version of Orion with double the resolution and, honestly, it wasn’t that different, because the magic was in having augmented reality at all, not in its resolution. I suspect the same thing applies to field of view: 70 degrees seemed massive on Orion, even though that is less than the Vision Pro’s 100 degrees, because the edge of the field of view for Orion was reality, whereas the edge for the Vision Pro is, well, nothing.

The current iteration of Orion’s software did have an Oculus-adjacent launch screen, and an Instagram prototype; it was, in my estimation, the least impressive part of the demonstration, for the same reason that I think the Vision Pro’s iPad app compatibility is a long-term limitation: it was simply taking the mobile paradigm and putting it in front of my face, and honestly, I’d rather just use my phone.

One of the most impressive demos, meanwhile, had the least UI: it was just a notification. I glanced up, saw that someone was calling me, touched my fingers together to “click” on the accept button that accompanied the notification, and was instantly talking to someone in another room while still being able to interact freely with the world around me. Of course phone calls aren’t some sort of new invention; what made the demo memorable was that I only got the UI I needed when I needed it.

This, I think, is the future: the exact UI you need — and nothing more — exactly when you need it, and at no time else. This specific example was, of course, programmed deterministically, but you can imagine a future where the glasses are smart enough to generate UI on the fly based on the context of not just your request, but also your broader surroundings and state.

This is where you start to see the bridge: what I am describing is an application of generative AI, specifically to on-demand UI interfaces. It’s also an application that you can imagine being useful on devices that already exist. A watch application, for example, would be much more usable if, instead of trying to navigate by touch like a small iPhone, it could simply show you the exact choices you need to make at a specific moment in time. Again, we get hints of that today through deterministic programming, but the ultimate application will be on-demand via generative AI.

Of course generative AI is also usable on the phone, and that is where I expect most of the exploration around generative UI to happen for now. We certainly see plenty of experimentation and rapid development of generative AI broadly, just as we saw plenty of experimentation and rapid development of the Internet on PCs. That experimentation and development was not just usable on the PC, but it also created the bridge to the smartphone; I think that generative AI is doing the same thing in terms of building a bridge to wearables that are not accessories, but general purpose computers in their own right:

This is exciting in the long-term, and bullish for Meta (and I’ve previously noted how generative AI is the key to the metaverse, as well). It’s also, clearly, well into the future. It also helps explain why Orion isn’t shipping today: it’s not just that the hardware isn’t yet in a production state, particularly from a cost perspective, but the entire application layer needs to be built out, first on today’s devices, enabling the same sort of smooth transition that the iPhone had. No, Apple didn’t have the App Store, but the iPhone was extraordinarily useful on day one, because it was an Internet Communicator.

Survey Complete

Ten years ago I wrote a post entitled The State of Consumer Technology in 2014, where I explored some of the same paradigm-shifts I detailed in this Article. This was the illustration I made then:

There is a perspective in which 2024 has been a bit of a letdown in terms of generative AI; there hasn’t been a GPT-5 level model released; the more meaningful developments have been in the vastly increased efficiency and reduction in size of GPT-4 level models, and the inference-scaling possibilities of o1. Concerns are rising that we may have hit a data wall, and that there won’t be more intelligent AI without new fundamental breakthroughs in AI architecture.

I, however, feel quite optimistic. To me the story of 2024 has been filling in those question marks in that illustration. The product overhang from the generative AI capabilities we have today are absolutely massive: there are so many new things to be built, and completely new application layer paradigms are at the top of the list. That, by extension, is the bridge that will unlock entirely new paradigms of computing. The road to the future needs to be built; it’s exciting to have the sense that the surveying is now complete.

Get notified about new Articles

Sign Up

Please verify your email address to proceed.