Stratechery by Ben Thompson

AI Promise and Chip Precariousness

Tuesday, February 25, 2025

Listen to Podcast
Listen to this post:

Log in to listen

Yesterday Anthropic released Claude Sonnet 3.7; Dylan Patel had the joke of the day about Anthropic’s seeming aversion to the number “4”, which means “die” in Chinese:

Anthropic is also a chinese ai company because of their aversion to the number four
— Dylan Patel (@dylan522p) February 24, 2025

Jokes aside, the correction on this post by Ethan Mollick suggests that Anthropic did not increment the main version number because Sonnet 3.7 is still in the GPT-4 class of models as far as compute is concerned.

After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered a 10^26 FLOP model and cost a few tens of millions of dollars to train, though future models will be much bigger. I updated the post with that information. The only significant change is that Claude 3 is now referred to as an advanced model but not a Gen3 model.

I love Mollick’s work, but reject his neutral naming scheme: whoever gets to a generation first deserves the honor of the name. In other words, if Gen2 models are GPT-4 class, then Gen3 models are Grok 3 class.

And, whereas Sonnet 3.7 is an evolution of Sonnet 3.5’s fascinating mixture of personality and coding prowess, likely a result of some Anthropic special sauce in post-training, Grok 3 feels like a model that is the result of a step-order increase in compute capacity, with a much lighter layer of reinforcement learning with human feedback (RLHF). Its answers are far more in-depth and detailed (model good!), but frequently becomes too verbose (RLHF lacking); it gets math problems right (model good!), but its explanations are harder to follow (RLHF lacking). It is also much more willing to generate forbidden content, from erotica to bomb recipes, while having on the surface the political sensibilities of Tumblr, with something more akin to 4chan under the surface if you prod.¹ Grok 3, more than any model yet, feels like the distilled Internet; it’s my favorite so far.

Grok 3 is also a reminder of how much speed matters, and, by extension, why base models are still important in a world of AI’s that reason. Grok 3 is tangibly faster than the competition, which is a better user experience; more generally, conversation is the realm of quick wits, not deep thinkers. The latter is who I want doing research or other agentic-type tasks; the former makes for a better consumer user experience in a chatbot or voice interface.

ChatGPT, meanwhile, still has the best product experience — its Mac app in particular is dramatically better than Claude’s² — and it handles more consumer-y use cases like math homework in a much more user-friendly way. Deep Research, meanwhile, is significantly better than all of its competitors (including Grok’s “Deep Search”), and, for me anyways, the closest experience yet to AGI.

OpenAI’s biggest asset, however, is the ChatGPT brand and associated mindshare; COO Brad Lightcap just told CNBC that the service had surpassed 400 million weekly active users, a 33% increase in less than 3 months. OpenAI is, as I declared four months after the release of ChatGPT, the accidental consumer tech company. Consumer tech companies are the hardest to build and have the potential to be the most valuable; they also require a completely different culture and value chain than a research organization with an API on the side. That is the fundamental reality that I suspect has driven much of the OpenAI upheaval over the last two-and-a-half years: long-time OpenAI employees didn’t sign up to be the next Google Search or Meta, nor is Microsoft interested in being a mere component supplier to a company that must own the consumer relationship to succeed.

In fact, though, OpenAI has moved too slowly: the company should absolutely have an ad-supported version by now, no matter how much the very idea might make AI researchers’ skin crawl; one of the takeaways from the DeepSeek phenomenon was how many consumers didn’t understand how good OpenAI’s best models were because they were not paying customers. It is very much in OpenAI’s competitive interest to make it cost-effective to give free users the best models, and that means advertising. More importantly, the only way for a consumer tech company to truly scale to the entire world is by having an ad model, which maximizes the addressable market while still making it possible to continually increase the average revenue per user (this doesn’t foreclose a subscription model of course; indeed, ads + subscriptions is the ultimate destination for a consumer content business).

DeepSeek, meanwhile, has both been the biggest story of the year, in part because it is the yin to Grok 3’s yang. DeepSeek’s V3 and R1 models are excellent and worthy competitors in the GPT-4 class, and they achieved this excellence through extremely impressive engineering on both the infrastructure and model layers; Grok 3, on the other hand, simply bought the most top-of-the-line Nvidia chips, leveraging the company’s networking to build the biggest computing cluster yet, and came out with a model that is better, but not astronomically so.

The fact that DeepSeek is Chinese is critically important, for reasons I will get to below, but it is just as important that it is an open lab, regularly publishing papers, full model weights, and underlying source code. DeepSeek’s models — which are both better than Meta’s Llama models and more open (and unencumbered by an “openish” license) — set the bar for “minimum open capability”; any model at or below DeepSeek’s models has no real excuse to not be open. Safety concerns are moot when you can just run DeepSeek, while competitive concerns are dwarfed by the sacrifice in uptake and interest entailed in having a model that is both worse and closed.

Both DeepSeek and Llama, meanwhile, provide significant pressure on pricing; API costs in both the U.S. and China have come down in response to the Chinese research lab’s releases, and the only way to have a sustainable margin in the long run is to either have a cost advantage in infrastructure (i.e. Google), have a sustainable model capability advantage (potentially Claude and coding), or be an Aggregator (which is what OpenAI ought to pursue with ChatGPT).

The State of AI Chips

All of this is — but for those with high p-doom concerns — great news. AI at the moment seems to be a goldilocks position: there is sufficient incentive for the leading research labs to raise money and continue investing in new foundation models (in the hope of building an AI that improves itself), even as competition drives API prices down relentlessly, further incentivizing model makers to come up with differentiated products and capabilities.

The biggest winner, of course, continues to be Nvidia, whose chips are fabbed by TSMC: DeepSeek’s success is causing Chinese demand for the H20, Nvidia’s reduced-compute-and-reduced-bandwidth-to-abide-by-export-controls version of the H200, to skyrocket, even as xAI just demonstrated that the fastest way to compete is to pay for the best chips. DeepSeek’s innovations will make other models more efficient, but it’s reasonable to argue that those efficiences are downstream from the chip ban, and that it’s understandable why companies who can just buy the best chips haven’t pursued — but will certainly borrow! — similar gains.

That latter point is a problem for AMD in particular: SemiAnalysis published a brutal breakdown late last year demonstrating just how poor the Nvidia competitor’s software is relative to its hardware; AMD promises to do better, but, frankly, great chips limited by poor software has been the story of AMD for its entire five decades of existence. Some companies, like Meta or Microsoft, might put in the work to write better software, but leading labs don’t have the time nor expertise.

The story is different for Huawei and its Ascend line of AI chips. Those chips are fabbed on China’s Semiconductor Manufactoring International Corporation’s (SMIC) 7nm process, using western-built deep ultraviolet lighography (DUV) and quad-patterning; that this is possible isn’t a surprise, but it’s reasonable to assume that the fab won’t progress further without a Chinese supplier developing extreme ultraviolet lithography (EUV) (and no, calling an evolution of the 7nm process 5.5nm doesn’t count).

Still, the primary limitation for AI chips — particularly when it comes to inference — isn’t necessarily chip speed, but rather memory bandwidth, and that can be improved at the current process level. Moreover, one way to (somewhat) overcome the necessity of using less efficient chips is to simply build more data centers with more power, something that China is much better at than the U.S. Most importantly, however, is that China’s tech companies have the motivation — and the software chops — to make the Ascend a viable contender, particularly for inference.

There is one more player who should be mentioned alongside Nvidia/TSMC and Huawei/SMIC, and that is the hyperscalers who design their own chips, either on their own (AWS with Trainium and Microsoft with Maia) or in collaboration with Broadcom (Google with TPUs and Meta with MTIA). The capabilities and importance of these efforts varies — Google has been investing in TPUs for a decade now, and trains its own models on them, while the next-generation Anthropic model is being trained on Trainium; Meta’s MTIA is about recommendations and not generative AI, while Microsoft’s Maia is a much more nascent effort — but what they all have in common is that their chips are fabbed by TSMC.

TSMC and Intel

That TSMC is dominant isn’t necessarily a surprise. Yes, much has been written, including on this site, about Intel’s stumbles and TSMC’s rise, but even if Intel had managed to stay on the leading edge — and 18A is looking promising — there is still the matter of the company needing to transform itself from an integrated device manufacturer (IDM) who designs and makes its own chips, to a foundry that has the customer service, IP library, and experience to make chips for 3rd parties like all of the entities I just discussed.

Nvidia, to take a pertinent example, was making its chips at TSMC (and Samsung) even when Intel had the leading process; indeed, it was the creation of TSMC and its pure-play foundry model that even made Nvidia possible.³ This also means that TSMC doesn’t just have leading edge capacity, but trailing edge capacity as well. There are a lot of chips in the world — both on AI servers and also in everything from cars to stereos to refrigerators — that don’t need to be on the cutting edge and which benefit from the low costs afforded by the fully depreciated foundries TSMC still maintains, mostly in Taiwan. And TSMC, in turn, can take that cash flow — along with increasing prices for the leading edge — and invest in new fabs on the cutting edge.

Those leading edge fabs continue to skyrocket in price, which means volume is critical. That is why it was clear to me back when this site started in 2013 that Intel needed to become a foundry; unfortunately the company didn’t follow my advice, preferring to see their stock price soar on the back of cloud server demand. Fast forward to 2021 and Intel — now no longer on the leading edge, and with its cloud server business bleeding share to a resurgent AMD on TSMC’s superior process — tried, under the leadership of Pat Gelsinger, to become a foundry; unfortunately the company’s diminishing cash position is larger than its foundry customer base, which is mostly experimental chips or x86 variants.

Intel’s core problem goes back to the observation above: becoming a foundry is about more than having the leading edge process; Intel might have been able to develop those skills in conjunction with customers eager to be on the best process in the world, but once Intel didn’t even have that, it had nothing to offer. There simply is no reason for an Apple or AMD or Nvidia to take the massive risk entailed in working with Intel when TSMC is an option.

China and a Changing World

TSMC is, of course, headquartered in Taiwan; that is where the company’s R&D and leading edge fabs are located, along with most of its trailing edge capacity. SMIC, obviously, is in China; another foundry is Samsung, in South Korea. I told the story as to why so much of this industry ended up in Asia last fall in A Chance to Build:

Semiconductors are so integral to the history of Silicon Valley that they give the region its name, and, more importantly, its culture: chips require huge amounts of up-front investment, but they have, relative to most other manufactured goods, minimal marginal costs; this economic reality helped drive the development of the venture capital model, which provided unencumbered startup capital to companies who could earn theoretically unlimited returns at scale. This model worked even better with software, which was perfectly replicable.

That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.

In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.

I recounted in that Article about how this outsourcing was an intentional policy of the U.S. government, and launched into a broader discussion about the post-War Pax Americana global order that placed the U.S. consumer market at the center of global trade, denominated by the dollar, and why that led to an inevitable decline in American manufacturing and the rise of a country in China that, in retrospect, was simply too big, and thus too expensive, for America to bear.

That, anyways, is how one might frame many of the signals coming out of the 2nd Trump administration, including what appears to be a Monroe 2.0 Doctrine approach to North America, an attempt to extricate the U.S. from the Ukraine conflict specifically and Europe broadly, and, well, a perhaps tamer approach to China to start, at least compared to Trump’s rhetoric on the campaign trail.

One possibility is that Trump is actually following through on the “pivot to Asia” that U.S. Presidents have been talking about but failing to execute on for years; in this view the U.S. is girding itself up to defend Taiwan and other entities in Asia, and hopefully break up the burgeoning China-Russia relationship in the process.

The other explanation is more depressing, but perhaps more realistic: President Trump may believe that the unipolar U.S.-dominated world that has been the norm since the fall of the Soviet Union is drawing to a close, and it’s better for the U.S. to proactively shift to a new norm than to have it forced upon them.

The important takeaway that is relevant to this Article is that Taiwan is the flashpoint in both scenarios. A pivot to Asia is about gearing up to defend Taiwan from a potential Chinese invasion or embargo; a retrenchment to the Americas is about potentially granting — or acknowledging — China as the hegemon of Asia, which would inevitably lead to Taiwan’s envelopment by China.

This is, needless to say, a discussion where I tread gingerly, not least because I have lived in Taipei off and on for over two decades. And, of course, there is the moral component entailed in Taiwan being a vibrant democracy with a population that has no interest in reunification with China. To that end, the status quo has been simultaneously absurd and yet surprisingly sustainable: Taiwan is an independent country in nearly every respect, with its own border, military, currency, passports, and — pertinent to tech — economy, increasingly dominated by TSMC; at the same time, Taiwan has not declared independence, and the official position of the United States is to acknowledge that China believes Taiwan is theirs, without endorsing either that position or Taiwanese independence.

Chinese and Taiwanese do, in my experience, handle this sort of ambiguity much more easily than do Americans; still, gray zones only go so far. What has been just as important are realist factors like military strength (once in favor of Taiwan, now decidedly in favor of China), economic ties (extremely deep between Taiwan and China, and China and the U.S.), and war-waging credibility. Here the Ukraine conflict and the resultant China-Russia relationship looms large, thanks to the sharing of military technology and overland supply chains for oil and food that have resulted, even as the U.S. has depleted itself. That, by extension, gets at another changing factor: the hollowing out of American manufacturing under Pax Americana has been directly correlated with China’s dominance of the business of making things, the most essential war-fighting capability.

Still, there is — or rather was — a critical factor that might give China pause: the importance of TSMC. Chips undergird every aspect of the modern economy; the rise of AI, and the promise of the massive gains that might result, only make this need even more pressing. And, as long as China needs TSMC chips, they have a powerful incentive to leave Taiwan alone.

Trump, Taiwan, and TSMC

Anyone who has been following the news for the last few years, however, can surely see the problem: the various iterations of the chip ban, going back to the initial action against ZTE in 2018, have the perhaps-unintended effect of making China less dependent on TSMC. I wrote at the time of the ZTE ban:

What seems likely to happen in the long run is a separation at the hardware layer as well; China is already investing heavily in chips, and this action will certainly spur the country to focus on the sort of relatively low-volume high-precision components that other countries like the U.S., Taiwan, and Japan specialize in (to date it has always made more sense for Chinese companies to focus on higher-volume lower-precision components). To catch up will certainly take time, but if this action harms ZTE as much as it seems it will I suspect the commitment will be even more significant than it already is.

I added two years later, after President Trump barred Huawei from TSMC chips in 2020:

I am, needless to say, not going to get into the finer details of the relationship between China and Taiwan (and the United States, which plays a prominent role); it is less that reasonable people may disagree and more that expecting reasonableness is probably naive. It is sufficient to note that should the United States and China ever actually go to war, it would likely be because of Taiwan.

In this TSMC specifically, and the Taiwan manufacturing base generally, are a significant deterrent: both China and the U.S. need access to the best chip maker in the world, along with a host of other high-precision pieces of the global electronics supply chain. That means that a hot war, which would almost certainly result in some amount of destruction to these capabilities, would be devastating…one of the risks of cutting China off from TSMC is that the deterrent value of TSMC’s operations is diminished.

Now you can see the fly in Goldilocks’ porridge! China would certainly like the best chips from TSMC, but they are figuring out how to manage with SMIC and the Ascend and surprisingly efficient state-of-the-art models; the entire AI economy in the U.S., on the other hand — the one that is developing so nicely, with private funding pursuing the frontier, and competition and innovation up-and-down the stack — is completely dependent on TSMC and Taiwan. We have created a situation where China is less dependent on Taiwan, even while we are more dependent on the island.

This is the necessary context for two more will-he-or-won’t-he ideas floated by President Trump; both are summarized in this Foreign Policy article:

U.S. President Donald Trump has vowed to impose tariffs on Taiwan’s semiconductor industry and has previously accused Taiwan of stealing the U.S. chip industry…The primary strategic goal for the administration is to revitalize advanced semiconductor manufacturing in the United States…As the negotiations between TSMC and the White House unfold, several options are emerging.

The most discussed option is a deal between TSMC, Intel, the U.S. government, and U.S. chip designers such as Broadcom and Qualcomm. Multiple reports indicate that the White House has proposed a deal that would have TSMC acquire a stake in Intel Foundry Services and take a leading role in its operations after IFS separated from Intel. Other reports suggest a potential joint venture involving TSMC, Intel, the U.S. government, and industry partners, with technology transfer and technical support from TSMC.

The motivation for such a proposal is clear: Intel’s board, who fired Gelsinger late last year, seems to want out of the foundry business, and Broadcom or Qualcomm are natural landing places for the design division; the U.S., however, is the entity that needs a leading edge foundry in the U.S., and the Trump administration is trying to compel TSMC to make it happen.

Unfortunately, I don’t think this plan is a good one. It’s simply not possible for one foundry to “take over” another: while the final output is the same — a microprocessor — nearly every step of the process is different in a multitude of ways. Transistors — even ones of the same class — can have different dimensions, with different layouts (TSMC, for example, packs its transistors more densely); production lines can be organized differently, to serve different approaches to lithography; chemicals are tuned to individual processes, and can’t be shared; equipment is tailored to a specific line, and can’t be switched out; materials can differ, throughout the chip, along with how exactly they are prepared and applied. Sure, most of the equipment could be repurposed, but one doesn’t simply layer a TSMC process onto an Intel fab! The best you could hope for is that TSMC could rebuild the fabs using the existing equipment according to their specifications.

That, though, doesn’t actually solve the Taiwan problem: TSMC is still headquartered in Taiwan, still has its R&D division there, and is still beholden to a Taiwanese government directive to not export its most cutting edge processes (and yes, there is truth to Trump’s complaints that Taiwan sees TSMC as leverage to guarantee that the U.S. defends Taiwan in the event of a Chinese invasion). Moreover, the U.S. chip problem isn’t just about the leading edge, but also the trailing edge. I wrote in Chips and China:

It’s worth pointing out, though, that this is producing a new kind of liability for the U.S., and potentially more danger for Taiwan…these aren’t difficult chips to make, but that is precisely why it makes little sense to build new trailing edge foundries in the U.S.: Taiwan already has it covered (with the largest marketshare in both categories), and China has the motivation to build more just so it can learn.

What, though, if TSMC were taken off the board?

Much of the discussion around a potential invasion of Taiwan — which would destroy TSMC (foundries don’t do well in wars) — centers around TSMC’s lead in high end chips. That lead is real, but Intel, for all of its struggles, is only a few years behind. That is a meaningful difference in terms of the processors used in smartphones, high performance computing, and AI, but the U.S. is still in the game. What would be much more difficult to replace are, paradoxically, trailing node chips, made in fabs that Intel long ago abandoned…

The more that China builds up its chip capabilities — even if that is only at trailing nodes — the more motivation there is to make TSMC a target, not only to deny the U.S. its advanced capabilities, but also the basic chips that are more integral to everyday life than we ever realized.

It’s good that the administration is focused on the issue of TSMC and Taiwan: what I’m not sure anyone realizes is just how deep the dependency goes, and just how vulnerable the U.S. — and our future in AI — really is.

What To Do

Everything that I’ve written until now has been, in some respects, trivial: it’s easy to identify problems and criticize proposed solutions; it’s much more difficult to come up with solutions of one’s own. The problem is less the need for creative thinking and more the courage to make trade-offs: the fact of the matter is that there are no good solutions to the situation the U.S. has got itself into with regards to Taiwan and chips. That is a long-winded way to say that the following proposal includes several ideas that, in isolation, I find some combination of distasteful, against my principles, and even downright dangerous. So here goes.

End the China Chip Ban

The first thing the U.S. should do — and, by all means, make this a negotiating plank in a broader agreement with China — is let Chinese companies, including Huawei, make chips at TSMC, and further, let Chinese companies buy top-of-the-line Nvidia chips.

The Huawei one is straightforward: Huawei’s founder may have told Chinese President Xi Jinping that Huawei doesn’t need external chip makers, but I think that the reality of having access to cutting edge TSMC fabrication would show that the company’s revealed preference would be for better chips than Huawei can get from SMIC — and the delta is only going to grow. Sure, Huawei would still work with SMIC, but the volume would go down; critically, so would the urgency of having no other choice. This, by extension, would restart China’s dependency on TSMC, thereby increasing the cost of making a move on Taiwan.

At the same time, giving Huawei access to cutting edge chips would be a significant threat to Nvidia’s dominance; the reason the company is so up-in-arms about the chip ban isn’t simply foregone revenue but the forced development of an alternative to their CUDA ecosystem. The best way to neuter that challenge — and it is in the U.S.’s interest to have Nvidia in control, not Huawei — is to give companies like Bytedance, Alibaba, and DeepSeek the opportunity to buy the best.

This does, without question, unleash China in terms of AI; preventing that has been the entire point of the various flavors of chip bans that came down from the Biden administration. DeepSeek’s success, however, should force a re-evaluation about just how viable it is to completely cut China off from AI.

It’s also worth noting that success in stopping China’s AI efforts has its own risks: another reason why China has held off from moving against Taiwan is the knowledge that every year they wait increases their relative advantages in all the real world realities I listed above; that makes it more prudent to wait. The prospect of the U.S. developing the sort of AI that matters in a military context, however, even as China is cut off, changes that calculus: now the prudent course is to move sooner rather than later, particularly if the U.S. is dependent on Taiwan for the chips that make that AI possible.

Double Down on the Semiconductor Equipment Ban

While I’ve continually made references to “chip bans”, that’s actually incomplete: the U.S. has also made moves to limit China’s access to semiconductor equipment necessary for making leading edge chips (SMIC’s 7nm process, for example, is almost completely dependent on western semiconductor equipment). Unfortunately, this effort has mostly been a failure, thanks to generous loopholes that are downstream from China being a large market for U.S. semiconductor equipment manufacturers.

It’s time for those loopholes to go away; remember, the overriding goal is for China to increase its dependence on Taiwan, and that means cutting SMIC and China’s other foundries off at the knees. Yes, this increases the risk that China will develop its own alternatives to western semiconductor manufacturers, leading to long-term competition and diminished money for R&D, but this is a time for hard choices and increasing Taiwan’s importance to China is more important.

Build Trailing Edge Fabs in the U.S.

The U.S.’s dependency on TSMC for trailing edge chip capacity remains a massive problem; if you think the COVID chip shortages were bad, then a scenario where the U.S. is stuck with GlobalFoundries and no one else is a disaster so great it is hard to contemplate. However, as long as TSMC exists, there is zero economic rationale for anyone to build more trailing edge fabs.

This, then, is a textbook example of where government subsidies are the answer: there is a national security need for trailing edge capacity, and no economic incentive to build it. And, as an added bonus, this helps fill in some of the revenue for semiconductor manufacturers who are now fully cut off from China. TSMC takes a blow, of course, but they are also being buttressed by orders from Huawei and other Chinese chip makers.

Intel and the Leading Edge

That leaves Intel and the need for native leading edge capacity, and this is in some respects the hardest problem to solve.

First, the U.S. should engineer a spin-off of Intel’s x86 chip business to Broadcom or Qualcomm at a nominal price; the real cost for the recipient company will be guaranteed orders for not just Intel chips but also a large portion of their existing chips for Intel Foundry. This will provide the foundational customer to get Intel Foundry off the ground.

Second, the U.S. should offer to subsidize Nvidia chips made at Intel Foundry. Yes, this is an offer worth billions of dollars, but it is the shortest, fastest route to ground the U.S. AI industry in U.S. fabs.

Third, if Nvidia declines — and they probably will, given the risks entailed in a foundry change — then the U.S. should make a massive order for Intel Gaudi AI accelerators, build data centers to house them, and make them freely available to companies and startups who want to build their own AI models, with the caveat that everything is open source.

Fourth, the U.S. should heavily subsidize chip startups to build at Intel Foundry, with the caveat that all of the resultant IP that is developed to actually build chips — the basic building blocks, that are separate from the “secret sauce” of the chip itself — is open-sourced.

Fifth, the U.S. should indemnify every model created on U.S.-manufactured chips against any copyright violations, with the caveat that the data used to train the model must be made freely available.

Here is the future state the U.S. wants to get to: a strong AI industry running on U.S.-made chips, along with trailing edge capacity that is beyond the reaches of China. Getting there, however, will take significant interventions into the market to undo the overwhelming incentives for U.S. companies to simply rely on TSMC; even then, such a shift will take time, which is why making Taiwan indispensable to China’s technology industry is the price that needs to be paid in the meantime.

AI is in an exciting place; it’s also a very precarious one. I believe this plan, with all of the risks and sacrifices it entails, is the best way to ensure that all of the trees that are sprouting have time to actually take root and change the world.

I wrote a follow-up to this Article in this Daily Update.
1. This suggests a surprising takeaway: it’s possible that while RLHF on ChatGPT and especially Claude block off the 4chan elements, they also tamp down the Tumblr elements, which is to say the politics don’t come from the post-training, but from the dataset — i.e. the Internet. In other words, if I’m right about Grok 3 having a much lighter layer of RLHF, then that explains both the surface politics, and what is available under the surface. ↩
2. Grok doesn’t yet have a Mac app, but its iPhone app is very good ↩
3. Although Nvidia’s first chip was made by SGS-Thomson Microelectronics ↩
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Deep Research and Knowledge Value

Monday, February 10, 2025

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

“When did you feel the AGI?”

This is a question that has been floating around AI circles for a while, and it’s a hard one to answer for two reasons. First, what is AGI, and second, “feel” is a bit like obscenity: as Supreme Court Justice Potter Stewart famously said in Jacobellis v. Ohio, “I know it when I see it.”

I gave my definition of AGI in AI’s Uneven Arrival:

What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford…My definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place).

The “feel” part of that question is a more recent discovery: DeepResearch from OpenAI feels like AGI; I just got a new employee for the shockingly low price of $200/month.

Deep Research Bullets

OpenAI announced Deep Research in a February 2 blog post:

Today we’re launching deep research in ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours.

Deep research is OpenAI’s next agent that can do work for you independently — you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst. Powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis, it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters.

The ability to synthesize knowledge is a prerequisite for creating new knowledge. For this reason, deep research marks a significant step toward our broader goal of developing AGI, which we have long envisioned as capable of producing novel scientific research.

It’s honestly hard to keep track of OpenAI’s AGI definitions these days — CEO Sam Altman, just yesterday, defined it as “a system that can tackle increasingly complex problems, at human level, in many fields” — but in my rather more modest definition Deep Research sits right in the middle of that excerpt: it synthesizes research in an economically valuable way, but doesn’t create new knowledge.

I already published two examples of Deep Research in last Tuesday’s Stratechery Update. While I suggest reading the whole thing, to summarize:
- First, I published my (brief) review of Apples recent earnings, including three observations:
  - It was notable that Apple earned record revenue even though iPhone sales were down year-over-year, in the latest datapoint about the company’s transformation into a Services juggernaut.
  - China sales were down again, but this wasn’t a new trend: it actually goes back nearly a decade, but you can only see that if you realize how the Huawei chip ban gave Apple a temporary boost in the country.
  - While Apple executives claimed that Apple Intelligence drove iPhone sales, there really wasn’t any evidence in the geographic sales numbers supporting that assertion.
- Second, I published a Deep Research report using a generic prompt:
  
  I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings in the style and voice of Stratechery that is in line with my previous analysis.
- Third, I published a Deep Research report using a prompt that incorporated my takeaways from the earnings:
  
  I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings for fiscal year 2025 q1 (calendar year 2024 q4). There are a couple of angles I am particularly interested in:
  
  - First, there is the overall trend of services revenue carrying the companies earnings. How has that trend continued, what does it mean for margins, etc.
  
  - Second, I am interested in the China angle. My theory is that Apple's recent decline in China is not new, but is actually part of a longer trend going back nearly a decade. I believe that trend was arrested by the chip ban on Huawei, but that that was only a temporary bump in terms of a long-term decline. In addition, I would like to marry this to deeper analysis of the Chinese phone market, the distinction between first tier cities and the rest of China, and what that says about Apple's prospects in the country.
  
  - Third, what takeaways are there about Apple's AI prospects? The company claims that Apple Intelligence is helping sales in markets where it has launched, but isn't this a function of not being available in China?
  
  Please deliver this report in a format and style that is suitable for Stratechery.
You can read the Update for the output, but this was my evaluation:

The first answer was decent given the paucity of instruction; it’s really more of a summary than anything, but there are a few insightful points. The second answer was considerably more impressive. This question relied much more heavily on my previous posts, and weaved points I’ve made in the past into the answer. I don’t, to be honest, think I learned anything new, but I think that anyone encountering this topic for the first time would have. Or, to put it another way, were I looking for a research assistant, I would consider hiring whoever wrote the second answer.

In other words, Deep Research isn’t a rifle barrel, but for this question at least, it was a pretty decent piece of ammunition.

DeepResearch Examples

Still, that ammunition wasn’t that valuable to me; I read the transcript of Apple’s earnings call before my 8am Dithering recording and came up with my three points immediately; that’s the luxury of having thought about and covered Apple for going on twelve years. And, as I noted above, the entire reason that the second Deep Research report was interesting was because I came up with the ideas and Deep Research substantiated them; the substantiation, however, wasn’t nearly to the standard (in my very biased subjective opinion!) of a Stratechery Update.

I found a much more beneficial use case the next day. Before I conduct a Stratechery Interview I do several hours of research on the person I am interviewing, their professional background, the company they work for, etc.; in this case I was talking to Bill McDermott, the Chairman and CEO of ServiceNow, a company I am somewhat familiar with but not intimately so. So, I asked Deep Research for help:

I am going to conduct an interview with Bill McDermott, the CEO of ServiceNow, and I need to do research about both McDermott and ServiceNow to prepare my questions.

First, I want to know more about McDermott and his background. Ideally there are some good profiles of him I can read. I know he used to work at SAP and I would like to know what is relevant about his experience there. Also, how and why did he take the ServiceNow job?

Then, what is the background of ServiceNow? How did it get started? What was its initial product-market fit, and how has it expanded over time? What kind of companies use ServiceNow?

What is the ServiceNow business model? What is its go-to-market strategy?

McDermott wants to talk about ServiceNow's opportunities in AI. What are those opportunities, and how are they meaningfully unique, or different from simple automation?

What do users think of ServiceNow? Is it very ugly and hard to use? Why is it very sticky? What attracts companies to it?

What competitors does ServiceNow have? Can it be a platform for other companies? Or is there an opportunity to disrupt ServiceNow?

What other questions do you have that would be useful for me to ask?

You can use previous Stratechery Interviews as a resource to understand the kinds of questions I typically ask.

I found the results eminently useful, although the questions were pretty mid; I did spend some time doing some additional reading of things like earnings reports before conducting the Interview with my own questions. In short, it saved me a fair bit of time and gave me a place to start from, and that alone more than paid for my monthly subscription.

Another compelling example came in researching a friend’s complicated medical issue; I’m not going to share my prompt and results for obvious reasons. What I will note is that this friend has been struggling with this issue for over a year, and has seen multiple doctors and tried several different remedies. Deep Research identified a possible issue in ten minutes that my friend has only just learned about from a specialist last week; while it is still to be determined if this is the answer he is looking for, it is notable that Deep Research may have accomplished in ten minutes what has taken my friend many hours over many months with many medical professionals.

It is the final example, however, that is the most interesting, precisely because it is the question on which Deep Research most egregiously failed. I generated a report about another friend’s industry, asking for the major players, supply chain analysis, customer segments, etc. It was by far my most comprehensive and detailed prompt. And, sure enough, Deep Research came back with a fully fleshed out report answering all of my questions.

It was also completely wrong, but in a really surprising way. The best way to characterize the issue is to go back to that famous Donald Rumsfeld quote:

There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know.

The issue with the report I generated — and once again, I’m not going to share the results, but this time for reasons that are non-obvious — is that it completely missed a major entity in the industry in question. This particular entity is not a well-known brand, but is a major player in the supply chain. It is a significant enough entity that any report about the industry that did not include them is, if you want to be generous, incomplete.

It is, in fact, the fourth categorization that Rumsfeld didn’t mention: “the unknown known.” Anyone who read the report that Deep Research generated would be given the illusion of knowledge, but would not know what they think they know.

Knowledge Value

One of the most painful lessons of the Internet was the realization by publishers that news was worthless. I’m not speaking about societal value, but rather economic value: something everyone knows is both important and also non-monetizable, which is to say that the act of publishing is economically destructive. I wrote in Publishers and the Pursuit of the Past:

Too many newspaper advocates utterly and completely fail to understand this; the truth is that newspapers made money in the past not by providing societal value, but by having quasi-monopolistic control of print advertising in their geographic area; the societal value was a bonus. Thus, when Chavern complains that “today’s internet distribution systems distort the flow of economic value derived from good reporting”, he is in fact conflating societal value with economic value; the latter does not exist and has never existed.

This failure to understand the past leads to a misdiagnosis of the present: Google and Facebook are not profitable because they took newspapers’ reporting, they are profitable because they took their advertising. Moreover, the utility of both platforms is so great that even if all newspaper content were magically removed — which has been tried in Europe — the only thing that would change is that said newspapers would lose even more revenue as they lost traffic.

This is why this solution is so misplaced: newspapers no longer have a monopoly on advertising, can never compete with the Internet when it comes to bundling content, and news remains both valuable to society and, for the same reasons, worthless economically (reaching lots of people is inversely correlated to extracting value, and facts — both real and fake ones — spread for free).

It is maybe a bit extreme to say has always been such; in truth it is very hard to draw direct lines from the analog era, defined as it was by friction and scarcity, to the Internet era’s transparency and abundance. It may have technically been the case that those of us old enough to remember newsstands bought the morning paper because a local light manufacturing company owned printing presses, delivery trucks, and an advertising sales team, but we too believed we simply wanted to know what was happening. Now we get that need fulfilled for free, and probably by social media (for better or worse); I sometimes wish I knew less!

Still, what Deep Research reveals is how much more could be known. I read a lot of things on the Internet, but it’s not as if I will ever come close to reading everything. Moreover, as the amount of slop increases — whether human or AI generated — the difficulty in finding the right stuff to read is only increasing. This is also one problem with Deep Research that is worth pointing out: the worst results are often, paradoxically, for the most popular topics, precisely because those are the topics that are the most likely to be contaminated by slop. The more precise and obscure the topic, the more likely it is that Deep Research will have to find papers and articles that actually cover the topic well:

This graph, however, is only half complete, as the example of my friend’s industry shows:

There is a good chance that Deep Research, particularly as it evolves, will become the most effective search engine there has ever been; it will find whatever information there is to find about a particular topic and present it in a relevant way. It is the death, in other words, of security through obscurity. Previously we shifted from a world where you had to pay for the news to the news being fed to you; now we will shift from a world where you had to spend hours researching a topic to having a topic reported to you on command.

Unless, of course, the information that matters is not on the Internet. This is why I am not sharing the Deep Research report that provoked this insight: I happen to know some things about the industry in question — which is not related to tech, to be clear — because I have a friend who works in it, and it is suddenly clear to me how much future economic value is wrapped up in information not being public. In this case the entity in question is privately held, so there aren’t stock market filings, public reports, barely even a webpage! And so AI is blind.

There is another example, this time in tech, of just how valuable secrecy can be. Amazon launched S3, the first primitive offered by AWS, in 2006, followed by EC2 later that year, and soon transformed startups and venture capital. What wasn’t clear was to what extent AWS was transforming Amazon; the company slowly transitioned Amazon.com to AWS, and that was reason enough to list AWS’s financials under Amazon.com until 2012, and then under “Other” — along with things like credit card and (then small amounts of) advertising revenue — after that.

The grand revelation would come in 2015, when Amazon announced in January that it would break AWS out into a separate division for reporting purposes. From a Reuters report at the time:

After years of giving investors the cold shoulder, Amazon.com Inc is starting to warm up to Wall Street. The No. 1 U.S. online retailer was unusually forthcoming during its fourth-quarter earnings call on Thursday, saying it will break out results this year, for the first time, for its fast-growing cloud computing unit, Amazon Web Services

The additional information shared during Amazon’s fourth-quarter results as well as its emphasis on becoming more efficient signaled a new willingness by Amazon executives to listen to investors as well. “This quarter, Amazon flexed its muscles and said this is what we can do when we focus on profits,” said Rob Plaza, senior equity analyst for Key Private Bank. “If they could deliver that upper teens, low 20s revenue growth and be able to deliver profits on top of that, the stock is going to respond.” The change is unlikely to be dramatic. When asked whether this quarter marked a permanent shift in Amazon’s relationship with Wall Street, Plaza laughed: “I wouldn’t be chasing the stock here based on that.”

Still, the shift is a good sign for investors, who have been clamoring for Amazon to disclose more about its fastest-growing and likely most profitable division that some analysts say accounts for 4 percent of total sales.

In fact, AWS accounted for nearly 7 percent of total sales, and it was dramatically more profitable than anyone expected. The revelation caused such a massive uptick in the stock price that I called it The AWS IPO:

One of the technology industry’s biggest and most important IPOs occurred late last month, with a valuation of $25.6 billion dollars. That’s more than Google, which IPO’d at a valuation of $24.6 billion, and certainly a lot more than Amazon, which finished its first day on the public markets with a valuation of $438 million. Don’t feel too bad for the latter, though: the “IPO” I’m talking about was Amazon Web Services, and it just so happens to still be owned by the same e-commerce company that went public nearly 20 years ago.

I’m obviously being facetious; there was no actual IPO for AWS, just an additional line item on Amazon’s financial reports finally breaking out the cloud computing service Amazon pioneered nine years ago. That line item, though, was almost certainly the primary factor in driving an overnight increase in Amazon’s market capitalization from $182 billion on April 23 to $207 billion on April 24. It’s not only that AWS is a strong offering in a growing market with impressive economics, it also may, in the end, be the key to realizing the potential of Amazon.com itself.

That $25.6 billion increase in market cap, however, came with its own costs: both Microsoft and Google doubled down on their own cloud businesses in response, and while AWS is still the market leader, it faces stiff competition. That’s a win for consumers and customers, but also a reminder that known unknowns have a value all their own.

Surfacing Data

I wouldn’t go so far as to say that Amazon was wrong to disclose AWS’s financials. In fact, SEC rules would have required as much once AWS revenue became 10% of the company’s overall business (today it is 15%, which might seem low until you remember that Amazon’s top-line revenue includes first-party e-commerce sales). Moreover, releasing AWS’s financials gave investors renewed confidence in the company, giving management freedom to continue investing heavily in capital expenditures for both AWS and the e-commerce business, fueling Amazon’s transformation into a logistics company. The point, rather, is to note that secrets are valuable.

What is interesting to consider is what this means for AI tools like Deep Research. Hedge funds have long known the value of proprietary data, paying for everything from satellite images to traffic observers and everything in between in order to get a market edge. My suspicion is that work like this is going to become even more valuable as security by obscurity disappears; it’s going to be more difficult to harvest alpha from reading endless financial filings when an AI can do that research in a fraction of the time.¹

The problem with those hedge fund reports, however, is that they themselves are proprietary; however, they are not a complete secret. After all, the way to monetize that research is through making trades on the open market, which is to say those reports have an impact on prices. Pricing is a signal that is available to everyone, and it’s going to become an increasingly important one.

That, by extension, is why AIs like Deep Research are one of the most powerful arguments yet for prediction markets. Prediction markets had their moment in the sun last fall during the U.S. presidential election, when they were far more optimistic about a Trump victory than polls. However, the potential — in fact, the necessity — of prediction markets is only going to increase with AI. AI’s capability of knowing everything that is public is going to increase the incentive to keep things secret; prediction markets in everything will provide a profit incentive for knowledge to be disseminated, by price if nothing else.

It is also interesting that prediction markets have become associated with crypto, another technology that is poised to come into its own in an AI-dominated world; infinite content generation increases the value of digital scarcity and verification, just as infinite transparency increases the value of secrecy. AI is likely to be the key to tying all of this together: a combination of verifiable information and understandable price movements may the only way to derive any meaning from the slop that is slowly drowning the Internet.

This is the other reality of AI, and why it is inescapable. Just as the Internet’s transparency and freedom to publish has devolved into torrents of information of questionable veracity, requiring ever more heroic efforts to parse, and undeniable opportunities to thrive by building independent brands — like this site — AI will both be the cause of further pollution of the information ecosystem and, simultaneously, the only way out.

Deep Research Impacts

Much of this is in the (not-so-distant) future; for now Deep Research is one of the best bargains in technology. Yes, $200/month is a lot, and yes, Deep Research is limited by the quality of information on the Internet and is highly dependent on the quality of the prompt. I can’t say that I’ve encountered any particular sparks of creativity, at least in arenas that I know well, but at the same time, there is a lot of work that isn’t creative in nature, but necessary all the same. I personally feel much more productive, and, truth be told, I was never going to hire a researcher anyways.

That, though, speaks to the peril in two distinct ways. First, one reason I’ve never hired a researcher is that I see tremendous value in the search for and sifting of information. There is so much you learn on the way to a destination, and I value that learning; will serendipity be an unwelcome casualty to reports on demand? Moreover, what of those who haven’t — to take the above example — been reading Apple earnings reports for 12 years, or thinking and reading about technology for three decades? What will be lost for the next generation of analysts?

And, of course, there is the job question: lots of other entities employ researchers, in all sorts of fields, and those salaries are going to be increasingly hard to justify. I’ve known intellectually that AI would replace wide swathes of knowledge work; it is another thing to feel it viscerally.

At the same time, that is why the value of secrecy is worth calling out. Secrecy is its own form of friction, the purposeful imposition of scarcity on valuable knowledge. It speaks to what will be valuable in an AI-denominated future: yes, the real world and human-denominated industries will rise in economic value, but so will the tools and infrastructure that both drive original research and discoveries, and the mechanisms to price it. The power of AI, at least on our current trajectory, comes from knowing everything; the (perhaps doomed) response of many will be to build walls, toll gates, and marketplaces to protect and harvest the fruits of their human expeditions.
1. I don’t think Deep Research is good at something like this, at least not yet. For example, I generated a report about what happened in 2015 surrounding Amazon’s disclosure, and the results were pretty poor; this is, however, the worst the tool will ever be. ↩
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
DeepSeek FAQ

Monday, January 27, 2025

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

It’s Monday, January 27. Why haven’t you written about DeepSeek yet?

I did! I wrote about R1 last Tuesday.

I totally forgot about that.

I take responsibility. I stand by the post, including the two biggest takeaways that I highlighted (emergent chain-of-thought via pure reinforcement learning, and the power of distillation), and I mentioned the low cost (which I expanded on in Sharp Tech) and chip ban implications, but those observations were too localized to the current state of the art in AI. What I totally failed to anticipate were the broader implications this news would have to the overall meta-discussion, particularly in terms of the U.S. and China.

Is there precedent for such a miss?

There is. In September 2023 Huawei announced the Mate 60 Pro with a SMIC-manufactured 7nm chip. The existence of this chip wasn’t a surprise for those paying close attention: SMIC had made a 7nm chip a year earlier (the existence of which I had noted even earlier than that), and TSMC had shipped 7nm chips in volume using nothing but DUV lithography (later iterations of 7nm were the first to use EUV). Intel had also made 10nm (TSMC 7nm equivalent) chips years earlier using nothing but DUV, but couldn’t do so with profitable yields; the idea that SMIC could ship 7nm chips using their existing equipment, particularly if they didn’t care about yields, wasn’t remotely surprising — to me, anyways.

What I totally failed to anticipate was the overwrought reaction in Washington D.C. The dramatic expansion in the chip ban that culminated in the Biden administration transforming chip sales to a permission-based structure was downstream from people not understanding the intricacies of chip production, and being totally blindsided by the Huawei Mate 60 Pro. I get the sense that something similar has happened over the last 72 hours: the details of what DeepSeek has accomplished — and what they have not — are less important than the reaction and what that reaction says about people’s pre-existing assumptions.

So what did DeepSeek announce?

The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.

Is this model naming convention the greatest crime that OpenAI has committed?

Second greatest; we’ll get to the greatest momentarily.

Let’s work backwards: what was the V2 model, and why was it important?

The DeepSeek-V2 model introduced two important breakthroughs: DeepSeekMoE and DeepSeekMLA. The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each.

DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.

DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.

I’m not sure I understood any of that.

The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.

That seems impossibly low.

DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:

Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

So no, you can’t replicate DeepSeek the company for $5.576 million.

I still don’t believe that number.

Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.

Scale AI CEO Alexandr Wang said they have 50,000 H100s.

I don’t know where Wang got his information; I’m guessing he’s referring to this November 2024 tweet from Dylan Patel, which says that DeepSeek had “over 50k Hopper GPUs”. H800s, however, are Hopper GPUs, they just have much more constrained memory bandwidth than H100s because of U.S. sanctions.

Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.

Meanwhile, DeepSeek also makes their models available for inference: that requires a whole bunch of GPUs above-and-beyond whatever was used for training.

So was this a violation of the chip ban?

Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.

Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.

So V3 is a leading edge model?

It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.

What is distillation?

Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.

Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.

Distillation seems terrible for leading edge models.

It is! On the positive side, OpenAI and Anthropic and Google are almost certainly using distillation to optimize the models they use for inference for their consumer-facing apps; on the negative side, they are effectively bearing the entire cost of training the leading edge, while everyone else is free-riding on their investment.

Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated.

Is this why all of the Big Tech stock prices are down?

In the long run, model commoditization and cheaper inference — which DeepSeek has also demonstrated — is great for Big Tech. A world where Microsoft gets to provide inference to its customers for a fraction of the cost means that Microsoft has to spend less on data centers and GPUs, or, just as likely, sees dramatically higher usage given that inference is so much cheaper. Another big winner is Amazon: AWS has by-and-large failed to make their own quality model, but that doesn’t matter if there are very high quality open source models that they can serve at far lower costs than expected.

Apple is also a big winner. Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference (Nvidia gaming GPUs max out at 32GB of VRAM, while Apple’s chips go up to 192 GB of RAM).

Meta, meanwhile, is the biggest winner of all. I already laid out last fall how every aspect of Meta’s business benefits from AI; a big barrier to realizing that vision is the cost of inference, which means that dramatically cheaper inference — and dramatically cheaper training, given the need for Meta to stay on the cutting edge — makes that vision much more achievable.

Google, meanwhile, is probably in worse shape: a world of decreased hardware requirements lessens the relative advantage they have from TPUs. More importantly, a world of zero-cost inference increases the viability and likelihood of products that displace search; granted, Google gets lower costs as well, but any change from the status quo is probably a net negative.

I asked why the stock prices are down; you just painted a positive picture!

My picture is of the long run; today is the short run, and it seems likely the market is working through the shock of R1’s existence.

Wait, you haven’t even talked about R1 yet.

R1 is a reasoning model like OpenAI’s o1. It has the ability to think through a problem, producing much higher quality results, particularly in areas like coding, math, and logic (but I repeat myself).

Is this more impressive than V3?

Actually, the reason why I spent so much time on V3 is that that was the model that actually demonstrated a lot of the dynamics that seem to be generating so much surprise and controversy. R1 is notable, however, because o1 stood alone as the only reasoning model on the market, and the clearest sign that OpenAI was the market leader.

R1 undoes the o1 mythology in a couple of important ways. First, there is the fact that it exists. OpenAI does not have some sort of special sauce that can’t be replicated. Second, R1 — like all of DeepSeek’s models — has open weights (the problem with saying “open source” is that we don’t have the data that went into creating it). This means that instead of paying OpenAI to get reasoning, you can run R1 on the server of your choice, or even locally, at dramatically lower cost.

How did DeepSeek make R1?

DeepSeek actually made two models: R1 and R1-Zero. I actually think that R1-Zero is the bigger deal; as I noted above, it was my biggest focus in last Tuesday’s Update:

R1-Zero, though, is the bigger deal in my mind. From the paper:

In this paper, we take the first step toward improving language model reasoning capabilities using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO as the RL framework to improve model performance in reasoning. During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912.

Reinforcement learning is a technique where a machine learning model is given a bunch of data and a reward function. The classic example is AlphaGo, where DeepMind gave the model the rules of Go with the reward function of winning the game, and then let the model figure everything else on its own. This famously ended up working better than other more human-guided techniques.

LLMs to date, however, have relied on reinforcement learning with human feedback; humans are in the loop to help guide the model, navigate difficult choices where rewards aren’t obvious, etc. RLHF was the key innovation in transforming GPT-3 into ChatGPT, with well-formed paragraphs, answers that were concise and didn’t trail off into gibberish, etc.

R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions.

What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments”:

A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment”. This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The “aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.

This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself!

Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. Back to the introduction:

However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.

This sounds a lot like what OpenAI did for o1: DeepSeek started the model out with a bunch of examples of chain-of-thought thinking so it could learn the proper format for human consumption, and then did the reinforcement learning to enhance its reasoning, along with a number of editing and refinement steps; the output is a model that appears to be very competitive with o1.

Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime.

So are we close to AGI?

It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.

But isn’t R1 now in the lead?

I don’t think so; this has been overstated. R1 is competitive with o1, although there do seem to be some holes in its capability that point towards some amount of distillation from o1-Pro. OpenAI, meanwhile, has demonstrated o3, a far more powerful reasoning model. DeepSeek is absolutely the leader in efficiency, but that is different than being the leader overall.

So why is everyone freaking out?

I think there are multiple factors. First, there is the shock that China has caught up to the leading U.S. labs, despite the widespread assumption that China isn’t as good at software as the U.S.. This is probably the biggest thing I missed in my surprise over the reaction. The reality is that China has an extremely proficient software industry generally, and a very good track record in AI model building specifically.

Second is the low training cost for V3, and DeepSeek’s low inference costs. This part was a big surprise for me as well, to be sure, but the numbers are plausible. This, by extension, probably has everyone nervous about Nvidia, which obviously has a big impact on the market.

Third is the fact that DeepSeek pulled this off despite the chip ban. Again, though, while there are big loopholes in the chip ban, it seems likely to me that DeepSeek accomplished this with legal chips.

I own Nvidia! Am I screwed?

There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:
- CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
- Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.

That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that:

Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of. https://t.co/omEcOPhdIz
— Satya Nadella (@satyanadella) January 27, 2025

Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!

Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.

In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.

So what about the chip ban?

The easiest argument to make is that the importance of the chip ban has only been accentuated given the U.S.’s rapidly evaporating lead in software. Software and knowhow can’t be embargoed — we’ve had these debates and realizations before — but chips are physical objects and the U.S. is justified in keeping them away from China.

At the same time, there should be some humility about the fact that earlier iterations of the chip ban seem to have directly led to DeepSeek’s innovations. Those innovations, moreover, would extend to not just smuggled Nvidia chips or nerfed ones like the H800, but to Huawei’s Ascend chips as well. Indeed, you can very much make the case that the primary outcome of the chip ban is today’s crash in Nvidia’s stock price.

What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. Yes, this may help in the short term — again, DeepSeek would be even more effective with more computing — but in the long run it simply sews the seeds for competition in an industry — chips and semiconductor equipment — over which the U.S. has a dominant position.

Like AI models?

AI models are a great example. I mentioned above I would get to OpenAI’s greatest crime, which I consider to be the 2023 Biden Executive Order on AI. I wrote in Attenuating Innovation:

The point is this: if you accept the premise that regulation locks in incumbents, then it sure is notable that the early AI winners seem the most invested in generating alarm in Washington, D.C. about AI. This despite the fact that their concern is apparently not sufficiently high to, you know, stop their work. No, they are the responsible ones, the ones who care enough to call for regulation; all the better if concerns about imagined harms kneecap inevitable competitors.

That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. These alleged dangers were the impetus for OpenAI becoming closed back in 2019 with the release of GPT-2:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code⁠(opens in a new window). We are not releasing the dataset, training code, or GPT-2 model weights…We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.

We also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems. If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly.

The arrogance in this statement is only surpassed by the futility: here we are six years later, and the entire world has access to the weights of a dramatically superior model. OpenAI’s gambit for control — enforced by the U.S. government — has utterly failed. In the meantime, how much innovation has been foregone by virtue of leading edge models not having open weights? More generally, how much time and energy has been spent lobbying for a government-enforced moat that DeepSeek just obliterated, that would have been better devoted to actual innovation?

So you’re not worried about AI doom scenarios?

I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us.

Wait, why is China open-sourcing their model?

Well DeepSeek is, to be clear; CEO Liang Wenfeng said in a must-read interview that open source is key to attracting talent:

In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.

The interviewer asked if this would change:

DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.

We will not change to closed source. We believe having a strong technical ecosystem first is more important.

This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what DeepSeek has delivered, which itself is resonant of how China has come to dominate other industries. This is also contrary to how most U.S. companies think about differentiation, which is through having differentiated products that can sustain larger margins.

So is OpenAI screwed?

Not necessarily. ChatGPT made OpenAI the accidental consumer tech company, which is to say a product company; there is a route to building a sustainable consumer business on commoditizable models through some combination of subscriptions and advertisements. And, of course, there is the bet on winning the race to AI take-off.

Anthropic, on the other hand, is probably the biggest loser of the weekend. DeepSeek made it to number one in the App Store, simply highlighting how Claude, in contrast, hasn’t gotten any traction outside of San Francisco. The API business is doing better, but API businesses in general are the most susceptible to the commoditization trends that seem inevitable (and do note that OpenAI and Anthropic’s inference costs look a lot higher than DeepSeek because they were capturing a lot of margin; that’s going away).

So this is all pretty depressing, then?

Actually, no. I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevons Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.

Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game; The End of the Beginning was right.

China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.

That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.’s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that has nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.

I wrote a follow-up to this Article in this Daily Update.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Stratechery Plus + Asianometry

Monday, January 20, 2025

Listen to Podcast
Listen to this post:

Log in to listen

Back in 2022, I rebranded a Stratechery subscription as Stratechery Plus, a bundle of content that would enhance the value of your subscription; today the bundle includes:
- The Stratechery Update
- The Stratechery Interview series
- The Sharp Tech with Ben Thompson podcast, hosted by Andrew Sharp
- The Sharp China with Bill Bishop podcast, hosted by Andrew Sharp
- The Dithering podcast, with John Gruber and myself
- The Greatest of All Talk podcast, with Ben Golliver and Andrew Sharp
Today I am excited to announce a new addition: the Asianometry newsletter and podcast, by Jon Yu.

Asianometry is one of the best tech YouTube channels in existence, with over 768,000 subscribers. Jon produces in-depth videos explaining every aspect of technology, with a particular expertise in semiconductors. To give you an idea of Jon’s depth, he has made 31 videos about TSMC alone. His semiconductor course includes 30 videos covering everything from designing chips to how ASML builds EUV machines to Moore’s Law. His video on the end of Dennard’s Law is a particular standout:

Jon is about more than semiconductors though: he’s made videos about other tech topics like The Tragedy of Compaq, and non-tech topics like Japanese Whisky and Taiwan convenience stores. In short, Jon is an intensely curious person who does his research, and we are blessed that he puts in the work to share what he learns.

I am blessed most of all, however. Over the last year Jon has been making Stratechery Articles into video essays and cutting clips for Sharp Tech; he did a great job with one of my favorite articles of 2024:

And now, starting today, Stratechery Plus subscribers can get exclusive access to Asianometry’s content in newsletter and podcast form. The Asianometry YouTube Channel will remain free and Jon’s primary focus, but from now on all of his content will be simultaneously released as a transcript and podcast. Stratechery Plus subscribers can head over to the new Asianometry Passport site to subscribe to his emails, or to add the podcast feed to your favorite podcast player.

And, of course, subscribe to Jon’s YouTube channel, along with Stratechery Plus.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
AI’s Uneven Arrival

Monday, January 13, 2025

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

Box’s route to its IPO, ten years ago this month, was a difficult one: the company first released an S-1 in March 2014, and potential investors were aghast at the company’s mounting losses; the company took a down round and, eight months later, released an updated S-1 that created the template for money-losing SaaS businesses to explain themselves going forward:

Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

We experience a range of profitability with our customers depending in large part upon what stage of the customer phase they are in. We generally incur higher sales and marketing expenses for new customers and existing customers who are still in an expanding stage…For typical customers who are renewing their Box subscriptions, our associated sales and marketing expenses are significantly less than the revenue we recognize from those customers.

This was the justification for those top-line losses; I wrote in an Update at the time:

That right there is the SaaS business model: you’re not so much selling a product as you are creating annuities with a lifetime value that far exceeds whatever you paid to acquire them. Moreover, if the model is working — and in retrospect, we know it has for that 2010 cohort — then I as an investor absolutely would want Box to spend even more on customer acquisition, which, of course, Box has done. The 2011 cohort is bigger than 2010, the 2012 cohort bigger than 2011, etc. This, though, has meant that the aggregate losses have been very large, which looks bad, but, counterintuitively, is a good thing.

Numerous SaaS businesses would include some version of this cohort chart in their S-1’s, each of them manifestations of what I’ve long considered tech’s sixth giant: Apple, Amazon, Google, Meta, Microsoft, and what I call “Silicon Valley Inc.”, the pipeline of SaaS companies that styled themselves as world-changing startups but which were, in fact, color-by-numbers business model disruptions enabled by cloud computing and a dramatically expanded venture capital ecosystem that increasingly accepted relatively low returns in exchange for massively reduced risk profiles.

This is not, to be clear, an Article about Box, or any one SaaS company in particular; it is, though, an exploration of how an era that opened — at least in terms of IPOs — a decade ago is both doomed in the long run and yet might have more staying power than you expect.

Digital Advertising Differences

John Wanamaker, a department store founder and advertising pioneer, famously said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” That, though, was the late 19th century; the last two decades have seen the rise of digital advertising, the defining characteristic of which is knowledge about whom is being targeted, and whether or not they converted. The specifics of how this works have shifted over time, particularly with the crackdown on cookies and Apple’s App Tracking Transparency initiative, which made digital advertising less deterministic and more probabilistic; the probabilities at play, though, are a lot closer to 100% than they are to a flip-of-a-coin.

What is interesting is that this advertising approach hasn’t always worked for everything, most notably some of the most advertising-centric businesses in the world. Back in 2016 Procter & Gamble announced they were scaling back targeted Facebook ads; from the Wall Street Journal:

Procter & Gamble Co., the biggest advertising spender in the world, will move away from ads on Facebook that target specific consumers, concluding that the practice has limited effectiveness. Facebook Inc. has spent years developing its ability to zero in on consumers based on demographics, shopping habits and life milestones. P&G, the maker of myriad household goods including Tide and Pampers, initially jumped at the opportunity to market directly to subsets of shoppers, from teenage shavers to first-time homeowners.

Marc Pritchard, P&G’s chief marketing officer, said the company has realized it took the strategy too far. “We targeted too much, and we went too narrow,” he said in an interview, “and now we’re looking at: What is the best way to get the most reach but also the right precision?”…On a broader scale, P&G’s shift highlights the limits of such targeting for big brands, one of the cornerstones of Facebook’s ad business. The social network is able to command higher prices for its targeted marketing; the narrower the targeting the more expensive the ad.

P&G is a consumer packaged goods (CPG) company, and what mattered most for CPG companies was shelf space. Consumers would become aware of a brand through advertising, motivated to buy through things like coupons, and the payoff came when they were in the store and chose one of the CPG brands off the shelf; of course CPG companies paid for that shelf space, particularly coveted end-caps that made it more likely consumers saw the brands they were familiar with through advertising. There were returns to scale, as well: manufacturing is a big one; the more advertising you bought the less paid per ad; more importantly, the more shelf space you had the more room you had to expand your product lines, and crowd out competitors.

The advertising component specifically was usually outsourced to ad agencies, for reasons I explained in a 2017 Article:

Few advertisers actually buy ads, at least not directly. Way back in 1841, Volney B. Palmer, the first ad agency, was opened in Philadelphia. In place of having to take out ads with multiple newspapers, an advertiser could deal directly with the ad agency, vastly simplifying the process of taking out ads. The ad agency, meanwhile, could leverage its relationships with all of those newspapers by serving multiple clients:

It’s a classic example of how being in the middle can be a really great business opportunity, and the utility of ad agencies only increased as more advertising formats like radio and TV became available. Particularly in the case of TV, advertisers not only needed to place ads, but also needed a lot more help in making ads; ad agencies invested in ad-making expertise because they could scale said expertise across multiple clients.

At the same time, the advertisers were rapidly expanding their geographic footprints, particularly after the Second World War; naturally, ad agencies increased their footprint at the same time, often through M&A. The overarching business opportunity, though, was the same: give advertisers a one-stop shop for all of their advertising needs.

The Internet provided two big challenges to this approach. First, the primary conversion point changed from the cash register to the check-out page; the products that benefited the most were either purely digital (like apps) or — at least in the earlier days of e-commerce — spur-of-the-moment purchases without major time pressure. CPG products didn’t really fall in either bucket.

Second, these types of purchases aligned well with the organizing principle of digital advertising, which is the individual consumer. What Facebook — now Meta — is better at than anyone in the world is understanding consumers not as members of a cohort or demographic group but rather as individuals, and serving them ads that are uniquely interesting to them.

Notice, though, that nothing in the traditional advertiser model was concerned with the individual: brands are created for cohorts or demographic groups, because they need to be manufactured at scale; then, ad agencies would advertise at scale — making money along the way — and the purchase would be consummated in physical stores at some later point in time, constrained (and propelled by) limited shelf space. Thus P&G’s pullback — and thus the opportunity for an entirely new wave of companies that were built around digital advertising and its deep personalization from the get-go.

This bifurcation manifested itself most starkly in the summer of 2020, when large advertisers boycotted Facebook over the company’s refusal to censor then-President Trump; Facebook was barely affected. I wrote in Apple and Facebook:

This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers…

This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.

In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.

It has been nine years since that P&G pullback I referenced above, and one of the big changes that P&G has made in that timeframe is to take most of their ad-buying in-house. This was in the long run inevitable, as the Internet ate everything, including traditional TV viewing, and as the rise of Aggregation platforms meant that the number of places you needed to actually buy an ad to reach everyone decreased even as potential reach increased. Those platforms also got better: programmatic platforms achieve P&G’s goal of mass reach in a way that actually increased efficiency instead of over-spending to over-target; programmatic advertising also covers more platforms now, including TV.

o3 Ammunition

Late last month OpenAI announced its o3 model, validating its initial o1 release and the returns that come from test-time scaling; I explained in an Update when o1 was released:

There has been a lot of talk about the importance of scale in terms of LLM performance; for auto-regressive LLMs that has meant training scale. The more parameters you have, the larger the infrastructure you need, but the payoff is greater accuracy because the model is incorporating that much more information. That certainly still applies to o1, as the chart on the left indicates.

It’s the chart on the right that is the bigger deal: o1 gets more accurate the more time it spends on compute at inference time. This makes sense intuitively given what I laid out above: the more time spent on compute the more time o1 can spend spinning up multiple chains-of-thought, checking its answers, and iterating through different approaches and solutions.

It’s also a big departure from how we have thought about LLMs to date: one of the “benefits” of auto-regressive LLMs is that you’re only generating one answer in a serial manner. Yes, you can get that answer faster with beefier hardware, but that is another way of saying that the pay-off from more inference compute is getting the answer faster; the accuracy of the answer is a function of the underlying model, not the amount of compute brought to bear. Another way to think about it is that the more important question for inference is how much memory is available; the more memory there is, the larger the model, and therefore, the greater amount of accuracy.

In this o1 represents a new inference paradigm: yes, you need memory to load the model, but given the same model, answer quality does improve with more compute. The way that I am thinking about it is that more compute is kind of like having more branch predictors, which mean more registers, which require more cache, etc.; this isn’t a perfect analogy, but it is interesting to think about inference compute as being a sort of dynamic memory architecture for LLMs that lets them explore latent space for the best answer.

o3 significantly outperforms o1, and the extent of that outperformance is dictated by how much computing is allocated to the problem at hand. One of the most stark examples was o3‘s performance on the ARC prize, a visual puzzle test that is designed to be easy for humans but hard for LLMs:

OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.

This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3…

Despite the significant cost per task, these numbers aren’t just the result of applying brute force compute to the benchmark. OpenAI’s new o3 model represents a significant leap forward in AI’s ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.

Of course, such generality comes at a steep cost, and wouldn’t quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.

I don’t believe that o3 and inference-time scaling will displace traditional LLMs, which will remain both faster and cheaper; indeed, they will likely make traditional LLMs better through their ability to generate synthetic data for further scaling of pre-training. There remains a large product overhang for traditional LLMS — the technology is far more capable than the products that have been developed to date — but even the current dominant product, chatbots, are better experienced with a traditional LLM.

That very use case, however, gets at traditional LLM limitations: because they lack the ability to think and decide and verify they are best thought of as a tool for humans to leverage. Indeed, while conventional wisdom about these models is that it allows anyone to generate good enough writing and research, the biggest returns come to those with the most expertise and agency, who are able to use their own knowledge and judgment to reap efficiency gains while managing hallucinations and mistakes.

What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford:

So I like this idea of barrels and ammunition. Most companies, once they get into hiring mode…just hire a lot of people, you expect that when you add more people your horsepower or your velocity of shipping things is going to increase. Turns out it doesn’t work that way. When you hire more engineers you don’t get that much more done. You actually sometimes get less done. You hire more designers, you definitely don’t get more done, you get less done in a day.

The reason why is because most great people actually are ammunition. But what you need in your company are barrels. And you can only shoot through the number of unique barrels that you have. That’s how the velocity of your company improves is adding barrels. Then you stock them with ammunition, then you can do a lot. You go from one barrel company, which is mostly how you start, to a two barrel company, suddenly you get twice as many things done in a day, per week, per quarter. If you go to three barrels, great. If you go to four barrels, awesome. Barrels are very difficult to find. But when you have them, give them lots of equity. Promote them, take them to dinner every week, because they are virtually irreplaceable. They are also very culturally specific. So a barrel at one company may not be a barrel at another company because one of the ways, the definition of a barrel is, they can take an idea from conception and take it all the way to shipping and bring people with them. And that’s a very cultural skill set.

The promise of AI generally, and inference-time scaling models in particular, is that they can be ammunition; in this context, the costs — even marginal ones — will in the long run be immaterial compared to the costs of people, particularly once you factor in non-salary costs like coordination and motivation.

The Uneven AI Arrival

There is a long way to go to realize this vision technically, although the arrival of first o1 and then o3 signal that the future is arriving more quickly than most people realize. OpenAI CEO Sam Altman wrote on his blog:

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

I grant the technical optimism; my definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place). The reason for the extended digression on advertising, however, is to explain why I’m skeptical about AI “materially chang[ing] the output of companies”, at least in 2025.

In this analogy CPG companies stand in for the corporate world generally. What will become clear once AI ammunition becomes available is just how unsuited most companies are for high precision agents, just as P&G was unsuited for highly-targeted advertising. No matter how well-documented a company’s processes might be, it will become clear that there are massive gaps that were filled through experience and tacit knowledge by the human ammunition.

SaaS companies, meanwhile, are the ad agencies. The ad agencies had value by providing a means for advertisers to scale to all sorts of media across geographies; SaaS companies have value by giving human ammunition software to do their job. Ad agencies, meanwhile, made money by charging a commission on the advertising they bought; SaaS companies make money by charging a per-seat licensing fee. Look again at that S-1 excerpt I opened with:

Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

The positive return on investment comes from retaining and increasing seat licenses; those seats, however, are proxies for actually getting work done, just as advertising was just a proxy for actually selling something. Part of what made direct response digital advertising fundamentally different is that it was tied to actually making a sale, as opposed to lifting brand awareness, which is a proxy for the ultimate goal of increasing revenue. To that end, AI — particularly AI’s like o3 that scale with compute — will be priced according to the value of the task they complete; the amount that companies will pay for inference time compute will be a function of how much the task is worth. This is analogous to digital ads that are priced by conversion, not CPM.

The companies that actually leveraged that capability, however, were not, at least for a good long while, the companies that dominated the old advertising paradigm. Facebook became a juggernaut by creating its own customer base, not by being the advertising platform of choice for companies like P&G; meanwhile, TV and the economy built on it stayed relevant far longer than anyone expected. And, by the time TV truly collapsed, both the old guard and digital advertising had evolved to the point that they could work together.

If something similar plays out with AI agents, then the most important AI customers will primarily be new companies, and probably a lot of them will be long tail type entities that take the barrel and ammunition analogy to its logical extreme. Traditional companies, meanwhile, will struggle to incorporate AI (outside of whole-scale job replacement a la the mainframe); the true AI takeover of enterprises that retain real world differentiation will likely take years.

None of this is to diminish what is coming with AI; rather, as the saying goes, the future may arrive but be unevenly distributed, and, contrary to what you might think, the larger and more successful a company is the less they may benefit in the short term. Everything that makes a company work today is about harnessing people — and the entire SaaS ecosystem is predicated on monetizing this reality; the entities that will truly leverage AI, however, will not be the ones that replace them, but start without them.

Get notified about new Articles

Sign Up

Please verify your email address to proceed.
The 2024 Stratechery Year in Review

Thursday, December 19, 2024
Stratechery, incredibly enough, has been my full-time job for over a decade; this is the 12th year-in-review. Here are the previous editions:

2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

It has long been a useful cliché to say that covering tech is easy, because something is always happening; now that that something is AI, that is more true than ever. Nearly every Article on Stratechery this year was about AI in some way or another, and that is likely to be true for years to come.

This year Stratechery published 29 free Articles, 109 subscriber Updates, and 40 Interviews. Today, as per tradition, I summarize the most popular and most important posts of the year.

The Five Most-Viewed Articles

The five most-viewed articles on Stratechery according to page views:
1. Intel Honesty — The best way to both save Intel and have leading edge manufacturing in the U.S. is to split the company, and for the U.S. government to pick up the bill via purchase guarantees.
2. Gemini and Google’s Culture — The Google Gemini fiasco shows that the biggest challenge for Google in AI is not business model but rather company culture; change is needed from the top down.
3. Intel’s Humbling — Intel under Pat Gelsinger is reaping the disaster that came from a lack of investment and execution a decade ago; the company, though, appears to be headed in the right direction, as evidenced by its execution and recent deal with UMC.
4. The Apple Vision Pro — The Apple Vision Pro is a disappointment for productivity, in part because of choices made to deliver a remarkable entertainment experience. Plus, the future of AR/VR for Apple and Meta.
5. MKBHDs For Everything — Marques Brownlee has tremendous power because he can go direct to consumers; that is possible in media, and AI will make it possible everywhere.
AI and the Future

Looking ahead to how AI will change everything.
- Enterprise Philosophy and The First Wave of AI — The first wave of successful AI implementations will probably look more like the first wave of computing, which was dominated by large-scale enterprise installations that eliminated jobs. Consumer will come later. YouTube
- The Gen AI Bridge to the Future — Generative AI is the bridge to the next computing paradigm of wearables, just like the Internet bridged the gap from PCs to smartphones.
- The New York Times’ AI Opportunity — The New York Times is suing OpenAI, but it is the New York Times that stands to benefit the most from large language models, thanks to its transformation to being an Internet entity. YouTube
- AI Integration and Modularization — Breaking down the Big Tech AI landscape through the lens of integration and modularization. YouTube
- Aggregator’s AI Risk — A single AI can never make everyone happy, which is fundamentally threatening to the Aggregator business model; the solution is personalized AI. YouTube
Government and Regulation

An emerging theme this year — which I expect to continue alongside AI — is the rising importance of non-economic factors in terms of technological development, even as regulators ramp up pressure on the giants of the Aggregator era.
- A Chance to Build — Silicon Valley has always been deeply integrated with Asia; Trump’s attempt to change trade could hurt Silicon Valley more than expected, and also present opportunities to build something new. YouTube
- Intel’s Death and Potential Revival — Intel died when mobile cost it its software differentiation; if the U.S. wants a domestic foundry, then it ought to leverage the need for AI chips to make an independent Intel foundry viable. YouTube
- The E.U. Goes Too Far — Recent E.U. regulatory decisions cross the line from market correction to property theft; if the E.U. continues down this path they are likely to see fewer new features and no new companies. YouTube
- Friendly Google and Enemy Remedies — The DOJ brought the right kind of case against an Aggregator, which stagnates by being too nice; the goal is for companies to act like they actually have enemies. YouTube
- United States v. Apple — Apple is being sued by the DOJ, but most of the complaints aren’t about the App Store. I think, though, Apple’s approach to the App Store is what led to this case.
Big Tech

The biggest tech companies, as usual, provided the most consistent lens on how the world is changing.
- Meta’s AI Abundance — Meta is well-positioned to the biggest beneficiary of AI and the largest company in the world. YouTube
- Gemini 1.5 and Google’s Nature — Google Cloud Next 2024 was Google’s most impressive assertion yet that it has the AI scale advantage and is determined to use it. YouTube (See also: Integration and Android)
- Elon Dreams and Bitter Lessons — SpaceX’s triumph is downstream of a dream and getting the cost structure necessary to make it happen; Elon Musk is trying the same approach for Tesla self-driving cars. YouTube
- Apple Intelligence is Right On Time — Apple is expected to announce a range of AI features at WWDC; the company is well placed to benefit from AI: they are not too late, but right on time. YouTube (See also: WWDC, Apple Intelligence, Apple Aggregates AI)
- Nvidia Waves and Moats — Nvidia’s GTC was an absolute spectacle; it was also a different kind of keynote than before ChatGPT, which is related to Nvidia’s need to dig a new kind of software moat. YouTube
Other Articles this year included: The Apple Vision Pro’s Missing Apps | Sora, Groq, and Virtual Reality | Meta and Open | Meta and Reasonable Doubt | The Great Flattening | Windows Returns | Crashes and Competition | Boomer Apple

Stratechery Interviews

Thursdays are for Stratechery Interview — in podcast and transcript form — with public company executives, founders and private company executives, and other analysts.

Public Company Executive Interviews:

Arm CEO Rene Haas | Netflix co-CEO Greg Peters | Zoom CEO Eric Yuan | dLocal Founder Sebastian Kanovich and CEO Pedro Arnt | Google Cloud CEO Thomas Kurian | Walmart CEO Doug McMillon | Microsoft CEO Satya Nadella and CTO Kevin Scott | AMD CEO Lisa Su | Google SVP Rick Osterloh | Zillow CEO Jeremy Wacksman | Meta CTO Andrew Bosworth | Salesforce CEO Marc Benioff | Synopsys CEO Sassine Ghazi

Startup/Private Company Executive Interviews:

Rescale CEO Joris Poort | Databricks CEO Ali Ghodsi | Terraform Industries CEO Casey Handmer | Scale AI CEO Alex Wang | Canva CEO Melanie Perkins

Analysts:

Om Malik on tech history | Joanna Stern on the Apple Vision Pro | Eric Seufert on digital advertising in February and October | Matthew Ball on VR and gaming | Daniel Gross and Nat Friedman on AI in February and June | Hugo Barra on AR and VR in March and October | Benedict Evans on regulation and AI | Michael Morton on e-commerce | Matthew Belloni on Hollywood and streaming | Marques Brownlee (MKBHD) on YouTube | Ben Bajarin on Apple and Intel | Craig Moffett on Apple and telecoms | Gregory Allen in October on the U.S. defense industry, and December on the China chip ban | Timothy B. Lee on AI and self-driving cars | Dylan Patel and Doug O’Laughlin on the semiconductor industry | Byrne Hobart on innovation | Tae Kim on The Nvidia Way

The Year in Stratechery Updates

Some of my favorite Stratechery Updates:
- January 29: Apple and the DMA, Apple and “Or”, A Reluctant Apple Apologist (See also: European Commission Charges Apple, Apple Delays New Features for E.U.)
- February 19: Xbox’s Announcement; Microsoft’s Messy Middle; Apple in Europe, Continued
- March 12: Walmart Earnings, Walmart Connect and Closing the Loop, Walmart Acquires Vizio
- April 1: MLS on Vision Pro, The Vision Pro’s Missing Content, The Vision Pro’s DRI
- May 7: TSMC Earnings, TSMC’s Pricing Mistake, Intel v. TSMC
- May 20: Netflix and the NFL, Netflix Internalizes Ads, Comcast’s Bundle (See also: Netflix’s Boxing Event, Customer Acquisition vs. Churn Mitigation, Accounting for Events)
- June 18: FTC Sues Adobe, The Legal Question, The Value of Doing Right
- June 24: Perplexity and Robots.txt, Perplexity’s Defense, Google and Competition
- July 17: Tech For Trump, Breaking the Deal, From Inertness to Interest
- August 26: Telegram CEO Arrested, Telegram’s Non-Encrypted Advantage, Telegram Complexities
- September 16: OpenAI’s New Model, How o1 Works, Scaling Inference
- September 30: More on Orion, Where Vision Pro Went Wrong, Apple’s Response and Meta’s Motivation
- October 1: Taking Waymo, Uber and Waymo (See also: GM Kills Cruise, Fleets Versus Autonomy, Robotaxi Outlook)
- October 7: U.S. Communications Hacked, The History of CALEA, Encryption and Backdoors
- October 22: Stripe Acquires Bridge, Stablecoins, Platform of Platforms
- October 28: Trump on Rogan, The Voters Decide, The Podcast Election
- November 6: President Trump, Take Two; Big Tech, Little Tech, Chips, and Hardware; Elon Musk’s Triumph
- November 13: Shopify Earnings, Software Self-Awareness, Rebels and the Arms Dealer
- December 4: AWS re:Invent, Nova and Model Choice, AI as Commodity
- December 17: Google Announces Veo 2, The Empire Strikes Back, Free ChatGPT Search
I am so grateful to the subscribers that make it possible for me to do this as a job. I wish all of you a Merry Christmas and Happy New Year, and I’m looking forward to a great 2025!
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Intel’s Death and Potential Revival

Monday, December 9, 2024

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

In 1980 IBM, under pressure from its customers to provide computers for personal use, not just mainframes, set out to create the IBM PC; given the project’s low internal priority but high external demand they decided to outsource two critical components: Microsoft would provide the DOS operating system, which would run on the Intel 8088 processor.

Those two deals would shape the computing industry for the following 27 years. Given that the point of the personal computer was to run applications, the operating system that provided the APIs for those applications would have unassailable lock-in, leading to Microsoft’s dominance with first DOS and then Windows, which was backwards compatible.

The 8088 processor, meanwhile, was a low-cost variant of the 8086 processor; up to that point most new processors came with their own instruction set, but the 8088 and 8086 used the same instruction set, which became the foundation of Intel processors going forward. That meant that the 286 and 386 processors that followed were backwards compatible with the 8088 in the IBM PC; in other words, Intel, too, had lock-in, and not just with MS-DOS: while the majority of applications leveraged operating system-provided APIs, it was much more common at that point in computing history to leverage lower level APIs, including calling on the processor instruction set directly. This was particularly pertinent for things like drivers, which powered all of the various peripherals a PC required.

Intel’s CISC Moat

The 8086 processor that undergirded the x86 instruction set was introduced in 1978, when memory was limited, expensive, and slow; that’s why the x86 used Complex Instruction Set Computing (CISC), which combined multiple steps into a single instruction. The price of this complexity was the necessity of microcode, dedicated logic that translated CISC instructions into its component steps so they could actually be executed.

The same year that IBM cut those deals, however, was the year that David Patterson and a team at Berkeley started work on what became known as the RISC-1 processor, which took an entirely different approach: Reduced Instruction Set Computing (RISC) replaced the microcode-focused transistors with registers, i.e. memory that operated at the same speed as the processor itself, and filled them with simple instructions that corresponded directly to transistor functionality. This would, in theory, allow for faster computing with the same number of transistors, but memory access was still expensive and more likely to be invoked given the greater number of instructions necessary to do anything, and programs and compilers needed to be completely reworked to take advantage of the new approach.

Intel, more than anyone, realized that this would be manageable in the long run. “Moore’s Law”, the observation that the number of transistors in an integrated circuit doubles every two years, was coined by Gordon Moore, their co-founder and second CEO; the implication for instruction sets was that increased software complexity and slow hardware would be solved through ever faster chips, and those chips could get even faster if they were simplified RISC designs. That is why most of the company wanted, in the mid-1980s, to abandon x86 and its CISC instruction set for RISC.

There was one man, however, who interpreted the Moore’s Law implications differently, and that was Pat Gelsinger; he led the development of the 486 processor and was adamant that Intel stick with CISC, as he explained in an oral history at the Computer Museum:

Gelsinger: We had a mutual friend that found out that we had Mr. CISC working as a student of Mr. RISC, the commercial versus the university, the old versus the new, teacher versus student. We had public debates of John and Pat. And Bear Stearns had a big investor conference, a couple thousand people in the audience, and there was a public debate of RISC versus CISC at the time, of John versus Pat.

And I start laying out the dogma of instruction set compatibility, architectural coherence, how software always becomes the determinant of any computer architecture being developed. “Software follows instruction set. Instruction set follows Moore’s Law. And unless you’re 10X better and John, you’re not 10X better, you’re lucky if you’re 2X better, Moore’s Law will just swamp you over time because architectural compatibility becomes so dominant in the adoption of any new computer platform.” And this is when x86– there was no server x86. There’s no clouds at this point in time. And John and I got into this big public debate and it was so popular.

Brock: So the claim wasn’t that the CISC could beat the RISC or keep up to what exactly but the other overwhelming factors would make it the winner in the end.

Gelsinger: Exactly. The argument was based on three fundamental tenets. One is that the gap was dramatically overstated and it wasn’t an asymptotic gap. There was a complexity gap associated with it but you’re going to make it leap up and that the CISC architecture could continue to benefit from Moore’s Law. And that Moore’s Law would continue to carry that forward based on simple ones, number of transistors to attack the CISC problems, frequency of transistors. You’ve got performance for free. And if that gap was in a reasonable frame, you know, if it’s less than 2x, hey, in a Moore’s Law’s term that’s less than a process generation. And the process generation is two years long. So how long does it take you to develop new software, porting operating systems, creating optimized compilers? If it’s less than five years you’re doing extraordinary in building new software systems. So if that gap is less than five years I’m going to crush you John because you cannot possibly establish a new architectural framework for which I’m not going to beat you just based on Moore’s Law, and the natural aggregation of the computer architecture benefits that I can bring in a compatible machine. And, of course, I was right and he was wrong.

Intel would, over time, create more RISC-like processors, switching out microcode for micro-ops processing units that dynamically generated RISC-like instructions from CISC-based software that maintained backwards compatibility; Gelsinger was right that no one wanted to take the time to rewrite all of the software that assumed an x86 instruction set when Intel processors were getting faster all of the time, and far out-pacing RISC alternatives thanks to Intel’s manufacturing prowess.

That, though, turned out to be Intel’s soft underbelly; while the late Intel CEO Paul Ottelini claimed that he turned down the iPhone processor contract because of price, Tony Fadell, who led the creation of the iPod and iPhone hardware, told me in a Stratechery Interview that the real issue was Intel’s obsession with performance and neglect of efficiency.

The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life and why do we do what we do? David Tupman was on the team, the iPod team with me, he would always say every nanocoulomb was sacred, and we would go after that and say, “Okay, where’s that next coulomb? Where are we going to go after it?” And so when you take that microscopic view of what you’re building, you look at the world very differently.

For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down.

I was always going, “Look, do you see how the iPhone was created? It started with the guts of the iPod, and we grew up from very little computing, and very little space, and we grew into an iPhone, and added more layers to it.” But we weren’t taking something big and shrinking it down. We were starting from the bottom up and yeah, we were taking Mac OS and shrinking it down, but we were significantly shrinking it down. Most people don’t want to take those real hard cuts to everything because they’re too worried about compatibility. Whereas if you’re just taking pieces and not worrying about compatibility, it’s a very different way of thinking about how building and designing products happens.

This is why I was so specific with that “27 year” reference above; Apple’s 2007 launch of the iPhone marked the end of both Microsoft and Intel’s dominance, and for the same reason. The shift to efficiency as the top priority meant that you needed to rewrite everything; that, by extension, meant that Microsoft’s API and Intel’s x86 instruction set were no longer moats but millstones. On the operating side Apple stripped macOS to the bones and rebuilt it for efficiency; that became iOS, and the new foundation for apps; on the processor side Apple used processors based on the ARM instruction set, which was RISC from the beginning. Yes, that meant a lot of things had to be rewritten, but here the rewriting wasn’t happening by choice, but by necessity.

This leads, as I remarked to Fadell in that interview, to a rather sympathetic interpretation of Microsoft and Intel’s failure to capture the mobile market; neither company had a chance. They were too invested in the dominant paradigm at the time, and thus unable to start from scratch; by the time they realized their mistake, Apple, Android, and ARM had already won.

Intel’s Missed Opportunity

It was their respective response to missing mobile that saved Microsoft, and doomed Intel. For the first seven years of the iPhone both companies refused to accept their failure, and tried desperately to leverage what they viewed as their unassailable advantages: Microsoft declined to put its productivity applications on iOS or Android, trying to get customers to adopt Windows Mobile, while Intel tried to bring its manufacturing prowess to bear to build processors that were sufficiently efficient while still being x86 compatible.

It was in 2014 that their paths diverged: Microsoft named Satya Nadella its new CEO, and his first public decision was to launch Office on iPad. This was a declaration of purpose: Microsoft would no longer be defined by Windows, and would instead focus on Azure and the cloud; no, that didn’t have the software lock-in of Windows — particularly since a key Azure decision was shifting from Windows servers to Linux — but it was a business that met Microsoft’s customers where they were, and gave the company a route to participating in the massive business opportunities enabled by mobile (given that most apps are in fact cloud services), and eventually, AI.

The equivalent choice for Intel would have been to start manufacturing ARM chips for 3rd parties, i.e. becoming a foundry instead of an integrated device manufacturer (IDM); I wrote that they should do exactly that in 2013:

It is manufacturing capability, on the other hand, that is increasingly rare, and thus, increasingly valuable. In fact, today there are only four major foundries: Samsung, GlobalFoundries, Taiwan Semiconductor Manufacturing Company, and Intel. Only four companies have the capacity to build the chips that are in every mobile device today, and in everything tomorrow.

Massive demand, limited suppliers, huge barriers to entry. It’s a good time to be a manufacturing company. It is, potentially, a good time to be Intel. After all, of those four companies, the most advanced, by a significant margin, is Intel. The only problem is that Intel sees themselves as a design company, come hell or high water.

Making chips for other companies would have required an overhaul of Intel’s culture and processes for the sake of what was then a significantly lower margin opportunity; Intel wasn’t interested, and proceeded to make a ton of money building server chips for the cloud.

In fact, though, the company was already fatally wounded. Mobile meant volume, and as the cost of new processes skyrocketed, the need for volume to leverage those costs skyrocketed as well. It was TSMC that met the moment, with Apple’s assistance: the iPhone maker would buy out the first year of every new process advancement, giving TSMC the confidence to invest, and eventually surpass Intel. That, in turn, benefited AMD, Intel’s long-time rival which now fabbed its chips at TSMC, which not only had better processor designs but, for the first time, had a better process, leading to huge gains in the data center. All of that low-level work on ARM, meanwhile, helped make ARM in PCS and in the datacenter viable, putting further pressure on Intel’s core markets.

AI was the final blow: not only did Intel not have a competitive product, it also did not have a foundry through which it could have benefitted from the exploding demand for AI chips; making matters worse is the fact that data center spending on GPUs is coming at the expense of traditional server chips, Intel’s core market.

Intel’s Death

The fundamental flaw with Pat Gelsinger’s 2020 return to Intel and his IDM 2.0 plan is that it was a decade too late. Gelsinger’s plan was to become a foundry, with Intel as its first-best customer. The former was the way to participate in mobile and AI and gain the volume necessary to push technology forward, which Intel has always done better than anyone else (EUV was the exception to the rule that Intel invents and introduces every new advance in processor technology); the latter was the way to fund the foundry and give it guaranteed volume.

Again, this is exactly what Intel should have done a decade ago, while TSMC was still in their rear-view mirror in terms of processing technology, and when its products were still dominant in PCs and the data center. By the time Gelsinger came on board, though, it was already too late: Intel’s process was behind, its product market share was threatened on all of the fronts noted above, and high-performance ARM processors had been built by TSMC for years (which meant a big advantage in terms of pre-existing IP, design software, etc.). Intel brought nothing to the table as a foundry other than being a potential second source to TSMC, which, to make matters worse, has dramatically increased its investment in leading edge nodes to absorb that skyrocketing demand. Intel’s products, meanwhile, are either non-competitive (because they are made by Intel) or not-very-profitable (because they are made by TSMC), which means that Intel is simply running out of cash.

Given this, you can make the case that Gelsinger was never the right person for the job; shortly after he took over I wrote in Intel Problems that the company needed to be split up, but he told me in a 2022 Stratechery Interview that he — and the board — weren’t interested in that:

So last week, AMD briefly passed Intel in market value, and I think Nvidia did a while ago, and neither of these companies build their own chips. It’s kind of like an inverse of the Jerry Sanders quote about “Real men have fabs!” When you were contemplating your strategy for Intel as you came back, how much consideration was there about going the same path, becoming a fabless company and leaning into your design?

PG: Let me give maybe three different answers to that question, and these become more intellectual as we go along. The first one was I wrote a strategy document for the board of directors and I said if you want to split the company in two, then you should hire a PE kind of guy to go do that, not me. My strategy is what’s become IDM 2.0 and I described it. So if you’re hiring me, that’s the strategy and 100% of the board asked me to be the CEO and supported the strategy I laid out, of which this is one of the pieces. So the first thing was all of that discussion happened before I took the job as the CEO, so there was no debate, no contemplation, et cetera, this is it.

Fast forward to last week, and the Intel board — which is a long-running disaster — is no longer on board, firing Gelsinger in the process. And, to be honest, I noted a couple of months ago that Gelsinger’s plan probably wasn’t going to work without a split and a massive cash infusion from the U.S. government, far in excess of the CHIPS Act.

That, though, doesn’t let the board off the hook: not only are they abandoning a plan they supported, their ideas for moving Intel forward are fundamentally wrong. Chairman Frank Yeary, who has inexplicable been promoted despite being present for the entirety of the Intel disaster, said in Intel’s press release about Gelsinger’s departure:

While we have made significant progress in regaining manufacturing competitiveness and building the capabilities to be a world-class foundry, we know that we have much more work to do at the company and are committed to restoring investor confidence. As a board, we know first and foremost that we must put our product group at the center of all we do. Our customers demand this from us, and we will deliver for them. With MJ’s permanent elevation to CEO of Intel Products along with her interim co-CEO role of Intel, we are ensuring the product group will have the resources needed to deliver for our customers. Ultimately, returning to process leadership is central to product leadership, and we will remain focused on that mission while driving greater efficiency and improved profitability.

Intel’s products are irrelevant to the future; that’s the fundamental foundry problem. If x86 still mattered, then Intel would be making enough money to fund its foundry efforts. Moreover, prospective Intel customers are wary that Intel — as it always has — will favor itself at the expense of its customers; the board is saying that is exactly what they want to do.

In fact, it is Intel’s manufacturing that must be saved. This is a business that yes, needs billions upon billions of dollars in funding, but it not only has a market as a TSMC competitor, but also the potential to lead that market in the long run. Moreover, Intel foundry existing is critical to national security: currently the U.S. is completely dependent on TSMC and Taiwan and all of the geopolitical risk that entails. That means it will fall on the U.S. government to figure out a solution.

Saving Intel

Last month, in A Chance to Build, I explained how tech has modularized itself over the decades, with hardware — including semiconductor fabrication — largely being outsourced to Asia, while software is developed in the U.S. The economic forces undergirding this modularization, including the path dependency from the past sixty years, will be difficult to overcome, even with tariffs.

Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all. This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computers or phones; hardware is software’s most essential complement.

The inverse may be the key to American manufacturing: software as hardware’s grantor of viability through integration. This is what Tesla did: the company is deeply integrated from software down through components, and builds vehicles in California (of course it has an even greater advantage with its China factory).

This is also what made Intel profitable for so long: the company’s lock-in was predicated on software, which allowed for massive profit margins that funded all of that innovation and leading edge processes in America, even as every other part of the hardware value chain went abroad. And, by extension, the reason why a product focus is a dead end for the company is because nothing is preserving x86 other than the status quo.

It follows, then, that if the U.S. wants to make Intel viable, it ideally will not just give out money, but also a point of integration. To that end, consider this report from Reuters:

A U.S. congressional commission on Tuesday proposed a Manhattan Project-style initiative to fund the development of AI systems that will be as smart or smarter than humans, amid intensifying competition with China over advanced technologies. The bipartisan U.S.-China Economic and Security Review Commission stressed that public-private partnerships are key in advancing artificial general intelligence, but did not give any specific investment strategies as it released its annual report.

To quote the report’s recommendation directly:
The Commission recommends:
1. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would surpass the sharpest human minds at every task. Among the specific actions the Commission
  recommends for Congress:
- Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and
- Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.
The problem with this proposal is that spending the money via “public-private partnerships” will simply lock-in the current paradigm; I explained in A Chance to Build:
Software runs on hardware, and here Asia dominates. Consider AI:
- Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.
- Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.
- An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.
- Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.
- AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.
The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.
Given this, if the U.S. is serious about AGI, then the true Manhattan Project — doing something that will be very expensive and not necessarily economically rational — is filling in the middle of the sandwich. Saving Intel, in other words.

Start with the fact that we know that leading AI model companies are interested in dedicated chips; OpenAI is reportedly working on its own chip with Broadcom, after flirting with the idea of building its own fabs. The latter isn’t viable for a software company in a world where TSMC exists, but it is for the U.S. government if it’s serious about domestic capabilities continuing to exist. The same story applies to Google, Amazon, Microsoft, and Meta.

To that end, the U.S. government could fund an independent Intel foundry — spin out the product group along with the clueless board to Broadcom or Qualcomm or private equity — and provide price support for model builders to design and buy their chips there. Or, if the U.S. government wanted to build the whole sandwich, it could directly fund model builders — including one developed in-house — and dictate that they not just use but deeply integrate with Intel-fabricated integrated chips (it’s not out of the question that a fully integrated stack might actually be the optimal route to AGI).

It would, to be sure, be a challenge to keep such an effort out of the clutches of the federal bureaucracy and the dysfunction that has befallen the U.S. defense industry. It would be essential to give this effort the level of independence and freedom that the original Manhattan Project had, with compensation packages to match; perhaps this would be a better use of Elon Musk’s time — himself another model builder — than DOGE?

This could certainly be bearish for Nvidia, at least in the long run. Nvidia is a top priority for TSMC, and almost certainly has no interest in going anywhere else; that’s also why it would be self-defeating for a U.S. “Manhattan Project” to simply fund the status quo, which is Nvidia chips manufactured in Taiwan. Competition is ok, though; the point isn’t to kill TSMC, but to stand up a truly domestic alternative (i.e. not just a fraction of non-leading edge capacity in Arizona). Nvidia for its part deserves all of the success it is enjoying, but government-funded alternatives would ultimately manifest for consumers and businesses as lower prices for intelligence.

This is all pretty fuzzy, to be clear. What does exist, however, is a need — domestically sourced and controlled AI, which must include chips — and a company, in Intel, that is best placed to meet that need, even as it needs a rescue. Intel lost its reason to exist, even as the U.S. needs it to exist more than ever; AI is the potential integration point to solve both problems at the same time.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
The Gen AI Bridge to the Future

Monday, December 2, 2024

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

In the beginning was the mainframe.

In 1945 the U.S. government built ENIAC, an acronym for Electronic Numerical Integrator and Computer, to do ballistics trajectory calculations for the military; World War 2 was nearing its conclusion, however, so ENIAC’s first major job was to do calculations that undergirded the development of the hydrogen bomb. Six years later, J. Presper Eckert and John Mauchly, who led the development of ENIAC, launched UNIVAC, the Universal Automatic Computer, for broader government and commercial applications. Early use cases included calculating the U.S. census and assisting with calculation-intensive back office operations like payroll and bookkeeping.

These were hardly computers as we know them today, but rather calculation machines that took in reams of data (via punch cards or magnetic tape) and returned results according to hardwired calculation routines; the “operating system” were the humans actually inputting the data, scheduling jobs, and giving explicit hardware instructions. Originally this instruction also happened via punch cards and magnetic tape, but later models added consoles to both provide status and also allow for register-level control; these consoles evolved into terminals, but the first versions of these terminals, like the one that was available for the original version of the IBM System/360, were used to initiate batch programs.

Any recounting of computing history usually focuses on the bottom two levels of that stack — the device and the input method — because they tend to evolve in parallel. For example, here are the three major computing paradigms to date:

These aren’t perfect delineations; the first PCs had terminal-like interfaces, and pre-iPhone smartphones used windows-icons-menus-pointer (WIMP) interaction paradigms, with built-in keyboards and styluses. In the grand scheme of things, though, the distinction is pretty clear, and, by extension, it’s pretty easy to predict what is next:

Wearables is an admittedly broad category that includes everything from smart watches to earpieces to glasses, but I think it is a cogent one: the defining characteristic of all of these devices, particularly in contrast to the three previous paradigms, is the absence of a direct mechanical input mechanism; that leaves speech, gestures, and at the most primitive level, thought.

Fortunately there is good progress being made on on all of these fronts: the quality and speed of voice interaction has increased dramatically over the last few years; camera-intermediated gestures on the Oculus and Vision Pro work well, and Meta’s Orion wristband uses electromyography (EMG) to interpret gestures without any cameras at all. Neuralink is even more incredible: an implant in the brain captures thoughts directly and translates them into actions.

These paradigms, however, do not exist in isolation. First off, mainframes still exist, and I’m typing this Article on a PC, even if you may consume it on a phone or via a wearable like a set of AirPods. What stands out to me, however, is the top level of the initial stack I illustrated above: the application layer on one paradigm provides the bridge to the next one. This, more than anything, is why generative AI is a big deal in terms of realizing the future.

Bridges to the Future

I mentioned the seminal IBM System/360 above, which was actually a family of mainframes; the first version was the Model 30, which, as I noted, did batch processing: you would load up a job using punch cards or magnetic tape and execute the job, just like you did with the ENIAC or UNIVAC. Two years later, however, IBM came out with the Model 67 and the TSS/360 operating system: now you could actually interact with a program via the terminal. This represented a new paradigm at the application layer:

It is, admittedly, a bit confusing to refer to this new paradigm at the application layer as Applications, but it is the most accurate nomenclature; what differentiated an application from a program was that while the latter was a pre-determined set of actions that ran as a job, the former could be interacted with and amended while running.

That new application layer, meanwhile, opened up the possibility for an entirely new industry to create those applications, which could run across the entire System/360 family of mainframes. New applications, in turn, drove demand for more convenient access to the computer itself. This ultimately led to the development of the personal computer (PC), which was an individual application platform:

Initial PCs operated from a terminal-like text interface, but truly exploded in popularity with the roll-out of the WIMP interface, which was invented by Xerox PARC, commercialized by Apple, and disseminated by Microsoft. The key point in terms of this Article, however, is that Applications came first: the concept created the bridge from mainframes to PCs.

PCs underwent their own transformation over their two decades of dominance, first in terms of speed and then in form factor, with the rise of laptops. The key innovation at the application layer, however, was the Internet:

The Internet differed from traditional applications by virtue of being available on every PC, facilitating communication between PCs, and by being agnostic to the actual device it was accessed on. This, in turn, provided the bridge to the next device paradigm, the smartphone, with its touch interface:

I’ve long noted that Microsoft did not miss mobile; their error was in trying to extend the PC paradigm to mobile. This not only led to a focus on the wrong interface (WIMP via stylus and built-in keyboard), but also an assumption that the application layer, which Windows dominated, would be a key differentiator.

Apple, famously, figured out the right interface for the smartphone, and built an entirely new operating system around touch. Yes, iOS is based on macOS at a low level, but it was a completely new operating system in a way that Windows Mobile was not; at the same time, because iOS was based on macOS, it was far more capable than smartphone-only alternatives like BlackBerry OS or PalmOS. The key aspect of this capability was that the iPhone could access the real Internet.

What is funny is that Steve Jobs’ initial announcement of this capability was met with much less enthusiasm than the iPhone’s other two selling points of being a widescreen iPod and a mobile phone:

Today, we’re introducing three revolutionary products of this class. The first one is a wide-screen iPod with touch controls. The second is a revolutionary mobile phone. The third is a breakthrough Internet communications device…These are not three separate devices, this is one device, and we are calling iPhone. Today, Apple is going to reinvent the phone.

I’ve watched that segment hundreds of times, and the audience’s confusion at “Internet communications device” cracks me up every time; in fact, that was the key factor in reinventing the phone, because it was the bridge that linked a device in your pocket to the world of computing writ large, via the Internet. Jobs listed the initial Internet features later on in the keynote:

Now let’s take a look at an Internet communications device, part of the iPhone. What’s this all about? Well, we’ve got some real breakthroughs here: to start off with, we’ve got rich HTML email on iPhone. The first time, really rich email on a mobile device, and it works with any IMAP or POP email service. You’ve got your favorite mail service, it’ll likely work with it, and it’s rich text email. We wanted the best web browser on our phone, not a baby browser or a WAP browser, a real browser, and we picked the best one in the world: Safari, and we have Safari running on iPhone. It is the first fully-usable HTML browser on a phone. Third, we have Google Maps. Maps, satellite images, directions, and traffic. This is unbelievable, wait until you see it. We have Widgets, starting off with weather and stocks. And, this communicates with the Internet over Edge and Wifi, and iPhone automatically detects Wifi and switches seamless to it. You don’t have to manage the network, it just does the right thing.

Notice that the Internet is not just the web; in fact, while Apple wouldn’t launch a 3rd-party App Store until the following year, it did, with the initial iPhone, launch the app paradigm which, in contrast to standalone Applications from the PC days, assumed and depended on the Internet for functionality.

The Generative AI Bridge

We already established above that the next paradigm is wearables. Wearables today, however, are very much in the pre-iPhone era. On one hand you have standalone platforms like Oculus, with its own operating system, app store, etc.; the best analogy is a video game console, which is technically a computer, but is not commonly thought of as such given its singular purpose. On the other hand, you have devices like smart watches, AirPods, and smart glasses, which are extensions of the phone; the analogy here is the iPod, which provided great functionality but was not a general computing device.

Now Apple might dispute this characterization in terms of the Vision Pro specifically, which not only has a PC-class M2 chip, along with its own visionOS operating system and apps, but can also run iPad apps. In truth, though, this makes the Vision Pro akin to Microsoft Mobile: yes, it is a capable device, but it is stuck in the wrong paradigm, i.e. the previous one that Apple dominated. Or, to put it another way, I don’t view “apps” as the bridge between mobile and wearables; apps are just the way we access the Internet on mobile, and the Internet was the old bridge, not the new one.

To think about the next bridge, it’s useful to jump forward to the future and work backwards; that jump forward is a lot easier to envision, for me anyways, thanks to my experience with Meta’s Orion AR glasses:

They’re real and they’re spectacular. pic.twitter.com/hIJZuS6taY
— Ben Thompson (@benthompson) September 25, 2024

The most impressive aspect of Orion is the resolution, which is perfect. I’m referring, of course, to the fact that you can see the real world with your actual eyes; I wrote in an Update:

The reality is that the only truly satisfactory answer to passthrough is to not need it at all. Orion has perfect field-of-view and infinite resolution because you’re looking at the real world; it’s also dramatically smaller and lighter. Moreover, this perfect fidelity actually gives more degrees of freedom in terms of delivering the AR experience: no matter how high resolution the display is, it will still be lower resolution than the world around it; I tried a version of Orion with double the resolution and, honestly, it wasn’t that different, because the magic was in having augmented reality at all, not in its resolution. I suspect the same thing applies to field of view: 70 degrees seemed massive on Orion, even though that is less than the Vision Pro’s 100 degrees, because the edge of the field of view for Orion was reality, whereas the edge for the Vision Pro is, well, nothing.

The current iteration of Orion’s software did have an Oculus-adjacent launch screen, and an Instagram prototype; it was, in my estimation, the least impressive part of the demonstration, for the same reason that I think the Vision Pro’s iPad app compatibility is a long-term limitation: it was simply taking the mobile paradigm and putting it in front of my face, and honestly, I’d rather just use my phone.

One of the most impressive demos, meanwhile, had the least UI: it was just a notification. I glanced up, saw that someone was calling me, touched my fingers together to “click” on the accept button that accompanied the notification, and was instantly talking to someone in another room while still being able to interact freely with the world around me. Of course phone calls aren’t some sort of new invention; what made the demo memorable was that I only got the UI I needed when I needed it.

This, I think, is the future: the exact UI you need — and nothing more — exactly when you need it, and at no time else. This specific example was, of course, programmed deterministically, but you can imagine a future where the glasses are smart enough to generate UI on the fly based on the context of not just your request, but also your broader surroundings and state.

This is where you start to see the bridge: what I am describing is an application of generative AI, specifically to on-demand UI interfaces. It’s also an application that you can imagine being useful on devices that already exist. A watch application, for example, would be much more usable if, instead of trying to navigate by touch like a small iPhone, it could simply show you the exact choices you need to make at a specific moment in time. Again, we get hints of that today through deterministic programming, but the ultimate application will be on-demand via generative AI.

Of course generative AI is also usable on the phone, and that is where I expect most of the exploration around generative UI to happen for now. We certainly see plenty of experimentation and rapid development of generative AI broadly, just as we saw plenty of experimentation and rapid development of the Internet on PCs. That experimentation and development was not just usable on the PC, but it also created the bridge to the smartphone; I think that generative AI is doing the same thing in terms of building a bridge to wearables that are not accessories, but general purpose computers in their own right:

This is exciting in the long-term, and bullish for Meta (and I’ve previously noted how generative AI is the key to the metaverse, as well). It’s also, clearly, well into the future. It also helps explain why Orion isn’t shipping today: it’s not just that the hardware isn’t yet in a production state, particularly from a cost perspective, but the entire application layer needs to be built out, first on today’s devices, enabling the same sort of smooth transition that the iPhone had. No, Apple didn’t have the App Store, but the iPhone was extraordinarily useful on day one, because it was an Internet Communicator.

Survey Complete

Ten years ago I wrote a post entitled The State of Consumer Technology in 2014, where I explored some of the same paradigm-shifts I detailed in this Article. This was the illustration I made then:

There is a perspective in which 2024 has been a bit of a letdown in terms of generative AI; there hasn’t been a GPT-5 level model released; the more meaningful developments have been in the vastly increased efficiency and reduction in size of GPT-4 level models, and the inference-scaling possibilities of o1. Concerns are rising that we may have hit a data wall, and that there won’t be more intelligent AI without new fundamental breakthroughs in AI architecture.

I, however, feel quite optimistic. To me the story of 2024 has been filling in those question marks in that illustration. The product overhang from the generative AI capabilities we have today are absolutely massive: there are so many new things to be built, and completely new application layer paradigms are at the top of the list. That, by extension, is the bridge that will unlock entirely new paradigms of computing. The road to the future needs to be built; it’s exciting to have the sense that the surveying is now complete.

Get notified about new Articles

Sign Up

Please verify your email address to proceed.
A Chance to Build

Monday, November 18, 2024

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

Semiconductors are so integral to the history of Silicon Valley that they give the region its name, and, more importantly, its culture: chips require huge amounts of up-front investment, but they have, relative to most other manufactured goods, minimal marginal costs; this economic reality helped drive the development of the venture capital model, which provided unencumbered startup capital to companies who could earn theoretically unlimited returns at scale. This model worked even better with software, which was perfectly replicable.

That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.

In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.

Moreover, that exportation was done with the help of the U.S. government. In 1962 the U.S. Congress passed the Tariff Classification Act of 1962, which amended the Tariff Act of 1930 to implement new tariff schedules developed by the United States Tariff Commission; those new schedules were implemented in 1963, and included Tariff Item 807.00, which read:
Articles assembled abroad in whole or in part of products of the United States which were exported for such purpose and which have not been advanced in value or improved in condition abroad by any means other than by the act of assembly:
- A duty upon the full value of the imported article, less the cost or value of such products of the United States.
The average Hong Kong worker assembled around 24 chips per hour; that meant their value add to the overall cost of the chip was just over $0.01, which means that tariffs were practically non-existent. This was by design! Chris Miller writes in Chip War:

South Vietnam would send shock waves across Asia. Foreign policy strategists perceived ethnic Chinese communities all over the region as ripe for Communist penetration, ready to fall to Communist influence like a cascade of dominoes. Malaysia’s ethnic Chinese minority formed the backbone of that country’s Communist Party, for example. Singapore’s restive working class was majority ethnic Chinese. Beijing was searching for allies—and probing for U.S. weakness…

By the end of the 1970s, American semiconductor firms employed tens of thousands of workers internationally, mostly in Korea, Taiwan, and Southeast Asia. A new international alliance emerged between Texan and Californian chipmakers, Asian autocrats, and the often ethnic-Chinese workers who staffed many of Asia’s semiconductor assembly facilities.

Semiconductors recast the economies and politics of America’s friends in the region. Cities that had been breeding grounds for political radicalism were transformed by diligent assembly line workers, happy to trade unemployment or subsistence farming for better paying jobs in factories. By the early 1980s, the electronics industry accounted for 7 percent of Singapore’s GNP and a quarter of its manufacturing jobs. Of electronics production, 60 percent was semiconductor devices, and much of the rest was goods that couldn’t work without semiconductors. In Hong Kong, electronics manufacturing created more jobs than any sector except textiles. In Malaysia, semiconductor production boomed in Penang, Kuala Lumpur, and Melaka, with new manufacturing jobs providing work for many of the 15 percent of Malaysian workers who had left farms and moved to cities between 1970 and 1980. Such vast migrations are often politically destabilizing, but Malaysia kept its unemployment rate low with many relatively well-paid electronics assembly jobs.

This is a situation that, at least in theory, should not persist indefinitely; increased demand for Asian labor should push up both the cost of that labor and also the currency of the countries where that labor is in demand, making those countries less competitive over time. The former has certainly happened: Taiwan, where I live, is one of the richest countries in the world. And yet chip-making is centered here to a greater extent than ever before, in seeming defiance of theory.

The Post-War Order

The problem with theory is usually reality; in 1944 the U.S. led the way in establishing what came to be known as the Bretton Woods System, which pegged exchange rates to the U.S. dollar. This was a boon to the devastated economies of Europe and first Japan, and then later the rest of Asia: an influx of U.S. capital rebuilt their manufacturing capability by leveraging their relatively lower cost of labor. This did raise labor costs, but thanks to the currency peg, the U.S. currency couldn’t depreciate in response, which in turn made U.S. debt much more attractive than it might have been otherwise for those manufacturing profits, which in turn helped to fund both the Vietnam War and the 1960’s expansion in social programs.

Ultimately this pressure on the U.S. dollar was too intense, leading to the dissolution of Bretton Woods in 1971 and a depreciation of the U.S. dollar relative to gold; the overall structure of the world economy, however, was set: trade was denominated in dollars — i.e. the U.S. dollar was the world’s reserve currency — which kept its value higher than economic theory would dictate. This made U.S. debt attractive, which funded deficit spending; that spending fueled the U.S. consumer market, which bought imported manufactured goods; the profits of those goods were reinvested into U.S. debt, which helped pay for the military that kept the entire system secure.

The biggest winner was the U.S. consumer. Money was cycled into the economy through an impressive and seemingly impossible array of service sector jobs and quickly spent on cheap imports. Those cheap imports were getting better too: to take chips as an example, increased automation decreased costs, and the development of software made those chips much more valuable. This applied not just to the chips directly, but everything built with and enabled by them; the actual building of electronics happened in Asia, as countries rapidly ascended the technological ladder, but the software was the province of Silicon Valley.

This is where it matters that software is truly a zero marginal cost product. R&D costs for tech companies have skyrocketed for decades, but that increase has been more than offset by the value created by software and captured by scale. Moreover, those increasing costs manifested as the highest salaries in the world for talent, the true scarce resource in technology. This meant that the most capable technologists made their way to the U.S. generally and Silicon Valley specifically to earn the most money, and, if they had the opportunity and drive, to create new companies as software ate the world.

Still, software runs on hardware, and here Asia dominates. Consider AI:
- Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.
- Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.
- An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.
- Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.
- AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.
The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.

All of this has happened in a world where the trend in trade was towards more openness and fewer barriers, at least in terms of facilitating this cycle. One key development was the Information Technology Agreement (ITA), a 1996 World Trade Organization agreement, which completely eliminated tariffs on IT products, including chips. The Internet, meanwhile, meant there were no barriers to the spread of software, with the notable exception of China’s Great Firewall; the end result is that while U.S. software ran on Asian hardware, it was U.S. companies that ultimately reaped the largest returns from scale.

Cars and China

Perhaps the defining characteristic of the Clinton-Bush-Obama era was the assumption that this system would continue forever; it certainly was plausible in terms of products. Consider cars: for a hundred years cars were marvelous mechanical devices with tens of thousands of finely engineered parts predicated on harnessing the power of combustion to transport people and products wherever they wished to go. Electric cars, however, are something else entirely: yes, there is still a mechanical aspect, as there must be to achieve movement in physical space, but the entire process is predicated on converting electricity to mechanical movement, and governed entirely by chips and software.

This products looks a lot more like a computer on wheels than the mechanical cars we are familiar with; it follows, then, that the ultimate structure of the car industry might end up looking something like the structure of AI: the U.S. dominates the zero marginal cost components like design and the user experience, while Asia — China specifically, given the scale and labor requirements — dominates manufacturing.

This has been Waymo’s plan; while current self-driving cars on the road are retrofitted Jaguar I-Pace sedans, the 6th-generation Waymo vehicle is manufactured by Chinese car company Geely. This car, called Zeekr, is purpose-built for transportation, but ideally it would be custom-built for Waymo’s purposes: you can imagine future fleets of self-driving cars with designs for different use cases, from individual taxis to groups to working offices to sleeper cars. The analogy here would be to personal computing devices: you can get a computer in rack form, a desktop, a laptop, or a phone; the chips and software are by-and-large the same.

Cars aren’t there yet, but they’re not far off; the relative simplicity of electric cars makes it more viable for established car manufactures to basically offer customizable platforms: that is how a company like Xiaomi can develop its own SUV. The consumer electronics company, most well-known for its smartphones, contracts with Beijing Automotive Group for manufacturing, while doing the design and technological integration. Huawei has a similar arrangement with Seres, Changan, and Chery Automobile.

Tesla, it should be noted, is a bit different: the company is extremely vertically integrated, building not only its own hardware and software but also a significant number of the components that go into its cars; this isn’t a surprise, given Tesla’s pioneering role in electrical cars (pioneers are usually vertically integrated), but it does mean that Tesla faces a significant long-term threat from the more modular Chinese approach. Given this, it’s not a surprise that Elon Musk is staking Tesla’s long-term future on autonomy, in effect doubling down on the company’s integration.

Regardless, what is notable — and ought to be a wake-up call to Silicon Valley — is the fact that the Xiaomi and Huawei cars run Chinese software. One of the under-appreciated benefits of the Great Firewall is that it created an attractive market for software developers that was not reachable from Silicon Valley; this means that while a good number of Chinese software engineers are in the U.S., there is a lot of talent in China as well, and that talent is being applied to products that can leverage Chinese manufacturing to win markets Silicon Valley thought would be theirs to sandwich forever.

Waymo’s Zeekr car, meanwhile, has a problem; from Bloomberg in May:

President Joe Biden will quadruple tariffs on Chinese electric vehicles and sharply increase levies for other key industries this week, unveiling the measures at a White House event framed as a defense of American workers, people familiar with the matter said. Biden will hike or add tariffs in the targeted sectors after nearly two years of review. The total tariff on Chinese EVs will rise to 102.5% from 27.5%, the people said, speaking on condition of anonymity ahead of the announcement. Others will double or triple in targeted industries, though the scope remains unclear.

Given this, it’s no surprise that Waymo had a new announcement in October: Waymo was partnering with Hyundai for new self-driving cars that are manufactured in America. This car is a retro-fitted IONIQ 5, which is built as a passenger car, unlike the transportation-focused Zeekr; in other words, Google is taking a step back in functionality because of government policy.

Trump’s Tariffs

Waymo may not be the only company taking a step back: newly (re-)elected President Trump’s signature economic proposal is tariffs. From the 2024 GOP Platform:

Our Trade deficit in goods has grown to over $1 Trillion Dollars a year. Republicans will support baseline Tariffs on Foreign-made goods, pass the Trump Reciprocal Trade Act, and respond to unfair Trading practices. As Tariffs on Foreign Producers go up, Taxes on American Workers, Families, and Businesses can come down.

Foreign Policy published an explainer over the weekend entitled Everything You Wanted to Know About Trump’s Tariffs But Were Afraid to Ask:

U.S. President-elect Donald Trump, the self-proclaimed “tariff man,” campaigned on the promise of ratcheting import duties as high as 60 percent against all goods from China, and perhaps 20 percent on everything from everywhere else. And he might be able to do it—including by drawing on little-remembered authorities from the 1930 Smoot-Hawley Tariff Act, the previous nadir of U.S. trade policy.

Trump’s tariff plans are cheered by most of his economic advisers, who see them as a useful tool to rebalance an import-dependent U.S. economy. Most economists fear the inflationary impacts of sharply higher taxes on U.S. consumers and businesses, as well as the deliberate drag on economic growth that comes from making everything more expensive. Other countries are mostly confused, uncertain whether Trump’s tariff talk is just bluster to secure favorable trade deals for the United States, or if they’ll be more narrowly targeted or smaller than promised. Big economies, such as China and the European Union, are preparing their reprisals, just in case.

What makes it hard for economists to model and other countries to understand is that nobody, even in Trump world, seems to know exactly why tariffs are on the table.

Sounds like the explainer needs an explainer! Or maybe the author was afraid to ask, but I digress.

The story to me seems straightforward: the big loser in the post World War 2 reconfiguration I described above was the American worker; yes, we have all of those service jobs, but what we have much less of are traditional manufacturing jobs. What happened to chips in the 1960s happened to manufacturing of all kinds over the ensuing decades. Countries like China started with labor cost advantages, and, over time, moved up learning curves that the U.S. dismantled; that is how you end up with this from Walter Isaacson in his Steve Jobs biography about a dinner with then-President Obama:

When Jobs’s turn came, he stressed the need for more trained engineers and suggested that any foreign students who earned an engineering degree in the United States should be given a visa to stay in the country. Obama said that could be done only in the context of the “Dream Act,” which would allow illegal aliens who arrived as minors and finished high school to become legal residents — something that the Republicans had blocked. Jobs found this an annoying example of how politics can lead to paralysis. “The president is very smart, but he kept explaining to us reasons why things can’t get done,” he recalled. “It infuriates me.”

Jobs went on to urge that a way be found to train more American engineers. Apple had 700,000 factory workers employed in China, he said, and that was because it needed 30,000 engineers on-site to support those workers. “You can’t find that many in America to hire,” he said. These factory engineers did not have to be PhDs or geniuses; they simply needed to have basic engineering skills for manufacturing. Tech schools, community colleges, or trade schools could train them. “If you could educate these engineers,” he said, “we could move more manufacturing plants here.” The argument made a strong impression on the president. Two or three times over the next month he told his aides, “We’ve got to find ways to train those 30,000 manufacturing engineers that Jobs told us about.”

I think that Jobs had cause-and-effect backwards: there are not 30,000 manufacturing engineers in the U.S. because there are not 30,000 manufacturing engineering jobs to be filled. That is because the structure of the world economy — choices made starting with Bretton Woods in particular, and cemented by the removal of tariffs over time — made them nonviable. Say what you will about the viability or wisdom of Trump’s tariffs, the motivation — to undo eighty years of structural changes — is pretty straightforward!

The other thing about Jobs’ answer is how ultimately self-serving it was. This is not to say it was wrong: Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all.

This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computer or phones; hardware is software’s most essential complement. Joel Spolsky, in his canonical post about commoditizing your complements, wrote:

Every product in the marketplace has substitutes and complements…A complement is a product that you usually buy together with another product. Gas and cars are complements. Computer hardware is a classic complement of computer operating systems. And babysitters are a complement of dinner at fine restaurants. In a small town, when the local five star restaurant has a two-for-one Valentine’s day special, the local babysitters double their rates. (Actually, the nine-year-olds get roped into early service.)…

Demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods…If you can run your software anywhere, that makes hardware more of a commodity. As hardware prices go down, the market expands, driving more demand for software (and leaving customers with extra money to spend on software which can now be more expensive.)…

Spolsky’s post was written in 2002, well before the rise of smartphones and, more pertinently, ad-supported software that now permeates our world. That, though, only makes his point: hardware has become so cheap and so widespread that software can be astronomically valuable even as it’s free to end users. Which, by the way, is that other part of the boon to consumers I noted above.

It’s Time to Build

A mistake many analysts make, particularly Americans, is viewing the U.S. as the only agent of change in the world; events like the Ukraine or Gaza wars are reminders that we aren’t in control of world events, and nothing would make that lesson clearer than a Chinese move on Taiwan. At the same time, we are living in a system the U.S. built, so it’s worth thinking seriously about the implications of a President with a mandate to blow the whole thing up.

The first point is perhaps the most comforting: there is a good chance that Trump makes a lot of noise and accomplishes little, at least in terms of trade and — pertinently for this blog — its impact on tech. That is arguably what happened his first term: there were China tariffs (that Apple was excluded from), and a ban on chip shipments to Huawei (that massively buoyed Apple), and TSMC committed to building N-1 fabs in Arizona. From a big picture perspective, though, today Silicon Valley is more powerful and richer than ever, and the hardware dominance of Asia generally and China specifically is larger than ever.

The reality is that uprooting the current system would take years of upheaval and political and economic pain; those who argue it is impossible are wrong, but believing it’s highly improbable is very legitimate. Indeed, it may be the case that systems can only truly be remade in the presence of an exogenous destructive force, which is to say war.

The second point, though, is that there does seem to be both more risk and opportunity than many people think. Tariffs do change things; by virtue of my location I talk to plenty of people on the ground who have been busy for years moving factories, not from China to the U.S., but to places like Thailand or Vietnam. That doesn’t really affect the trade deficit, but things that matter don’t always show up in aggregate numbers.

To that end, the risk for tech is that tariffs specifically and Trump’s approach to trade generally do more damage to the golden goose than expected. More expensive hardware ultimately constricts the market for software; tariffs in violation of agreements like the ITA give the opening for other countries to impose levies of their own, and U.S. tech companies could very well be popular targets.

The opportunity, meanwhile, is to build new kinds of manufacturing companies that can seize on a tariff-granted price advantage. These sorts of companies, perhaps to Trump’s frustration, are not likely to be employment powerhouses; the real opportunity is taking advantage of robotics and AI to make physical goods into zero marginal cost items in their own right (outside of commodities; this is what has happened to chips: assembly and testing are fully automated, which makes a U.S. buildout viable).

To take a perhaps unintuitive example, consider Amazon: the company is deeply investing in automation for its fulfillment centers, which decreases the marginal cost of picking, enabling the company to sell more items like “Everyday Essentials” that don’t cost much but are purchased frequently; it’s also no surprise that Amazon is invested in drones and self-driving car startups, to take the same costs out of delivery. It’s a long journey, to be sure, but it’s a destination that is increasingly possible to imagine.

The analogy to manufacturing is that a combination of automation and modular platforms, defined by software, is both necessary and perhaps possible to build for the first time in a long time. It won’t be an easy road — see Tesla’s struggles with automation — but there is, at a minimum, a market in national security, and perhaps arenas like self-driving cars, to build something scalable with assumptions around modularity and software-defined functionality at the core.

Again, I don’t know if this will work: the symbiotic relationship between Silicon Valley software makers and Asian hardware manufactures is one of the most potent economic combinations in history, and it may be impossible to compete with; if it’s ever going to work, though, the best opportunity — absent a war, God forbid — is probably right now.
Get notified about new Articles

Sign Up

Please verify your email address to proceed.
Meta’s AI Abundance

Tuesday, October 29, 2024

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

Stratechery has benefited from a Meta cheat code since its inception: wait for investors to panic, the stock to drop, and write an Article that says Meta is fine — better than fine even — and sit back and watch the take be proven correct. Notable examples include 2013’s post-IPO swoon, the 2018 Stories swoon, and most recently, the 2022 TikTok/Reels swoon (if you want a bonus, I was optimistic during the 2020 COVID swoon too):

Perhaps with that in mind I wrote a cautionary note earlier this year about Meta and Reasonable Doubt: while investors were concerned about the sustainability of Meta’s spending on AI, I was worried about increasing ad prices and the lack of new formats after Stories and then Reels; the long-term future, particularly in terms of the metaverse, was just as much of a mystery as always.

Six months on and I feel the exact opposite: it seems increasingly clear to me that Meta is in fact the most well-placed company to take advantage of generative AI. Yes, investors are currently optimistic, so this isn’t my usual contrarian take — unless you consider the fact that I think Meta has the potential to be the most valuable company in the world. As evidence of that fact I’m writing today, a day before Meta’s earnings: I don’t care if they’re up or down, because the future is that bright.

Short-term: Generative AI and Digital Advertising

Generative AI is clearly a big deal, but the biggest winner so far is Nvidia, in one of the clearest examples of the picks-and-shovels ethos on which San Francisco was founded: the most money to be made is in furnishing the Forty-niners (yes, I am using a linear scale instead of the log scale above for effect):

The big question weighing on investors’ minds is when all of this GPU spend will generate a return. Tesla and xAI are dreaming of autonomy; Azure, Google Cloud, AWS, and Oracle want to undergird the next generation of AI-powered startups; and Microsoft and Salesforce are bickering about how to sell AI into the enterprise. All of these bets are somewhat speculative; what would be the most valuable in the short-term, at least in terms of justifying the massive ongoing capital expenditure necessary to create the largest models, is a guaranteed means to translate those costs into bottom-line benefit.

Meta is the best positioned to do that in the short-term, thanks to the obvious benefit of applying generative AI to advertising. Meta is already highly reliant on machine learning for its ads product: right now an advertiser can buy ads based on desired outcomes, whether that be an app install or a purchase, and leave everything else up to Meta; Meta will work across their vast troves of data in a way that is only possible using machine learning-derived algorithms to find the right targets for an ad and deliver exactly the business goals requested.

What makes this process somewhat galling for the advertiser is that the more of a black box Meta’s advertising becomes the better the advertising results, even as Meta makes more margin. The big reason for the former is the App Tracking Transparency (ATT)-driven shift in digital advertising to probabilistic models in place of deterministic ones.

It used to be that ads shown to users could be perfectly matched to conversions made in 3rd-party apps or on 3rd-party websites; Meta was better at this than everyone else, thanks to its scale and fully built-out ad infrastructure (including SDKs in apps and pixels on websites), but this was a type of targeting and conversion tracking that could be done in some fashion by other entities, whether that be smaller social networks like Snap, ad networks, or even sophisticated marketers themselves.

ATT severed that link, and Meta’s business suffered greatly; from a February post-earnings Update:

It is worth noting that while the digital ecosystem did not disappear, it absolutely did shrink: [MoffettNathanson’s Michael] Nathanson, in his Meta earnings note, explained what he was driving at with that question:

While revenues have recovered, with +22% organic growth in the fourth quarter, we think that the more important driver of the outperformance has been the company’s focus on tighter cost controls. Coming in 2023, Meta CEO Mark Zuckerberg made a New Year’s resolution, declaring 2023 the “Year of Efficiency.” By remaining laser-focused on reining in expense growth as the top line reaccelerated, Meta’s operating margins (excluding restructuring) expanded almost +1,100 bps vs last 4Q, reaching nearly 44%. Harking back to Zuckerberg’s resolution, Meta’s 2023 was, in fact, highly efficient…

Putting this in perspective, two years ago, after the warnings on the 4Q 2021 earnings call, we forecasted that Meta Family of Apps would generate $155 billion of revenues and nearly $68 billion of GAAP operating income in 2023. Fast forward to today, and last night Meta reported that Family of Apps delivered only $134.3 billion of revenues ($22 billion below our 2-year ago estimate), yet FOA operating income (adjusted for one-time expenses) was amazingly in-line with that two-year old forecast. For 2024, while we now forecast Family of Apps revenues of $151.2 billion (almost $30 billion below the forecast made on February 2, 2022), our current all-in Meta operating profit estimate of $56.8 billion is also essentially in line. In essence, Meta has emerged as a more profitable (dare we say, efficient) business.

That shrunken revenue figure is digital advertising that simply disappeared — in many cases, along with the companies that bought it — in the wake of ATT. The fact that Meta responded by becoming so much leaner, though, was critical to not just surviving ATT, but also laid the groundwork for where the company is going next.

Increased company efficiency is a reason to be bullish on Meta, but three years on, the key takeaway from ATT is that it validated my thesis that Meta is anti-fragile. From 2020’s Apple and Facebook:

This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers. This focus on the long-tail, which is only possible because of Facebook’s fully automated ad-buying system, has turned out to be a tremendous asset during the coronavirus slow-down…

This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.

In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.

Make no mistake, a lot of these kinds of companies were killed by ATT; the ones that survived, though, emerged into a world where no one other than Meta — thanks in part to a massive GPU purchase the same month the company reached its most-recent stock market nadir — had the infrastructure to rebuild the type of ad system they depended on. This rebuild had to be probabilistic — making a best guess as to the right target, and, more confoundingly, a best guess as to conversion — which is only workable with an astronomical amount of data and an astronomical amount of infrastructure to process that data, such that advertisers could once again buy based on promised results, and have those promises met.

Now into this cauldron Meta is adding generative AI. Advertisers have long understood the importance of giving platforms like Meta multiple pieces of creative for ads; Meta’s platform will test different pieces of creative with different audiences and quickly hone in on what works, putting more money behind the best arrow. Generative AI puts this process on steroids: advertisers can provide Meta with broad parameters and brand guidelines, and let the black box not just test out a few pieces of creative, but an effectively unlimited amount. Critically, this generative AI application has a verification function: did the generated ad generate more revenue or less? That feedback function, meanwhile, is data in its own right, and can be leveraged to better target individuals in the future.

The second piece to all of this — the galling part I referenced above — is the margin question. The Department of Justice’s lawsuit against Google’s ad business explains why black boxes are so beneficial to big ad platforms:

Over time, as Google’s monopoly over the publisher ad server was secured, Google surreptitiously manipulated its Google Ads’ bids to ensure it won more high-value ad inventory on Google’s ad exchange while maintaining its own profit margins by charging much higher fees on inventory that it expected to be less competitive. In doing so, Google was able to keep both categories of inventory out of the hands of rivals by competing in ways that rivals without similar dominant positions could not. In doing so, Google preserved its own profits across the ad tech stack, to the detriment of publishers. Once again, Google engaged in overt monopoly behavior by grabbing publisher revenue and keeping it for itself. Google called this plan “Project Bernanke.”

I’m skeptical about the DOJ’s case for reasons I laid out in this Update; publishers made more money using Google’s ad server than they would have otherwise, while the advertisers, who paid more, are not locked in. The black box effect, however, is real: platforms like Google or Meta can meet an advertiser’s goals — at a price point determined by an open auction — without the advertisers knowing which ads worked and which ones didn’t, keeping the margin from the latter. The galling bit is that this works out best for everyone: these platforms are absolutely finding customers you wouldn’t get otherwise, which means advertisers earn more when the platforms earn more too, and these effects will only be supercharged with generative ads.

There’s more upside for Meta, too. Google and Amazon will benefit from generative ads, but I expect the effect will be the most powerful at the top of the funnel where Meta’s advertising operates, as opposed to the bottom-of-the-funnel search ads where Amazon and Google make most of their money. Moreover, there is that long tail I mentioned above: one of the challenges for Meta in moving from text (Feed) to images (Stories) to video (Reels) is that effective creative becomes more difficult to execute, especially if you want multiple variations. Meta has devoted a lot of resources over the years to tooling to help advertisers make effective ads, much of which will be obviated by generative AI. This, by extension, will give long tail advertisers more access to more inventory, which will increase demand and ultimately increase prices.

There is one more channel that is exclusive to Meta: text-to-message ads. These are ads where the conversion event is initiating a chat with an advertiser, an e-commerce channel that is particularly popular in Asia. The distinguishing factor in the markets where these ads are taking off is low labor costs, which AI addresses. Zuckerberg explained in a 2023 earnings call:

And then the one that I think is going to have the fastest direct business loop is going to be around helping people interact with businesses. You can imagine a world on this where over time, every business has as an AI agent that basically people can message and interact with. And it’s going to take some time to get there, right? I mean, this is going to be a long road to build that out. But I think that, that’s going to improve a lot of the interactions that people have with businesses as well as if that does work, it should alleviate one of the biggest issues that we’re currently having around messaging monetization is that in order for a person to interact with a business, it’s quite human labor-intensive for a person to be on the other side of that interaction, which is one of the reasons why we’ve seen this take off in some countries where the cost of labor is relatively low. But you can imagine in a world where every business has an AI agent, that we can see the kind of success that we’re seeing in Thailand or Vietnam with business messaging could kind of spread everywhere. And I think that’s quite exciting.

Both of these use cases — generative ads and click-to-message AI agents — are great examples as to why it makes sense for Meta to invest in its Llama models and make them open(ish): more and better AI means more and better creative and more and better agents, all of which can be monetized via advertising.

Medium-Term: The Smiling Curve and Infinite Content

Of course all of this depends on people continuing to use Meta properties, and here AI plays an important role as well. First, there is the addition of Meta AI, which makes Meta’s apps more useful. Meta AI also opens the door to a search-like product, which The Information just reported the company was working on; potential search advertising is a part of the bull case as well, although for me a relatively speculative one.

Second is the insertion of AI content into the Meta content experience, which Meta just announced it is working on. From The Verge:

If you think avoiding AI-generated images is difficult as it is, Facebook and Instagram are now going to put them directly into your feeds. At the Meta Connect event on Wednesday, the company announced that it’s testing a new feature that creates AI-generated content for you “based on your interests or current trends” — including some that incorporate your face.

When you come across an “Imagined for You” image in your feed, you’ll see options to share the image or generate a new picture in real time. One example (embedded below) shows several AI-generated images of “an enchanted realm, where magic fills the air.” But others could contain your face… which I’d imagine will be a bit creepy to stumble upon as you scroll…

In a statement to The Verge, Meta spokesperson Amanda Felix says the platform will only generate AI images of your face if you “onboarded to Meta’s Imagine yourself feature, which includes adding photos to that feature” and accepting its terms. You’ll be able to remove AI images from your feed as well.

This sounds like a company crossing the Rubicon, but in fact said crossing already happened a few years ago. Go back to 2015’s Facebook and the Feed, where I argued that Facebook was too hung up on being a social network, and concluded:

Consider Facebook’s smartest acquisition, Instagram. The photo-sharing service is valuable because it is a network, but it initially got traction because of filters. Sometimes what gets you started is only a lever to what makes you valuable. What, though, lies beyond the network? That was Facebook’s starting point, and I think the answer to what lies beyond is clear: the entire online experience of over a billion people. Will Facebook seek to protect its network — and Zuckerberg’s vision — or make a play to be the television of mobile?

It took Facebook another five years — and the competitive threat of TikTok — but the company finally did make the leap to showing you content from across the entire service, not just that which was posted by your network. The latter was an artificial limitation imposed by the company’s own self-conception of itself as a social network, when in reality it is a content network; true social networking — where you talk to people you actually know — happens in group chats:

The structure of this illustration may look familiar; it’s another manifestation of The Smiling Curve, which I first wrote about in the context of publishing:

Over time, as this cycle repeats itself and as people grow increasingly accustomed to getting most of their “news” from Facebook (or Google or Twitter), value moves to the ends, just like it did in the IT manufacturing industry or smartphone industry:

On the right you have the content aggregators, names everyone is familiar with: Google ($369.7 billion), Facebook ($209.0 billion), Twitter ($26.4 billion), Pinterest (private). They are worth by far the most of anyone in this discussion. Traditional publishers, meanwhile, are stuck in the middle…publishers (all of them, not just newspapers) don’t really have an exclusive on anything anymore. They are Acer, offering the same PC as the next guy, and watching as the lion’s share of the value goes to the folks who are actually putting the content in front of readers.

It speaks to the inevitability of the smiling curve that it has even come for Facebook (which I wrote about in 2020’s Social Networking 2.0); moving to global content and purely individualized feeds unconstrained by your network was the aforementioned Rubicon crossing. The provenance of that content is a tactical question, not a strategic one.

To that end, I’ve heard whispers that these AI content tests are going extremely well, which raises an interesting financial question. One of Meta’s great strengths is that it gets its content for free from users. There certainly are costs incurred in personalizing your feed, but this is one of the rare cases where AI content is actually more expensive. It’s possible, though, that it simply is that much better and more engaging, in part because it is perfectly customized to you.

This leads to a third medium-term AI-derived benefit that Meta will enjoy: at some point ads will be indistinguishable from content. You can already see the outlines of that given I’ve discussed both generative ads and generative content; they’re the same thing! That image that is personalized to you just might happen to include a sweater or a belt that Meta knows you probably want; simply click-to-buy.

It’s not just generative content, though: AI can figure out what is in other content, including authentic photos and videos. Suddenly every item in that influencer photo can be labeled and linked — provided the supplier bought into the black box, of course — making not just every piece of generative AI a potential ad, but every piece of content period.

The market implications of this are profound. One of the oddities of analyzing digital ad platforms is that some of the most important indicators are counterintuitive; I wrote this spring:

The most optimistic time for Meta’s advertising business is, counter-intuitively, when the price-per-ad is dropping, because that means that impressions are increasing. This means that Meta is creating new long-term revenue opportunities, even as its ads become cost competitive with more of its competitors; it’s also notable that this is the point when previous investor freak-outs have happened.

When I wrote that I was, as I noted in the introduction, feeling more cautious about Meta’s business, given that Reels is built out and the inventory opportunities of Meta AI were not immediately obvious. I realize now, though, that I was distracted by Meta AI: the real impact of AI is to make everything inventory, which is to say that the price-per-ad on Meta will approach $0 for basically forever. Would-be competitors are finding it difficult enough to compete with Meta’s userbase and resources in a probabilisitic world; to do so with basically zero price umbrella seems all-but-impossible.

The Long-term: XR and Generative UI

Notice that I am thousands of words into this Article and, like Meta Myths, haven’t even mentioned VR or AR. Meta’s AI-driven upside is independent from XR becoming the platform of the future. What is different now, though, is that the likelihood of XR mattering feels dramatically higher than it did even six months ago.

The first one is obviously Orion, which I wrote about last month. Augmented reality is definitely going to be a thing — I would buy a pair of Meta’s prototypes now if they were for sale.

Once again, however, the real enabler will be AI. In the smartphone era, user interfaces started out being pixel perfect, and have gradually evolved into being declarative interfaces that scale to different device sizes. AI, however, will enable generative UI, where you are only presented with the appropriate UI to accomplish the specific task at hand. This will be somewhat useful on phones, and much more compelling on something like a smartwatch; instead of having to craft an interface for a tiny screen, generative UIs will surface exactly what you need when you need it, and nothing else.

Where this will really make a difference is with hardware like Orion. Smartphone UI’s will be clunky and annoying in augmented reality; the magic isn’t in being pixel perfect, but rather being able to do something with zero friction. Generative UI will make this possible: you’ll only see what you need to see, and be able to interact with it via neural interfaces like the Orion neural wristband. Oh, and this applies to ads as well: everything in the world will be potential inventory.

AI will have a similarly transformative effect on VR, which I wrote about back in 2022 in DALL-E, the Metaverse, and Zero Marginal Content. That article traced the evolution of both games and user-generated content from text to images to video to 3D; the issue is that games had hit a wall, given the cost of producing compelling 3D content, and that that challenge would only be magnified by the immersive nature of VR. Generative AI, though, will solve that problem:

In the very long run this points to a metaverse vision that is much less deterministic than your typical video game, yet much richer than what is generated on social media. Imagine environments that are not drawn by artists but rather created by AI: this not only increases the possibilities, but crucially, decreases the costs.

Here once again Meta’s advantages come to the fore: not only are they leading the way in VR with the Quest line of headsets, but they are also justified in building out the infrastructure necessary to generate metaverses — advertising included — because every part of their business benefits from AI.

From Abundance to Infinity

This was all a lot of words to explain the various permutations of an obvious truth: a world of content abundance is going to benefit the biggest content Aggregator first and foremost. Of course Meta needs to execute on all of these vectors, but that is where they also benefit from being founder-led, particularly given the fact that founder seems more determined and locked in than ever.

It’s also going to cost a lot of money, both in terms of training and inference. The inference part is inescapable: Meta may have a materially higher cost of revenue in the long run. The training part, however, has some intriguing possibilities. Specifically, Meta’s AI opportunities are so large and so central to the company’s future, that there is no question that Zuckerberg will spend whatever is necessary to keep pushing Llama forward. Other companies, however, with less obvious use cases, or more dependency on third-party development that may take longer than expected to generate real revenue, may at some point start to question their infrastructure spend, and wonder if it might make more sense to simply license Llama (this is where the “ish” part of “openish” looms large). It’s definitely plausible that Meta ends up being subsidized for building the models that give the company so much upside.

Regardless, it’s good to be back on the Meta bull train, no matter what tomorrow’s earnings say about last quarter or next year. Stratechery from the beginning has been focused on the implications of abundance and the companies able to navigate it on behalf of massive user bases — the Aggregators. AI takes abundance to infinity, and Meta is the purest play of all.

I wrote a follow-up to this Article in this Daily Update.

Get notified about new Articles

Sign Up

Please verify your email address to proceed.