Apple launched âApple Intelligenceâ yesterday â their take on AI.
I want to zoom in on the new Siri but first hereâs my mental model of the whole thing.
Overview
Hereâs the Apple Intelligence marketing page. Lots of pics!
Hereâs the Apple Intelligence press release. Itâs an easy read too.
Apple Intelligence is (a) a platform and (b) a bundle of user-facing features.
The platform is Appleâs take on AI infra to meet their values â on-device models, private cloud compute, and the rest.
The user-facing features we can put into 5 buckets:
- Generation/Summarisation. Bounded to avoid hallucination, deepfakes, and IP risks (no making a picture in the style of a particular artist).
- Agents. This is what underpins Siri: on-device tasks using high personal context. (They call it âorchestration.â)
- Natural interfaces. Voice, handwriting, nodding/shaking the head with AirPods Pro.
- Do what I mean. This is a combination of gen-AI and traditional ML: recognising people in photos, knowing which notifications are important, spotting salient data in emails.
- World knowledge. Cautiously delivered as an integration with ChatGPT, think of this is web search++. Also used to turbo-charge text and image generation, if the users opts in.
Bucket 1-4 are delivered using Appleâs own models.
Appleâs terminology distinguishes between âpersonal intelligence,â on-device and under their control, and âworld knowledge,â which is prone to hallucinations â but is also what consumers expect when they use AI, and itâs what may replace Google search as the âpoint of first intentâ one day soon.
Itâs wise for them to keep world knowledge separate, behind a very clear gate, but still engage with it. Protects the brand and hedges their bets.
There are also a couple of early experiments:
- Attach points for inter-op. How do you integrate your own image generation models? How could the user choose their own chatbot? Thereâs a promise to allow integration of models other than OpenAIâs GPT-4o.
- Copilots. A copilot is an AI UX that is deeply integrated into an app, allowing for context-aware generation and refinement, chat, app-specific actions, and more. Thereâs the beginning of a copilot UX in Xcode in the form of Swift Assist â Iâd love to see this across the OS eventually.
A few areas werenât touched on:
- Multiplayer. I feel like solving for multiplayer is a prerequisite for really great human-AI collaboration. I feel like their app Freeform is a sandbox for this.
- Long-running or off-device agent tasks. Say, booking a restaurant. Thatâs where Google Assistant ran to. But having taken a stab at this in old client projects Iâm of the opinion that weâll need whole new UX primitives to do a good job of it. (Progress bars??)
- Character/vibe. Large language models have personality, and people love chatting with them. ChatGPT has a vibe and character.ai is hugely popular⦠but nobody really talks about this. I think itâs awkwardly close to virtual girlfriend territory? Still, Anthropic are taking character seriously now so Iâm hopeful for some real research in this area.
- Refining, tuning, steering. Note that Appleâs main use cases are prompt-led and one-and-done. Steering is a cutting-edge research topic with barely-understood tech let alone UX; there are hard problems.
Gotta leave something for iOS 19.
Architecture
Someone shared the Apple Intelligence high level architecture â I snagged it went by on the socials but forget who shared, sorry.
Hereâs the architecture slide.
The boxes I want to point out so I can come back them in a sec:
- Semantic index. This must be something like a vector database with embeddings of all your texts, emails, appointments, and so on. Your personal context. I talked about embeddings the other day â imagine a really effective search engine that you can query by meaning.
- App Intents toolbox. Thatâs the list of functions or tools offered by all the apps on your phone, and whatever else is required to make it work. Apple apps now but open to everyone.
- Orchestration. Thatâs the agent runtime, the part that takes a user request, breaks it into actions, and performs them. I imagine this is both for generation tasks, which will take a number of operations behind the scenes, and more obvious multi-step agent tasks using Siri.
Whatâs neat about the Apple Intelligence platform is how clearly buildable it all is.
Each component is straightforwardly specific (we know what a vector databases is), improvable over time with obvious gradient descent (you can put an engineering team on making generation real-time and theyâll manage themselves), and itâs scalable across the ecosystem and for future features (itâs obvious how App Intents could be extended to the entire App Store).
A very deft architecture.
And the user-facing features are chosen to minimise hallucination, avoid prompt injection/data exfiltration, and dodge other risks. Good job.
Siri
Siri â the voice assistant that was once terrible and is now, well, looking pretty good actually.
Iâve been immersed in agents recently.
(Hereâs my recent paper: Lares smart home assistant: A toy AI agent demonstrating emergent behavior.)
So Iâm seeing everything through that lens. Three observations/speculations.
1. Siri is now a runtime for micro agents, programmed in plain English.
Take another look at the Apple Intelligence release and look at the requests that Siri can handle now: Send the photos from the barbecue on Saturday to Malia
(hi you) or Add this address to his contact card.
These are multi-step tasks across multiple apps.
The App Intents database (the database of operations that Siri can use in app) is almost good enough to run this. But my experience is that a GPT-3.5-level model is not always reliable⦠especially when there are many possible actions to choose fromâ¦
You know what massively improves reliability? When the prompt includes the exact steps to perform.
Oh and look at that, Siri now includes a detailed device guide:
Siri can now give users device support everywhere they go, and answer thousands of questions about how to do something on iPhone, iPad, and Mac.
The example given is Hereâs how to schedule a text message to send later
and the instructions have four steps.
Handy for users!
BUT.
Look. This is not aimed at humans. These are instructions written to be consumed by Siri itself, for use in the Orchestration agent runtime.
Given these instructions, even a 3.5-level agent is capable of combining steps and performing basic reasoning.
Itâs a gorgeously clever solution. I love that Apple just wrote 1000s of step-by-step guides to achieve everything on your phone, which sure you can read if you ask. But then also: Embed them, RAG the right ones in against a user request, run the steps via app intents. Such a straightforward approach with minimal code.
i.e. Siriâs new capabilities are programmed in plain English.
Can I prove it? No. But Iâll eat my hat if itâs not something like that.
2. Semantic indexing isnât enough. You need salience too and we got a glimpse of that in the Journal app.
Siriâs instruction manual is an example of how Apple often surfaces technical capabilities as user-facing features.
Hereâs another one I canât prove: the prototype of the âpersonal contextâ in the semantic index.
Itâs not just enough to know that you went to such-and-such location yesterday, or happened to be in the same room as X and Y, or listened to whatever podcast. Semantic search isnât enough.
You also need salience.
Was it notable that you went to such-and-such location? Like, is meeting up in whatever bookshop with whatever person unusual and significant? Did you deliberately play whatever podcast, or did it just run on from the one before?
Thatâs tough to figure out.
Fortunately Apple has been testing this for many months: Apple launched their Journal app in December 2023 as part of the OS, and it includes Intelligently curated personalised suggestions
as daily writing prompts.
Like, you had an outing with someone, that kind of thing, thatâs the kind of suggestion they give you. Itâs all exposed by the Journaling Suggestions API.
Imagine the training data that comes from seeing whether people click on the prompts or not. Valuable for training the salience engine Iâm sure. You donât need to train with the actual data, just give a signal that the weights are right.
Again, nothing I can prove. But!
3. App Intents? How about Web App Intents?
AI agents use tools or functions.
Siri uses âApp Intentsâ which developers declare, as part of their app, and Siri stores them all in a database. âIntentâ is also the term of art on Android for âa meaningful operation that an app can do.â App Intents arenât new for this generation of AI; Apple and Android both laid the groundwork for this many, many years ago.
Intents == agent tools.
It is useful that there is a language for this now!
The new importance of App Intents to AI-powered Siri provokes a bunch of follow-up questions:
- What about intents that can only be fulfilled off-device, like booking a restaurant? In the future, do you need an app to advertise that intent to Siri, or could Siri index âWeb App Intentsâ too, accessed remotely, no app required?
- How will new intents be discovered? Like, if I want to use the smart TV in an Airbnb and I donât have the app yet? Or book a ticket for the train in a country Iâm visiting for the first time?
- When there are competing intents, how will Siri decide who wins? Like, Google Maps and Resi can both recommend restaurants â who gets to respond to me asking for dinner suggestions?
- How will personal information be shared and protected?
I unpack a lot of these questions in my post about search engines for personal AI agents from March earlier this year. Siriâs new powers make these more relevant.
On a more technical level, in the Speculations section of my recent agent paper, I suggested that systems will need an agent-facing API â we can re-frame that now as future Web App Intents.
In that paper, I started sketching out some technical requirements for that agent-facing API, and now I can add a new one: in addition to an API, any system (like Google Maps for restaurant booking) will need to publish a large collection of instruction cards â something that parallels Siriâs device guides.
Good to know!
Iâm impressed with Apple Intelligence.
It will have taken a ton of work to make it so straightforward, and also align so well with what users want, brand, and strategy.
Let me add one more exceptionally speculative speculation, seeing as I keep on accusing Apple of hiding the future in plain sightâ¦
Go back to the Apple Intelligence page and check out the way Siri appears now. No longer a glowing orb, itâs an iridescent ring on the perimeter of the phone screen.
Another perimeter feature: in iOS 18, when you push the volume button it pushes in the display bezel.
I bet the upcoming iPhones have curved screens a la the Samsung Galaxy S6 Edge from 2015.
Or at least it has been strongly considered.
But iPhones with Siri AI should totally have curved glass. Because that would look sick.
Apple launched âApple Intelligenceâ yesterday â their take on AI.
I want to zoom in on the new Siri but first hereâs my mental model of the whole thing.
Overview
Hereâs the Apple Intelligence marketing page. Lots of pics!
Hereâs the Apple Intelligence press release. Itâs an easy read too.
Apple Intelligence is (a) a platform and (b) a bundle of user-facing features.
The platform is Appleâs take on AI infra to meet their values â on-device models, private cloud compute, and the rest.
The user-facing features we can put into 5 buckets:
Bucket 1-4 are delivered using Appleâs own models.
Appleâs terminology distinguishes between âpersonal intelligence,â on-device and under their control, and âworld knowledge,â which is prone to hallucinations â but is also what consumers expect when they use AI, and itâs what may replace Google search as the âpoint of first intentâ one day soon.
Itâs wise for them to keep world knowledge separate, behind a very clear gate, but still engage with it. Protects the brand and hedges their bets.
There are also a couple of early experiments:
A few areas werenât touched on:
Gotta leave something for iOS 19.
Architecture
Someone shared the Apple Intelligence high level architecture â I snagged it went by on the socials but forget who shared, sorry.
Hereâs the architecture slide.
The boxes I want to point out so I can come back them in a sec:
Whatâs neat about the Apple Intelligence platform is how clearly buildable it all is.
Each component is straightforwardly specific (we know what a vector databases is), improvable over time with obvious gradient descent (you can put an engineering team on making generation real-time and theyâll manage themselves), and itâs scalable across the ecosystem and for future features (itâs obvious how App Intents could be extended to the entire App Store).
A very deft architecture.
And the user-facing features are chosen to minimise hallucination, avoid prompt injection/data exfiltration, and dodge other risks. Good job.
Siri
Siri â the voice assistant that was once terrible and is now, well, looking pretty good actually.
Iâve been immersed in agents recently.
(Hereâs my recent paper: Lares smart home assistant: A toy AI agent demonstrating emergent behavior.)
So Iâm seeing everything through that lens. Three observations/speculations.
1. Siri is now a runtime for micro agents, programmed in plain English.
Take another look at the Apple Intelligence release and look at the requests that Siri can handle now: (hi you) or
These are multi-step tasks across multiple apps.
The App Intents database (the database of operations that Siri can use in app) is almost good enough to run this. But my experience is that a GPT-3.5-level model is not always reliable⦠especially when there are many possible actions to choose fromâ¦
You know what massively improves reliability? When the prompt includes the exact steps to perform.
Oh and look at that, Siri now includes a detailed device guide:
The example given is
and the instructions have four steps.Handy for users!
BUT.
Look. This is not aimed at humans. These are instructions written to be consumed by Siri itself, for use in the Orchestration agent runtime.
Given these instructions, even a 3.5-level agent is capable of combining steps and performing basic reasoning.
Itâs a gorgeously clever solution. I love that Apple just wrote 1000s of step-by-step guides to achieve everything on your phone, which sure you can read if you ask. But then also: Embed them, RAG the right ones in against a user request, run the steps via app intents. Such a straightforward approach with minimal code.
i.e. Siriâs new capabilities are programmed in plain English.
Can I prove it? No. But Iâll eat my hat if itâs not something like that.
2. Semantic indexing isnât enough. You need salience too and we got a glimpse of that in the Journal app.
Siriâs instruction manual is an example of how Apple often surfaces technical capabilities as user-facing features.
Hereâs another one I canât prove: the prototype of the âpersonal contextâ in the semantic index.
Itâs not just enough to know that you went to such-and-such location yesterday, or happened to be in the same room as X and Y, or listened to whatever podcast. Semantic search isnât enough.
You also need salience.
Was it notable that you went to such-and-such location? Like, is meeting up in whatever bookshop with whatever person unusual and significant? Did you deliberately play whatever podcast, or did it just run on from the one before?
Thatâs tough to figure out.
Fortunately Apple has been testing this for many months: Apple launched their Journal app in December 2023 as part of the OS, and it includes as daily writing prompts.
Like, you had an outing with someone, that kind of thing, thatâs the kind of suggestion they give you. Itâs all exposed by the Journaling Suggestions API.
Imagine the training data that comes from seeing whether people click on the prompts or not. Valuable for training the salience engine Iâm sure. You donât need to train with the actual data, just give a signal that the weights are right.
Again, nothing I can prove. But!
3. App Intents? How about Web App Intents?
AI agents use tools or functions.
Siri uses âApp Intentsâ which developers declare, as part of their app, and Siri stores them all in a database. âIntentâ is also the term of art on Android for âa meaningful operation that an app can do.â App Intents arenât new for this generation of AI; Apple and Android both laid the groundwork for this many, many years ago.
Intents == agent tools.
It is useful that there is a language for this now!
The new importance of App Intents to AI-powered Siri provokes a bunch of follow-up questions:
I unpack a lot of these questions in my post about search engines for personal AI agents from March earlier this year. Siriâs new powers make these more relevant.
On a more technical level, in the Speculations section of my recent agent paper, I suggested that systems will need an agent-facing API â we can re-frame that now as future Web App Intents.
In that paper, I started sketching out some technical requirements for that agent-facing API, and now I can add a new one: in addition to an API, any system (like Google Maps for restaurant booking) will need to publish a large collection of instruction cards â something that parallels Siriâs device guides.
Good to know!
Iâm impressed with Apple Intelligence.
It will have taken a ton of work to make it so straightforward, and also align so well with what users want, brand, and strategy.
Let me add one more exceptionally speculative speculation, seeing as I keep on accusing Apple of hiding the future in plain sightâ¦
Go back to the Apple Intelligence page and check out the way Siri appears now. No longer a glowing orb, itâs an iridescent ring on the perimeter of the phone screen.
Another perimeter feature: in iOS 18, when you push the volume button it pushes in the display bezel.
I bet the upcoming iPhones have curved screens a la the Samsung Galaxy S6 Edge from 2015.
Or at least it has been strongly considered.
But iPhones with Siri AI should totally have curved glass. Because that would look sick.