-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
| SEP Number | #1577 |
|---|---|
| Title | Sampling With Tools |
| Author | Olivier Chafik |
| Sponsor | @bhosmer-ant |
| Status | Draft |
| Created | 2025-09-29 |
| Specification | MCP 2025-06-18 |
| Prototype | modelcontextprotocol/typescript-sdk#991 |
| PR | #1796 |
| SDKs | modelcontextprotocol/python-sdk#1594 modelcontextprotocol/typescript-sdk#1101 |
Updates:
- Oct 1: renamed
tool_choice->toolChoice(+"none"value); removed exoticstopReasons"refusal" & "other"; allowed{CreateMessageResult,SamplingMessage}.contentto be single contents or arrays of contents; - Oct 6: aligned
ToolResultContentonCallToolResult(support image / audio); added "Possible Follow Ups" section. - Oct 10: updated reference impl example w/ simple tool registry (unify mcp tools w/ tool loop tools, see comment below) and a "choose your own adventure" game that uses sampling w/ tools + elicitation.
- Oct 27: aligned
ToolResultContent.contentonCallToolResult.content(using ContentBlock); addedToolResultContent._meta - Nov 5:
- kept
stopReasonas open string but w/ redundant explicit enums for visibility - removed requirement to throw when
includeContextnot matching advertisedClientCapabilities.sampling.context - mitigates backwards compatibility issue of
CreateMessageResult.contentbeing an array of contents OR a single content by saying sampling MUST NOT return an array in earlier spec versions (+ ackowledging SDK updates of code w/ sampling will need small code changes)
- kept
- Nov 7: renamed type
ToolCallContenttoToolUseContent(to match itstool_usetype & thetoolUsestopReason). SEP was approved! - Nov 10: removing
disable_parallel_tool_use/ keeping for a later update as the Gemini API has no way to implement this for now. - Nov 11: added extra notes about Gemini API's function calling modes & roles; requiring SamplingMessage w/ tool result contents not be mixed w/ other content types
Abstract
This SEP introduces tools & toolChoice params to sampling/createMessage and soft-deprecates includeContext (fences thisServer & allServers under a capability). This allows MCP servers to run their own agentic loops using the client's tokens (still under the user supervision), and reduces the complexity of client implementations (context support becoming explicitly optional).
Motivation
-
Sampling doesn't support tool calling, although it's a cornerstone of modern agentic behaviour. Without explicit support for it, MCP servers that use Sampling can either try and emulate tool calling w/ complex prompting / custom parsing of the outputs, or are limited to simpler, non-agentic requests. Adding support for tool calling could unlock many novel use cases in the MCP ecosystem.
-
Context inclusion is ambiguously defined (see this doc): it makes it particularly tricky to fully implement sampling, which along with other precautions needed for sampling (unaffected by this SEP) may have contributed to low adoption of the feature in clients (feature was introduced in the MCP Nov 2024 spec).
Please note some related work:
- MCP Sampling (@jerome3o-anthropic): extremely similar proposal:
- Add same tools semantics,
- Deprecate
includeContext(doc explains why its semantics are ambiguous) - (goes further to suggest explicit context sharing, which is out of scope from this proposal)
- Allow Prompt/Sampling Messages to contain multiple content blocks. #198
- In this PR we've made
{CreateMessageResult,SamplingMessage}.contentto accept a single content or an array of contents. Theresult.contentchange is backwards incompatible but is required to support parallel tool calls. TheSamplingMessage.contentchange then makes it much more natural to write a tool loop (see example in reference implementation: toolLoopSampling.ts)
- In this PR we've made
In the "Possible Follow ups" Section below, we give examples of features that were kept out of scope from this SEP but which we took care to make this SEP reasonably compatible with.
Specification
Overview
- Add traditional tool call support in CreateMessageRequest w/
tools(w/ JSON schemas) &toolChoiceparams, requiring a server-side tool loop- Sampling may now yield ToolCallBlock responses
- Server needs to call tools by itself
- Server calls sampling again with ToolResultParamBlock to inject tool results
toolChoice.modecan be“auto" | "required" | "none"to allow common structured outputs use case (see below for possible follow up improvements)- Fenced by new capability (
sampling { tools {} })
- Fix/update underspecified strings in CreateMessageResult:
stopReason: “endTurn" | "stopSequence" | “toolUse" | “maxToken" | string(explicit enums + open string for compat)role: “assistant”
- Soft-deprecate CreateMessageRequest.params.includeContext != ‘none’ (now fenced by capability)
- Incentivize context-free sampling implementation
Protocol changes
sampling/createMessageMUST throw an error whenincludeContext is “thisServer” | “allServers”butclientCapabilities.sampling.contextis missing- MUST throw an error when
toolortoolChoiceare defined butclientCapabilities.sampling.toolsis missing - Servers SHOULD avoid
[includeContext](https://modelcontextprotocol.io/specification/2025-06-18/schema#createmessagerequest)!= ‘none’as values“thisServer”and“allServers”` may be removed in future spec releases. CreateMessageRequest.messagesMUST balance any “assistant” message w/ aToolUseContent(andid: $id1) w/ a “user” message w/ a ToolResultContent (andtool_result_id: $id1)- Note: this is a requirement for Claude API implementation (parallel tool call must all be responded to in one go)
- SamplingMessage with tool result content blocks MUST NOT contain other content types.
Schema changes
-
interface ClientCapabilities { ... sampling?: { context?: object; // NEW: Allows CreateMessageRequest.params.includeContext != "none" tools?: object; // NEW: Allows CreateMessageRequest.params.{tools,toolChoice} }; }
-
CreateMessageRequest (use existing Tool)
interface CreateMessageRequest { method: “sampling/createMessage”; params: { ... messages: SamplingMessage[]; // Note: type updated, see below tools?: Tool[] // NEW (existing type) toolChoice?: ToolChoice // NEW }; } interface ToolChoice { // NEW mode?: “auto” | "required" | "none"; // disable_parallel_tool_use?: boolean; // Update (Nov 10): removed, see below }
- Notes:
- OpenAI vs. Anthropic API idioms to avoid parallel tool calls:
- OpenAI:
parallel_tool_calls: false(top-level param) - Anthropic:
tool_choice.disable_parallel_tool_use: true- Preferred here as default value if unset is false (e.g. parallel tool calls allowed)
- OpenAI:
- OpenAI vs. Anthropic API re/
tool_choice"none"vs.tools:- OpenAI:
tools: [$Foo], tool_choice: "none"forbids any tool call- Preferred behaviour here
- Anthropic:
tools: [$Foo], tool_choice: {mode: "none"}may still call toolFoo
- OpenAI:
- Gemini vs. OAI / Anthropic re/
disable_parallel_tool_use:- Gemini API has no way to disable parallel tool calls atm (unlike OAI / Anthropic APIs). Removing this flag for now, to be reintroduced when Gemini has any way of supporting it. Otherwise clients would get unexpected multiple tool calls (or alternatively if implemented that way, unexpected failures / costly retry until a single tool call is emitted)
- Gemini API's Function calling modes have an
ANYvalue that should match the proposedrequired
- OpenAI vs. Anthropic API idioms to avoid parallel tool calls:
- Notes:
-
/* BEFORE: interface SamplingMessage { content: TextContent | ImageContent | AudioContent role: Role; } */ type SamplingMessage = UserMessage | AssistantMessage // NEW type AssistantMessageContent = TextContent | ImageContent | AudioContent | ToolUseContent; type UserMessageContent = TextContent | ImageContent | AudioContent | ToolResultContent interface AssistantMessage { // NEW role: "assistant"; content: AssistantMessageContent | AssistantMessageContent[] } interface ToolUseContent { // NEW type: "tool_use"; name: string; id: string; input: object; } interface UserMessage { // NEW role: "user"; content: UserMessageContent | UserMessageContent[] } interface ToolResultContent { // NEW _meta?: { [key: string]: unknown }; type: "tool_result"; toolUseId: string; content: ContentBlock[], structuredContent: object; isError?: boolean; }
-
Notes:
- Differences of role vs. content type when it comes to tool calling between APIs:
- OpenAI:
role: “system" | “user" | “assistant" | “tool"(where tool is for tool results), while tool calls are nested in assistant messages, content is then typically null but some “OpenAI compatible” APIs accept non-null values-
[ {"role": "user", "content": "what is the temperature in london?"}, { "role": "assistant", "content": "Let me use a tool...", "tool_calls": [{ "id": "call_1", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"London\"}" } }] }, { "role": "tool", "content": "{\"temperature\": 20, \"condition\": \"sunny\"}", "tool_call_id": "call_1" } ]
-
- Claude API:
role: “user" | “assistant", tool use and result are passed through specially-typed message content parts:-
[ { "role": "user", "content": [ { "type": "text", "text": "what is the temperature in london?" } }, { "role": "assistant", "content": [ { "type": "text", "text": "Let me use a tool..." }, { "type": "tool_use", "id": "call_1", "name": "get_weather", "input": {"location": "London"} } ] }, { "role": "user", "content": [ { "type": "tool_result", "tool_call_id": "call_1", "content": {"temperature": 20, "condition": "sunny"} } ] } ]
-
- Gemini API:
functionrole (similar to OAI'stoolrole)- No tool call id concept (function calling: Gemini requires tool results to be provided in the exact same order as the tool use parts. An implementation could generate the tool call ids and use them to reorder the tool results if needed.
- OpenAI:
- Differences of role vs. content type when it comes to tool calling between APIs:
-
/* BEFORE: interface CreateMessageResult { _meta?: { [key: string]: unknown }; content: TextContent | ImageContent | AudioContent; role: Role; stopReason?: string; [key: string]: unknown; } */ interface CreateMessageResult { _meta?: { [key: string]: unknown }; content: AssistantMessageContent | AssistantMessageContent[] // UPDATED role: "assistant"; // UPDATED stopReason?: “endTurn" | "stopSequence" | “toolUse" | “maxToken" | string // UPDATED [key: string]: unknown; }
-
Notes:
- Backwards compatibility issue: returning CreateMessageResult.content as an array of contents OR a single content is problematic, so we propose:
sampling/createMessageMUST NOT return an array inCreateMessageResult.contentbefore spec version Nov 2025.- This guarantees wire-level backwards-compatibility
- Existing code that uses sampling may break w/ new SDK releases as it will need to test content to know if it's an array or a single block, and act accordingly.
- This seems reasonable(?)
CreateMessageResult.stopReasonfield is currently defined as an openstring, and the spec only mentions theendTurnas example value.- OpenAI vs. Anthropic API idioms
- Finish/stop reason
- OpenAI’s ChatCompletion:
finish_reason: “stop” | “length” | “tool_use”(…?) - Anthropic:
stop_reason: “end_turn” | “max_tokens” | “stop_sequence” | “tool_use” | “pause_turn” | “refusal”
- OpenAI’s ChatCompletion:
- Finish/stop reason
- Backwards compatibility issue: returning CreateMessageResult.content as an array of contents OR a single content is problematic, so we propose:
-
Possible Follow ups
Theses are out of scope for this SEP, but care was taken not to preclude them, so where appropriate we give examples of how they could be implemented on top of / after this SEP.
Streaming support
See: Streaming tool use results #117
This could be important for some longer-running use cases or when latency is important, but would play better w/ streaming support in MCP tools.
A possible way to implement this would be to use notifications w/ payload, and possibly create a new method sampling/createMessageStreamed. Both should be orthogonal w/ this SEP (but we'd need to create delta types for results, similar to streaming APIs in inference API such as Claude API and OpenAI API).
Cache friendliness updates
Two bits needed here:
- Introduce cache awareness
- Implicit caching guidelines phrased as SHOULDs
- Explicit cache points and TTL semantics as in the Claude API? (incl. beta behaviour for longer caching)
- Pros: easy to implement for at least 1 implementor (Anthropic)
- Cons: if hard to implement for others, unlikely to get approval.
- “Whole prompt” / prompt-prefix cache w/ an explicit key as in the OpenAI API?
- Pros:
- simpler for users (no need to think about where the shared prefix stops)
- implicitly supports updating the cache (maybe even as subtree)
- Cons: possibly harder to implement / more storage inefficient
- Pros:
- Introduce allowed_tools feature to enable / disable tools w/o breaking context caching
-
Relevant to this SEP as we may want to merge this feature under the tool_choice field, similar to what OpenAI did.
interface ToolChoice { // NEW mode?: “auto” | "required"; allowed_tools?: string[] }
-
Allow client to call the server’s tools by itself in an agentic loop
From the server’s perspective, that would remove the need to call tools by itself / inject tool results in follow up sampling calls.
The MCP server would just allowlist its own tools in the sampling request, w/t a dedicated tool definition such as:
{
type: "server-tool"; // MCP tool from same server.
name: string;
}Pros:
- Safe, limited to that server’s tools.
- If we propagate the mcp-session-id, can leverage keep any server-side session context / caching
Allow client to call any other MCP servers’ tools by itself in an agentic loop
Although this sounds similar to the previous one (allow only same server’s tools), this option wouldn’t need a protocol change / could be entirely done by the client as an implementation detail of their sampling support.
The end user would allowlist tools from any other MCP server for use in a sampling request, without the server having to ask for anything. The client UI would e.g. display a tool selection UI as part of the sampling approval flow, auto enabling tools from same server by default.
Pros:
- Technically no spec change needed (if anything, mention this as a freedom clients have)
- Possibly similar to what CreateMessageRequest.params.includeContext = thisServer / allServers intended semantics may have meant
CreateMessageRequest.params.allowImplicitToolCalls = “none” | “thisServer” | “allServers”
(assuming we wanted to give the server any control over this)
Cons:
- Classifier might be needed to avoid High potential for privacy leaks / abuse
- If user approves Gmail MCP tool usage / delegation by mistake, server gets access to their private emails through sampling
Allow server to list & call clients’ tools (client/server → p2p)
If we say the client can now expose tools that the server can call, it opens a set of possibilities:
- The client can “forward” other servers’ tools (maybe w/ some namespacing for seamless aggregation)
- The server can then call these tools as part of its tool loop.
- Client & Server semantics start to lose weight, we enter a more peer-to-peer, symmetrical relationship
- Client could also ask a server for sampling, while we’re at it
- Symmetry at the protocol layer, but still directionality at the transport layer (e.g. for HTTP transport, direction of POST requests still matters)
Simplify structured outputs use case
A major use case of sampling is to get outputs that conform to a given schema.
This is possible in OpenAI’s API for instance.
The most common workaround is to give a single tool and set tool_choice: "required", which guarantees the output is a ToolCall containing inputs that conform to the tool’s input schema.
While this SEP proposes we enable this "required"-based workaround, as a follow up it would be great to provide more explicit / simpler JSON schema support, which would also allow schema types not allowed in tool inputs (which require an object w/ properties, so one has to pick at least a name for their outputs, which requires thinking / interplay w/ the prompting strategy):
interface CreateMessageRequest {
method: “sampling/createMessage”;
params: {
messages: SamplingMessage[];
...
format: {
type: "json_schema",
"schema": {
"type": "array",
"minItems": 5,
"maxItems": 100
}
}
}Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status