Skip to content

SEP-1577 - Sampling With Tools #1577

@ochafik

Description

@ochafik
SEP Number #1577
Title Sampling With Tools
Author Olivier Chafik
Sponsor @bhosmer-ant
Status Draft
Created 2025-09-29
Specification MCP 2025-06-18
Prototype modelcontextprotocol/typescript-sdk#991
PR #1796
SDKs modelcontextprotocol/python-sdk#1594 modelcontextprotocol/typescript-sdk#1101

Updates:

  • Oct 1: renamed tool_choice -> toolChoice (+ "none" value); removed exotic stopReasons "refusal" & "other"; allowed {CreateMessageResult,SamplingMessage}.content to be single contents or arrays of contents;
  • Oct 6: aligned ToolResultContent on CallToolResult (support image / audio); added "Possible Follow Ups" section.
  • Oct 10: updated reference impl example w/ simple tool registry (unify mcp tools w/ tool loop tools, see comment below) and a "choose your own adventure" game that uses sampling w/ tools + elicitation.
  • Oct 27: aligned ToolResultContent.content on CallToolResult.content (using ContentBlock); added ToolResultContent._meta
  • Nov 5:
    • kept stopReason as open string but w/ redundant explicit enums for visibility
    • removed requirement to throw when includeContext not matching advertised ClientCapabilities.sampling.context
    • mitigates backwards compatibility issue of CreateMessageResult.content being an array of contents OR a single content by saying sampling MUST NOT return an array in earlier spec versions (+ ackowledging SDK updates of code w/ sampling will need small code changes)
  • Nov 7: renamed type ToolCallContent to ToolUseContent (to match its tool_use type & the toolUse stopReason). SEP was approved!
  • Nov 10: removing disable_parallel_tool_use / keeping for a later update as the Gemini API has no way to implement this for now.
  • Nov 11: added extra notes about Gemini API's function calling modes & roles; requiring SamplingMessage w/ tool result contents not be mixed w/ other content types

Abstract

This SEP introduces tools & toolChoice params to sampling/createMessage and soft-deprecates includeContext (fences thisServer & allServers under a capability). This allows MCP servers to run their own agentic loops using the client's tokens (still under the user supervision), and reduces the complexity of client implementations (context support becoming explicitly optional).

Motivation

  • Sampling doesn't support tool calling, although it's a cornerstone of modern agentic behaviour. Without explicit support for it, MCP servers that use Sampling can either try and emulate tool calling w/ complex prompting / custom parsing of the outputs, or are limited to simpler, non-agentic requests. Adding support for tool calling could unlock many novel use cases in the MCP ecosystem.

  • Context inclusion is ambiguously defined (see this doc): it makes it particularly tricky to fully implement sampling, which along with other precautions needed for sampling (unaffected by this SEP) may have contributed to low adoption of the feature in clients (feature was introduced in the MCP Nov 2024 spec).

Please note some related work:

  • MCP Sampling (@jerome3o-anthropic): extremely similar proposal:
    • Add same tools semantics,
    • Deprecate includeContext (doc explains why its semantics are ambiguous)
    • (goes further to suggest explicit context sharing, which is out of scope from this proposal)
  • Allow Prompt/Sampling Messages to contain multiple content blocks. #198
    • In this PR we've made {CreateMessageResult,SamplingMessage}.content to accept a single content or an array of contents. The result.content change is backwards incompatible but is required to support parallel tool calls. The SamplingMessage.content change then makes it much more natural to write a tool loop (see example in reference implementation: toolLoopSampling.ts)

In the "Possible Follow ups" Section below, we give examples of features that were kept out of scope from this SEP but which we took care to make this SEP reasonably compatible with.

Specification

Overview

  • Add traditional tool call support in CreateMessageRequest w/ tools (w/ JSON schemas) & toolChoice params, requiring a server-side tool loop
    • Sampling may now yield ToolCallBlock responses
    • Server needs to call tools by itself
    • Server calls sampling again with ToolResultParamBlock to inject tool results
    • toolChoice.mode can be “auto" | "required" | "none" to allow common structured outputs use case (see below for possible follow up improvements)
    • Fenced by new capability (sampling { tools {} })
  • Fix/update underspecified strings in CreateMessageResult:
    • stopReason: “endTurn" | "stopSequence" | “toolUse" | “maxToken" | string (explicit enums + open string for compat)
    • role: “assistant”
  • Soft-deprecate CreateMessageRequest.params.includeContext != ‘none’ (now fenced by capability)
    • Incentivize context-free sampling implementation

Protocol changes

  • sampling/createMessage
    • MUST throw an error when includeContext is “thisServer” | “allServers” but clientCapabilities.sampling.context is missing
    • MUST throw an error when tool or toolChoice are defined but clientCapabilities.sampling.tools is missing
    • Servers SHOULD avoid [includeContext](https://modelcontextprotocol.io/specification/2025-06-18/schema#createmessagerequest) != ‘none’as values“thisServer”and“allServers”` may be removed in future spec releases.
    • CreateMessageRequest.messages MUST balance any “assistant” message w/ a ToolUseContent (and id: $id1) w/ a “user” message w/ a ToolResultContent (and tool_result_id: $id1)
      • Note: this is a requirement for Claude API implementation (parallel tool call must all be responded to in one go)
    • SamplingMessage with tool result content blocks MUST NOT contain other content types.

Schema changes

  • ClientCapabilities

    interface ClientCapabilities {
      ...
      sampling?: {
        context?: object; // NEW: Allows CreateMessageRequest.params.includeContext != "none"
        tools?: object;   // NEW: Allows CreateMessageRequest.params.{tools,toolChoice}
      };
    }
  • CreateMessageRequest (use existing Tool)

    interface CreateMessageRequest {
      method: “sampling/createMessage”;
      params: {
        ...
        messages: SamplingMessage[]; // Note: type updated, see below
        
        tools?: Tool[] // NEW (existing type)
        
        toolChoice?: ToolChoice // NEW
      };
    }
    
    interface ToolChoice { // NEW
      mode?: “auto” | "required" | "none";
      // disable_parallel_tool_use?: boolean; // Update (Nov 10): removed, see below
    }
    • Notes:
      • OpenAI vs. Anthropic API idioms to avoid parallel tool calls:
        • OpenAI: parallel_tool_calls: false (top-level param)
        • Anthropic: tool_choice.disable_parallel_tool_use: true
          • Preferred here as default value if unset is false (e.g. parallel tool calls allowed)
      • OpenAI vs. Anthropic API re/ tool_choice "none" vs. tools:
        • OpenAI: tools: [$Foo], tool_choice: "none" forbids any tool call
          • Preferred behaviour here
        • Anthropic: tools: [$Foo], tool_choice: {mode: "none"} may still call tool Foo
      • Gemini vs. OAI / Anthropic re/ disable_parallel_tool_use:
        • Gemini API has no way to disable parallel tool calls atm (unlike OAI / Anthropic APIs). Removing this flag for now, to be reintroduced when Gemini has any way of supporting it. Otherwise clients would get unexpected multiple tool calls (or alternatively if implemented that way, unexpected failures / costly retry until a single tool call is emitted)
        • Gemini API's Function calling modes have an ANY value that should match the proposed required
  • SamplingMessage:

    /*
      BEFORE:
      
      interface SamplingMessage {
        content: TextContent | ImageContent | AudioContent
        role: Role;
      }
    */
    
    type SamplingMessage = UserMessage | AssistantMessage // NEW
    
    type AssistantMessageContent = TextContent | ImageContent | AudioContent | ToolUseContent;
    type UserMessageContent =      TextContent | ImageContent | AudioContent | ToolResultContent
    interface AssistantMessage { // NEW
      role: "assistant";
      content: AssistantMessageContent | AssistantMessageContent[]
    }
    
    interface ToolUseContent { // NEW
      type: "tool_use";
      name: string;
      id: string;
      input: object;
    }
    
    interface UserMessage { // NEW
      role: "user";
      content: UserMessageContent | UserMessageContent[]
    }
    
    interface ToolResultContent { // NEW
      _meta?: { [key: string]: unknown };
      type: "tool_result";
      toolUseId: string;
      content: ContentBlock[],
      structuredContent: object;
      isError?: boolean;
    }
  • Notes:

    • Differences of role vs. content type when it comes to tool calling between APIs:
      • OpenAI: role: “system" | “user" | “assistant" | “tool" (where tool is for tool results), while tool calls are nested in assistant messages, content is then typically null but some “OpenAI compatible” APIs accept non-null values
        • [
            {"role": "user", "content": "what is the temperature in london?"},
            {
              "role": "assistant",
              "content": "Let me use a tool...",
              "tool_calls": [{
                "id": "call_1",
                "type": "function",
                "function": {
                  "name": "get_weather",
                  "arguments": "{\"location\": \"London\"}"
                }
              }]
            },
            {
              "role": "tool",
              "content": "{\"temperature\": 20, \"condition\": \"sunny\"}",
              "tool_call_id": "call_1"
            }
          ]
      • Claude API: role: “user" | “assistant", tool use and result are passed through specially-typed message content parts:
        • [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "what is the temperature in london?"
                }
              },
            {
              "role": "assistant",
              "content": [
                {
                  "type": "text",
                  "text": "Let me use a tool..."
                },
                {
                  "type": "tool_use",
                  "id": "call_1",
                  "name": "get_weather",
                  "input": {"location": "London"}
                }
              ]
            },
            {
              "role": "user",
              "content": [
                {
                  "type": "tool_result",
                  "tool_call_id": "call_1",
                  "content": {"temperature": 20, "condition": "sunny"}
                }
              ]
            }
          ]
      • Gemini API:
        • function role (similar to OAI's tool role)
        • No tool call id concept (function calling: Gemini requires tool results to be provided in the exact same order as the tool use parts. An implementation could generate the tool call ids and use them to reorder the tool results if needed.
  • CreateMessageResult

    /*
      BEFORE:
      
      interface CreateMessageResult {
        _meta?: { [key: string]: unknown };
        content: TextContent | ImageContent | AudioContent;
        role: Role;
        stopReason?: string;
        [key: string]: unknown;
    }
    */
    interface CreateMessageResult {
      _meta?: { [key: string]: unknown };
      
      content: AssistantMessageContent | AssistantMessageContent[] // UPDATED
      
      role: "assistant"; // UPDATED
      
      stopReason?: “endTurn" | "stopSequence" | “toolUse" | “maxToken" | string // UPDATED
      
      [key: string]: unknown;
    }
    • Notes:

      • Backwards compatibility issue: returning CreateMessageResult.content as an array of contents OR a single content is problematic, so we propose:
        • sampling/createMessage MUST NOT return an array in CreateMessageResult.content before spec version Nov 2025.
          • This guarantees wire-level backwards-compatibility
        • Existing code that uses sampling may break w/ new SDK releases as it will need to test content to know if it's an array or a single block, and act accordingly.
        • This seems reasonable(?)
      • CreateMessageResult.stopReason field is currently defined as an open string, and the spec only mentions the endTurn as example value.
      • OpenAI vs. Anthropic API idioms
        • Finish/stop reason
          • OpenAI’s ChatCompletion: finish_reason: “stop” | “length” | “tool_use” (…?)
          • Anthropic: stop_reason: “end_turn” | “max_tokens” | “stop_sequence” | “tool_use” | “pause_turn” | “refusal”

Possible Follow ups

Theses are out of scope for this SEP, but care was taken not to preclude them, so where appropriate we give examples of how they could be implemented on top of / after this SEP.

Streaming support

See: Streaming tool use results #117

This could be important for some longer-running use cases or when latency is important, but would play better w/ streaming support in MCP tools.

A possible way to implement this would be to use notifications w/ payload, and possibly create a new method sampling/createMessageStreamed. Both should be orthogonal w/ this SEP (but we'd need to create delta types for results, similar to streaming APIs in inference API such as Claude API and OpenAI API).

Cache friendliness updates

Two bits needed here:

  • Introduce cache awareness
    • Implicit caching guidelines phrased as SHOULDs
    • Explicit cache points and TTL semantics as in the Claude API? (incl. beta behaviour for longer caching)
      • Pros: easy to implement for at least 1 implementor (Anthropic)
      • Cons: if hard to implement for others, unlikely to get approval.
    • “Whole prompt” / prompt-prefix cache w/ an explicit key as in the OpenAI API?
      • Pros:
        • simpler for users (no need to think about where the shared prefix stops)
        • implicitly supports updating the cache (maybe even as subtree)
      • Cons: possibly harder to implement / more storage inefficient
  • Introduce allowed_tools feature to enable / disable tools w/o breaking context caching

Allow client to call the server’s tools by itself in an agentic loop

From the server’s perspective, that would remove the need to call tools by itself / inject tool results in follow up sampling calls.

The MCP server would just allowlist its own tools in the sampling request, w/t a dedicated tool definition such as:

{
  type: "server-tool"; // MCP tool from same server.
  name: string;
}

Pros:

  • Safe, limited to that server’s tools.
  • If we propagate the mcp-session-id, can leverage keep any server-side session context / caching

Allow client to call any other MCP servers’ tools by itself in an agentic loop

Although this sounds similar to the previous one (allow only same server’s tools), this option wouldn’t need a protocol change / could be entirely done by the client as an implementation detail of their sampling support.

The end user would allowlist tools from any other MCP server for use in a sampling request, without the server having to ask for anything. The client UI would e.g. display a tool selection UI as part of the sampling approval flow, auto enabling tools from same server by default.

Pros:

  • Technically no spec change needed (if anything, mention this as a freedom clients have)
  • Possibly similar to what CreateMessageRequest.params.includeContext = thisServer / allServers intended semantics may have meant
    • CreateMessageRequest.params.allowImplicitToolCalls = “none” | “thisServer” | “allServers”
      (assuming we wanted to give the server any control over this)

Cons:

  • Classifier might be needed to avoid High potential for privacy leaks / abuse
    • If user approves Gmail MCP tool usage / delegation by mistake, server gets access to their private emails through sampling

Allow server to list & call clients’ tools (client/server → p2p)

If we say the client can now expose tools that the server can call, it opens a set of possibilities:

  • The client can “forward” other servers’ tools (maybe w/ some namespacing for seamless aggregation)
    • The server can then call these tools as part of its tool loop.
  • Client & Server semantics start to lose weight, we enter a more peer-to-peer, symmetrical relationship
    • Client could also ask a server for sampling, while we’re at it
    • Symmetry at the protocol layer, but still directionality at the transport layer (e.g. for HTTP transport, direction of POST requests still matters)

Simplify structured outputs use case

A major use case of sampling is to get outputs that conform to a given schema.

This is possible in OpenAI’s API for instance.

The most common workaround is to give a single tool and set tool_choice: "required", which guarantees the output is a ToolCall containing inputs that conform to the tool’s input schema.

While this SEP proposes we enable this "required"-based workaround, as a follow up it would be great to provide more explicit / simpler JSON schema support, which would also allow schema types not allowed in tool inputs (which require an object w/ properties, so one has to pick at least a name for their outputs, which requires thinking / interplay w/ the prompting strategy):

interface CreateMessageRequest {
  method: “sampling/createMessage”;
  params: {
    messages: SamplingMessage[];
    ...
    format: {
      type: "json_schema",
      "schema": {
        "type": "array",
        "minItems": 5,
        "maxItems": 100
      }
    }
  }

Metadata

Metadata

Assignees

Labels

SEPacceptedSEP accepted by core maintainers, but still requires final wording and reference implementation.spec

Type

No type

Projects

Status

Accepted

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions