Python: Fix streaming path to emit mcp_server_tool_result on output_item.done instead of output_item.added#4821
Python: Fix streaming path to emit mcp_server_tool_result on output_item.done instead of output_item.added#4821giles17 wants to merge 5 commits intomicrosoft:mainfrom
Conversation
…ft#4814) Remove premature mcp_server_tool_result emission from the response.output_item.added/mcp_call handler — at that point the MCP server has not yet responded and output is always None. Add a handler for response.mcp_call.completed that emits mcp_server_tool_result with the actual tool output, matching the non-streaming path behavior. Co-authored-by: Copilot <[email protected]>
…ft#4814) Stop eagerly emitting mcp_server_tool_result on response.output_item.added (when output is always None). Instead, handle response.output_item.done for mcp_call items, which carries the full McpCall with populated output. This matches the non-streaming path which guards with 'if item.output is not None' before emitting the result. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 96% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by giles17's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Fixes the OpenAI Responses streaming parser so MCP tool results are emitted only once the MCP call item is complete, restoring parity with non-streaming behavior and enabling UIs to display MCP outputs during streaming.
Changes:
- Stop emitting
mcp_server_tool_resultduringresponse.output_item.addedformcp_callitems (when output fields are not populated yet). - Add handling for
response.output_item.doneto emitmcp_server_tool_resultwith the finalizedmcp_call.output. - Update/extend unit tests to validate deferral behavior and
output_item.doneresult emission (including the no-output case).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/core/agent_framework/openai/_responses_client.py |
Defers MCP result parsing to response.output_item.done and emits mcp_server_tool_result from the completed item output. |
python/packages/core/tests/openai/test_openai_responses_client.py |
Adjusts tests to assert no result on output_item.added and adds coverage for output_item.done result emission. |
…icrosoft#4814) - Add call_id fallback in response.output_item.done mcp_call handler to match the output_item.added handler pattern - Use done_item instead of event for raw_representation to keep consistent with other output_item branches and non-streaming path - Add test for call_id fallback when id attribute is missing - Add raw_representation assertions to existing done handler tests Co-authored-by: Copilot <[email protected]>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 93%
✓ Correctness
This PR makes two small, correct fixes to the streaming MCP call handling in
_parse_chunk_from_openai: (1) adds acall_idattribute fallback when extracting the call identifier from a done mcp_call item, and (2) fixesraw_representationto pass the item object instead of the event wrapper, aligning with the non-streaming path and theresponse.output_item.addedstreaming path. Tests are well-constructed and correctly validate the changes. No issues found.
✓ Security Reliability
The changes are minor, correct, and improve reliability: (1) adding a
call_idattribute fallback whenidis missing on thedone_itemguards against different event object shapes from the OpenAI API, and (2) passingdone_iteminstead ofeventasraw_representationis consistent with the non-streaming path (line 1560) and the streamingoutput_item.addedhandler (line 1932), which both pass the item object. The test coverage for the fallback and raw_representation assertions is adequate. No security or reliability issues found.
✗ Test Coverage
The diff fixes two bugs in the streaming MCP call handler: (1) adds a
call_idfallback whenidis missing, and (2) passesdone_iteminstead ofeventasraw_representation. Tests cover the happy path, null-output, and call_id fallback cases, plus verifyraw_representation. However, the new fallback test is missing araw_representationassertion, and there is no test for the case where bothidandcall_idare absent (empty-string fallback). The broader issue from the linked discussion—missing handler forresponse.mcp_call.completed—is not addressed here, but that is a separate feature/fix beyond this PR's scope.
✓ Design Approach
The two changes are well-motivated and consistent with the rest of the codebase. The
call_idfallback inresponse.output_item.donenow mirrors the identical pattern already used inresponse.output_item.added(line 1925), andraw_representation=done_itemaligns with both the non-streaming path (line 1560) and the streamingmcp_calladded handler (line 1932), which all use the item rather than the outer event wrapper. No design-level concerns found. The only gap is a missingraw_representationassertion in the new fallback test.
Suggestions
- Add
assert result_content.raw_representation is mock_itemtotest_parse_chunk_from_openai_with_mcp_output_item_done_call_id_fallbackfor consistency with the other two MCP done tests, since theraw_representation=done_itemfix is part of what this PR explicitly changes. - The non-streaming path at line 1545 uses
item.iddirectly without thecall_idfallback. Consider applying the same defensive fallback (getattr(item, 'id', None) or getattr(item, 'call_id', None) or '') there for consistency, so both paths handle unexpected API shapes gracefully. - Add a test case where the mock item has neither
idnorcall_idto verify the empty-string fallback path (the thirdor ""branch).
Automated review by giles17's agents
…rage (microsoft#4814) - Apply defensive call_id fallback (getattr with id/call_id/empty) to non-streaming mcp_call path for consistency with streaming path - Add raw_representation assertion to call_id fallback test - Add test for empty-string fallback when neither id nor call_id exist Co-authored-by: Copilot <[email protected]>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 91%
✓ Correctness
The PR makes the non-streaming mcp_call path at line 1545 defensive against missing
idattribute by using getattr with fallback tocall_idthen empty string, matching the pattern already used in the streaming path (line 2208). The code change is correct. The new test covers the streamingresponse.output_item.donehandler for the edge case where neitheridnorcall_idexists. However, the new test (and the existing fallback test at line 1282) test the streaming path (_parse_chunk_from_openai), not the non-streaming path (_parse_response_from_openai) where the actual code change was made. The non-streaming path change at line 1545 lacks a dedicated test for thecall_idfallback and the no-attribute fallback scenarios. This is a gap but not a blocking issue since the pattern is well-tested in the streaming path and the logic is identical.
✓ Security Reliability
The diff makes a small, focused reliability improvement: the non-streaming MCP call path now uses defensive
getattrfor extractingcall_id, matching the already-defensive streaming path. The fallback chain (id→call_id→ empty string) is consistent across both paths. The new test covers the edge case where neither attribute exists. No security concerns. One minor reliability gap exists in the same non-streaming block where other attributes (item.name,item.server_label,item.arguments,item.output) are still accessed directly withoutgetattr, but these are pre-existing and out of scope for this PR.
✗ Test Coverage
The production code change is on line 1545 in the non-streaming path (
_parse_response_from_openai), addinggetattrfallback forcall_idwhenitem.idis missing. However, the new tests exclusively target the streaming path (_parse_chunk_from_openaiviaresponse.output_item.done), which already had the samegetattrfallback logic before this PR (line 2208). This means the actual code change has zero new test coverage. The non-streaming path'smcp_callhandling at line 1545 needs at least two new tests: one forcall_idfallback (noidattribute) and one for empty-string fallback (neitheridnorcall_id). Additionally, the existing non-streaming test at line 1161 usesMagicMock()(withoutspec), which auto-creates.id, so it never exercised the old bug either.
✗ Design Approach
The diff changes
item.idto agetattrfallback chain in_parse_response_from_openai(non-streaming path, line 1545) to make it consistent with the already-fixed streamingresponse.output_item.donehandler at line 2208. The approach is defensible but has two design problems: (1) the new tests are written against_parse_chunk_from_openai(streaming path, line 2208), not against the code that was actually changed (_parse_response_from_openai, line 1545), leaving the changed line untested; (2) silently falling back to an empty-stringcall_idwhen neither attribute exists — and then cementing that with an assertion — codifies broken correlation behavior for concurrent MCP calls instead of failing fast or at least logging a warning.
Flagged Issues
- All new tests target the streaming path
_parse_chunk_from_openai(line 2208), which already had thegetattrfallback before this PR. The actual code change is in the non-streaming_parse_response_from_openai(line 1545) and has no test coverage. Add tests for the non-streamingmcp_callcase usingMagicMock(spec=[])(so.idisn't auto-created): one withcall_idset to verify the fallback, and one with neitheridnorcall_idto verify the empty-string fallback. - When neither
idnorcall_idexists, silently falling back to""means concurrent MCP calls become indistinguishable. The code should at minimum emit a warning log before falling back; a stronger alternative is raising a descriptive exception so integration failures are immediately visible.
Suggestions
- Consider making
item.name,item.server_label,item.arguments, anditem.outputaccess defensive withgetattrin the non-streamingmcp_callblock (lines 1549–1555) to match the defensive style now used forcall_id. This is pre-existing but could be addressed for consistency. - The
orchaingetattr(item, 'id', None) or getattr(item, 'call_id', None) or ""silently swallows a falsy-but-presentidvalue (e.g.,id=""). Prefer an explicitis not Nonecheck or usehasattrto avoid conflating a missing attribute with a present-but-empty one. - The streaming
response.output_item.addedpath (line 1925) also uses the samegetattrfallback pattern but has no fallback tests—consider adding them for completeness.
Automated review by giles17's agents
Motivation and Context
The streaming code path attempted to extract MCP tool results from
response.output_item.addedevents, where the result/output fields are not yet populated. This meantmcp_server_tool_resultcontent was either empty or malformed during streaming, breaking parity with the non-streaming and agent-as-tool paths.Fixes #4814
Description
The root cause was that
_parse_chunk_from_openaitried to readresult/output/outputsfrom the MCP call item at theresponse.output_item.addedstage, before the API has populated those fields. The fix removes that premature extraction and instead handles MCP call results in a newresponse.output_item.donecase, where the completed item carries the finaloutputstring. This aligns streaming behavior with the non-streaming path and correctly emitsContent.from_mcp_server_tool_resultwith the actual tool output.Contribution Checklist