feat(telemetry): enrich llm_request span with response metadata and error details#4693
Conversation
…rror details Add 6 new attributes to qwen-code.llm_request OTel spans that were previously only available in log events (ApiResponseEvent), closing the observability gap that blocked cross-system debugging (e.g. correlating qwen-code traces with DashScope request logs). New span attributes (with GenAI semconv duals where applicable): - response_id / gen_ai.response.id — provider request ID - finish_reason / gen_ai.response.finish_reasons — model stop reason - thoughts_token_count / gen_ai.usage.reasoning_tokens — reasoning tokens - subagent_name — originating subagent - error_type / error.type — structured error classification - error_status_code — HTTP status from provider errors Implementation details: - Extend LLMRequestMetadata with 6 new optional fields - Track lastFinishReason and lastError as closure variables in the streaming path (consolidatedResponse is try-scoped, inaccessible from finally) - Capture subagentName eagerly at method entry to avoid AsyncLocalStorage context loss in setTimeout/finally - Update all 5 endLLMRequestSpan call sites with appropriate field subsets - gen_ai.response.finish_reasons emitted as string[] per OTel semconv
📋 Review SummaryThis PR enriches 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR enriches qwen-code.llm_request OpenTelemetry spans with response metadata and structured error details that were previously only available in log events, improving trace/log correlation across systems (e.g., provider request IDs and finish reasons).
Changes:
- Extend
LLMRequestMetadataandendLLMRequestSpan()to emit 6 new span attributes (with GenAI semconv dual-emits where applicable). - Populate the new metadata from
LoggingContentGeneratorfor both non-streaming and streaming code paths. - Add unit tests covering the new
endLLMRequestSpan()attributes and edge cases (e.g.,thoughtsTokenCount === 0).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/core/src/telemetry/session-tracing.ts | Adds new LLM request metadata fields and stamps them onto spans (including GenAI semconv dual attributes). |
| packages/core/src/telemetry/session-tracing.test.ts | Adds tests validating emission/omission behavior for the new span attributes. |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts | Forwards response metadata and error details into endLLMRequestSpan() across non-stream and streaming paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
This adds response-metadata and error attributes (response id, finish reason, reasoning-token count, subagent name, error type and status code) to the LLM-request telemetry span, with GenAI-semconv duals where they exist. The data was already captured in the log events but missing from the span, which broke cross-system trace correlation. The implementation follows the file's existing attribute-emission conventions exactly, adds no new control flow or span-end calls (so the existing leak/double-end/timeout guarantees hold), and the token accounting is correct — reasoning tokens are emitted separately rather than double-counted. Error enrichment reuses the same classification helpers that feed the existing error log event, and no message content or secrets land in attributes.
One non-blocking gap: the added tests exercise the span helper in isolation, but the actual extraction wiring — reading finish reason / response id / token counts from the response, tracking them across streaming chunks, and deriving error type and status on the failure paths — has no new coverage. That extraction is where a regression would actually surface; extending the existing stream/success/error tests to assert the new fields would close it.
For context: this PR targets the daemon_mode_b_main branch, not main.
Verdict
APPROVE — correct, convention-following enrichment with no leak or double-end risk; the untested extraction wiring is worth closing but not blocking.
Summary
qwen-code.llm_requestOTel spans that were previously only in log events (ApiResponseEvent), closing the observability gap for cross-system debugging (e.g. correlating qwen-code traces with DashScope request logs)response_id,finish_reason,thoughts_token_count,subagent_name,error_type,error_status_code(with GenAI semconv duals where applicable)lastFinishReasonandlastErroras closure variables in the streaming path sinceconsolidatedResponseis try-scoped and inaccessible fromfinallyNew span attributes
response_idgen_ai.response.idfinish_reasongen_ai.response.finish_reasons(array)thoughts_token_countgen_ai.usage.reasoning_tokenssubagent_nameerror_typeerror.typeerror_status_codeMotivation
排查 session
ccb6e187卡住问题时,trace 中llm_requestspan 没有 DashScope request ID 和 finish_reason,无法跨系统关联定位问题。这些数据在ApiResponseEventlog 事件中已采集但未同步到 span。Test plan
session-tracing.test.ts— 12 new tests for all new attributes including zero-value edge cases)loggingContentGenerator.test.ts— existing tests, no regression)npx tsc --noEmit)llm_requestspans in ARMSerror_typeanderror_status_codeappear on error spans🤖 Generated with Qwen Code