feat(telemetry): enrich llm_request span with response metadata and error details by doudouOUC · Pull Request #4693 · QwenLM/qwen-code

doudouOUC · 2026-06-02T07:40:18Z

Summary

Add 6 new attributes to qwen-code.llm_request OTel spans that were previously only in log events (ApiResponseEvent), closing the observability gap for cross-system debugging (e.g. correlating qwen-code traces with DashScope request logs)
New attributes: response_id, finish_reason, thoughts_token_count, subagent_name, error_type, error_status_code (with GenAI semconv duals where applicable)
Track lastFinishReason and lastError as closure variables in the streaming path since consolidatedResponse is try-scoped and inaccessible from finally

New span attributes

Private name	GenAI semconv dual	Source
`response_id`	`gen_ai.response.id`	Provider request ID (DashScope/OpenAI)
`finish_reason`	`gen_ai.response.finish_reasons` (array)	Model stop reason (STOP/MAX_TOKENS/etc)
`thoughts_token_count`	`gen_ai.usage.reasoning_tokens`	Reasoning token count
`subagent_name`	—	Originating subagent name
`error_type`	`error.type`	Structured error classification
`error_status_code`	—	HTTP status from provider errors

Motivation

排查 session ccb6e187 卡住问题时，trace 中 llm_request span 没有 DashScope request ID 和 finish_reason，无法跨系统关联定位问题。这些数据在 ApiResponseEvent log 事件中已采集但未同步到 span。

Test plan

94 unit tests pass (session-tracing.test.ts — 12 new tests for all new attributes including zero-value edge cases)
43 unit tests pass (loggingContentGenerator.test.ts — existing tests, no regression)
TypeScript type-check pass (npx tsc --noEmit)
Deploy to staging, send a prompt with tool calls, verify new attributes appear on llm_request spans in ARMS
Trigger an API error, verify error_type and error_status_code appear on error spans

🤖 Generated with Qwen Code

…rror details Add 6 new attributes to qwen-code.llm_request OTel spans that were previously only available in log events (ApiResponseEvent), closing the observability gap that blocked cross-system debugging (e.g. correlating qwen-code traces with DashScope request logs). New span attributes (with GenAI semconv duals where applicable): - response_id / gen_ai.response.id — provider request ID - finish_reason / gen_ai.response.finish_reasons — model stop reason - thoughts_token_count / gen_ai.usage.reasoning_tokens — reasoning tokens - subagent_name — originating subagent - error_type / error.type — structured error classification - error_status_code — HTTP status from provider errors Implementation details: - Extend LLMRequestMetadata with 6 new optional fields - Track lastFinishReason and lastError as closure variables in the streaming path (consolidatedResponse is try-scoped, inaccessible from finally) - Capture subagentName eagerly at method entry to avoid AsyncLocalStorage context loss in setTimeout/finally - Update all 5 endLLMRequestSpan call sites with appropriate field subsets - gen_ai.response.finish_reasons emitted as string[] per OTel semconv

github-actions · 2026-06-02T07:42:15Z

📋 Review Summary

This PR enriches llm_request OTel spans with 6 new attributes previously only available in log events, addressing a critical observability gap for cross-system debugging. The implementation is well-tested with 12 new unit tests and follows existing patterns in the codebase. Overall, this is a solid, focused change that improves debugging capabilities without introducing breaking changes.

🔍 General Feedback

Strong motivation: The PR clearly addresses a real pain point (debugging session ccb6e187 where trace spans lacked DashScope request ID and finish_reason)
Good test coverage: 12 new tests in session-tracing.test.ts cover all new attributes including edge cases (zero values, undefined handling)
Consistent patterns: The dual-emission approach (private name + GenAI semconv) aligns with existing attribute handling in the codebase
Thoughtful design: Using closure variables (lastFinishReason, lastError) to capture data from try-scoped variables is the right approach for streaming paths
Documentation: The table in the PR body clearly maps private names to GenAI semconv duals and sources

🎯 Specific Feedback

🟡 High

File: loggingContentGenerator.ts:475-477 - The closure variables lastFinishReason, lastError, and subagentName are declared at line 475-477, but subagentName is captured immediately while the others are assigned later. Consider adding a comment explaining why subagentName doesn't need deferred capture (it's synchronous from context vs. async from stream response).
File: session-tracing.ts:92-106 - The thoughtsTokenCount JSDoc mentions "For OpenAI-compatible providers, this value is already INCLUDED in outputTokens" — this is critical information that should also be documented at the call sites in loggingContentGenerator.ts where thoughtsTokenCount is set, to prevent future developers from accidentally double-counting.

🟢 Medium

File: loggingContentGenerator.ts:291-295 - The non-streaming generateContent path captures responseId, finishReason, thoughtsTokenCount, and subagentName directly from the response. Consider extracting this into a helper function (e.g., extractResponseMetadata()) to avoid duplication with the streaming path at lines 650-656.
File: session-tracing.ts:475-483 - The dual-emission for finish_reason wraps the string in an array for the GenAI semconv (gen_ai.response.finish_reasons). This is correct per the semconv spec, but consider adding a comment citing the spec section to prevent future "simplification" attempts.
File: loggingContentGenerator.ts:655-656 - The error attributes use lastError ? getErrorType(lastError) : undefined pattern. Since errorType and errorStatusCode are already optional in LLMRequestMetadata, you could simplify to just getErrorType(lastError) and let the interface handle undefined.

🔵 Low

File: session-tracing.ts:104 - The subagentName JSDoc says "or undefined for main" — consider being more explicit: "undefined when the request originates from the main agent (not a subagent)".
File: loggingContentGenerator.ts:294 - The subagentName is captured via subagentNameContext.getStore() in both success and error paths. Consider extracting this to a local variable at the start of the method to avoid repeated ALS lookups (minor performance optimization).
File: session-tracing.test.ts:826 - The test "emits all new attributes together" is excellent, but consider adding a test case where success: false to verify error attributes don't leak into success spans and vice versa (though this is partially covered by existing tests).

✅ Highlights

Excellent test design: The tests cover zero-value edge cases (e.g., thoughtsTokenCount: 0 is meaningful vs. undefined), which shows deep understanding of the data semantics
Proper GenAI semconv alignment: Dual-emitting private names alongside standard attributes allows gradual migration and backward compatibility
Smart closure variable usage: Tracking lastFinishReason and lastError as closure variables solves the try-scope accessibility problem elegantly
Defensive error handling: The error path in streaming (lines 655-656) uses lastError ? getErrorType(lastError) : undefined to prevent null/undefined from being passed when no error occurred

Copilot

Pull request overview

This PR enriches qwen-code.llm_request OpenTelemetry spans with response metadata and structured error details that were previously only available in log events, improving trace/log correlation across systems (e.g., provider request IDs and finish reasons).

Changes:

Extend LLMRequestMetadata and endLLMRequestSpan() to emit 6 new span attributes (with GenAI semconv dual-emits where applicable).
Populate the new metadata from LoggingContentGenerator for both non-streaming and streaming code paths.
Add unit tests covering the new endLLMRequestSpan() attributes and edge cases (e.g., thoughtsTokenCount === 0).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
packages/core/src/telemetry/session-tracing.ts	Adds new LLM request metadata fields and stamps them onto spans (including GenAI semconv dual attributes).
packages/core/src/telemetry/session-tracing.test.ts	Adds tests validating emission/omission behavior for the new span attributes.
packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts	Forwards response metadata and error details into `endLLMRequestSpan()` across non-stream and streaming paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tanzhenxin

Review

This adds response-metadata and error attributes (response id, finish reason, reasoning-token count, subagent name, error type and status code) to the LLM-request telemetry span, with GenAI-semconv duals where they exist. The data was already captured in the log events but missing from the span, which broke cross-system trace correlation. The implementation follows the file's existing attribute-emission conventions exactly, adds no new control flow or span-end calls (so the existing leak/double-end/timeout guarantees hold), and the token accounting is correct — reasoning tokens are emitted separately rather than double-counted. Error enrichment reuses the same classification helpers that feed the existing error log event, and no message content or secrets land in attributes.

One non-blocking gap: the added tests exercise the span helper in isolation, but the actual extraction wiring — reading finish reason / response id / token counts from the response, tracking them across streaming chunks, and deriving error type and status on the failure paths — has no new coverage. That extraction is where a regression would actually surface; extending the existing stream/success/error tests to assert the new fields would close it.

For context: this PR targets the daemon_mode_b_main branch, not main.

Verdict

APPROVE — correct, convention-following enrichment with no leak or double-end risk; the untested extraction wiring is worth closing but not blocking.

Copilot AI review requested due to automatic review settings June 2, 2026 07:40

Copilot started reviewing on behalf of doudouOUC June 2, 2026 07:40 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts

Comment thread packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts

Comment thread packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts

doudouOUC requested review from chiga0 and wenshao June 2, 2026 07:57

doudouOUC self-assigned this Jun 2, 2026

tanzhenxin added the type/feature-request New feature or enhancement request label Jun 3, 2026

tanzhenxin approved these changes Jun 3, 2026

View reviewed changes

yiliang114 approved these changes Jun 3, 2026

View reviewed changes

doudouOUC merged commit 89122ca into daemon_mode_b_main Jun 3, 2026
11 checks passed

doudouOUC deleted the feat/llm-request-span-enrichment branch June 3, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): enrich llm_request span with response metadata and error details#4693

feat(telemetry): enrich llm_request span with response metadata and error details#4693
doudouOUC merged 1 commit into
daemon_mode_b_mainfrom
feat/llm-request-span-enrichment

doudouOUC commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tanzhenxin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

doudouOUC commented Jun 2, 2026

Summary

New span attributes

Motivation

Test plan

Uh oh!

github-actions Bot commented Jun 2, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🟡 High

🟢 Medium

🔵 Low

✅ Highlights

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tanzhenxin left a comment

Choose a reason for hiding this comment

Review

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants