Skip to content

feat(telemetry): enrich llm_request span with response metadata and error details#4693

Merged
doudouOUC merged 1 commit into
daemon_mode_b_mainfrom
feat/llm-request-span-enrichment
Jun 3, 2026
Merged

feat(telemetry): enrich llm_request span with response metadata and error details#4693
doudouOUC merged 1 commit into
daemon_mode_b_mainfrom
feat/llm-request-span-enrichment

Conversation

@doudouOUC
Copy link
Copy Markdown
Collaborator

Summary

  • Add 6 new attributes to qwen-code.llm_request OTel spans that were previously only in log events (ApiResponseEvent), closing the observability gap for cross-system debugging (e.g. correlating qwen-code traces with DashScope request logs)
  • New attributes: response_id, finish_reason, thoughts_token_count, subagent_name, error_type, error_status_code (with GenAI semconv duals where applicable)
  • Track lastFinishReason and lastError as closure variables in the streaming path since consolidatedResponse is try-scoped and inaccessible from finally

New span attributes

Private name GenAI semconv dual Source
response_id gen_ai.response.id Provider request ID (DashScope/OpenAI)
finish_reason gen_ai.response.finish_reasons (array) Model stop reason (STOP/MAX_TOKENS/etc)
thoughts_token_count gen_ai.usage.reasoning_tokens Reasoning token count
subagent_name Originating subagent name
error_type error.type Structured error classification
error_status_code HTTP status from provider errors

Motivation

排查 session ccb6e187 卡住问题时,trace 中 llm_request span 没有 DashScope request ID 和 finish_reason,无法跨系统关联定位问题。这些数据在 ApiResponseEvent log 事件中已采集但未同步到 span。

Test plan

  • 94 unit tests pass (session-tracing.test.ts — 12 new tests for all new attributes including zero-value edge cases)
  • 43 unit tests pass (loggingContentGenerator.test.ts — existing tests, no regression)
  • TypeScript type-check pass (npx tsc --noEmit)
  • Deploy to staging, send a prompt with tool calls, verify new attributes appear on llm_request spans in ARMS
  • Trigger an API error, verify error_type and error_status_code appear on error spans

🤖 Generated with Qwen Code

…rror details

Add 6 new attributes to qwen-code.llm_request OTel spans that were
previously only available in log events (ApiResponseEvent), closing
the observability gap that blocked cross-system debugging (e.g.
correlating qwen-code traces with DashScope request logs).

New span attributes (with GenAI semconv duals where applicable):
- response_id / gen_ai.response.id — provider request ID
- finish_reason / gen_ai.response.finish_reasons — model stop reason
- thoughts_token_count / gen_ai.usage.reasoning_tokens — reasoning tokens
- subagent_name — originating subagent
- error_type / error.type — structured error classification
- error_status_code — HTTP status from provider errors

Implementation details:
- Extend LLMRequestMetadata with 6 new optional fields
- Track lastFinishReason and lastError as closure variables in the
  streaming path (consolidatedResponse is try-scoped, inaccessible
  from finally)
- Capture subagentName eagerly at method entry to avoid AsyncLocalStorage
  context loss in setTimeout/finally
- Update all 5 endLLMRequestSpan call sites with appropriate field subsets
- gen_ai.response.finish_reasons emitted as string[] per OTel semconv
Copilot AI review requested due to automatic review settings June 2, 2026 07:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

📋 Review Summary

This PR enriches llm_request OTel spans with 6 new attributes previously only available in log events, addressing a critical observability gap for cross-system debugging. The implementation is well-tested with 12 new unit tests and follows existing patterns in the codebase. Overall, this is a solid, focused change that improves debugging capabilities without introducing breaking changes.

🔍 General Feedback

  • Strong motivation: The PR clearly addresses a real pain point (debugging session ccb6e187 where trace spans lacked DashScope request ID and finish_reason)
  • Good test coverage: 12 new tests in session-tracing.test.ts cover all new attributes including edge cases (zero values, undefined handling)
  • Consistent patterns: The dual-emission approach (private name + GenAI semconv) aligns with existing attribute handling in the codebase
  • Thoughtful design: Using closure variables (lastFinishReason, lastError) to capture data from try-scoped variables is the right approach for streaming paths
  • Documentation: The table in the PR body clearly maps private names to GenAI semconv duals and sources

🎯 Specific Feedback

🟡 High

  • File: loggingContentGenerator.ts:475-477 - The closure variables lastFinishReason, lastError, and subagentName are declared at line 475-477, but subagentName is captured immediately while the others are assigned later. Consider adding a comment explaining why subagentName doesn't need deferred capture (it's synchronous from context vs. async from stream response).

  • File: session-tracing.ts:92-106 - The thoughtsTokenCount JSDoc mentions "For OpenAI-compatible providers, this value is already INCLUDED in outputTokens" — this is critical information that should also be documented at the call sites in loggingContentGenerator.ts where thoughtsTokenCount is set, to prevent future developers from accidentally double-counting.

🟢 Medium

  • File: loggingContentGenerator.ts:291-295 - The non-streaming generateContent path captures responseId, finishReason, thoughtsTokenCount, and subagentName directly from the response. Consider extracting this into a helper function (e.g., extractResponseMetadata()) to avoid duplication with the streaming path at lines 650-656.

  • File: session-tracing.ts:475-483 - The dual-emission for finish_reason wraps the string in an array for the GenAI semconv (gen_ai.response.finish_reasons). This is correct per the semconv spec, but consider adding a comment citing the spec section to prevent future "simplification" attempts.

  • File: loggingContentGenerator.ts:655-656 - The error attributes use lastError ? getErrorType(lastError) : undefined pattern. Since errorType and errorStatusCode are already optional in LLMRequestMetadata, you could simplify to just getErrorType(lastError) and let the interface handle undefined.

🔵 Low

  • File: session-tracing.ts:104 - The subagentName JSDoc says "or undefined for main" — consider being more explicit: "undefined when the request originates from the main agent (not a subagent)".

  • File: loggingContentGenerator.ts:294 - The subagentName is captured via subagentNameContext.getStore() in both success and error paths. Consider extracting this to a local variable at the start of the method to avoid repeated ALS lookups (minor performance optimization).

  • File: session-tracing.test.ts:826 - The test "emits all new attributes together" is excellent, but consider adding a test case where success: false to verify error attributes don't leak into success spans and vice versa (though this is partially covered by existing tests).

✅ Highlights

  • Excellent test design: The tests cover zero-value edge cases (e.g., thoughtsTokenCount: 0 is meaningful vs. undefined), which shows deep understanding of the data semantics
  • Proper GenAI semconv alignment: Dual-emitting private names alongside standard attributes allows gradual migration and backward compatibility
  • Smart closure variable usage: Tracking lastFinishReason and lastError as closure variables solves the try-scope accessibility problem elegantly
  • Defensive error handling: The error path in streaming (lines 655-656) uses lastError ? getErrorType(lastError) : undefined to prevent null/undefined from being passed when no error occurred

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enriches qwen-code.llm_request OpenTelemetry spans with response metadata and structured error details that were previously only available in log events, improving trace/log correlation across systems (e.g., provider request IDs and finish reasons).

Changes:

  • Extend LLMRequestMetadata and endLLMRequestSpan() to emit 6 new span attributes (with GenAI semconv dual-emits where applicable).
  • Populate the new metadata from LoggingContentGenerator for both non-streaming and streaming code paths.
  • Add unit tests covering the new endLLMRequestSpan() attributes and edge cases (e.g., thoughtsTokenCount === 0).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
packages/core/src/telemetry/session-tracing.ts Adds new LLM request metadata fields and stamps them onto spans (including GenAI semconv dual attributes).
packages/core/src/telemetry/session-tracing.test.ts Adds tests validating emission/omission behavior for the new span attributes.
packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts Forwards response metadata and error details into endLLMRequestSpan() across non-stream and streaming paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doudouOUC doudouOUC requested review from chiga0 and wenshao June 2, 2026 07:57
@doudouOUC doudouOUC self-assigned this Jun 2, 2026
@tanzhenxin tanzhenxin added the type/feature-request New feature or enhancement request label Jun 3, 2026
Copy link
Copy Markdown
Collaborator

@tanzhenxin tanzhenxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

This adds response-metadata and error attributes (response id, finish reason, reasoning-token count, subagent name, error type and status code) to the LLM-request telemetry span, with GenAI-semconv duals where they exist. The data was already captured in the log events but missing from the span, which broke cross-system trace correlation. The implementation follows the file's existing attribute-emission conventions exactly, adds no new control flow or span-end calls (so the existing leak/double-end/timeout guarantees hold), and the token accounting is correct — reasoning tokens are emitted separately rather than double-counted. Error enrichment reuses the same classification helpers that feed the existing error log event, and no message content or secrets land in attributes.

One non-blocking gap: the added tests exercise the span helper in isolation, but the actual extraction wiring — reading finish reason / response id / token counts from the response, tracking them across streaming chunks, and deriving error type and status on the failure paths — has no new coverage. That extraction is where a regression would actually surface; extending the existing stream/success/error tests to assert the new fields would close it.

For context: this PR targets the daemon_mode_b_main branch, not main.

Verdict

APPROVE — correct, convention-following enrichment with no leak or double-end risk; the untested extraction wiring is worth closing but not blocking.

@doudouOUC doudouOUC merged commit 89122ca into daemon_mode_b_main Jun 3, 2026
11 checks passed
@doudouOUC doudouOUC deleted the feat/llm-request-span-enrichment branch June 3, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature-request New feature or enhancement request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants