Skip to content

fix(python): bypass expensive dealiasing in parse_sse_obj for ~28x SSE speedup#16074

Open
tstanmay13 wants to merge 1 commit into
mainfrom
devin/1779478358-fix-sse-streaming-perf
Open

fix(python): bypass expensive dealiasing in parse_sse_obj for ~28x SSE speedup#16074
tstanmay13 wants to merge 1 commit into
mainfrom
devin/1779478358-fix-sse-streaming-perf

Conversation

@tstanmay13
Copy link
Copy Markdown
Contributor

@tstanmay13 tstanmay13 commented May 22, 2026

Description

Fixes the major SSE streaming performance regression reported by Cohere. The previous TypeAdapter caching fix (PR #16013) addressed only ~17% of the cost. This PR eliminates the remaining ~75% bottleneck.

Root Cause

parse_sse_objparse_obj_asconvert_and_respect_annotation_metadata runs expensive recursive type introspection on every single SSE event. For a large discriminated union like V2ChatStreamResponse, this function iterates through all union variants and calls _convert_mapping for each Pydantic model variant — O(variants × fields) per event.

This dealiasing pass is designed for TypedDict aliasing support and is completely unnecessary for SSE data, which arrives as wire-format JSON and maps directly to Pydantic models.

Profiling breakdown (cohere 6.1.0, 10K events)

Component us/event % of total
convert_and_respect_annotation_metadata 2952 75.4%
TypeAdapter() creation (uncached) 654 16.7%
TypeAdapter.validate_python 132 3.4%
asdict / discriminator / json.loads ~6 ~0.1%

Changes Made

  • Added _validate_sse_data() helper that calls TypeAdapter.validate_python() directly, bypassing the unnecessary convert_and_respect_annotation_metadata dealiasing step
  • Added _get_sse_type_info() cache for discriminator/variant analysis per type (avoids repeated type introspection)
  • Replaced dataclasses.asdict(sse) with direct attribute access (sse.data, sse.event, etc.)
  • Applied fix to all 4 pydantic_utilities.py variants:
    • shared/pydantic_utilities.py (pydantic v2 native)
    • shared/with_pydantic_aliases/pydantic_utilities.py (pydantic v2 with aliases)
    • shared/with_pydantic_v1_on_v2/pydantic_utilities.py (pydantic v1 compat)
    • shared/with_pydantic_v1_on_v2/with_aliases/pydantic_utilities.py (v1 compat with aliases)

Cohere SDK Benchmark Results

Comprehensive benchmark (100K events, V2ChatStreamResponse):

================================================================================
                      COMPREHENSIVE SSE BENCHMARK RESULTS                       
================================================================================
Label                                         Total      Per Event     vs SOL
--------------------------------------------------------------------------------
json.loads only (speed of light)            0.780s        7.80 us      1.0x
5.20.4 (construct_type)                    51.104s      511.04 us     65.5x
7.0.0 BEFORE (original)                   332.036s     3320.36 us    425.7x
7.0.0 AFTER (this PR)                      16.070s      160.70 us     20.6x
--------------------------------------------------------------------------------

Speedup of patch vs 7.0.0 original: 20.7x
Patch overhead vs json.loads only:   20.6x
================================================================================
Version / Path us/event vs speed-of-light Notes
json.loads only 7.8 1.0x Theoretical minimum (no typing)
5.20.4 (construct_type) 511 65.5x Old path before parse_sse_obj
7.0.0 BEFORE (parse_sse_obj original) 3320 425.7x Current regression
7.0.0 AFTER (this PR) 161 20.6x Fixed

The ~20x overhead vs json.loads-only is the irreducible cost of Pydantic's TypeAdapter.validate_python() which provides proper typing, discriminated union dispatch, and field validation. This is 3x faster than the old 5.20.4 construct_type path that preceded the parse_sse_obj regression.

Additional per-version results

cohere 7.0.0 (latest):

BEFORE:  338.4s   3384 us/event
AFTER:    14.7s    147 us/event  →  22.9x faster

cohere 6.1.0:

BEFORE:  405.9s   4059 us/event
AFTER:    14.7s    147 us/event  →  27.6x faster

Testing

  • Comprehensive benchmark: 5.20.4, 7.0.0 before/after, json.loads speed-of-light
  • Before/after Cohere 7.0.0: 20.7x speedup (3320 → 161 µs/event)
  • Before/after Cohere 6.1.0: 27.6x speedup (4059 → 147 µs/event)
  • Patched path is 3.2x faster than pre-regression 5.20.4 construct_type (161 vs 511 µs)
  • All 53 existing SSE tests pass (test_http_sse.py)
  • All 3 parse_obj_as tests pass (test_parse_obj_as.py)
  • Pydantic v1-on-v2 hardening test passes
  • poetry run pre-commit run -a passes cleanly
  • All CI checks green (56 passed)

Link to Devin session: https://app.devin.ai/sessions/44e42c69b1c240cc8d38f1b9416aaa3a
Requested by: @tstanmay13


Open in Devin Review

…E speedup

parse_sse_obj was calling parse_obj_as which runs
convert_and_respect_annotation_metadata on every SSE event. This function
recursively introspects all union variants and accounts for ~75% of total
parse time on large discriminated unions like V2ChatStreamResponse.

SSE data arrives as wire-format JSON and does not need the TypedDict
dealiasing pass. This change:
- Adds _validate_sse_data() that calls TypeAdapter.validate_python directly
- Caches discriminator/variant analysis per type
- Replaces dataclasses.asdict() with direct attribute access

Benchmark (100k events, Cohere V2ChatStreamResponse):
  BEFORE: 4059 us/event (405.9s total)
  AFTER:   147 us/event  (14.7s total)
  Speedup: 27.6x

Co-Authored-By: tanmay.singh <tstanmay13@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

SDK Generation Benchmark Results

Comparing PR branch against median of 5 nightly run(s) on main (latest: 2026-05-22T05:19:54Z).

Full benchmark table (click to expand)
Generator Spec main (generator) main (E2E) PR (generator) Delta
python-sdk square 128s (n=5) 229s (n=5) 140s +12s (+9.4%)

main (generator): generator-only time via --skip-scripts (includes Docker image build, container startup, IR parsing, and code generation — this is the same Docker-based flow customers use via fern generate). main (E2E): full customer-observable time including build/test scripts (nightly baseline, informational). Delta is computed against generator-only baseline.
⚠️ = generation exited with a non-zero exit code (timing may not reflect a successful run).
Baseline from nightly runs on main (latest: 2026-05-22T05:19:54Z). Trigger benchmark-baseline to refresh.
Last updated: 2026-05-22 19:55 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant