Skip to content

feat: per-segment video pipeline latency telemetry#127

Open
gilankpam wants to merge 1 commit into
OpenIPC:masterfrom
gilankpam:feat/g2g-latency-upstream
Open

feat: per-segment video pipeline latency telemetry#127
gilankpam wants to merge 1 commit into
OpenIPC:masterfrom
gilankpam:feat/g2g-latency-upstream

Conversation

@gilankpam

@gilankpam gilankpam commented May 22, 2026

Copy link
Copy Markdown

Summary

Adds an opt-in probe on the ground-station that consumes the waybeam_venc RTP sidecar and publishes a per-segment breakdown of video pipeline latency to the OSD — from frame capture on the drone to DRM display-commit on the GS. (Not photons-to-photons: the camera sensor and panel scan-out sit outside the measured window.)

Segments published (sum = total_ms):

Fact Where Spans
video.latency.capture_to_encode_{ms,us} drone capture timestamp → encoder feed
video.latency.encode_to_send_{ms,us} drone encoder finish → RTP send
video.latency.wire_ms transport drone send → GS receive
video.latency.gs_pipeline_ms GS GStreamer feed → DRM commit (decode + handover)
video.latency.total_ms sum capture → display-commit
video.latency.clock_offset_us / clock_rtt_us sync NTP-style min-RTT clock-offset diagnostics
video.latency.wire_clamp_count sync counts clamped (negative) wire deltas — non-zero ⇒ offset drift

Design

  • New latency_probe module: RTP header parser, NTP-style min-RTT clock-offset estimator, frame matcher keyed by (ssrc, rtp_timestamp), UDP sidecar wire codec, probe thread, and OSD publish path.
  • Hot-path hooks: gstrtpreceiver calls on_rtp_buffer() on each packet; the display thread forwards decode_and_handover_ms via record_gs_pipeline_ms() only on real video commits (fb_id != 0) — OSD-only redraws would otherwise dilute the rolling figure.
  • Default-disabled; enabled via the new latency_probe block in config_osd.json (enable/host/port). Sidecar default port 5602.
  • Suppresses wire_ms / total_ms until the first MSG_SYNC_RESP so the cross-clock subtraction can't bleed garbage onto the OSD. clock_rtt_us == 0 is the explicit "syncing…" signal.

Companion drone-side change (RTP sidecar emitter) lives in waybeam_venc. This PR is GS-side only and is a no-op when latency_probe.enable is false.

Tests

Catch2 unit tests cover the header parser, clock offset, frame matcher, compute_and_publish, and wire codec. A loopback integration test against a fake-waybeam UDP server asserts all 9 spec facts. tests/run_latency_tests.sh provides a standalone build path that bypasses the Rockchip MPP/RGA toolchain (g++ + Catch2 + spdlog/fmt).

Test plan

  • BUILD_TESTS=ON cmake build passes
  • tests/run_latency_tests.sh — all 117 assertions in 23 test cases pass
  • On-target smoke test with a waybeam_venc peer publishing sidecar messages: OSD shows per-segment latency facts and total_ms stabilises after a few sync round-trips
  • Default config (no latency_probe block in user config): probe stays off, no thread started

🤖 Generated with Claude Code

* docs: design spec for VideoStutterWidget (GS-local stutter metrics)

Adds a design for an OSD widget that surfaces frame-interval stutter
quantification — average interval, stutter event rate, and a 10s-decaying
peak — sampled in the display thread on GS-local CLOCK_MONOTONIC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add moving-graph variant to stutter metrics spec

Extends the spec with a second surface — reuse BarChartWidget subscribed
to video.frame_interval_ms, plus a small fixed-Y-axis enhancement to
keep calm streams from looking chaotic under auto-scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: implementation plan for video stutter metrics

Nine TDD-style tasks covering: pure helpers (is_stutter, update_peak)
with Catch2 tests, VideoStutterWidget class + JSON parser branch,
frame-interval fact publish in display thread, and BarChartWidget
fixed-Y-axis enhancement + parser support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): glass-to-glass latency telemetry design (GS side)

Consumes the waybeam_venc RTP sidecar plus three new GS-local timestamps
(arrival, decode-out, display-submit) to publish a per-segment latency
breakdown on the OSD. FIFO-based frame matching, no GstMeta plumbing.
NTP-style min-RTT clock-offset estimator over the sidecar SYNC handshake.
Default-off via osd.json.

* plan(g2g-latency): bite-sized TDD implementation plan

11 tasks covering RTP header parser, ClockOffset estimator, FrameMatcher,
publish layer, sidecar wire codec, probe thread, config, hot-path hooks,
loopback integration test, and final verification.

* feat(latency-probe): skeleton module + CMake wiring

Add latency_probe.hpp/.cpp with the full public API surface (active atomic,
start/stop, on_rtp_buffer, record_decode_done, record_display_submit, now_us)
as deliberate no-op stubs.  Wire both files into LIB_SOURCE_FILES before
gstrtpreceiver.cpp so subsequent tasks compile against a stable header.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(latency-probe): include cstddef for size_t (review feedback)

* feat(latency-probe): RTP header parser

Pure parse_rtp_header() extracts ssrc/timestamp/marker from the 12-byte
fixed RTP prefix; rejects version != 2 and short/null buffers. Four
Catch2 test cases cover the happy path, marker=0, wrong version, and
too-short/null inputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(latency-probe): cover CSRC + extension headers (review feedback)

* feat(latency-probe): NTP-style min-RTT clock offset

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(latency-probe): FrameMatcher with FIFO bind + TTL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(latency-probe): tighten matcher assertions + fix header comment

* feat(latency-probe): compute_and_publish per-segment latency facts

Adds PublishedFacts, PublishUintFn/PublishIntFn, and compute_and_publish
to latency_probe, computing the eight video.latency.* OSD facts from a
FrameTimings + clock offset/rtt pair. Wire clamping increments a caller-
owned counter so tests and production code can observe it. Two TDD test
cases added to test_latency_probe.cpp; math verified via standalone g++
runner (all 14 assertions pass).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(latency-probe): sidecar wire codec (subscribe, sync, frame)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(latency-probe): cover null/unknown-type/short-len + full 64-bit MSG_FRAME

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(latency-probe): probe thread, UDP socket, OSD publish path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(latency-probe): acquire ordering on hot-path + fcntl error path

* feat(latency-probe): wire osd.json config + start/stop in main

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(latency-probe): guard start() within try/catch (review feedback)

* feat(latency-probe): pad probe + decode/display timestamp hooks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(latency-probe): loopback integration vs fake waybeam

Stand up a UDP server thread mimicking waybeam's sidecar (SUBSCRIBE,
SYNC_REQ/RESP, MSG_FRAME), run latency_probe::start() against it, feed
synthetic arrival/decode/display events, and assert all three frames
publish video.latency.total_ms and clock_offset_us via captured callbacks.
Adds set_publish_overrides_for_test() to the public API to intercept
compute_and_publish without touching the real OSD path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(latency-probe): assert all 9 spec facts in integration test

* test(latency-probe): nix-shell + standalone runner, fix TTL off-by-one

The CMake test target requires the full Rockchip MPP/RGA toolchain even
for unit tests. Add a minimal standalone runner that bypasses cmake by
compiling only the latency_probe.cpp + test files against Catch2,
stubbing osd.h via -DUSE_SIMULATOR. shell.nix provides the build deps.

Also fixes a TTL test off-by-one: the matcher uses strict-> eviction, so
the test's now=501000 with insert=1000 and ttl=500000 did NOT actually
evict. Bumped to 501001 to clear the strict-> boundary.

All 117 assertions in 23 test cases pass under nix-shell.

* fix(latency-probe): gate display hook on fb_id != 0

Display thread wakes on either "new video frame" OR "OSD update pending"
(video_cond is signaled by both paths). The previous unconditional
record_display_submit() fired on OSD-only redraws too, where no new
video plane was committed (fb_id == 0). Those extra events stamped
the oldest awaiting matcher slot too early, causing permanent FIFO
drift: real video display events then stamped later slots, and the
published display_ms / decode_ms values got wildly inflated.

Only record the display timestamp when an actual video frame was
committed.

Caught during on-drone smoke testing.

* feat(latency-probe): publish capture_to_encode_us + encode_to_send_us

Drone-side latency segments often run sub-millisecond: the encoder + RTP
packetizer on the test rig finish ~80 µs after frame_ready. The existing
*_ms facts truncate that to 0, hiding real signal. Add microsecond-
precision sibling facts so sub-ms values stay visible on the OSD.

Same guards as the ms versions: capture_to_encode_us is only published
when capture_us is non-zero and frame_ready_us >= capture_us.

* refactor(latency-probe): drop decode/display hooks, reuse decode_and_handover_ms

Our own decode_ms / display_ms were measuring the wrong thing: decode_ms
captured the whole gstreamer-to-MPP-output pipeline depth (~340ms on the
test rig, dominated by appsink queueing), and display_ms was broken by
FIFO drift from OSD-only redraws stamping the matcher.

The display thread already publishes video.decode_and_handover_ms
(feed-time → display-commit, ~30-50ms — the real MPP+handover latency).
Reuse that instead of re-instrumenting:

- Remove latency_probe::record_decode_done + record_display_submit hooks.
- Add latency_probe::record_gs_pipeline_ms(uint64_t) — called from the
  display thread on real video commits (fb_id != 0), forwarding the
  value already computed for decode_and_handover_ms.
- Simplify FrameMatcher: only two events per frame (marker + sidecar),
  keyed by (ssrc, rtp_timestamp), no FIFO scan. is_complete becomes
  sidecar_seen && gs_recv_last_us != 0.
- Drop FrameTimings fields gs_decode_done_us / gs_display_submit_us.
- compute_and_publish gains a gs_pipeline_ms parameter; drops
  video.latency.decode_ms / display_ms facts; total_ms is now
  capture + encode + wire + gs_pipeline (always published; missing
  summands contribute 0).

Net: -1 thread hook, -2 OSD facts, simpler matcher, honest total.

* docs(spec): TODO for waybeam sidecar rate-limit (option B)

* update osd

* fix(latency-probe): suppress wire_ms/total_ms until first clock sync

Between start() and the first MSG_SYNC_RESP, ClockOffset::get() returns
(offset=0, rtt=0). compute_and_publish then computed
  wire_us = gs_recv_last_us - (last_pkt_send_us - 0)
which subtracts two unrelated CLOCK_MONOTONIC clocks (GS uptime vs drone
uptime), producing wildly large values that propagated into total_ms.
Field symptom: absurd latency right after startup, settling to ~50 ms
once sync converged.

Gate wire_ms / total_ms / wire_clamp_counter on rtt_us != 0 (the existing
"no sync yet" sentinel). Drone-local segments and clock diagnostics still
publish, so clock_rtt_us=0 is the explicit "syncing…" signal on the OSD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gilankpam gilankpam changed the title feat: glass-to-glass latency telemetry (GS side) feat: per-segment video pipeline latency telemetry (GS side) May 22, 2026
@gilankpam gilankpam changed the title feat: per-segment video pipeline latency telemetry (GS side) feat: per-segment video pipeline latency telemetry May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant