feat: per-segment video pipeline latency telemetry#127
Open
gilankpam wants to merge 1 commit into
Open
Conversation
* docs: design spec for VideoStutterWidget (GS-local stutter metrics) Adds a design for an OSD widget that surfaces frame-interval stutter quantification — average interval, stutter event rate, and a 10s-decaying peak — sampled in the display thread on GS-local CLOCK_MONOTONIC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add moving-graph variant to stutter metrics spec Extends the spec with a second surface — reuse BarChartWidget subscribed to video.frame_interval_ms, plus a small fixed-Y-axis enhancement to keep calm streams from looking chaotic under auto-scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: implementation plan for video stutter metrics Nine TDD-style tasks covering: pure helpers (is_stutter, update_peak) with Catch2 tests, VideoStutterWidget class + JSON parser branch, frame-interval fact publish in display thread, and BarChartWidget fixed-Y-axis enhancement + parser support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): glass-to-glass latency telemetry design (GS side) Consumes the waybeam_venc RTP sidecar plus three new GS-local timestamps (arrival, decode-out, display-submit) to publish a per-segment latency breakdown on the OSD. FIFO-based frame matching, no GstMeta plumbing. NTP-style min-RTT clock-offset estimator over the sidecar SYNC handshake. Default-off via osd.json. * plan(g2g-latency): bite-sized TDD implementation plan 11 tasks covering RTP header parser, ClockOffset estimator, FrameMatcher, publish layer, sidecar wire codec, probe thread, config, hot-path hooks, loopback integration test, and final verification. * feat(latency-probe): skeleton module + CMake wiring Add latency_probe.hpp/.cpp with the full public API surface (active atomic, start/stop, on_rtp_buffer, record_decode_done, record_display_submit, now_us) as deliberate no-op stubs. Wire both files into LIB_SOURCE_FILES before gstrtpreceiver.cpp so subsequent tasks compile against a stable header. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(latency-probe): include cstddef for size_t (review feedback) * feat(latency-probe): RTP header parser Pure parse_rtp_header() extracts ssrc/timestamp/marker from the 12-byte fixed RTP prefix; rejects version != 2 and short/null buffers. Four Catch2 test cases cover the happy path, marker=0, wrong version, and too-short/null inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(latency-probe): cover CSRC + extension headers (review feedback) * feat(latency-probe): NTP-style min-RTT clock offset Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(latency-probe): FrameMatcher with FIFO bind + TTL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(latency-probe): tighten matcher assertions + fix header comment * feat(latency-probe): compute_and_publish per-segment latency facts Adds PublishedFacts, PublishUintFn/PublishIntFn, and compute_and_publish to latency_probe, computing the eight video.latency.* OSD facts from a FrameTimings + clock offset/rtt pair. Wire clamping increments a caller- owned counter so tests and production code can observe it. Two TDD test cases added to test_latency_probe.cpp; math verified via standalone g++ runner (all 14 assertions pass). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(latency-probe): sidecar wire codec (subscribe, sync, frame) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(latency-probe): cover null/unknown-type/short-len + full 64-bit MSG_FRAME Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(latency-probe): probe thread, UDP socket, OSD publish path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(latency-probe): acquire ordering on hot-path + fcntl error path * feat(latency-probe): wire osd.json config + start/stop in main Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(latency-probe): guard start() within try/catch (review feedback) * feat(latency-probe): pad probe + decode/display timestamp hooks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(latency-probe): loopback integration vs fake waybeam Stand up a UDP server thread mimicking waybeam's sidecar (SUBSCRIBE, SYNC_REQ/RESP, MSG_FRAME), run latency_probe::start() against it, feed synthetic arrival/decode/display events, and assert all three frames publish video.latency.total_ms and clock_offset_us via captured callbacks. Adds set_publish_overrides_for_test() to the public API to intercept compute_and_publish without touching the real OSD path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(latency-probe): assert all 9 spec facts in integration test * test(latency-probe): nix-shell + standalone runner, fix TTL off-by-one The CMake test target requires the full Rockchip MPP/RGA toolchain even for unit tests. Add a minimal standalone runner that bypasses cmake by compiling only the latency_probe.cpp + test files against Catch2, stubbing osd.h via -DUSE_SIMULATOR. shell.nix provides the build deps. Also fixes a TTL test off-by-one: the matcher uses strict-> eviction, so the test's now=501000 with insert=1000 and ttl=500000 did NOT actually evict. Bumped to 501001 to clear the strict-> boundary. All 117 assertions in 23 test cases pass under nix-shell. * fix(latency-probe): gate display hook on fb_id != 0 Display thread wakes on either "new video frame" OR "OSD update pending" (video_cond is signaled by both paths). The previous unconditional record_display_submit() fired on OSD-only redraws too, where no new video plane was committed (fb_id == 0). Those extra events stamped the oldest awaiting matcher slot too early, causing permanent FIFO drift: real video display events then stamped later slots, and the published display_ms / decode_ms values got wildly inflated. Only record the display timestamp when an actual video frame was committed. Caught during on-drone smoke testing. * feat(latency-probe): publish capture_to_encode_us + encode_to_send_us Drone-side latency segments often run sub-millisecond: the encoder + RTP packetizer on the test rig finish ~80 µs after frame_ready. The existing *_ms facts truncate that to 0, hiding real signal. Add microsecond- precision sibling facts so sub-ms values stay visible on the OSD. Same guards as the ms versions: capture_to_encode_us is only published when capture_us is non-zero and frame_ready_us >= capture_us. * refactor(latency-probe): drop decode/display hooks, reuse decode_and_handover_ms Our own decode_ms / display_ms were measuring the wrong thing: decode_ms captured the whole gstreamer-to-MPP-output pipeline depth (~340ms on the test rig, dominated by appsink queueing), and display_ms was broken by FIFO drift from OSD-only redraws stamping the matcher. The display thread already publishes video.decode_and_handover_ms (feed-time → display-commit, ~30-50ms — the real MPP+handover latency). Reuse that instead of re-instrumenting: - Remove latency_probe::record_decode_done + record_display_submit hooks. - Add latency_probe::record_gs_pipeline_ms(uint64_t) — called from the display thread on real video commits (fb_id != 0), forwarding the value already computed for decode_and_handover_ms. - Simplify FrameMatcher: only two events per frame (marker + sidecar), keyed by (ssrc, rtp_timestamp), no FIFO scan. is_complete becomes sidecar_seen && gs_recv_last_us != 0. - Drop FrameTimings fields gs_decode_done_us / gs_display_submit_us. - compute_and_publish gains a gs_pipeline_ms parameter; drops video.latency.decode_ms / display_ms facts; total_ms is now capture + encode + wire + gs_pipeline (always published; missing summands contribute 0). Net: -1 thread hook, -2 OSD facts, simpler matcher, honest total. * docs(spec): TODO for waybeam sidecar rate-limit (option B) * update osd * fix(latency-probe): suppress wire_ms/total_ms until first clock sync Between start() and the first MSG_SYNC_RESP, ClockOffset::get() returns (offset=0, rtt=0). compute_and_publish then computed wire_us = gs_recv_last_us - (last_pkt_send_us - 0) which subtracts two unrelated CLOCK_MONOTONIC clocks (GS uptime vs drone uptime), producing wildly large values that propagated into total_ms. Field symptom: absurd latency right after startup, settling to ~50 ms once sync converged. Gate wire_ms / total_ms / wire_clamp_counter on rtt_us != 0 (the existing "no sync yet" sentinel). Drone-local segments and clock diagnostics still publish, so clock_rtt_us=0 is the explicit "syncing…" signal on the OSD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in probe on the ground-station that consumes the
waybeam_vencRTP sidecar and publishes a per-segment breakdown of video pipeline latency to the OSD — from frame capture on the drone to DRM display-commit on the GS. (Not photons-to-photons: the camera sensor and panel scan-out sit outside the measured window.)Segments published (sum =
total_ms):video.latency.capture_to_encode_{ms,us}video.latency.encode_to_send_{ms,us}video.latency.wire_msvideo.latency.gs_pipeline_msvideo.latency.total_msvideo.latency.clock_offset_us/clock_rtt_usvideo.latency.wire_clamp_countDesign
latency_probemodule: RTP header parser, NTP-style min-RTT clock-offset estimator, frame matcher keyed by(ssrc, rtp_timestamp), UDP sidecar wire codec, probe thread, and OSD publish path.gstrtpreceivercallson_rtp_buffer()on each packet; the display thread forwardsdecode_and_handover_msviarecord_gs_pipeline_ms()only on real video commits (fb_id != 0) — OSD-only redraws would otherwise dilute the rolling figure.latency_probeblock inconfig_osd.json(enable/host/port). Sidecar default port5602.wire_ms/total_msuntil the firstMSG_SYNC_RESPso the cross-clock subtraction can't bleed garbage onto the OSD.clock_rtt_us == 0is the explicit "syncing…" signal.Companion drone-side change (RTP sidecar emitter) lives in
waybeam_venc. This PR is GS-side only and is a no-op whenlatency_probe.enableisfalse.Tests
Catch2 unit tests cover the header parser, clock offset, frame matcher,
compute_and_publish, and wire codec. A loopback integration test against a fake-waybeam UDP server asserts all 9 spec facts.tests/run_latency_tests.shprovides a standalone build path that bypasses the Rockchip MPP/RGA toolchain (g++ + Catch2 + spdlog/fmt).Test plan
BUILD_TESTS=ONcmake build passestests/run_latency_tests.sh— all 117 assertions in 23 test cases passwaybeam_vencpeer publishing sidecar messages: OSD shows per-segment latency facts andtotal_msstabilises after a few sync round-tripslatency_probeblock in user config): probe stays off, no thread started🤖 Generated with Claude Code