[draft] Add OtlpSink wrapper for ComponentStats OTLP/gRPC export (Phase 2)#1184
Draft
yutongzhang-microsoft wants to merge 3 commits into
Draft
[draft] Add OtlpSink wrapper for ComponentStats OTLP/gRPC export (Phase 2)#1184yutongzhang-microsoft wants to merge 3 commits into
yutongzhang-microsoft wants to merge 3 commits into
Conversation
Wire the build system to optionally pull in opentelemetry-cpp so a follow-up
PR can add an OTLP sink to ComponentStats. The default build is unchanged:
* configure.ac gains a new --enable-otlp option (default: disabled).
When enabled, the build probes for opentelemetry-cpp via pkg-config and
falls back to a header check + a hard-coded -l<lib> list for SDKs that
are not packaged with .pc files. HAVE_OTLP is defined and OTLP is
exposed as an automake conditional, plus OPENTELEMETRY_CFLAGS /
OPENTELEMETRY_LIBS substitutions for use by Makefile.am in later PRs.
* debian/rules gains a new 'otlp' build profile. When the profile is
active, --enable-otlp is passed to configure; otherwise --disable-otlp
is passed, which is the current behaviour.
This is the build-system half of Phase 2 in the Component Statistics HLD
(sonic-net/SONiC#2312). It does not add any source files, does not change
the public API, and does not affect any default build path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Introduce a thin C++ wrapper, swss::OtlpSink, that converts a snapshot of ComponentStats counters into an OTLP/gRPC batch destined for a local OpenTelemetry Collector. The class is the OTLP half of the dual-sink design described in the Component Statistics HLD (sonic-net/SONiC#2312). What the wrapper provides: * A small public header (common/component_stats_otlp.h) that exposes a Config struct, a DataPoint struct, exportBatch(), and shutdown(), and intentionally hides every OpenTelemetry C++ SDK type behind PIMPL so callers (notably ComponentStats) do not transitively include any OTel headers. * An implementation (common/component_stats_otlp.cpp) that: - constructs an OtlpGrpcMetricExporter lazily and shares one gRPC channel for the lifetime of the sink; - groups data points by metric name so that 'entity' is exported as a label rather than as part of the metric name (HLD section 7.7); - maps isMonotonic=true to a CUMULATIVE Sum, isMonotonic=false to a Gauge with last-value semantics; - never throws: all SDK exceptions and Export() error results are caught, logged, and converted to a 'false' return so a dead Collector cannot stall the ComponentStats writer thread or affect the DB sink (HLD requirement R9); - is move-only and idempotent on shutdown. * Five smoke tests (tests/component_stats_otlp_ut.cpp) covering the contract: construct/destruct, empty batch is a no-op, exporting to an unreachable Collector does not throw, shutdown is idempotent, and a moved-from instance is harmless. A real in-process gRPC mock server test is deferred to a follow-up. Build wiring: * common/Makefile.am: when --enable-otlp is active, append component_stats_otlp.cpp to libswsscommon and link OPENTELEMETRY_LIBS. * tests/Makefile.am: when --enable-otlp is active, append the unit test and link OPENTELEMETRY_LIBS. Default builds are unaffected because --enable-otlp is opt-in and ships disabled by default (added in sonic-net#1183). Phase 2 follow-ups: * PR C: connect OtlpSink to ComponentStats writer-thread fan-out. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Geneva mdm rejects OTLP metrics whose Sum points carry CUMULATIVE
aggregation temporality (mdm log: 'Raw metrics data were dropped because
OTLP metrics with cumulative aggregation temporality is not supported.
Data Dropped Count: 1'). Switch the sink to DELTA so the very first
production deployment is not silently a no-op.
Behaviour change inside Impl::exportBatch():
* Per-series cache (lastValue, lastEndTs) keyed by '<entity>\\x1f<metric>'.
* For each Sum point: delta = current - lastValue, with a counter-reset
guard (current < lastValue is treated as 'delta = current', no
uint64_t underflow). cache then advances unconditionally so a
transient Export() failure costs at most one batch.
* Per-metric MetricData.start_ts is the previous end_ts (or
creationTs on the first export of that series), end_ts is now -
matching the OTLP delta contract.
* Gauge points are unchanged (LastValuePointData has no temporality);
their MetricData.aggregation_temporality is set to Unspecified.
API stays cumulative: callers (ComponentStats in PR C) still pass the
cumulative in-memory counter, so the Phase 1 COUNTERS_DB sink and this
sink share the exact same input. Cumulative-to-delta conversion is the
sink's responsibility, not the caller's.
Tests: two new GTest cases cover (1) three consecutive snapshots of the
same series across exportBatch() calls, and (2) a counter-reset
(current < last) without underflow.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Introduce
swss::OtlpSink— a thin C++ wrapper that converts a snapshot of ComponentStats counters into an OTLP/gRPC metric batch destined for a local OpenTelemetry Collector or Geneva mdm container. This is the OTLP half of the dual-sink design described in the Component Statistics HLD (sonic-net/SONiC#2312).This PR is stacked on top of #1183 (build plumbing) and is intentionally opened as a draft — see Status below.
Why
Phase 1 (#1180 + sonic-net/sonic-swss#4516) already lands the in-memory counters and the
COUNTERS_DBsink. Phase 2 needs to fan the same snapshot out over OTLP/gRPC so SONiC components can be observed by the standard OpenTelemetry pipeline (local Collector or Geneva mdm → any OTLP backend).Splitting the wrapper out of the writer-thread integration keeps each PR small and reviewable:
--enable-otlpbuild flag +opentelemetry-cpplink configuration (no behaviour change).OtlpSinkwrapper class + smoke tests (no caller yet).OtlpSinkto the ComponentStats writer-thread fan-out.What this PR adds
common/component_stats_otlp.hA small public header that exposes
Config,DataPoint,exportBatch()andshutdown(), and intentionally hides every OpenTelemetry C++ SDK type behind the PIMPL idiom so callers (notablyComponentStats) do not transitively include any OTel headers.common/component_stats_otlp.cppOtlpGrpcMetricExporterlazily and shares one gRPC channel for the lifetime of the sink.entityis exported as a label, not as part of the metric name (HLD §7.7). Final metric name issonic.<componentName>.<metric>.isMonotonic=true→ DELTASum;isMonotonic=false→Gaugewith last-value semantics.delta = current − lastwith a counter-reset guard (nouint64_tunderflow).Export()error results are caught, logged withSWSS_LOG_WARN, and converted to afalsereturn so a dead Collector cannot stall the ComponentStats writer thread or affect the DB sink (HLD requirement R9).shutdown()is idempotent.Why DELTA, not CUMULATIVE
Geneva mdm currently rejects OTLP metrics whose Sum points carry
AGGREGATION_TEMPORALITY_CUMULATIVEand silently drops them:To make the very first production deployment functional rather than a silent no-op, this sink emits DELTA out of the box. Each
MetricData'sstart_tsis the previous export'send_ts(or sink-creation time on the first export of a series) andend_tsis the current wall-clock — matching the OTLP delta contract.tests/component_stats_otlp_ut.cppSeven smoke tests covering the contract:
shutdown()is idempotent;current < last) does not underflow.A real in-process gRPC mock-server test is deferred to a follow-up.
Build wiring
Both
common/Makefile.amandtests/Makefile.amadd anif OTLP … endifblock mirroring the existingYANGMODSpattern, so the new sources andOPENTELEMETRY_LIBSare only compiled and linked when--enable-otlpis passed.Status — why this PR is a Draft
opentelemetry-cppis not yet packaged in the SONiC build environment. CI will run with--disable-otlpand therefore exercise none of the new code; this PR is non-mergeable until the SDK is available insonic-buildimage.telegrafreadingCOUNTERS_DBand exporting OTLP to the Geneva mdm container — keeps swss-common code unchanged and may be pursued in parallel; this PR remains useful for the eventual application-direct-emit path.Compatibility
--enable-otlpis opt-in and ships disabled.Checklist
tests/--enable-otlp(blocked on SDK packaging — tracked in PR A)