Skip to content

HLD: Reusable service-level component statistics.#2312

Open
yutongzhang-microsoft wants to merge 5 commits into
sonic-net:masterfrom
yutongzhang-microsoft:sonic-component-stats-hld
Open

HLD: Reusable service-level component statistics.#2312
yutongzhang-microsoft wants to merge 5 commits into
sonic-net:masterfrom
yutongzhang-microsoft:sonic-component-stats-hld

Conversation

@yutongzhang-microsoft
Copy link
Copy Markdown

@yutongzhang-microsoft yutongzhang-microsoft commented Apr 28, 2026

What I'm doing

Adding a new High-Level Design document at doc/component-stats/component-stats-hld.md that specifies a reusable mechanism for exposing service-level (control-plane software) counters from SONiC containers.

The HLD introduces:

  1. A new shared library swss::ComponentStats in sonic-swss-common.
  2. A SWSS-specific facade SwssStats in sonic-swss built on top of that library, as the first consumer.

Counters are published to two sinks driven from a single in-process atomic snapshot:

  • COUNTERS_DB — for parity with the existing Flex-Counter pipeline and for on-box diagnostic tooling (redis-cli, show ... stats).
  • Local OpenTelemetry (OTLP) Collector sidecar — so the same counters can be forwarded to off-box telemetry systems (e.g. Geneva mdm) that consume OTLP.

Why

SONiC already has dataplane counters (Flex-Counter / SAI), but no uniform mechanism for service-level counters such as orchagent task throughput, gNMI request rate, or BMP error counts. A naive per-container implementation would duplicate atomic counter management, dirty tracking, the writer thread, the Redis schema, and an OTLP exporter in every container — concurrency review, bug fixes, and on-the-wire schemas would all drift. This HLD specifies one reusable producer that any container can adopt with a ~100-line facade.

Companion PRs

  • sonic-net/sonic-swss-common #1180swss::ComponentStats library + unit tests.
  • sonic-net/sonic-swss #4516 SwssStats thin facade over ComponentStats in orchagent/.

Adds a new HLD describing swss::ComponentStats: a reusable library in sonic-swss-common that produces service-level (control-plane) counters, mirrors them to COUNTERS_DB, and exports them via OTLP to a local OpenTelemetry Collector. The existing SwssStats class in sonic-swss is refactored into a thin facade over this library.

Related PRs:

- sonic-swss-common#1180

- sonic-swss#4516

- sonic-buildimage#26924

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

… label

Address review feedback:
- Replace 'Initial draft' with 'Initial revision' in the revision table.
- Treat the SwssStats facade as freshly introduced by this work; remove all
  references to sonic-swss#4434 in Scope, Overview, Requirements, the
  facade section, Warmboot, Memory, and Testing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 28, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

yutongzhang-microsoft and others added 2 commits April 28, 2026 10:47
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
…mplify section 9

- Reword non-swss vocabulary out-of-scope item as future work.
- Remove the sonic-buildimage submodule row from the repositories table; not needed.
- Section 9: collapse Manifest / CLI / CONFIG_DB subsections into a single
  'Not applicable' note.
- Update Phase 1 wording and system-test bullet to reference two companion PRs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@yutongzhang-microsoft yutongzhang-microsoft changed the title Add HLD for SONiC component statistics HLD: Reusable service-level component statistics Apr 28, 2026
@yutongzhang-microsoft yutongzhang-microsoft changed the title HLD: Reusable service-level component statistics [doc] HLD: Reusable service-level component statistics (swss::ComponentStats + SwssStats facade) Apr 28, 2026
@yutongzhang-microsoft yutongzhang-microsoft changed the title [doc] HLD: Reusable service-level component statistics (swss::ComponentStats + SwssStats facade) HLD: Reusable service-level component statistics. Apr 28, 2026
Split the previous single component-stats-hld.md into two documents so
that responsibilities map cleanly to the teams involved:

* component-stats-framework-hld.md (SONiC team): the swss::ComponentStats
  library, the SwssStats facade pattern, hot path, threading, memory
  ordering, warmboot, memory and testing for the producer. The DB sink
  is the only sink documented; OTLP is moved to future work.

* component-stats-reporting-hld.md (SONiC team, contract with NDM): the
  COUNTERS_DB schema (key layout, hash fields, idle suppression) and
  SWSS-specific vocabulary, plus conventions for future components. The
  reporting transport (telegraf -> mdm -> Geneva) is owned by the NDM
  HLD and referenced here, not duplicated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants