test(smp): Agent vs Agent+ADP 1Hz dogstatsd benchmark by jszwedko · Pull Request #1592 · DataDog/saluki

jszwedko · 2026-05-05T22:53:41Z

Summary

Adds an SMP regression benchmark comparing the Datadog Agent (baseline) against a converged Agent+Agent-Data-Plane image (comparison) for raw dogstatsd ingest at a 1-second aggregation bucket interval. Stacked on #1459 so the single `aggregator_bucket_size_seconds: 1` knob in `datadog.yaml` reaches both Agent core and ADP. Reuses `test/smp/regression/adp/` in place — OTLP and quality_gates cases removed; the 15 `dsd_uds_*` cases rewritten to drive the Agent entrypoint and a converged Dockerfile.

Based on PR that did comparison of tag filter experiments: #1327

Draft because the meaningful review signal is the SMP report comment, which CI posts after the benchmark runs.

Test plan

CI green: `build-adp-baseline-image`, `build-adp-comparison-image`, `build-agent-adp-baseline-image`, `build-agent-adp-comparison-image`, `run-benchmarks-adp`, `binary-size-analysis`.
SMP report posted as a PR comment under header "Regression Detector (Agent vs Agent+ADP, 1Hz dogstatsd)".
Spot-check the report: comparison side shows ADP-side metrics (`sub_agent: adp` series), 1Hz bucketing visible (flush rate / output volume).

References

Stacked on #1459 (`luke/configurable-aggregation`). Modeled after #1327 but slimmer — no tag filtering, reuses existing dsd_uds cases.

🤖 Generated with Claude Code

Stacks on PR #1459 (luke/configurable-aggregation). Reuses test/smp/regression/adp/ in-place: deletes OTLP + quality_gates cases, rewrites dsd_uds_* to drive a Datadog Agent baseline vs converged Agent+ADP comparison with aggregator_bucket_size_seconds: 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Nine tasks covering case file rewrites, Dockerfile build args, GitLab CI build jobs, run-benchmarks-adp repointing, and the draft PR open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The adp/ SMP target directory will be repurposed for an Agent-vs-Agent+ADP 1Hz dogstatsd comparison. Only the dsd_uds_* cases are kept; everything else moves out of scope on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The converged Agent+ADP image boots ADP via the Agent's s6 supervisor (not via target.command), so /etc/agent-data-plane/empty.yaml and the IPC cert are no longer referenced. Remove them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…f1e04a-full Per user request, base off the converged Agent dev image carrying Luke's configurable-aggregation patch on top of 15f1e04a, instead of the upstream 15f1e04a-py3-jmx tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single source of truth for aggregator_bucket_size_seconds. Agent core reads the file directly; ADP reads it via config-stream gRPC when DD_DATA_PLANE_USE_NEW_CONFIG_STREAM_ENDPOINT=true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Targets the Agent's /bin/entrypoint.sh instead of the ADP binary directly, so the same case files drive both the Datadog Agent baseline image and the converged Agent+ADP comparison image. Memory allotment bumped from 2GiB to 3200MiB to fit Agent + ADP + JVM in one container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Add 127.0.0.1:9092 blackhole for the process-agent's process_dd_url. - Scrape Agent core telemetry on 127.0.0.1:5000 alongside ADP on :5102, tagging both with sub_agent so the SMP report can attribute metrics. - Switch dogstatsd UDS path to /tmp/dsd.socket to match datadog.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lets the converged Agent+ADP image bake in the env vars ADP needs to boot in non-standalone, config-stream-driven mode (REMOTE_AGENT_ENABLED=true, USE_NEW_CONFIG_STREAM_ENDPOINT=true, DOGSTATSD_ENABLED=true). The build job in .gitlab/benchmark.yml supplies the values for the SMP comparison image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build-agent-adp-baseline-image retags the upstream Datadog Agent dev image (luke-configurable-aggregation-15f1e04a-full, from datadog-agent#49676) as the SMP baseline. build-agent-adp-comparison-image builds Dockerfile.datadog-agent on top of the comparison ADP image with all DD_DATA_PLANE_* knobs set to drive non-standalone, config-stream-driven mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Now consumes BASELINE_AGENT_IMG (vanilla Datadog Agent dev image) and COMPARISON_AGENT_IMG (converged Agent+ADP image) from the new build jobs. Same SMP target dir; the dsd_uds cases are now driven through the Agent entrypoint with aggregator_bucket_size_seconds: 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pr-commenter · 2026-05-05T23:00:31Z

Binary Size Analysis (Agent Data Plane)

Target: dc8f4b9 (baseline) vs a25ec14 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.11 MiB
Comparison Size: 37.00 MiB
Size Change: -111.49 KiB (-0.29%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`figment`	-118.82 KiB	125
`otlp_protos::otlp_include::opentelemetry`	-41.79 KiB	103
`hyper`	+25.69 KiB	76
`prost`	+24.60 KiB	66
`hyper_util`	-14.05 KiB	13
`hashbrown`	+11.34 KiB	72
`h2`	-8.86 KiB	92
`[sections]`	-8.52 KiB	6
`tonic`	-7.37 KiB	33
`core`	-7.09 KiB	878
`serde_core`	+6.51 KiB	85
`serde`	+6.39 KiB	18
`tower`	+5.15 KiB	11
`async_compression`	+4.62 KiB	19
`tokio_util`	+3.94 KiB	14
`alloc`	+3.76 KiB	50
`saluki_components::sources::otlp`	+3.75 KiB	17
`tokio`	+3.67 KiB	124
`saluki_core::data_model::event`	-3.40 KiB	8
`futures_channel`	-3.35 KiB	7

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +18.5Ki  [NEW] +18.3Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::flush::h4cec187aab531472
  [NEW] +16.5Ki  [NEW] +16.4Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::add_span::hf95b8e429cc5bbe0
  [NEW] +16.1Ki  [NEW] +16.0Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::new_stat_span_from_span::h831751497d326aef
  +283% +15.9Ki  +288% +15.9Ki    h2::proto::connection::DynConnection<B>::recv_frame::h9d7adeb5727e1522
  [NEW] +12.3Ki  [NEW] +12.1Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::hf9a832fc7767a946
  [NEW] +9.39Ki  [NEW] +9.23Ki    _<hyper::proto::h2::server::Server<T,S,B,E> as core::future::future::Future>::poll::h7fad89436d42473e
  +750% +6.74Ki  +828% +6.74Ki    prost::encoding::message::merge_repeated::hc52fda914c63fb75
  [NEW] +6.50Ki  [NEW] +6.28Ki    saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h2b55df90d15c8dc3
  +739% +6.39Ki  +819% +6.39Ki    prost::encoding::message::merge_repeated::h125609fe5afef278
  [DEL] -6.56Ki  [DEL] -6.41Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h14607bccbe25f0f5
 -24.0% -8.01Ki -24.1% -7.98Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h63fe22416badd464
  [DEL] -8.13Ki  [DEL] -8.00Ki    figment::value::de::_<impl figment::value::value::Value>::deserialize_from::hc178e2144edf2db7
 -67.1% -9.24Ki -67.7% -9.24Ki    saluki_components::transforms::trace_obfuscation::sql::obfuscate_sql_string::hbc6c7c370aac7ff9
  [DEL] -9.49Ki  [DEL] -9.34Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h44f59c3078bb5bee
  [DEL] -9.74Ki  [DEL] -9.59Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h795d206f112d7dfd
 -81.7% -10.1Ki -82.6% -10.1Ki    _<core::pin::Pin<P> as core::future::future::Future>::poll::h901e76ef802f2f4c
  [DEL] -11.6Ki  [DEL] -11.5Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h262edabf8ad9b351
  [DEL] -15.4Ki  [DEL] -15.3Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::hf4491bd4db3bec82
  [DEL] -15.9Ki  [DEL] -15.7Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::hf41c2cc956726b9d
  [DEL] -32.1Ki  [DEL] -32.0Ki    saluki_components::transforms::apm_stats::ApmStats::process_trace::h7d2f794f20a992a4
  -1.4% -83.5Ki  -1.4% -69.0Ki    [4675 Others]
  -0.3%  -111Ki  -0.3% -96.7Ki    TOTAL

The adp/ SMP target dir is generated from experiments.yaml via generate_experiments.py, and CI's check-smp-experiments verifies cases/ stays in sync. The hand-rewritten case files for the Agent-vs-Agent+ADP 1Hz benchmark were drifting from that source of truth — running the check would fail. This commit makes experiments.yaml the source of truth again: - Rewrites the global block to target the Datadog Agent entrypoint (name: datadog-agent, command: /bin/entrypoint.sh, files: datadog.yaml sourced from shared/datadog.yaml). - Bumps memory_allotment to 3200MiB and trims env to DD_API_KEY + DD_HOSTNAME — the Agent reads the rest from datadog.yaml. - Drops ADP-standalone-only env from the dsd_base template and switches the unix_datagram path to /tmp/dsd.socket. - Drops OTLP and quality_gates experiments + templates (out of scope on this branch). - Adds shared/datadog.yaml (the 1Hz config the Agent reads; aggregator_bucket_size_seconds: 1 reaches ADP via config-stream). - Adds a `${EXPERIMENT_NAME}` placeholder substitution to generate_experiments.py so DD_INTERNAL_PROFILING_EXTRA_TAGS gets the per-case expanded name without duplicating each experiment three times. After regen, `make check-smp-experiments` passes against the 15 dsd_uds_* cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pr-commenter · 2026-05-05T23:18:04Z

Regression Detector (Agent vs Agent+ADP, 1Hz dogstatsd)

Regression Detector Results

Run ID: 6e12d29a-e079-4e47-b394-db9d8f0fb254

Baseline: 15f1e04a
Comparison: a25ec14
Diff

Optimization Goals: ❌ Regression(s) detected

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	dsd_uds_512kb_3k_contexts_memory	memory utilization	+5.96	[+5.74, +6.18]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_memory	memory utilization	-15.26	[-15.43, -15.09]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	-21.53	[-21.68, -21.39]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	-46.76	[-70.35, -23.18]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	-56.79	[-65.52, -48.07]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_memory	memory utilization	-56.99	[-57.14, -56.84]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_memory	memory utilization	-76.10	[-76.21, -75.99]	1	bounds checks dashboard

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	dsd_uds_512kb_3k_contexts_memory	memory utilization	+5.96	[+5.74, +6.18]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	+3.90	[+3.68, +4.12]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	-0.05	[-0.21, +0.10]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_memory	memory utilization	-15.26	[-15.43, -15.09]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	-18.44	[-79.85, +42.98]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	-21.53	[-21.68, -21.39]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	-26.11	[-92.52, +40.30]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	-42.31	[-106.56, +21.93]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	-46.76	[-70.35, -23.18]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	-56.79	[-65.52, -48.07]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_memory	memory utilization	-56.99	[-57.14, -56.84]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_memory	memory utilization	-76.10	[-76.21, -75.99]	1	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

jszwedko and others added 11 commits May 5, 2026 15:53

docs: implementation plan for Agent-vs-Agent+ADP 1Hz benchmark

a823960

Nine tasks covering case file rewrites, Dockerfile build args, GitLab CI build jobs, run-benchmarks-adp repointing, and the draft PR open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dd-octo-sts Bot added area/ci CI/CD, automated testing, etc. area/docs Reference documentation. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 5, 2026

jszwedko mentioned this pull request May 6, 2026

test(smp): add default-bucket dsd_uds variant alongside 1Hz cases #1594

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(smp): Agent vs Agent+ADP 1Hz dogstatsd benchmark#1592

test(smp): Agent vs Agent+ADP 1Hz dogstatsd benchmark#1592
jszwedko wants to merge 12 commits into
luke/configurable-aggregationfrom
jszwedko/agent-vs-adp-1hz-benchmark

jszwedko commented May 5, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented May 5, 2026 •

edited

Loading

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented May 5, 2026 •

edited

Loading

Fine details of change detection per experiment

Explanation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jszwedko commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

References

Uh oh!

pr-commenter Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent vs Agent+ADP, 1Hz dogstatsd)

Regression Detector Results

Optimization Goals: ❌ Regression(s) detected

Fine details of change detection per experiment

Explanation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jszwedko commented May 5, 2026 •

edited

Loading

pr-commenter Bot commented May 5, 2026 •

edited

Loading

pr-commenter Bot commented May 5, 2026 •

edited

Loading