test(smp): Agent vs Agent+ADP 1Hz dogstatsd benchmark#1592
Conversation
Stacks on PR #1459 (luke/configurable-aggregation). Reuses test/smp/regression/adp/ in-place: deletes OTLP + quality_gates cases, rewrites dsd_uds_* to drive a Datadog Agent baseline vs converged Agent+ADP comparison with aggregator_bucket_size_seconds: 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine tasks covering case file rewrites, Dockerfile build args, GitLab CI build jobs, run-benchmarks-adp repointing, and the draft PR open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The adp/ SMP target directory will be repurposed for an Agent-vs-Agent+ADP 1Hz dogstatsd comparison. Only the dsd_uds_* cases are kept; everything else moves out of scope on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The converged Agent+ADP image boots ADP via the Agent's s6 supervisor (not via target.command), so /etc/agent-data-plane/empty.yaml and the IPC cert are no longer referenced. Remove them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f1e04a-full Per user request, base off the converged Agent dev image carrying Luke's configurable-aggregation patch on top of 15f1e04a, instead of the upstream 15f1e04a-py3-jmx tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single source of truth for aggregator_bucket_size_seconds. Agent core reads the file directly; ADP reads it via config-stream gRPC when DD_DATA_PLANE_USE_NEW_CONFIG_STREAM_ENDPOINT=true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Targets the Agent's /bin/entrypoint.sh instead of the ADP binary directly, so the same case files drive both the Datadog Agent baseline image and the converged Agent+ADP comparison image. Memory allotment bumped from 2GiB to 3200MiB to fit Agent + ADP + JVM in one container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add 127.0.0.1:9092 blackhole for the process-agent's process_dd_url. - Scrape Agent core telemetry on 127.0.0.1:5000 alongside ADP on :5102, tagging both with sub_agent so the SMP report can attribute metrics. - Switch dogstatsd UDS path to /tmp/dsd.socket to match datadog.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets the converged Agent+ADP image bake in the env vars ADP needs to boot in non-standalone, config-stream-driven mode (REMOTE_AGENT_ENABLED=true, USE_NEW_CONFIG_STREAM_ENDPOINT=true, DOGSTATSD_ENABLED=true). The build job in .gitlab/benchmark.yml supplies the values for the SMP comparison image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build-agent-adp-baseline-image retags the upstream Datadog Agent dev image (luke-configurable-aggregation-15f1e04a-full, from datadog-agent#49676) as the SMP baseline. build-agent-adp-comparison-image builds Dockerfile.datadog-agent on top of the comparison ADP image with all DD_DATA_PLANE_* knobs set to drive non-standalone, config-stream-driven mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now consumes BASELINE_AGENT_IMG (vanilla Datadog Agent dev image) and COMPARISON_AGENT_IMG (converged Agent+ADP image) from the new build jobs. Same SMP target dir; the dsd_uds cases are now driven through the Agent entrypoint with aggregator_bucket_size_seconds: 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Binary Size Analysis (Agent Data Plane)Target: dc8f4b9 (baseline) vs a25ec14 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
figment |
-118.82 KiB | 125 |
otlp_protos::otlp_include::opentelemetry |
-41.79 KiB | 103 |
hyper |
+25.69 KiB | 76 |
prost |
+24.60 KiB | 66 |
hyper_util |
-14.05 KiB | 13 |
hashbrown |
+11.34 KiB | 72 |
h2 |
-8.86 KiB | 92 |
[sections] |
-8.52 KiB | 6 |
tonic |
-7.37 KiB | 33 |
core |
-7.09 KiB | 878 |
serde_core |
+6.51 KiB | 85 |
serde |
+6.39 KiB | 18 |
tower |
+5.15 KiB | 11 |
async_compression |
+4.62 KiB | 19 |
tokio_util |
+3.94 KiB | 14 |
alloc |
+3.76 KiB | 50 |
saluki_components::sources::otlp |
+3.75 KiB | 17 |
tokio |
+3.67 KiB | 124 |
saluki_core::data_model::event |
-3.40 KiB | 8 |
futures_channel |
-3.35 KiB | 7 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +18.5Ki [NEW] +18.3Ki saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::flush::h4cec187aab531472
[NEW] +16.5Ki [NEW] +16.4Ki saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::add_span::hf95b8e429cc5bbe0
[NEW] +16.1Ki [NEW] +16.0Ki saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::new_stat_span_from_span::h831751497d326aef
+283% +15.9Ki +288% +15.9Ki h2::proto::connection::DynConnection<B>::recv_frame::h9d7adeb5727e1522
[NEW] +12.3Ki [NEW] +12.1Ki _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::hf9a832fc7767a946
[NEW] +9.39Ki [NEW] +9.23Ki _<hyper::proto::h2::server::Server<T,S,B,E> as core::future::future::Future>::poll::h7fad89436d42473e
+750% +6.74Ki +828% +6.74Ki prost::encoding::message::merge_repeated::hc52fda914c63fb75
[NEW] +6.50Ki [NEW] +6.28Ki saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h2b55df90d15c8dc3
+739% +6.39Ki +819% +6.39Ki prost::encoding::message::merge_repeated::h125609fe5afef278
[DEL] -6.56Ki [DEL] -6.41Ki _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h14607bccbe25f0f5
-24.0% -8.01Ki -24.1% -7.98Ki _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h63fe22416badd464
[DEL] -8.13Ki [DEL] -8.00Ki figment::value::de::_<impl figment::value::value::Value>::deserialize_from::hc178e2144edf2db7
-67.1% -9.24Ki -67.7% -9.24Ki saluki_components::transforms::trace_obfuscation::sql::obfuscate_sql_string::hbc6c7c370aac7ff9
[DEL] -9.49Ki [DEL] -9.34Ki _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h44f59c3078bb5bee
[DEL] -9.74Ki [DEL] -9.59Ki _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h795d206f112d7dfd
-81.7% -10.1Ki -82.6% -10.1Ki _<core::pin::Pin<P> as core::future::future::Future>::poll::h901e76ef802f2f4c
[DEL] -11.6Ki [DEL] -11.5Ki _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h262edabf8ad9b351
[DEL] -15.4Ki [DEL] -15.3Ki _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::hf4491bd4db3bec82
[DEL] -15.9Ki [DEL] -15.7Ki _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::hf41c2cc956726b9d
[DEL] -32.1Ki [DEL] -32.0Ki saluki_components::transforms::apm_stats::ApmStats::process_trace::h7d2f794f20a992a4
-1.4% -83.5Ki -1.4% -69.0Ki [4675 Others]
-0.3% -111Ki -0.3% -96.7Ki TOTAL
The adp/ SMP target dir is generated from experiments.yaml via
generate_experiments.py, and CI's check-smp-experiments verifies
cases/ stays in sync. The hand-rewritten case files for the
Agent-vs-Agent+ADP 1Hz benchmark were drifting from that source of
truth — running the check would fail.
This commit makes experiments.yaml the source of truth again:
- Rewrites the global block to target the Datadog Agent entrypoint
(name: datadog-agent, command: /bin/entrypoint.sh, files:
datadog.yaml sourced from shared/datadog.yaml).
- Bumps memory_allotment to 3200MiB and trims env to DD_API_KEY +
DD_HOSTNAME — the Agent reads the rest from datadog.yaml.
- Drops ADP-standalone-only env from the dsd_base template and
switches the unix_datagram path to /tmp/dsd.socket.
- Drops OTLP and quality_gates experiments + templates (out of scope
on this branch).
- Adds shared/datadog.yaml (the 1Hz config the Agent reads;
aggregator_bucket_size_seconds: 1 reaches ADP via config-stream).
- Adds a `${EXPERIMENT_NAME}` placeholder substitution to
generate_experiments.py so DD_INTERNAL_PROFILING_EXTRA_TAGS gets
the per-case expanded name without duplicating each experiment
three times.
After regen, `make check-smp-experiments` passes against the 15
dsd_uds_* cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression Detector (Agent vs Agent+ADP, 1Hz dogstatsd)Regression Detector ResultsRun ID: 6e12d29a-e079-4e47-b394-db9d8f0fb254 Baseline: 15f1e04a Optimization Goals: ❌ Regression(s) detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +5.96 | [+5.74, +6.18] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +3.90 | [+3.68, +4.12] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.06, +0.06] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.06, +0.06] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.05 | [-0.21, +0.10] | 1 | bounds checks dashboard |
| ✅ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -15.26 | [-15.43, -15.09] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -18.44 | [-79.85, +42.98] | 1 | bounds checks dashboard |
| ❌ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -21.53 | [-21.68, -21.39] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | -26.11 | [-92.52, +40.30] | 1 | bounds checks dashboard |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | -42.31 | [-106.56, +21.93] | 1 | bounds checks dashboard |
| ✅ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -46.76 | [-70.35, -23.18] | 1 | bounds checks dashboard |
| ✅ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -56.79 | [-65.52, -48.07] | 1 | bounds checks dashboard |
| ✅ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -56.99 | [-57.14, -56.84] | 1 | bounds checks dashboard |
| ✅ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -76.10 | [-76.21, -75.99] | 1 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Summary
Adds an SMP regression benchmark comparing the Datadog Agent (baseline) against a converged Agent+Agent-Data-Plane image (comparison) for raw dogstatsd ingest at a 1-second aggregation bucket interval. Stacked on #1459 so the single `aggregator_bucket_size_seconds: 1` knob in `datadog.yaml` reaches both Agent core and ADP. Reuses `test/smp/regression/adp/` in place — OTLP and quality_gates cases removed; the 15 `dsd_uds_*` cases rewritten to drive the Agent entrypoint and a converged Dockerfile.
Based on PR that did comparison of tag filter experiments: #1327
Draft because the meaningful review signal is the SMP report comment, which CI posts after the benchmark runs.
Test plan
References
Stacked on #1459 (`luke/configurable-aggregation`). Modeled after #1327 but slimmer — no tag filtering, reuses existing dsd_uds cases.
🤖 Generated with Claude Code