Skip to content

test(antithesis): enable forwarder disk persistence — flags a log-amplification bug#1800

Draft
blt wants to merge 1 commit into
blt/antithesis-workload-samplingfrom
blt/antithesis-disk-persistence
Draft

test(antithesis): enable forwarder disk persistence — flags a log-amplification bug#1800
blt wants to merge 1 commit into
blt/antithesis-workload-samplingfrom
blt/antithesis-disk-persistence

Conversation

@blt
Copy link
Copy Markdown
Contributor

@blt blt commented Jun 2, 2026

Summary

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

How did you test this PR?

References

Copy link
Copy Markdown
Contributor Author

blt commented Jun 2, 2026

@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented Jun 2, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Semantic PR Title Check | Check For Semantic PR Title   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Unknown release type 'test' in pull request title 'test(antithesis): enable forwarder disk persistence — flags a log-amplification bug'.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 00d479d | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Jun 2, 2026

Binary Size Analysis (Agent Data Plane)

Baseline: dc62bbd · Comparison: 00d479d · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.85 MiB (baseline) vs 37.92 MiB (comparison)
Size Change: +70.89 KiB (+0.18%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
tonic -27.60 KiB 207
tokio +25.05 KiB 1008
alloc +17.22 KiB 442
[sections] +16.64 KiB 6
figment -16.19 KiB 163
comfy_table +16.17 KiB 9
agent_data_plane::main::_{{closure}} -15.61 KiB 1
chrono -12.27 KiB 9
http_body_util +11.95 KiB 46
serde_core +11.14 KiB 50
agent_data_plane::internal::env +10.50 KiB 24
hyper_timeout +8.77 KiB 8
tower_layer +6.56 KiB 13
serde_json -5.83 KiB 38
anyhow +4.60 KiB 232
agent_data_plane::components::ottl_transform_processor +4.53 KiB 10
anon.fc864d48bd6fb31b92d502387c127e70.164.llvm.6438344938255192491 -3.90 KiB 1
agent_data_plane::cli::run -3.83 KiB 7
anon.906af368ca29b4bed0ab339065337fa0.53.llvm.14344814396300750576 +3.82 KiB 1
anon.3cc1e2c151556d16a22a602cac2ac04c.190.llvm.12895873079736457573 +3.00 KiB 1
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.5% +55.6Ki  +0.7% +58.0Ki    [11604 Others]
   +51% +20.2Ki   +51% +20.2Ki    agent_data_plane::internal::env::workload::build_collector::_{{closure}}::h48c3bde421c3ca9e
   +52% +19.8Ki   +52% +19.8Ki    agent_data_plane::internal::env::ADPEnvironmentProvider::from_configuration::_{{closure}}::h9670fb6d171f7edc
  [NEW] +17.8Ki  [NEW] +17.7Ki    tokio::runtime::park::CachedParkThread::block_on::hf364174df10d3359
 +34e2% +16.2Ki +41e2% +16.2Ki    comfy_table::cell::Cell::new::hbfd14b14cc593aba
  +1.4% +15.3Ki  +1.4% +15.3Ki    [section .gcc_except_table]
  [NEW] +4.87Ki  [NEW] +4.61Ki    agent_data_plane::components::ottl_transform_processor::config::_::_<impl serde_core::de::Deserialize for agent_data_plane::components::ottl_transform_processor::config::OttlTransformConfig>::deserialize::h6879914fbe295bfb
  [NEW] +4.23Ki  [NEW] +4.10Ki    agent_data_plane::state::metrics::rules::aggregation::get_aggregation_remappings::h8df5781a6f761921
   +67% +3.93Ki   +68% +3.93Ki    h2::proto::connection::Connection<T,P,B>::poll::hf5517bad6e618457
  +689% +3.88Ki  [ = ]       0    core::ptr::drop_in_place<http_body_util::combinators::map_err::MapErr<tonic::body::Body,axum_core::error::Error::new<tonic::status::Status>>>::heab8229a141536d3
  [NEW] +3.87Ki  [NEW] +3.72Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h7d89aa3704ac3a45
  [NEW] +3.82Ki  [NEW]     +16    anon.906af368ca29b4bed0ab339065337fa0.53.llvm.14344814396300750576
  [DEL] -3.90Ki  [DEL]     -16    anon.fc864d48bd6fb31b92d502387c127e70.164.llvm.6438344938255192491
 -83.0% -4.04Ki -85.0% -4.04Ki    agent_data_plane::state::metrics::rules::get_datadog_agent_remappings::hbdc0274dbd81cbc3
  [DEL] -4.19Ki  [DEL]    -340    core::ptr::drop_in_place<core::option::Option<$LP$u32,tokio_util::sync::reusable_box::ReusableBoxFuture<core::result::Result<tokio::sync::semaphore::OwnedSemaphorePermit,tokio::sync::batch_semaphore::AcquireError>>$RP$>>::h3084f24fce0a93d5
  [DEL] -4.21Ki  [DEL]    -341    core::ptr::drop_in_place<http_body_util::combinators::map_err::MapErr<http_body_util::combinators::map_err::MapErr<http_body_util::combinators::map_frame::MapFrame<tonic::body::Body,tonic::codec::decode::Streaming<datadog_protos::agent_include::datadog::remoteagent::flare::v1::GetFlareFilesRequest>::new<tonic::body::Body,tonic_prost::codec::ProstDecoder<datadog_protos::agent_include::datadog::remoteagent::flare::v1::GetFlareFilesRequest>>::{{closure}}>,tonic::codec::decode::Streaming<datadog_protos::agent_include::datadog::remoteagent::flare::v1::GetFlareFilesRequest>::new<tonic::body::Body,tonic_prost::codec::ProstDecoder<datadog_protos::agent_include::datadog::remoteagent::flare::v1::GetFlareFilesRequest>>::{{closure}}>,tonic::status::Status::map_error<tonic::status::Status>>>::he50075b158634d51
  -2.9% -4.29Ki  -2.9% -4.29Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hdf07f62ba689f516
  [DEL] -12.3Ki  [DEL] -12.2Ki    _<chrono::format::formatting::DelayedFormat<I> as core::fmt::Display>::fmt::hf5d2386408d8d014
 -31.8% -15.6Ki -31.9% -15.5Ki    agent_data_plane::main::_{{closure}}::h91f918e534067526
  [DEL] -17.3Ki  [DEL] -17.1Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::hcfe30cbe49586720
  [DEL] -32.6Ki  [DEL] -32.4Ki    agent_data_plane::internal::env::workload::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::hc52c5cf280184190
  +0.2% +70.9Ki  +0.2% +77.2Ki    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Jun 2, 2026

Regression Detector (Agent Data Plane)

Run ID: 678020e6-f2ab-4c7b-afd7-eb33a24474cb
Baseline: dc62bbd2 · Comparison: 00d479d1 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
otlp_ingest_traces_5mb_cpu (erratic) cpu ⚪ +2.27 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ +1.88 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ +1.34 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ +1.19 metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ +0.58 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu ⚪ +0.35 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ +0.29 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ +0.25 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ +0.18 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ -0.09 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ +0.09 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ +0.07 metrics profiles logs
otlp_ingest_metrics_5mb_memory memory ⚪ +0.02 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ -0.01 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ +0.00 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ +0.03 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ +0.04 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ -0.06 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ -0.06 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ -0.07 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ -0.15 metrics profiles logs
quality_gates_rss_idle memory ⚪ -0.17 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ -0.27 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ -0.39 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ -0.48 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ -0.64 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ +0.75 metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ -1.12 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ -2.18 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ -3.85 metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ -4.62 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 128 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 40.1 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 60.5 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 182 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 26.7 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

@blt blt force-pushed the blt/antithesis-workload-sampling branch from dd0c580 to 6945527 Compare June 2, 2026 16:26
@blt blt force-pushed the blt/antithesis-disk-persistence branch from 04cada1 to e3b7d43 Compare June 2, 2026 16:27
@blt blt force-pushed the blt/antithesis-workload-sampling branch from 6945527 to 6e47ff1 Compare June 2, 2026 20:16
@blt blt force-pushed the blt/antithesis-disk-persistence branch from e3b7d43 to 1d74c55 Compare June 2, 2026 20:16
@blt blt force-pushed the blt/antithesis-workload-sampling branch from 6e47ff1 to b291254 Compare June 2, 2026 20:33
@blt blt force-pushed the blt/antithesis-disk-persistence branch 2 times, most recently from 1a275ee to 52a95f4 Compare June 2, 2026 20:48
@blt blt force-pushed the blt/antithesis-workload-sampling branch 2 times, most recently from 2ac1e74 to 90323e5 Compare June 2, 2026 21:00
@blt blt force-pushed the blt/antithesis-disk-persistence branch 2 times, most recently from 1925259 to 709025d Compare June 2, 2026 21:27
@blt blt force-pushed the blt/antithesis-workload-sampling branch from 90323e5 to 5a9453c Compare June 2, 2026 21:27
…lification bug

Sample forwarder_storage_max_size_in_bytes 50/50 on/off with forwarder_storage_path
on a persistent compose volume, so the on-disk retry queue and restart-recovery paths
run for the first time.

BUG this branch surfaces: with persistence on, a network partition fills the
disk-backed retry queue, and the forwarder logs error! per failed retry attempt
(io.rs:462/472/421). Over a large backlog that is unbounded log amplification — it
floods per-moment output, tripping 'very high output ... fail to materialize' at
cx=134896 on run 4ecf6d1b, which masks other findings. The same path also opens the
non-atomic torn-write hunt at persisted.rs:184 under node termination.
@blt blt force-pushed the blt/antithesis-workload-sampling branch from 5a9453c to e5880cc Compare June 2, 2026 21:37
@blt blt force-pushed the blt/antithesis-disk-persistence branch from 709025d to 00d479d Compare June 2, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test All things testing: unit/integration, correctness, SMP regression, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant