feat(datadog): support forwarder_outdated_file_in_days for stale retry file cleanup#1777
feat(datadog): support forwarder_outdated_file_in_days for stale retry file cleanup#1777jszwedko wants to merge 13 commits into
forwarder_outdated_file_in_days for stale retry file cleanup#1777Conversation
|
| // If the storage size is set, enable disk persistence for the retry queue. | ||
| if config.retry().storage_max_size_bytes() > 0 { | ||
| // Remove stale retry files before opening the queue. | ||
| remove_outdated_retry_files(config.retry().storage_path(), config.retry().outdated_file_in_days()).await; |
There was a problem hiding this comment.
This matches the Agent's behavior of only checking at start-up.
forwarder_outdated_file_in_days for stale retry file cleanupforwarder_outdated_file_in_days for stale retry file cleanup
Binary Size Analysis (Agent Data Plane)Baseline: b078b0d · Comparison: d24e244 · diff ✅ Binary size difference within thresholdChanges by Module
Detailed Symbol Changes |
forwarder_outdated_file_in_days for stale retry file cleanupforwarder_outdated_file_in_days for stale retry file cleanup
forwarder_outdated_file_in_days for stale retry file cleanupforwarder_outdated_file_in_days for stale retry file cleanup
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ❌ 2 regressions detected
Fine details of change detection per experiment (33)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
…try file cleanup Adds forwarder_outdated_file_in_days (default 10) to RetryConfiguration. When disk persistence is enabled, ADP now removes retry-*.json files older than the configured number of days each time it starts, preventing unbounded disk growth after long outages. Set to 0 to disable. Matches the core Agent's behavior in comp/forwarder/defaultforwarder/default_forwarder.go. Closes #1360 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…le scope Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ile_in_days Schema type: number maps to Float in smoke tests, causing injection of 1.5 which fails to deserialize into u32. Explicit Integer override makes the smoke test inject 42 instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scan the per-queue subdirectory (storage_path/{queue_id}) not the root;
retry files live at storage_path/{queue_id}/retry-*.json
- Use the filename-embedded creation timestamp via decode_timestamped_filename
(now pub-exported from saluki-io) instead of filesystem mtime, which can
be reset by backup/restore tools
- break (not continue) on next_entry() error to avoid potential infinite loop
on macOS with persistent readdir errors
- Downgrade ENOENT on remove_file to debug; it indicates a concurrent sibling
endpoint task already deleted the same file
- Update tests to use valid filename-encoded timestamps; remove filetime dep
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ueue::from_root_path Cleanup now lives inside saluki-io alongside the filename format it depends on. No cross-crate exports needed. RetryQueue::with_disk_persistence and PersistedQueue::from_root_path each gain a max_age_days: u32 parameter; io.rs passes outdated_file_in_days() at the single call site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y files The core Agent's FileRemovalPolicy with outdatedFileDayCount=0 sets the cutoff to now, deleting all retry files. Remove the early-return guard (which incorrectly documented 0 as "disable") to match that behavior. Update the test and field doc accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Simplify doc last line to just 'Defaults to 10.' - Use 10 (not 0) in with_disk_persistence test call for clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…suite Remove the nested mod, match the style of storage_ratio_exceeded and other existing tests (tempfile::tempdir, flat helpers, files_in_dir). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test via the public API (from_root_path) rather than calling the private remove_outdated_retry_files directly. Uses make_persisted_queue helper, FakeData, and DiskUsageRetrieverImpl — consistent with storage_ratio_exceeded and other tests in the module. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8dc55a2 to
d24e244
Compare
forwarder_outdated_file_in_days for stale retry file cleanupforwarder_outdated_file_in_days for stale retry file cleanup
forwarder_outdated_file_in_days for stale retry file cleanupforwarder_outdated_file_in_days for stale retry file cleanup
Summary
Adds support for
forwarder_outdated_file_in_days(default 10), matching the core Agent's startup behavior. When disk persistence is enabled, ADP now scansforwarder_storage_pathat startup and removesretry-*.jsonfiles whose filesystem mtime exceeds the configured age, preventing unbounded disk growth from stale retry data after extended outages. Set to0to disable cleanup. The config key is moved from the unsupported registry to the supported forwarder registry.Closes #1360
Test plan
cargo check --workspace && cargo check --workspace --testspassescargo test -p saluki-components --lib outdatedpasses (3 new tests)cargo test -p saluki-components --lib config_registrypassesforwarder_storage_max_size_in_bytesset andforwarder_outdated_file_in_days: 10, old retry files are removed at startup; recent files and non-retry files are untouchedforwarder_outdated_file_in_days: 0, no files are removed🤖 Generated with Claude Code