Skip to content

[CASCL-506] Allow minReplicas=0 on DatadogPodAutoscaler#3048

Open
avolant wants to merge 1 commit into
DataDog:mainfrom
avolant:arthur.volant/cascl-506-allow-min-replicas-zero
Open

[CASCL-506] Allow minReplicas=0 on DatadogPodAutoscaler#3048
avolant wants to merge 1 commit into
DataDog:mainfrom
avolant:arthur.volant/cascl-506-allow-min-replicas-zero

Conversation

@avolant

@avolant avolant commented May 27, 2026

Copy link
Copy Markdown

What does this PR do?

Relaxes the kubebuilder validation on DatadogPodAutoscalerConstraints.MinReplicas from Minimum=1 to Minimum=0, so a DatadogPodAutoscaler (and DatadogPodAutoscalerClusterProfile) with spec.constraints.minReplicas: 0 is admitted by the API server. CRDs were regenerated via make manifests.

Motivation

Refs https://datadoghq.atlassian.net/browse/CASCL-506 (feature request: support horizontal scale from/to 0).

We hit this in dd-analytics while experimenting with DPA on Airflow Celery worker groups. Workers are queue-driven and idle between DAG runs, so scaling to 0 between bursts is a clear cost win. The CRD currently rejects this configuration outright:

```
DatadogPodAutoscaler.datadoghq.com "airflow-wg-adp-integrations" is invalid:
spec.constraints.minReplicas: Invalid value: 0:
spec.constraints.minReplicas in body should be greater than or equal to 1
```

This PR removes the admission-level block. It is intentionally a minimal change to unblock experimentation/design work on CASCL-506; it does not by itself make scale-from-0 work end-to-end -- see Additional Notes.

Additional Notes

  • The horizontal controller in datadog-agent (pkg/clusteragent/autoscaling/workload/controller_horizontal.go) already honors any non-nil MinReplicas. The hardcoded defaultMinReplicas = 1 only applies when the field is unset, so this is purely an admission relaxation -- no controller code change is required.
  • Scale-from-zero still depends on the recommendation source being able to emit a signal at 0 replicas. CPU-based recommendations are unusable at 0 pods (no metrics to read), and a custom-query formula like (running + queued) / replicas is undefined at 0 replicas. Queue-length-style external metrics work because the source is external to the workload.
  • Updated the MinReplicas godoc to call this out explicitly so users don't expect magic scale-up behavior just from setting minReplicas: 0.

Minimum Agent Versions

No agent-side changes are required for this PR (controller already accepts the value). Mentioning for completeness:

  • Agent: n/a
  • Cluster Agent: n/a

Describe your test plan

  • `make generate && make manifests` (run; regenerated CRD YAML/JSON files included in this PR)
  • `go vet ./api/...` -- clean
  • `go test ./api/datadoghq/common/... ./api/datadoghq/v1alpha2/...` -- pass

Manual verification (suggested for reviewer): apply the regenerated CRD to a kind cluster, then `kubectl apply` a DPA with `spec.constraints.minReplicas: 0` and confirm admission succeeds; apply one with `-1` and confirm it is still rejected.

Checklist

  • PR has at least one valid label: `enhancement`
  • PR has a milestone or the `qa/skip-qa` label (deferring to reviewer)
  • All commits are signed

Relax the kubebuilder validation on
DatadogPodAutoscalerConstraints.MinReplicas from Minimum=1 to Minimum=0
to admit scale-to-zero configurations.

The horizontal controller already honors any non-nil MinReplicas value
(default of 1 only applies when unset), so this is purely an admission
relaxation. Scale-from-zero remains gated on the recommendation source
being able to emit signals at 0 replicas (e.g. queue-based metrics);
CPU-based recommendations are unusable at 0 pods.

Regenerated CRDs (DatadogPodAutoscaler v1alpha1 + v1alpha2 and
DatadogPodAutoscalerClusterProfile) via 'make manifests'.

Refs: https://datadoghq.atlassian.net/browse/CASCL-506
@avolant avolant requested a review from a team May 27, 2026 11:08
@avolant avolant requested a review from a team as a code owner May 27, 2026 11:08
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented May 27, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

pull request linter | Check Milestone   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Missing milestone or `qa/skip-qa` label

pull request linter | build   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Missing required labels in the pull request. Valid labels are: ['bug', 'enhancement', 'refactoring', 'documentation', 'tooling', 'dependencies'].

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 72198ac | Docs | Datadog PR Page | Give us feedback!

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.36%. Comparing base (ed139da) to head (72198ac).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3048   +/-   ##
=======================================
  Coverage   42.36%   42.36%           
=======================================
  Files         337      337           
  Lines       28964    28964           
=======================================
  Hits        12272    12272           
  Misses      15887    15887           
  Partials      805      805           
Flag Coverage Δ
unittests 42.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed139da...72198ac. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants