Validate and auto-resolve AWS region before Karpenter operations#3057
Draft
L3n41c wants to merge 2 commits into
Draft
Validate and auto-resolve AWS region before Karpenter operations#3057L3n41c wants to merge 2 commits into
L3n41c wants to merge 2 commits into
Conversation
The `kubectl datadog autoscaling cluster` commands build their AWS clients
from the default credential chain while the target EKS cluster comes from the
kubeconfig context. When AWS_REGION was unset, users hit an opaque STS error
("Invalid Configuration: Missing Region" buried in endpoint-resolution noise);
when it was set to the wrong region, operations silently looked at the wrong
place (e.g. uninstall reporting "stack does not exist, skipping").
The kubeconfig context for an EKS cluster is an ARN
(arn:aws:eks:<region>:<account>:cluster/<name>) that already carries the
region — the same source PR #2892 uses for the AWS account-consistency check.
Reconcile the region from it inside clients.Build, before the service clients
are constructed:
- AWS_REGION unset but derivable from the kubeconfig ARN: derive it and reload
the config with config.WithRegion so credential providers (assume-role /
web-identity STS clients) also pick it up, then proceed with a notice.
- AWS_REGION unset and not derivable: clear, actionable error.
- AWS_REGION set but different from the cluster's region: hard error
(RegionMismatchError) in all commands (install, update, uninstall).
Refactors the kubeconfig-ARN parse into a shared getClusterARNFromKubeconfig
helper (also tightened to only trust EKS cluster ARNs) that both the account
and region extractors use.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3057 +/- ##
==========================================
+ Coverage 43.03% 43.11% +0.08%
==========================================
Files 339 339
Lines 29215 29262 +47
==========================================
+ Hits 12573 12617 +44
- Misses 15820 15823 +3
Partials 822 822
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
The patch-coverage gate (target 80%) failed because the region-reconciliation logic lived inline in clients.Build, which is integration-only and not unit testable. Extract it verbatim into a reconcileRegion(ctx, awsConfig, configFlags) helper (behavior-preserving) and add a hermetic table-driven TestReconcileRegion covering the match, derive, mismatch, undeterminable, and unreadable-kubeconfig paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Reconciles the AWS region with the target EKS cluster before the
kubectl datadog autoscaling clustercommands (install/update/uninstall) build their AWS clients:AWS_REGIONunset but derivable from the kubeconfig context ARN → derive the region and reload the AWS config withconfig.WithRegion(so credential providers also pick it up), then proceed with a log notice.AWS_REGIONunset and not derivable → clear, actionable error instead of the opaque STS "Missing Region" noise.AWS_REGIONset but different from the cluster's region → hard error (RegionMismatchError) in all three commands.Motivation
QA hit two confusing failures, both rooted in the AWS region:
AWS_REGIONunset, the only feedback was an opaque, deeply-wrapped STS error:failed to get AWS caller identity: operation error STS: GetCallerIdentity, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region.AWS_REGIONset to the wrong region, operations silently targeted the wrong place — e.g. uninstall printingStack ... does not exist, skipping deletion.The kubeconfig context for an EKS cluster is an ARN (
arn:aws:eks:<region>:<account>:cluster/<name>) that already carries the region — the same source PR #2892 uses for the AWS account-consistency check. This change reuses it for the region, with no extra AWS API call.Additional Notes
clients.Build, before the service clients are constructed, because credential providers (assume-role / web-identity STS clients) capture the region at config-load time — a post-load mutation wouldn't reach them. The config is reloaded with the derived region only in the derive path.getClusterARNFromKubeconfighelper used by both the account and region extractors. The helper is tightened to only trust EKS cluster ARNs (Service == "eks"+cluster/resource prefix).RegionMismatchErrormirrors the existingAccountMismatchError.Minimum Agent Versions
N/A — this is a
kubectl-datadogplugin change, not an agent change.Describe your test plan
resolveRegion(all branches incl. mismatch viaerrors.As, GovCloud partitions) andgetClusterARNFromKubeconfig(EKS/GovCloud ARNs, plain names, eksctl FQDNs, non-EKS ARN rejection).go build ./cmd/kubectl-datadog/...,go vet, and the package tests pass; fullmake lintreports 0 issues.AWS region not set; using "us-east-2" from the kubeconfig context.and proceeds.AWS_REGION=us-west-2against aus-east-2cluster) → fails immediately withRegionMismatchErrorfor install, update, and uninstall.Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel🤖 Generated with Claude Code