Skip to content

Bug: Hardcoded --authorization-mode prevents usage of --authorization-config #1120

@ayan-khan-rtds

Description

@ayan-khan-rtds

Description

Kamaji unconditionally sets --authorization-mode=Node,RBAC in the kube-apiserver desiredArgs map at [internal/builders/controlplane/deployment.go:723](https://github.com/clastix/kamaji/blob/master/internal/builders/controlplane/deployment.go#L723). While user-provided extraArgs can override values via MergeMaps(), they cannot remove keys from the map. Kubernetes rejects --authorization-config when --authorization-mode or --authorization-webhook-* flags are present:

E0326 13:22:36.127190       1 run.go:72] "command failed" err="--authorization-config can not be specified when --authorization-mode or --authorization-webhook-* flags are defined"

This is documented Kubernetes behavior:

You cannot combine the --authorization-mode command line argument with the --authorization-config command line argument used for configuring authorization using a local file. If you try this, the API server reports an error message during startup, then exits immediately.

Why This Is a Problem

The Structured Authorization Configuration API (via --authorization-config, GA since Kubernetes 1.30 / KEP-3221) is the only way to set failurePolicy and timeout on authorization webhooks. Without it, authorization webhooks are fail-closed by default - if the webhook becomes unreachable, all API requests to the tenant cluster are denied.

Feature Flag-based (--authorization-mode) Structured Config (--authorization-config)
failurePolicy (NoOpinion/Deny) Not available ✅ Available
timeout per webhook Not available ✅ Available
matchConditions (CEL pre-filters) Not available ✅ Available
Multiple authorization webhooks Not available ✅ Available

Impact on Cluster Health

  • Complete API server lockout: If the authorization webhook goes down (network partition, service crash, latency spike), no user, controller, or kubelet can perform any action.
  • Cascading failures: Kubelet lease renewals fail → nodes go NotReady → pods get evicted → self-healing controllers cannot reschedule workloads because they also cannot reach the API server.
  • No graceful degradation: failurePolicy: NoOpinion (only available via --authorization-config) would allow falling through to the next authorizer (RBAC), keeping the cluster functional during a webhook outage. This safety net is currently unreachable.

Steps to Reproduce

  1. Create a TenantControlPlane with extraArgs specifying --authorization-config:
apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: test-tcp
  namespace: test
spec:
  controlPlane:
    deployment:
      extraVolumes:
        - configMap:
            name: authz-config
          name: authz-config-vol
      additionalVolumeMounts:
        apiServer:
          - name: authz-config-vol
            mountPath: /authz-config
      extraArgs:
        apiServer:
          - "--authorization-config=/authz-config/authorization-config.yaml"
  1. The kube-apiserver pod fails to start with:
E0326 13:22:36.127190       1 run.go:72] "command failed" err="--authorization-config can not be specified when --authorization-mode or --authorization-webhook-* flags are defined"
  1. Inspecting the pod confirms both flags are present - --authorization-config from extraArgs and --authorization-mode=Node,RBAC from Kamaji's hardcoded defaults.

Root Cause

In buildKubeAPIServerCommand(), --authorization-mode is hardcoded in desiredArgs:

desiredArgs := map[string]string{
    "--allow-privileged":    "true",
    "--authorization-mode":  "Node,RBAC",  // <-- always set
    // ...
}
// ...
return utilities.MergeMaps(current, desiredArgs, extraArgs)

MergeMaps() merges all maps (last wins for overlapping keys), but since --authorization-config and --authorization-mode are different keys, both end up in the final args. There is no YAML-level workaround - extraArgs can override a key's value but cannot delete a key.

Source: [internal/builders/controlplane/deployment.go:723](https://github.com/clastix/kamaji/blob/master/internal/builders/controlplane/deployment.go#L723)

Proposed Fix

When --authorization-config is present in extraArgs, remove --authorization-mode (and any --authorization-webhook-* flags) from desiredArgs before merging:

if _, hasAuthzConfig := extraArgs["--authorization-config"]; hasAuthzConfig {
    delete(desiredArgs, "--authorization-mode")
    for k := range desiredArgs {
        if strings.HasPrefix(k, "--authorization-webhook-") {
            delete(desiredArgs, k)
        }
    }
}

return utilities.MergeMaps(current, desiredArgs, extraArgs)

This mirrors how Kubernetes itself treats these flags as mutually exclusive - if the user provides a structured config file, the flag-based equivalents should not be injected.

Current Workaround

Fall back to flag-based configuration, losing failurePolicy and timeout:

extraArgs:
  apiServer:
    - "--authorization-mode=Node,Webhook,RBAC"
    - "--authorization-webhook-config-file=/authz-config/authz.yml"
    - "--authorization-webhook-cache-authorized-ttl=5m"
    - "--authorization-webhook-cache-unauthorized-ttl=30s"

This works because Kamaji (since v0.4.2 / #415) allows overriding --authorization-mode. However, this approach has no failurePolicy or timeout, meaning a webhook outage can crash the apiserver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions