Skip to content

Add prereq, install page, upgrade#1088

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
a-mccarthy:docs/p0-core
May 14, 2026
Merged

Add prereq, install page, upgrade#1088
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
a-mccarthy:docs/p0-core

Conversation

@a-mccarthy

Copy link
Copy Markdown
Contributor

splitting out docs work from larger #1068. this focuses solely on the prereqs and install set up and will hopefully be easier to review.

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Adds prereq and install docs

Which issue(s) this PR is related to:

split out from to #1068

Special notes for your reviewer:

Does this PR introduce a user-facing change?

adds docs


Additional documentation (design docs, usage docs, etc.):


Checklist

  • make check test passes locally
  • make check-generate passes if api/ changed (CRDs, deepcopy, informers, listers, clientset)
  • make check-modules passes if go.mod / go.sum changed
  • Tests added or updated for the change
  • Helm chart (deployments/helm) updated if flags, RBAC, or defaults changed

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 30, 2026
@netlify

netlify Bot commented Apr 30, 2026

Copy link
Copy Markdown

Deploy Preview for dra-driver-nvidia-gpu ready!

Name Link
🔨 Latest commit 0305e68
🔍 Latest deploy log https://app.netlify.com/projects/dra-driver-nvidia-gpu/deploys/6a052841dba9c20008259eb1
😎 Deploy Preview https://deploy-preview-1088--dra-driver-nvidia-gpu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 30, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @a-mccarthy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 30, 2026
@a-mccarthy a-mccarthy mentioned this pull request May 1, 2026
5 tasks
@dims

dims commented May 1, 2026

Copy link
Copy Markdown
Member

/release-note-none
/ok-to-test

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2026
@dims

dims commented May 1, 2026

Copy link
Copy Markdown
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 1, 2026
Comment thread docs/install.md Outdated
Comment thread docs/install.md Outdated
| `resources.gpus.enabled` | `true` | Enable the GPU kubelet plugin. Requires `gpuResourcesEnabledOverride=true`. |
| `resources.computeDomains.enabled` | `true` | Enable the ComputeDomain controller and kubelet plugin. |
| `gpuResourcesEnabledOverride` | `false` | Required to enable GPU allocation resources. |
| `featureGates` | `{}` | Map of feature gate name to boolean. See [Feature gates](reference/feature-gates.md). |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have the table of FGs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a table in a separate PR, #1098

Comment thread docs/install.md
Comment thread docs/install.md
Comment thread docs/prerequisites.md Outdated

On Kubernetes v1.32 and v1.33, the feature gate must be manually enabled on four components: the API server, scheduler, controller manager, and each kubelet.

### kubeadm

@guptaNswati guptaNswati May 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified with a kubeadm-init.yaml

apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
apiServer:
  extraArgs:
  - name: "feature-gates"
    value: "DynamicResourceAllocation=true"
  - name: "runtime-config"
    value: "resource.k8s.io/v1beta1=true"
controllerManager:
  extraArgs:
  - name: "feature-gates"
    value: "DynamicResourceAllocation=true"
scheduler:
  extraArgs:
  - name: "feature-gates"
    value: "DynamicResourceAllocation=true"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  DynamicResourceAllocation: true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guptaNswati i've updated this steps, can you review again?

@guptaNswati

Copy link
Copy Markdown
Contributor

Thank you @a-mccarthy for splitting it up. easier to review and iterate. I have some nits but otherwise looks good.

@a-mccarthy

Copy link
Copy Markdown
Contributor Author

@guptaNswati thanks for the review! I've made the updates here, please let me know if you have any other ideas to improve here

Comment thread docs/install.md Outdated
Comment thread docs/install.md Outdated
Comment thread docs/install.md Outdated
Comment thread docs/install.md Outdated
Comment thread docs/install.md Outdated
Comment thread docs/prerequisites.md Outdated
Comment thread docs/prerequisites.md Outdated
Comment thread docs/prerequisites.md Outdated
Comment thread docs/prerequisites.md Outdated
Comment thread docs/prerequisites.md Outdated
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2026
@a-mccarthy

Copy link
Copy Markdown
Contributor Author

@dims for review!

@a-mccarthy a-mccarthy changed the title Add prereq and install page Add prereq, install page, upgrade May 13, 2026
@a-mccarthy a-mccarthy requested a review from bmozaffa May 13, 2026 17:22
Comment thread docs/prerequisites.md Outdated

The DRA Driver for NVIDIA GPUs requires the following:

- Kubernetes cluster running Kubernetes v1.32 or later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivamerla is 1.32 still accurate?

@shivamerla shivamerla May 13, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets bump it to 1.34 or later as DRA is GA since that version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added 1.34.2+ here, b/c there was an issue in an earlier 1.34 version, right?

Comment thread docs/install.md Outdated
Comment thread docs/prerequisites.md Outdated
| `DynamicResourceAllocation` feature gate | Enabled by default in Kubernetes v1.34+. On v1.32 and v1.33, [enable it manually](#enable-dra-on-kubernetes-v132-and-v133). |
| Helm | v3.8 or later. |
| NVIDIA Driver | v565 or later for GPU allocation. v570.158.01 or later if using [ComputeDomains](#computedomains-additional-prerequisites). |
| NVIDIA Container Toolkit | Configures the container runtime on each node and enables Container Device Interface (CDI) support, which the DRA Driver uses to expose GPUs to containers. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of NVIDIA Container Toolkit dependency here, mention that CDI should be enabled as below (Default from containerd 2.0, CRIO v1.27+).

CDI must be enabled in the underlying container runtime (such as containerd or CRI-O).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
@shivamerla

Copy link
Copy Markdown
Contributor

/cherry-pick release-0.4

@k8s-infra-cherrypick-robot

Copy link
Copy Markdown

@shivamerla: once the present PR merges, I will cherry-pick it on top of release-0.4 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-0.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shivamerla

Copy link
Copy Markdown
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 14, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: c3d176fcbec75c24b2345e9720695d229085f855

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-mccarthy, dims, shivamerla

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 04ef19e into kubernetes-sigs:main May 14, 2026
16 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in DRA Driver for NVIDIA GPUs May 14, 2026
@github-actions

Copy link
Copy Markdown

❌ Failed to create backport PR for release-0.4

Error: GitHub Actions is not permitted to create or approve pull requests. - https://docs.github.com/rest/pulls/pulls#create-a-pull-request

Please backport manually.

@k8s-infra-cherrypick-robot

Copy link
Copy Markdown

@shivamerla: new pull request created: #1141

Details

In response to this:

/cherry-pick release-0.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick/release-0.4 cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants