Add prereq, install page, upgrade#1088
Conversation
✅ Deploy Preview for dra-driver-nvidia-gpu ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @a-mccarthy. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/release-note-none |
|
/approve |
| | `resources.gpus.enabled` | `true` | Enable the GPU kubelet plugin. Requires `gpuResourcesEnabledOverride=true`. | | ||
| | `resources.computeDomains.enabled` | `true` | Enable the ComputeDomain controller and kubelet plugin. | | ||
| | `gpuResourcesEnabledOverride` | `false` | Required to enable GPU allocation resources. | | ||
| | `featureGates` | `{}` | Map of feature gate name to boolean. See [Feature gates](reference/feature-gates.md). | |
There was a problem hiding this comment.
should we have the table of FGs?
|
|
||
| On Kubernetes v1.32 and v1.33, the feature gate must be manually enabled on four components: the API server, scheduler, controller manager, and each kubelet. | ||
|
|
||
| ### kubeadm |
There was a problem hiding this comment.
this can be simplified with a kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
apiServer:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
- name: "runtime-config"
value: "resource.k8s.io/v1beta1=true"
controllerManager:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
scheduler:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
DynamicResourceAllocation: true
There was a problem hiding this comment.
@guptaNswati i've updated this steps, can you review again?
|
Thank you @a-mccarthy for splitting it up. easier to review and iterate. I have some nits but otherwise looks good. |
|
@guptaNswati thanks for the review! I've made the updates here, please let me know if you have any other ideas to improve here |
|
@dims for review! |
|
|
||
| The DRA Driver for NVIDIA GPUs requires the following: | ||
|
|
||
| - Kubernetes cluster running Kubernetes v1.32 or later. |
There was a problem hiding this comment.
Lets bump it to 1.34 or later as DRA is GA since that version.
There was a problem hiding this comment.
i added 1.34.2+ here, b/c there was an issue in an earlier 1.34 version, right?
| | `DynamicResourceAllocation` feature gate | Enabled by default in Kubernetes v1.34+. On v1.32 and v1.33, [enable it manually](#enable-dra-on-kubernetes-v132-and-v133). | | ||
| | Helm | v3.8 or later. | | ||
| | NVIDIA Driver | v565 or later for GPU allocation. v570.158.01 or later if using [ComputeDomains](#computedomains-additional-prerequisites). | | ||
| | NVIDIA Container Toolkit | Configures the container runtime on each node and enables Container Device Interface (CDI) support, which the DRA Driver uses to expose GPUs to containers. | |
There was a problem hiding this comment.
Instead of NVIDIA Container Toolkit dependency here, mention that CDI should be enabled as below (Default from containerd 2.0, CRIO v1.27+).
CDI must be enabled in the underlying container runtime (such as containerd or CRI-O).
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
|
/cherry-pick release-0.4 |
|
@shivamerla: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/approve |
|
LGTM label has been added. DetailsGit tree hash: c3d176fcbec75c24b2345e9720695d229085f855 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: a-mccarthy, dims, shivamerla The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
❌ Failed to create backport PR for Error: GitHub Actions is not permitted to create or approve pull requests. - https://docs.github.com/rest/pulls/pulls#create-a-pull-request Please backport manually. |
|
@shivamerla: new pull request created: #1141 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
splitting out docs work from larger #1068. this focuses solely on the prereqs and install set up and will hopefully be easier to review.
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Adds prereq and install docs
Which issue(s) this PR is related to:
split out from to #1068
Special notes for your reviewer:
Does this PR introduce a user-facing change?
adds docs
Additional documentation (design docs, usage docs, etc.):
Checklist
make check testpasses locallymake check-generatepasses ifapi/changed (CRDs, deepcopy, informers, listers, clientset)make check-modulespasses ifgo.mod/go.sumchangeddeployments/helm) updated if flags, RBAC, or defaults changed