Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions cmd/gpu-kubelet-plugin/vfio-device.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ func NewVfioPciManager(containerDriverRoot string, hostDriverRoot string, nvlib
return nil, fmt.Errorf("IOMMU is not enabled in the kernel")
}

if _, err := os.Stat(hostRoot); os.IsNotExist(err) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you hit this case? If PassthroughSupport feature gate is enabled, then this host mount should be added through Helm templates. This function itself is only invoked when that feature gate is enabled here:

if featuregates.Enabled(featuregates.PassthroughSupport) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forked the repo (added some standard attributes to the driver) and i was using a custom-built image, which i deployed via Helm with some custom values. I agree that the stock Helm chart has the mount present. This was just something I hit doing some dev work.

return nil, fmt.Errorf("%s volume mount is missing: the kubelet-plugin container requires a hostPath volume "+
"mounted at %s to check GPU device availability via fuser. Ensure the helm chart has "+
"featureGates.PassthroughSupport=true or add the volume mount manually", hostRoot, hostRoot)
}

vm := &VfioPciManager{
containerDriverRoot: containerDriverRoot,
hostDriverRoot: hostDriverRoot,
Expand Down Expand Up @@ -118,11 +124,11 @@ func (vm *VfioPciManager) WaitForGPUFree(ctx context.Context, info *VfioDeviceIn
if exitErr, ok := cmdErr.(*exec.ExitError); ok && exitErr.ExitCode() == 1 {
return nil
}
err = fmt.Errorf("unexpected error checking if gpu device %q is free: %w", info.PciBusID, cmdErr)
err = fmt.Errorf("error checking if gpu device %q is free (verify %s is mounted and contains fuser): %w", info.PciBusID, hostRoot, cmdErr)
klog.V(6).Infof("[DEBUG] %s", err.Error())
continue
}
err = fmt.Errorf("gpu device %q has open fds by process(es): %q", info.PciBusID, string(out))
err = fmt.Errorf("gpu device %q has open fds by process(es): %q -- external processes (e.g. dcgm-exporter) holding GPU device handles will block VFIO passthrough", info.PciBusID, string(out))
klog.V(6).Infof("[DEBUG] %s", err.Error())
}
}
Expand Down