Skip to content

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463

Closed
parallelArchitect wants to merge 1 commit into
Syllo:masterfrom
parallelArchitect:fix/gb10-unified-memory-detection
Closed

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463
parallelArchitect wants to merge 1 commit into
Syllo:masterfrom
parallelArchitect:fix/gb10-unified-memory-detection

Conversation

@parallelArchitect
Copy link
Copy Markdown

@parallelArchitect parallelArchitect commented Apr 15, 2026

#449)

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal (~121GB). This prevents has_unified_memory from being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1.

Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal. If total >= 90% of system RAM, classify as unified memory and use MemAvailable instead of MemTotal for display.

Note: requires validation on GB10 / DGX Spark hardware. Author does not have access to a coherent UMA system.

References

NVML API documentation on SOC/UMA behavior: https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html
Community NVML shim for GB10 UMA: https://forums.developer.nvidia.com/t/nvml-support-for-dgx-spark-grace-blackwell-unified-memory-community-solution/358869
NVML memory fix at the shim layer: https://github.com/parallelArchitect/nvml-unified-shim
btop PR: aristocratos/btop#1611
nvitop PR: XuehaiPan/nvitop#208

@parallelArchitect
Copy link
Copy Markdown
Author

For anyone needing the NVML memory fix now while this PR is under review — the fix is available in this fork: https://github.com/parallelArchitect/nvml-unified-shim

…449)

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with
total == system MemTotal (~121GB). This prevents has_unified_memory from
being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1.

Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal.
If total >= 90% of system RAM, classify as unified memory and use
MemAvailable instead of MemTotal for display.

Note: requires validation on GB10 / DGX Spark hardware. Author does not
have access to a coherent UMA system.
@Syllo Syllo force-pushed the fix/gb10-unified-memory-detection branch from e756faf to e436c6b Compare April 29, 2026 08:29
@Syllo
Copy link
Copy Markdown
Owner

Syllo commented Apr 29, 2026

Hey,

Sorry I merged #466 before coming to your patch and resolved the merge conflicts.

I'm a bit concerned about the workaround (>= 90% of RAM) as it could probably falsely detect as UMA many systems, e.g. 32GB of RAM and a dedicated GPU with 32GB of GDDR. Maybe for now the only system like this has 128GB of RAM (and discrete GPU afaik don't go that far) so we could differentiate on this metric.

I guess that we'll have to see how NVIDIA is going to expose this through their NVML library to be able to avoid this scenario.

@parallelArchitect
Copy link
Copy Markdown
Author

Valid concern on the threshold. The proper detection is via CUDA device attributes:

cudaDevAttrPageableMemoryAccessUsesHostPageTables — true on GB10
cudaDevAttrConcurrentManagedAccess — true on GB10

Both together identify hardware-coherent UMA without relying on memory size comparison. SM 12.1 check is also an option but that's too narrow.

The NVML path is the problem — nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal on GB10, which is correct behavior for UMA but breaks the has_unified_memory detection. The fix should detect UMA via device attributes first, then use /proc/meminfo MemAvailable for display on those platforms.

I don't have direct GB10 access but azampatti and dustin1925 in the community do — happy to coordinate validation if you want to iterate on a cleaner fix.

@parallelArchitect parallelArchitect closed this by deleting the head repository May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants