Fix NVML memory reporting regression on coherent UMA platforms (Fixes… by parallelArchitect · Pull Request #463 · Syllo/nvtop

parallelArchitect · 2026-04-15T11:08:27Z

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal (~121GB). This prevents has_unified_memory from being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1.

Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal. If total >= 90% of system RAM, classify as unified memory and use MemAvailable instead of MemTotal for display.

Note: requires validation on GB10 / DGX Spark hardware. Author does not have access to a coherent UMA system.

References

NVML API documentation on SOC/UMA behavior: https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html
Community NVML shim for GB10 UMA: https://forums.developer.nvidia.com/t/nvml-support-for-dgx-spark-grace-blackwell-unified-memory-community-solution/358869
NVML memory fix at the shim layer: https://github.com/parallelArchitect/nvml-unified-shim
btop PR: aristocratos/btop#1611
nvitop PR: XuehaiPan/nvitop#208

parallelArchitect · 2026-04-15T19:45:30Z

For anyone needing the NVML memory fix now while this PR is under review — the fix is available in this fork: https://github.com/parallelArchitect/nvml-unified-shim

…449) On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal (~121GB). This prevents has_unified_memory from being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1. Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal. If total >= 90% of system RAM, classify as unified memory and use MemAvailable instead of MemTotal for display. Note: requires validation on GB10 / DGX Spark hardware. Author does not have access to a coherent UMA system.

Syllo · 2026-04-29T09:28:57Z

Hey,

Sorry I merged #466 before coming to your patch and resolved the merge conflicts.

I'm a bit concerned about the workaround (>= 90% of RAM) as it could probably falsely detect as UMA many systems, e.g. 32GB of RAM and a dedicated GPU with 32GB of GDDR. Maybe for now the only system like this has 128GB of RAM (and discrete GPU afaik don't go that far) so we could differentiate on this metric.

I guess that we'll have to see how NVIDIA is going to expose this through their NVML library to be able to avoid this scenario.

parallelArchitect · 2026-04-30T11:02:10Z

Valid concern on the threshold. The proper detection is via CUDA device attributes:

cudaDevAttrPageableMemoryAccessUsesHostPageTables — true on GB10
cudaDevAttrConcurrentManagedAccess — true on GB10

Both together identify hardware-coherent UMA without relying on memory size comparison. SM 12.1 check is also an option but that's too narrow.

The NVML path is the problem — nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal on GB10, which is correct behavior for UMA but breaks the has_unified_memory detection. The fix should detect UMA via device attributes first, then use /proc/meminfo MemAvailable for display on those platforms.

I don't have direct GB10 access but azampatti and dustin1925 in the community do — happy to coordinate validation if you want to iterate on a cleaner fix.

parallelArchitect mentioned this pull request Apr 15, 2026

Fix NVML memory reporting regression on coherent UMA platforms aristocratos/btop#1611

Closed

parallelArchitect mentioned this pull request Apr 16, 2026

Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX … XuehaiPan/nvitop#208

Open

Syllo force-pushed the fix/gb10-unified-memory-detection branch from e756faf to e436c6b Compare April 29, 2026 08:29

parallelArchitect closed this by deleting the head repository May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463
parallelArchitect wants to merge 1 commit into
Syllo:masterfrom
parallelArchitect:fix/gb10-unified-memory-detection

parallelArchitect commented Apr 15, 2026 •

edited

Loading

Uh oh!

parallelArchitect commented Apr 15, 2026

Uh oh!

Syllo commented Apr 29, 2026

Uh oh!

parallelArchitect commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

parallelArchitect commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parallelArchitect commented Apr 15, 2026

Uh oh!

Syllo commented Apr 29, 2026

Uh oh!

parallelArchitect commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parallelArchitect commented Apr 15, 2026 •

edited

Loading