fix: add missing delwin() calls in free_device_windows#467
Merged
Syllo merged 1 commit intoSyllo:masterfrom May 6, 2026
Merged
Conversation
shader_cores, l2_cache_size, and exec_engines were allocated with newwin() in alloc_device_window() but never freed in free_device_windows(). This caused a memory leak every time device windows were torn down (e.g. on refresh or hot-unplug). The function now frees all windows it allocates, preventing resource leaks in long-running sessions.
danbedford
added a commit
to danbedford/nvtop
that referenced
this pull request
May 3, 2026
…plot_window, and unsigned underflow guard - free_device_windows: delwin() for shader_cores, l2_cache_size, exec_engines (upstream PR Syllo#467 fix/memory-leaks-in-free-device-windows) - delete_all_windows: delwin() for plots[i].plot_window (upstream PR Syllo#468 fix/plot-window-memory-leak) - nvtop_get_nvlink_info: guard against unsigned underflow in CLI throughput delta if hardware counter wraps or resets
danbedford
added a commit
to danbedford/nvtop
that referenced
this pull request
May 3, 2026
Upstream: Syllo/nvtop (commit 095d91c "Remove unused function in ixml") Fork: danbedford/nvtop, branch `nvlink` GPU Tested: `NVIDIA GeForce RTX 3090` Scope: 5 files changed, 706 insertions(+), 19 deletions(-) --- ## Overview Extends nvtop with per-GPU NVLink info in unused space of the existing interface. When no NVLink-connected GPU is detected, layout and behavior are identical to upstream -- no visual or functional difference. The goal is to bring useful data and throughput to all users of nvtop with NVLink-supported hardware, from consumer (2080, 3090 series) to datacenter (Ampere, Hopper, Blackwell series). ### Main bar (row 2, shown by default) Appended at end after `power_info` -- NVLink version, link count, and aggregate throughput displayed. Two display states: NVLink supported device - No bridge or no active links (0-link case, no row 2 padding compaction applied): NVL5 0x With active links (row 2 padding compaction applied, throughput displayed). Example (theoretical fully saturated GB200 with NVLink 5.0): NVL518x 1.636 TiB/s When NVLink is supported but no bridge is connected or links are inactive, only the version and link count display -- no compaction is applied to reclaim space on row 2 since there is no throughput to display. The `NVL5 0x` text extends past the panel edge without affecting the layout. Only when active links exist does fan field compaction kick in (11 to 8 characters) to make room for the throughput value. - **Label**: `NVL` to represent minimal Label for NVLink. - **Version**: Marketing NVLink version via `nvmlDeviceGetNvLinkVersion` (raw NVML enum values require remapping): - Raw 1 -> NVLink V1.0 -> Display 1 - Raw 2 -> V2.0 -> 2 - Raw 3 -> V2.2 -> 2 - Raw 4 -> V3.0 -> 3 - Raw 5 -> V3.1 -> 3 - Raw 6 -> V4.0 -> 4 - Raw 7 -> V5.0 -> 5 - Raw 8 -> V6.0 -> 6 (assumed Rubin) Display shows single-digit major version due to limited space. - **Link count**: Total physical links on the device (static hardware property). Maximum is 36 to future-proof for planned Nvidia Rubin. - **Throughput**: Aggregate Transmit plus Receive utilization, currently read via `nvidia-smi` CLI fallback for all NVLink-connected GPUs. This carries measurable overhead from forking a full binary and parsing its text output, but providing real throughput visibility to consumer GPU users outweighs the cost, and all other non-NVLink users are isolated. The 2-second interval is hardcoded and independent of global nvtop refresh rate to cap CLI calls regardless of display speed. Uses "r" (raw) counters which include payload plus protocol overhead, reflecting true bandwidth utilization. Parses "Link N: Raw Tx: NNNNN KiB" / "Raw Rx" per link. Delta = `(current - previous) / time_delta` per link, summed for aggregate; unsigned underflow guard checks `new >= old` before subtraction. No smoothing applied -- raw accuracy over display smoothness. **TODO:** On datacenter GPUs with `nvmlDeviceGetNvLinkUtilizationCounter`, replace with direct API call; keep CLI fallback for consumer GPUs. - **Layout compaction**: The Fan field shrinks from 11 to 8 characters ONLY when `any_device_has_nvlink_active` is true (at least one monitored GPU has active NVLink links). GPUs with NVLink hardware but no bridge (0-link case) do NOT get compaction -- `NVL3 0x` extends past the panel edge without needing reclaimed space. Panel width is determined by device name length (`device_name` column = `largest_device_name + 11`), so longer names produce more room for NVLink. - **Throughput display**: Uses `print_data_at_scale()` (renamed from `print_pcie_at_scale()`) with IEC binary prefixes. Array bounds check extended from `< 5` to `< 6` to support up to Terbibytes/s (TiB/s) for Blackwell NVLink 5.0 devices at ~1.636 TiB/s aggregate. The `memory_prefix[]` array already contains entries up to "Pi" -- only the loop guard needed updating. ### Extra GPU info bar (row 4, not shown by default) Appended at end after `exec_engines` -- error and correction counters since nvtop launch. Example with zeroed counters: NVL E:00000 C:00000 Example with non-zero counters (errors in red, corrections in yellow): NVL E:00420 C:00069 - **Label**: `NVL` to represent minimal Label for NVLink. - **Error counters**: Replay, recovery, CRC FLIT, and CRC DATA errors via `nvmlDeviceGetNvLinkErrorCounter`, summed across all links. Baseline subtraction ensures counters start at zero on nvtop launch. - **CRC corrections**: Per-lane CRC flit corrections via `nvmlDeviceGetFieldValues` (field IDs 32-247 for links 0-35), summed across all links. Uses modern signature `(device, valuesCount, fieldValues)` with field IDs populated in-place in the `nvmlFieldValue_t` buffer (48 bytes on NVML 11.515+: fieldId at offset 0, scopeId at 4, timestamp at 8, latencyUsec at 16, valueType at 24, nvmlReturn at 28, value.union at 32). Offsets are manually parsed since `nvml.h` is not exposed in the nvtop build. - Error counters read during refresh cycle (`gpuinfo_nvidia_refresh_dynamic_info()`), not during startup probe (`nvtop_probe_nvlink_list()` calls `nvtop_get_nvlink_info()` before display is drawn). This ensures the baseline is established at the moment of first display refresh, guaranteeing counters read zero on launch. `nvtop_get_nvlink_info()` does NOT read error counters in the display path. --- ## Files Changed ### include/nvtop/extract_gpuinfo_common.h (+31 lines, -1 line) - `NVTOP_NVLINK_MAX_LINKS` defined to 36 - Flat struct `nvlink_info`: `num_links`, `version`, `supported`, `has_throughput`, `aggregate_tx`, `aggregate_rx`, `total_errors`, `total_corrections` - `nvtop_get_nvlink_info()`: return cached NVLink data; vendor guard skips non-NVIDIA GPUs before `container_of()` - `nvtop_get_nvlink_error_counts()`: public getter for display-ready error/correction counts; bridges `interface.c` to per-device error state in `extract_gpuinfo_nvidia.c` - `nvtop_probe_nvlink_list()`: probe all devices for NVLink support before curses init; short-circuits if `any_device_has_nvlink` already true - `nvtop_set_nvlink_probe()`: set `any_device_has_nvlink` global flag only (leaves `any_device_has_nvlink_active` untouched) - `nvtop_reset_nvlink_cache()`: reset all per-device NVLink caching (probe flag, cached linkcount, cached version, cached info struct) on monitored GPU set change; vendor guard for non-NVIDIA ### include/nvtop/interface_internal_common.h (+4 lines, -1 line) - `WINDOW *nvlink_info` added to `struct device_window` (row 2 throughput) - `WINDOW *nvlink_errors` added to `struct device_window` (row 4 errors) - `device_nvlink_errors` added to `enum device_field` with size 19 ### src/extract_gpuinfo_nvidia.c (+451 insertions, -1 deletion) - Four NVML (NVIDIA Management Library) symbols via `dlsym()`: `nvmlDeviceGetNvLinkState`, `nvmlDeviceGetNvLinkVersion`, `nvmlDeviceGetNvLinkErrorCounter`, `nvmlDeviceGetFieldValues` (modern 3-param signature) - Per-device state: `device_index`, `cli_poll_active`, per-link CLI counters, baseline/display error fields, probe cache (`nvlink_probed`, `nvlink_cached_linkcount`, `nvlink_cached_version`), full struct cache (`cached_nvlink_info`, `cached_nvlink_info_populated`) - Link discovery: probes links 0-35 via `nvmlDeviceGetNvLinkState`, counts consecutive successes, stops on first hard error or `NVML_ERROR_NOT_SUPPORTED`; only active links (`isActive == 1`) are counted -- physical slots with no bridge are excluded - Caching: 3 layers -- (1) link count/version via `nvlink_probe_and_cache()`, (2) full struct via `nvlink_refresh_cached_info()`, (3) list-level probe short-circuit in `nvtop_probe_nvlink_list()`; all reset by `nvtop_reset_nvlink_cache()` on GPU set change - Throughput: `nvidia-smi nvlink --getthroughput r -i <dev>` every 2 seconds (hardcoded, independent of display refresh rate), delta-based rate computation with unsigned underflow guard - Error reading via `nvlink_read_errors()`: called from `gpuinfo_nvidia_refresh_dynamic_info()` (not `nvtop_get_nvlink_info()`) to ensure baseline is established at first display refresh; reads errors via `nvmlDeviceGetNvLinkErrorCounter` and corrections via `nvmlDeviceGetFieldValues`; unsigned underflow guard prevents counter wrap artifacts ### src/interface.c (+220 insertions, -20 deletions) - Conditional layout: `any_device_has_nvlink` controls window allocation; `any_device_has_nvlink_active` controls fan compaction (shrinks from 11 to 8 chars only when active links exist -- 0-link devices do not get compaction) - `device_length()` always uses base layout (clock + mem_clock + temp + fan + power + 5) regardless of NVLink state; NVLink window on line 2 extends past nominal panel edge, which ncurses handles gracefully - `nvtop_adjust_field_sizes_for_nvlink()` checks `any_device_has_nvlink_active` (not `any_device_has_nvlink`) for fan compaction - `interface_check_monitored_gpu_change()` resets ALL mutable NVLink state: both global flags plus `sizeof_device_field[device_fan_speed] = 11`, then calls per-device `nvtop_reset_nvlink_cache()` - Fan N/A fallback branch uses `any_device_has_nvlink_active` for correct 11-character format on 0-link devices - NVLink info window (row 2): displays `print_data_at_scale()`-formatted throughput (renamed from `print_pcie_at_scale()`; bounds check extended to `< 6` for TiB/s ceiling) - NVLink errors window (row 4): reads via `nvtop_get_nvlink_error_counts()` (does NOT call `nvtop_get_nvlink_info()` in display path) - Memory leak fixes: added missing `delwin()` for `shader_cores`, `l2_cache_size`, `exec_engines`, `plots[i].plot_window`, and `nvlink_errors`. Two of these are also submitted as standalone upstream PRs: `free_device_windows()` fix (PR Syllo#467) and `plots[i].plot_window` fix (PR Syllo#468). ### src/nvtop.c (+5 lines, -1 line) - `nvtop_probe_nvlink_list()` and `nvtop_set_nvlink_probe()` called before curses init (first layout pass) - Re-evaluated in main loop after `interface_check_monitored_gpu_change()` for GPU hotplug --- ## Design Decisions ### Flat struct over nested Single struct per device. Error and correction counters are cumulative totals (unsigned long long) summed across all links. Avoids per-link arrays and dynamic allocation in the hot refresh path. ### Two-tier error state: baseline plus display Five fields in `struct gpu_info_nvidia` track error state: `baseline_errors`, `baseline_corrections`, `nvlink_errors_baseline_read` (bool), `display_errors`, `display_corrections`. Baselines persist for the entire process lifetime. Display values computed each refresh as `cumulative - baseline`. ### total_errors / total_corrections retained for API compatibility Populated from `display_errors`/`display_corrections` in `nvlink_refresh_cached_info()`. Primary display path uses `nvtop_get_nvlink_error_counts()`, but both carry the same data. ### No new dependencies Uses only NVML symbols already in `nvidia-ml` driver and `nvidia-smi` binary already on the system. --- ## What Was Not Changed Process listing, memory/GPU charts, configuration options, keyboard shortcuts, menu behavior, and all non-NVLink display fields remain identical to upstream. --- ## Testing Dual RTX 3090 Founders Edition 24GB with 3-slot NVLink Bridge (RTXA6000NVLINK3S-KIT). Displays in UI as: `NVIDIA GeForce RTX 3090`. 4 physical links per GPU. `enum nvmlNvlinkVersion_t` returns `5` representing NVLink v3.1. When idle, NVLink shows ~1.2 MiB/s aggregate residual throughput from assumed protocol keep-alives/link maintenance. Errors/corrections correctly display `E:00000 C:00000` on every launch, incrementing only when new errors occur (no errors experienced to fully confirm).
Open
Owner
|
Thank you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
shader_cores,l2_cache_size, andexec_engineswere allocated withnewwin()inalloc_device_window()but never freed infree_device_windows(). This caused a memory leak every time device windows were torn down (e.g. on refresh or hot-unplug).The function now frees all windows it allocates, preventing resource leaks in long-running sessions.