Skip to content

[Feature]: API to release IPC memory pin created by hsa_amd_ipc_memory_create #4754

@AndreasKaratzas

Description

@AndreasKaratzas

Suggestion Description

hsa_amd_ipc_memory_create permanently pins GPU memory at the HSA driver level. The ROCm HSA runtime provides no API to release this pin from the creating process. hsa_amd_ipc_memory_detach only works on the receiving (attaching) side.

This causes GPU memory to be unrecoverable after the application is done with it, even after the tensors are freed, the caches are emptied, and GC has run.

In vLLM (LLM serving engine), KV caches can consume 120+ GB of GPU memory. When the engine registers these buffers with UCX for potential inter-node transfers, hsa_amd_ipc_memory_create is called during uct_rocm_ipc_pack_key (via ucp_mem_mapmem_reg). After engine shutdown, this memory is permanently leaked at the driver level — even though PyTorch has freed all tensors and the CUDA cache is empty.

This makes it impossible to reuse the GPU within the same process (e.g., running multiple test iterations, reinitializing the engine, etc.).

Reproduction

import torch
import ctypes
import gc

hsa = ctypes.CDLL('/opt/rocm/lib/libhsa-runtime64.so.1')

class hsa_amd_ipc_memory_t(ctypes.Structure):
    _fields_ = [('handle', ctypes.c_uint32 * 8)]

# Measure baseline
free_before, total = torch.cuda.mem_get_info()
print(f"Before: {free_before / 1024**3:.1f} GB free / {total / 1024**3:.1f} GB total")

# Allocate 50 GB GPU tensor
t = torch.zeros(50 * 1024**3 // 4, dtype=torch.float32, device='cuda')

# Create IPC handle (this is what UCX does internally during ucp_mem_map)
handle = hsa_amd_ipc_memory_t()
status = hsa.hsa_amd_ipc_memory_create(
    ctypes.c_void_p(t.data_ptr()),
    ctypes.c_size_t(t.nelement() * t.element_size()),
    ctypes.byref(handle),
)
print(f"hsa_amd_ipc_memory_create status: {status}")

# Free everything
del t, handle
gc.collect()
torch.cuda.empty_cache()

# Measure after cleanup
free_after, _ = torch.cuda.mem_get_info()
leaked = (free_before - free_after) / 1024**3
print(f"After:  {free_after / 1024**3:.1f} GB free")
print(f"Leaked: {leaked:.1f} GB")

# Expected: Leaked: ~50 GB (memory is permanently pinned)

Requested API

A function to release the IPC pin from the creating process, e.g.:

hsa_status_t hsa_amd_ipc_memory_destroy(void *ptr, size_t len);

or

hsa_status_t hsa_amd_ipc_memory_release(hsa_amd_ipc_memory_t *handle);

This would allow UCX (and other IPC consumers) to:

  1. Create IPC handles when needed for sharing
  2. Release them when the memory is no longer being shared
  3. Free the GPU memory normally afterward

Without this API, any GPU memory that has ever been IPC-shared is permanently pinned for the lifetime of the process.

Operating System

Ubuntu 22.04

GPU

MI355

ROCm Component

vLLM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions