[Feature]: API to release IPC memory pin created by `hsa_amd_ipc_memory_create`

### Suggestion Description

`hsa_amd_ipc_memory_create` permanently pins GPU memory at the HSA driver level. The ROCm HSA runtime provides **no API to release this pin from the creating process**. `hsa_amd_ipc_memory_detach` only works on the *receiving* (attaching) side.

This causes GPU memory to be unrecoverable after the application is done with it, even after the tensors are freed, the caches are emptied, and GC has run.

In vLLM (LLM serving engine), KV caches can consume 120+ GB of GPU memory. When the engine registers these buffers with UCX for potential inter-node transfers, `hsa_amd_ipc_memory_create` is called during `uct_rocm_ipc_pack_key` (via `ucp_mem_map` → `mem_reg`). After engine shutdown, this memory is permanently leaked at the driver level — even though PyTorch has freed all tensors and the CUDA cache is empty.

This makes it impossible to reuse the GPU within the same process (e.g., running multiple test iterations, reinitializing the engine, etc.).


### Reproduction

```python
import torch
import ctypes
import gc

hsa = ctypes.CDLL('/opt/rocm/lib/libhsa-runtime64.so.1')

class hsa_amd_ipc_memory_t(ctypes.Structure):
    _fields_ = [('handle', ctypes.c_uint32 * 8)]

# Measure baseline
free_before, total = torch.cuda.mem_get_info()
print(f"Before: {free_before / 1024**3:.1f} GB free / {total / 1024**3:.1f} GB total")

# Allocate 50 GB GPU tensor
t = torch.zeros(50 * 1024**3 // 4, dtype=torch.float32, device='cuda')

# Create IPC handle (this is what UCX does internally during ucp_mem_map)
handle = hsa_amd_ipc_memory_t()
status = hsa.hsa_amd_ipc_memory_create(
    ctypes.c_void_p(t.data_ptr()),
    ctypes.c_size_t(t.nelement() * t.element_size()),
    ctypes.byref(handle),
)
print(f"hsa_amd_ipc_memory_create status: {status}")

# Free everything
del t, handle
gc.collect()
torch.cuda.empty_cache()

# Measure after cleanup
free_after, _ = torch.cuda.mem_get_info()
leaked = (free_before - free_after) / 1024**3
print(f"After:  {free_after / 1024**3:.1f} GB free")
print(f"Leaked: {leaked:.1f} GB")

# Expected: Leaked: ~50 GB (memory is permanently pinned)
```


### Requested API

A function to release the IPC pin from the **creating** process, e.g.:

```c
hsa_status_t hsa_amd_ipc_memory_destroy(void *ptr, size_t len);
```

or

```c
hsa_status_t hsa_amd_ipc_memory_release(hsa_amd_ipc_memory_t *handle);
```

This would allow UCX (and other IPC consumers) to:
1. Create IPC handles when needed for sharing
2. Release them when the memory is no longer being shared
3. Free the GPU memory normally afterward

Without this API, any GPU memory that has ever been IPC-shared is permanently pinned for the lifetime of the process.

### Operating System

Ubuntu 22.04

### GPU

MI355

### ROCm Component

vLLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: API to release IPC memory pin created by `hsa_amd_ipc_memory_create` #4754

Suggestion Description

Reproduction

Requested API

Operating System

GPU

ROCm Component

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: API to release IPC memory pin created by hsa_amd_ipc_memory_create #4754

Description

Suggestion Description

Reproduction

Requested API

Operating System

GPU

ROCm Component

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature]: API to release IPC memory pin created by `hsa_amd_ipc_memory_create` #4754