Description:
rccl-UnitTests fails in the Register test suite when NCCL_LOCAL_REGISTER is disabled. The enabled variants pass, but multiple *_Disabled isolated tests fail due to an unexpected non-NULL registration handle.
Tests Failed:
CommRegisterDeregister_Disabled
MultipleBufferRegistration_Disabled
VariableSizeBuffers_Disabled
Error snippets:
2026-04-06T12:14:59.9595502Z Expected equality of these values:
2026-04-06T12:14:59.9595973Z regHandles[i]
2026-04-06T12:14:59.9596365Z Which is: 0x794820a03c50
2026-04-06T12:14:59.9596767Z nullptr
2026-04-06T12:14:59.9597117Z Which is: (nullptr)
2026-04-06T12:14:59.9597723Z Expected NULL handle for buffer 3
2026-04-06T12:14:59.9598061Z
2026-04-06T12:14:59.9598694Z [ INFO ] Test 'MultipleBufferRegistration_Disabled' (PID: 5247) FAILED with exit code 1 after 2955 ms
2026-04-06T12:14:59.9600142Z [ INFO ] Running isolated test 'MultipleBufferRegistration_Enabled' (PID: 5259) with env: NCCL_LOCAL_REGISTER=1
2026-04-06T12:15:02.1466077Z [12:15:02Z] Mem: 96.0/3023.4GB (3%) | CPU: 0% | Jobs: ~1/384 | Disk: 784GB free
2026-04-06T12:15:02.9117231Z [ INFO ] Test 'MultipleBufferRegistration_Enabled' PASSED (2953 ms)
2026-04-06T12:15:02.9126660Z [ INFO ] Running isolated test 'VariableSizeBuffers_Disabled' (PID: 5272)
2026-04-06T12:15:05.8646421Z /__w/TheRock/TheRock/rocm-systems/projects/rccl/test/RegisterTests.cpp:164: Failure
2026-04-06T12:15:11.7703016Z [ INFO ] Process-Isolated Tests: 4 passed, 3 failed, 0 skipped (20669 ms total)
2026-04-06T12:15:11.7704030Z [ INFO ] Failed: CommRegisterDeregister_Disabled - Test failed with exit code 1
2026-04-06T12:15:11.7705069Z [ INFO ] Failed: MultipleBufferRegistration_Disabled - Test failed with exit code 1
2026-04-06T12:15:11.7706003Z [ INFO ] Failed: VariableSizeBuffers_Disabled - Test failed with exit code 1
Full Logs:
Impact:
These Failures detected in Rock - Multiarch CI check for rocm-libraries submodule bump: Bump rocm-libraries from f000f77 to a9ff799
Currently blocking its promotion
Description:
rccl-UnitTests fails in the Register test suite when
NCCL_LOCAL_REGISTERis disabled. The enabled variants pass, but multiple*_Disabledisolated tests fail due to an unexpected non-NULL registration handle.Tests Failed:
Error snippets:
Full Logs:
Impact:
These Failures detected in Rock - Multiarch CI check for rocm-libraries submodule bump: Bump rocm-libraries from f000f77 to a9ff799
Currently blocking its promotion