Problem Description
Problem
In non-attach mode, code object lifecycle is fully tracked:
hsa_executable_freeze registers code objects in CodeobjTableTranslatorSynchronized
executable_destroy removes entries from the translation table and flushes buffers
In attach mode, only the load path is covered:
load_attach_code_objects() iterates already-loaded code objects via rocprofiler_attach_iterate_all_code_objects
notify_new_code_object is chained for newly loaded code objects
There is no corresponding unload/destroy hook in attach mode. RocAttachDispatchTable does not expose a notify_destroy_code_object callback, so there is no way to be notified when a code object is unloaded while the profiler is attached.
This affects both the main code_object module and the pc_sampling::code_object module in attach/detach mode.
Impact
In the context of pc-sampling, when a code object is destroyed while the profiler is still attached, its entry remains as a stale entry in the translation table. This can lead to incorrect PC sample decoding:
- Profiler attaches and iterates loaded code objects
- A code object is unloaded, but its entry is not removed from the translation table
- A new code object is loaded at a similar or overlapping address range
- PC samples from the new code object are decoded against the stale entry from the old code object
- This produces incorrect code object ID mappings and wrong results in the PC sampling decoder/parser
The risk is lower when the attach interval is short and the profiler detaches before any code objects are unloaded. However, this is not guaranteed in general usage.
Proposed Fix
Add a notify_destroy_code_object entry to RocAttachDispatchTable and hook into it the same way notify_new_code_object currently works for creation. Both the main code_object module and the pc_sampling::code_object module should handle this callback to remove stale entries and flush buffers on code object destruction during attach mode.
Operating System
Ubuntu 22.04.5 LTS
CPU
AMD EPYC 9354 32-Core Processor
GPU
MI300x
ROCm Version
Rocm 7.2.0
ROCm Component
rocprofiler-sdk
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Problem Description
Problem
In non-attach mode, code object lifecycle is fully tracked:
hsa_executable_freezeregisters code objects inCodeobjTableTranslatorSynchronizedexecutable_destroyremoves entries from the translation table and flushes buffersIn attach mode, only the load path is covered:
load_attach_code_objects()iterates already-loaded code objects viarocprofiler_attach_iterate_all_code_objectsnotify_new_code_objectis chained for newly loaded code objectsThere is no corresponding unload/destroy hook in attach mode.
RocAttachDispatchTabledoes not expose anotify_destroy_code_objectcallback, so there is no way to be notified when a code object is unloaded while the profiler is attached.This affects both the main
code_objectmodule and thepc_sampling::code_objectmodule in attach/detach mode.Impact
In the context of pc-sampling, when a code object is destroyed while the profiler is still attached, its entry remains as a stale entry in the translation table. This can lead to incorrect PC sample decoding:
The risk is lower when the attach interval is short and the profiler detaches before any code objects are unloaded. However, this is not guaranteed in general usage.
Proposed Fix
Add a
notify_destroy_code_objectentry toRocAttachDispatchTableand hook into it the same waynotify_new_code_objectcurrently works for creation. Both the maincode_objectmodule and thepc_sampling::code_objectmodule should handle this callback to remove stale entries and flush buffers on code object destruction during attach mode.Operating System
Ubuntu 22.04.5 LTS
CPU
AMD EPYC 9354 32-Core Processor
GPU
MI300x
ROCm Version
Rocm 7.2.0
ROCm Component
rocprofiler-sdk
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response