Skip to content

Fix sampling profiler race when registered thread exits#1995

Open
antonis wants to merge 1 commit into
facebook:static_hfrom
antonis:fix-sampling-profiler-pthread-kill-1853
Open

Fix sampling profiler race when registered thread exits#1995
antonis wants to merge 1 commit into
facebook:static_hfrom
antonis:fix-sampling-profiler-pthread-kill-1853

Conversation

@antonis
Copy link
Copy Markdown

@antonis antonis commented Apr 23, 2026

Summary

Fix a crash in the sampling profiler where pthread_kill is invoked on a thread that has exited but whose corresponding SamplingProfiler is still registered. On Android bionic this aborts the process with invalid pthread_t 0x... passed to pthread_kill; on other POSIX platforms the sampler silently signals a stale thread ID.

Root cause: thread termination was never synchronized with the sampling path. If a JS thread exited while a host held the Runtime alive, the sampler would still attempt to sample it.

Fix: register each profiler with a thread-local ThreadDeathGuard backed by pthread_key_create. Its destructor, which runs before the pthread_t is reaped, invalidates the registered thread under the existing runtimeDataLock_. A shared_ptr-based ProfilerHandle keeps the synchronization state alive across either-end destruction, so there is no UAF if the profiler is destroyed on a different thread than the one that registered it. The sampler now skips any profiler whose registered thread has exited, and also returns cleanly on a non-fatal ESRCH from pthread_kill on platforms where that window is observable.

Related: #1853 and getsentry/sentry-react-native#5441.

Test Plan

  • New unit test SamplingProfilerTest.SamplingAfterRegisteredThreadExitDoesNotCrash reproduces the scenario deterministically on all POSIX platforms and asserts that the registered thread is invalidated after std::thread::join().
  • Existing SamplingProfilerTest suite continues to pass.
  • Local build and test run on macOS arm64 (Xcode 16.4, ninja) against static_h:
    $ ninja -C build HermesVMRuntimeTests
    [484/484] Linking CXX executable unittests/VMRuntime/HermesVMRuntimeTests
    
    $ ./build/unittests/VMRuntime/HermesVMRuntimeTests --gtest_filter='SamplingProfilerTest.*'
    [==========] Running 4 tests from 1 test case.
    [----------] 4 tests from SamplingProfilerTest
    [ RUN      ] SamplingProfilerTest.Invariants
    [       OK ] SamplingProfilerTest.Invariants (96 ms)
    [ RUN      ] SamplingProfilerTest.RegisterDifferentThread
    [       OK ] SamplingProfilerTest.RegisterDifferentThread (29 ms)
    [ RUN      ] SamplingProfilerTest.RegisterIdenticalThread
    [       OK ] SamplingProfilerTest.RegisterIdenticalThread (30 ms)
    [ RUN      ] SamplingProfilerTest.SamplingAfterRegisteredThreadExitDoesNotCrash
    [       OK ] SamplingProfilerTest.SamplingAfterRegisteredThreadExitDoesNotCrash (85 ms)
    [==========] 4 tests from 1 test case ran. (240 ms total)
    [  PASSED  ] 4 tests.
    
    $ ./build/unittests/VMRuntime/HermesVMRuntimeTests
    [==========] 351 tests from 79 test cases ran.
    [  PASSED  ] 351 tests.
    

@meta-cla meta-cla Bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 23, 2026
@antonis antonis force-pushed the fix-sampling-profiler-pthread-kill-1853 branch from 50668b4 to 7443718 Compare April 24, 2026 08:37
@antonis antonis marked this pull request as ready for review April 24, 2026 08:52
@tmikov
Copy link
Copy Markdown
Contributor

tmikov commented Apr 25, 2026

Hi, the main branch is not actively maintained, so no guarantees about when this can be reviewed or merged. Does the problem exist in V1?

If a thread that registered a SamplingProfiler exits while the profiler
instance lives on (for example when the host keeps the Runtime alive
after the JS thread is gone), the sampling timer thread can call
pthread_kill on a stale pthread_t. On Android bionic this aborts the
process with "invalid pthread_t 0x... passed to pthread_kill"; on other
POSIX platforms it silently signals a stale thread ID.

The sampler never synchronized with thread termination, so there was no
way to know the registered thread had gone away.

Register each profiler with a thread-local ThreadDeathGuard backed by
pthread_key_create. Its destructor runs before the pthread_t is reaped
and, under the existing runtimeDataLock_, invalidates the profiler's
registered thread so subsequent sample attempts skip it cleanly. A
shared_ptr-based ProfilerHandle keeps the synchronization state alive
across either-end destruction, so there is no UAF if the profiler is
destroyed on a different thread than the one that registered it.

The sampler now checks for the invalidated registration before calling
pthread_kill and also handles an ESRCH return for platforms that return
it instead of aborting.

Related: facebook#1853 and getsentry/sentry-react-native#5441.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antonis antonis force-pushed the fix-sampling-profiler-pthread-kill-1853 branch from 7443718 to 90501e6 Compare April 28, 2026 09:57
@antonis antonis changed the base branch from main to static_h April 28, 2026 09:57
@antonis antonis marked this pull request as draft April 28, 2026 09:57
@antonis
Copy link
Copy Markdown
Author

antonis commented Apr 28, 2026

Thanks for the heads-up. Confirmed the same code path exists in static_h (same pthread_kill(posixProfiler->currentThread_, SIGPROF) call, no thread-death synchronization), so I've retargeted this PR to static_h and force-pushed 🙇

@antonis antonis marked this pull request as ready for review April 28, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants