RUM-16442: Fix profiler safe memory reads#2967
Conversation
This comment has been minimized.
This comment has been minimized.
c49f01f to
678a9f1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 678a9f1e2a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const size_t region_size = std::min( | ||
| current_frame_size + (static_cast<size_t>(max_depth) * STACK_REGION_FRAME_SIZE_ESTIMATE), | ||
| STACK_REGION_MAX_READ |
There was a problem hiding this comment.
Avoid truncating stacks at the heuristic read window
When a sampled thread has any caller frame more than about max_depth * 128 bytes above the current frame pointer (or a current frame larger than the 64 KB cap), the saved FP/PC pair remains mapped but lies outside bytes_read, so walk_frames_in_buffer stops after the leaf frames. The previous per-frame read followed valid frame pointers up to max_depth; this heuristic window silently drops the rest of otherwise valid stacks for code with large stack allocations, biasing collected profiles rather than just degrading on invalid memory.
Useful? React with 👍 / 👎.
| STACK_REGION_MAX_READ | ||
| ); | ||
|
|
||
| bytes_read = read_stack_region(sp, stack_buf, region_size); |
There was a problem hiding this comment.
Do not call Mach VM reads while the target thread is suspended
In the default all-thread mode this call runs from sample_thread after thread_suspend, despite that critical section explicitly warning to avoid system calls that may acquire locks. If the sampled thread is suspended while it is in a VM operation (for example mmap/deallocation or another path holding the task VM map lock), vm_read_overwrite has to enter the same Mach VM machinery and can block forever, leaving the target thread suspended and hanging the app; the old direct read avoided introducing a Mach syscall in this suspended region.
Useful? React with 👍 / 👎.
678a9f1 to
b532078
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5320784f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| if (has_thread_state) { | ||
| stack_trace_sample_context(&trace, fp, pc, sp, config.max_stack_depth); |
There was a problem hiding this comment.
Keep the stack snapshot synchronized with the captured registers
When the sampled thread resumes before stack_trace_sample_context copies the stack, any thread that returns from the captured frame or pushes new frames in that gap is no longer guaranteed to have stack memory matching the suspended fp/pc/sp values. The walker then combines a PC from the suspended instant with caller frames from later execution (or stops early after the stack is reused), which can corrupt profiler samples for actively running threads rather than representing a consistent stack.
Useful? React with 👍 / 👎.
fuzzybinary
left a comment
There was a problem hiding this comment.
This LGTM other than some minor nits about style, but I'm trusting your tests exercise this properly as I'm a little out of my depth on the specifics of stack walking for profiling.
| const uintptr_t address = reinterpret_cast<uintptr_t>(addr); | ||
| if (address > UINTPTR_MAX - size) { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Is this supposed to be a check on addr or on buf? Or should it check both? I would think since addr is the read address, if were going to pass UINTPTR_MAX than it would just return the amount read when calling vm_read_overwirte.
| } | ||
|
|
||
| total_read += bytes_read; | ||
| if (total_read == size || current_address > UINTPTR_MAX - bytes_read) { |
There was a problem hiding this comment.
I'm confused about this check:
current_address > UINTPTR_MAX - bytes_read
If this is true, doesn't that mean read_memory_region has already read past UINTPTR_MAX?
| size_t bytes_read, | ||
| void* next_frame[2] | ||
| ) { | ||
| if (stack_base == 0 || stack_buf == nullptr || bytes_read < sizeof(void*[2]) || fp < stack_base) { |
There was a problem hiding this comment.
if you can uses constexpr it might be worth factoring out the repeated sizeof checks:
constexpr size_t read_size = sizeof(void*[2]);There was a problem hiding this comment.
Actually, this might be a larger refactor, but I'm seeing a lot of use of void*[2] -- refactoring this into a struct would make this easier to read with no impact on overhead:
struct Frame {
void* next_frame_pointer;
void* return _addresss;
}Maybe worth a the change sometime in the future.
What and why?
Fixes a crash in the iOS Mach profiler caused by reading stale or invalid frame pointers during stack unwinding.
The previous safe-read path, introduced in #2605, relied on faulting memory access recovery but invalid stack memory could still terminate the app with
EXC_BAD_ACCESS/SIGSEGV.This PR makes stack memory reads non-faulting so the profiler degrades safely instead of crashing.
How?
Replaces direct memory reads during stack sampling with
vm_read_overwrite.The profiler now first tries to read the expected stack region in one batch for performance. If that read fails, it falls back to page-by-page reads from the stack pointer and keeps the readable prefix, allowing partial stack recovery without touching invalid memory.
Review checklist
make api-surfacewhen adding new APIs