Skip to content

RUM-16442: Fix profiler safe memory reads#2967

Open
simaoseica-dd wants to merge 1 commit into
feature/profilingfrom
simaoseica/RUM-16442/use-vm_read_overwrite
Open

RUM-16442: Fix profiler safe memory reads#2967
simaoseica-dd wants to merge 1 commit into
feature/profilingfrom
simaoseica/RUM-16442/use-vm_read_overwrite

Conversation

@simaoseica-dd

@simaoseica-dd simaoseica-dd commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What and why?

Fixes a crash in the iOS Mach profiler caused by reading stale or invalid frame pointers during stack unwinding.

The previous safe-read path, introduced in #2605, relied on faulting memory access recovery but invalid stack memory could still terminate the app with EXC_BAD_ACCESS / SIGSEGV.
This PR makes stack memory reads non-faulting so the profiler degrades safely instead of crashing.

How?

Replaces direct memory reads during stack sampling with vm_read_overwrite.
The profiler now first tries to read the expected stack region in one batch for performance. If that read fails, it falls back to page-by-page reads from the stack pointer and keeps the readable prefix, allowing partial stack recovery without touching invalid memory.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes
  • Add Objective-C interface for public APIs - see our guidelines (internal)
  • Run make api-surface when adding new APIs

@datadog-prod-us1-5

This comment has been minimized.

@simaoseica-dd simaoseica-dd force-pushed the simaoseica/RUM-16442/use-vm_read_overwrite branch 2 times, most recently from c49f01f to 678a9f1 Compare June 3, 2026 10:33
@simaoseica-dd simaoseica-dd marked this pull request as ready for review June 3, 2026 14:28
@simaoseica-dd simaoseica-dd requested review from a team as code owners June 3, 2026 14:28

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 678a9f1e2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +426 to +428
const size_t region_size = std::min(
current_frame_size + (static_cast<size_t>(max_depth) * STACK_REGION_FRAME_SIZE_ESTIMATE),
STACK_REGION_MAX_READ

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid truncating stacks at the heuristic read window

When a sampled thread has any caller frame more than about max_depth * 128 bytes above the current frame pointer (or a current frame larger than the 64 KB cap), the saved FP/PC pair remains mapped but lies outside bytes_read, so walk_frames_in_buffer stops after the leaf frames. The previous per-frame read followed valid frame pointers up to max_depth; this heuristic window silently drops the rest of otherwise valid stacks for code with large stack allocations, biasing collected profiles rather than just degrading on invalid memory.

Useful? React with 👍 / 👎.

STACK_REGION_MAX_READ
);

bytes_read = read_stack_region(sp, stack_buf, region_size);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not call Mach VM reads while the target thread is suspended

In the default all-thread mode this call runs from sample_thread after thread_suspend, despite that critical section explicitly warning to avoid system calls that may acquire locks. If the sampled thread is suspended while it is in a VM operation (for example mmap/deallocation or another path holding the task VM map lock), vm_read_overwrite has to enter the same Mach VM machinery and can block forever, leaving the target thread suspended and hanging the app; the old direct read avoided introducing a Mach syscall in this suspended region.

Useful? React with 👍 / 👎.

@simaoseica-dd simaoseica-dd force-pushed the simaoseica/RUM-16442/use-vm_read_overwrite branch from 678a9f1 to b532078 Compare June 3, 2026 16:28

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5320784f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

if (has_thread_state) {
stack_trace_sample_context(&trace, fp, pc, sp, config.max_stack_depth);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the stack snapshot synchronized with the captured registers

When the sampled thread resumes before stack_trace_sample_context copies the stack, any thread that returns from the captured frame or pushes new frames in that gap is no longer guaranteed to have stack memory matching the suspended fp/pc/sp values. The walker then combines a PC from the suspended instant with caller frames from later execution (or stops early after the stack is reused), which can corrupt profiler samples for actively running threads rather than representing a consistent stack.

Useful? React with 👍 / 👎.

@fuzzybinary fuzzybinary left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM other than some minor nits about style, but I'm trusting your tests exercise this properly as I'm a little out of my depth on the specifics of stack walking for profiling.

Comment on lines +100 to +103
const uintptr_t address = reinterpret_cast<uintptr_t>(addr);
if (address > UINTPTR_MAX - size) {
return 0;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a check on addr or on buf? Or should it check both? I would think since addr is the read address, if were going to pass UINTPTR_MAX than it would just return the amount read when calling vm_read_overwirte.

}

total_read += bytes_read;
if (total_read == size || current_address > UINTPTR_MAX - bytes_read) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about this check:

current_address > UINTPTR_MAX - bytes_read

If this is true, doesn't that mean read_memory_region has already read past UINTPTR_MAX?

size_t bytes_read,
void* next_frame[2]
) {
if (stack_base == 0 || stack_buf == nullptr || bytes_read < sizeof(void*[2]) || fp < stack_base) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can uses constexpr it might be worth factoring out the repeated sizeof checks:

constexpr size_t read_size = sizeof(void*[2]);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this might be a larger refactor, but I'm seeing a lot of use of void*[2] -- refactoring this into a struct would make this easier to read with no impact on overhead:

struct Frame {
  void* next_frame_pointer;
  void* return _addresss;
}

Maybe worth a the change sometime in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants