Skip to content

Serialize DAC access and materialize unsafe enumerations#1473

Open
leculver wants to merge 4 commits into
microsoft:mainfrom
leculver:split/dg-dac-serialization
Open

Serialize DAC access and materialize unsafe enumerations#1473
leculver wants to merge 4 commits into
microsoft:mainfrom
leculver:split/dg-dac-serialization

Conversation

@leculver

@leculver leculver commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces and applies a single DAC serialization model for stateful DAC entry points.

The DAC is not safe to call concurrently across independent helpers/enumerators. This PR adds and applies the DacLibrary/SOSDac SyncRoot convention so DAC calls that can race native state are serialized and results are materialized before returning to callers.

Included here:

  • shared DAC/SOS SyncRoot plumbing and documentation
  • service singleton construction and DAC flush serialization
  • heap service initialization under the DAC lock
  • runtime/heap/thread cache population protection so partial native enumerations cannot be permanently cached
  • stack walking materialized under the DAC lock with maxFrames enforced inside the DAC walk
  • remaining DAC helper surfaces serialized through SyncRoot
  • native heap traversal materialized under the lock instead of returning lazy DAC-backed enumerables

Part of #420.

Validation

  • Built as part of the split branch verification: dotnet build src/Microsoft.Diagnostics.Runtime/Microsoft.Diagnostics.Runtime.csproj --framework net10.0 -- passed with 0 warnings and 0 errors.
  • This split branch was recombined with the other reduced branches and compared against lockfree-mmf-threadsafe; all non-stress paths matched the original combined branch.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a single DAC serialization model (DacLibrary.SyncRootSOSDac.SyncRoot) and updates key DAC-backed helpers to (1) serialize stateful DAC entry points and (2) materialize previously lazy enumerations before returning, preventing concurrent DAC enumerations from corrupting each other and avoiding caching of partial results.

Changes:

  • Added a shared DAC SyncRoot and routed SOS/DAC helpers through it to serialize stateful enumerations and cache fills.
  • Converted multiple DAC-backed IEnumerable implementations from lazy iterators to fully materialized collections created under the DAC lock.
  • Updated thread/heap/runtime cache population to be lock-protected to prevent publishing truncated results; updated stack walking to enforce maxFrames inside the DAC walk.
Show a summary per file
File Description
src/Microsoft.Diagnostics.Runtime/DacLibrary.cs Introduces SyncRoot used as the central DAC serialization lock.
src/Microsoft.Diagnostics.Runtime/DacInterface/SosDac.cs Exposes SyncRoot and adjusts internal name caches to be eagerly allocated dictionaries.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacTypeHelpers.cs Serializes DAC calls and materializes method/field enumeration results under the lock.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacThreadHelpers.cs Serializes stack root/trace enumeration and materializes results; enforces maxFrames inside the walk.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacServiceProvider.cs Adds locking for singleton service construction and serializes heap initialization/flush against the DAC lock.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacRuntime.cs Serializes/materalizes threads/appdomains/handles/jit manager enumerations.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacNativeHeaps.cs Serializes and materializes native heap traversals and region enumerations.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacModuleHelpers.cs Serializes module/metadata access; passes DAC lock into metadata reader.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacMethodLocator.cs Serializes method lookup helpers through SyncRoot.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacMetadataReader.cs Serializes metadata import calls and materializes enumerations under the DAC lock.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacLegacyThreadPool.cs Serializes threadpool DAC reads through SyncRoot.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacHeap.cs Serializes/materalizes heap-related enumerations and validates method tables under the DAC lock.
src/Microsoft.Diagnostics.Runtime/DacImplementation/DacComHelpers.cs Serializes COM-related DAC reads and materializes RCW cleanup enumeration.
src/Microsoft.Diagnostics.Runtime/ClrThread.cs Serializes stack root/trace cache population and updates stack trace helper signature usage.
src/Microsoft.Diagnostics.Runtime/ClrRuntime.cs Serializes runtime cache population and adds handle caching with lock protection.
src/Microsoft.Diagnostics.Runtime/ClrHeap.cs Serializes heap cache population for sync blocks, allocation contexts, and dependent handles.
src/Microsoft.Diagnostics.Runtime/AbstractDac/IAbstractThreadHelpers.cs Updates stack trace helper contract to include maxFrames enforced by the DAC walk.

Copilot's findings

  • Files reviewed: 17/17 changed files
  • Comments generated: 3

Comment thread src/Microsoft.Diagnostics.Runtime/ClrRuntime.cs Outdated
Introduce and apply the DacLibrary/SOS SyncRoot model to stateful DAC entry points. Serialize service construction, flush, stack walks, runtime/heap/thread cache population, and the remaining DAC helper surfaces so partial native enumerations cannot race and be permanently cached. Materialize native heap and other DAC traversal results under the lock, and enforce stack maxFrames inside the DAC walk before the helper can keep walking past the caller's cap.
@leculver leculver force-pushed the split/dg-dac-serialization branch from d43c438 to c8355eb Compare June 3, 2026 22:04
foreach (ClrHandleInfo info in GetDacRuntime().EnumerateHandles())
builder.Add(new ClrHandle(this, info));

_handles = builder.MoveOrCopyToImmutable();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't previously cached - is caching it intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, walking the handle table is expensive so I wanted to cache it. But this did remind me that there's a correctness issue with Flush. Using a volatile publish will fix this. It's ugly, but it's fast. I wish I hadn't chosen to use ImmutableArray so long ago.

.Select(r => CreateClrStackFrame(r))
.Take(maxFrames)
.ToArray();
if (frames.Length < maxFrames)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on this line. Is the goal to only cache if frames is less than maxFrames? It can't be more than maxFrames because of the call to Take(maxFrames).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct because the if statement has frames.Length < maxFrames, not maxFrames < frames.Length. The result is we ONLY caches the result if we are sure we got the complete result, not if it was truncated.

For example, we requested 16 frames and got 8, we can safely cache it because we know the result is complete. If we requested 16 frames and got 16 back...we don't know if this was a full request or if it was truncated, so don't cache it...the caller may call back and request 32 frames and we don't want to hand back 16 frames not knowing if there were 17 in the list.

yield return root;
ClrHeap heap = Runtime.Heap;
ClrStackFrame[] frames = GetFramesForRoots();
return threadHelpers.EnumerateStackRoots(OSThreadId, traceErrors: IsAlive).Select(r => CreateClrStackRoot(heap, frames, r)).Where(r => r is not null)!;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the non-caching path need to be called under the _rootCacheLock as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the dac enumeration itself is properly serialized. The _rootCache lock is only to make sure we properly cache and publish a complete _rootCache.

return _heapHelper = new DacHeap(_sos, _sos8, _sos12, _sos16, _dataReader, _dac.TargetProperties, data, mts);

return null;
lock (_serviceLock)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing that _serviceLock is distinct from _dac.SyncRoot and _sos.SyncRoot. Just wanting to make sure that (a) we shouldn't be taking _sos.SyncRoot instead of _dac.SyncRoot since you call _sos.GetGCHeapData under the DAC SyncRoot, and (b) just wondering why you need an additional _serviceLock here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service lock is intentional, as it's distinct for singleton publishing here.

_dac.SyncRoot and _sos.SyncRoot are the same object, so that isn't an issue, but it is confusing. I changed some of these over to make it more consistent in the code so it's easier to understand.

brianrob
brianrob previously approved these changes Jun 11, 2026

@brianrob brianrob left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few questions below, but if all is good, feel free to merge.

leculver and others added 2 commits June 12, 2026 07:47
_dac.SyncRoot and _sos.SyncRoot are the same object (SOSDac.SyncRoot =>
DacLibrary.SyncRoot, and _dac is that same DacLibrary), so locking the
GC-heap init and Flush on _dac.SyncRoot was already correct. Switch both
sites to _sos.SyncRoot to match the convention used by every other DAC
helper and remove reviewer confusion about mixed lock names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The lock-free double-checked reads of the _threads and _handles caches
read a non-volatile ImmutableArray<T> field. The struct is a single array
reference so the read is atomic, but the fast path has no acquire barrier
(unlike the volatile _heap/_domainAndModules fields beside it), so on weak
memory models a reader could observe the published reference before the
array contents. ImmutableArray<T> can't be volatile and
ImmutableCollectionsMarshal isn't available on netstandard2.0, so gate the
fast-path read behind a volatile published flag (release on write, acquire
on read). FlushCachedData clears the flag before resetting the array.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. EnumerateHandles / EnumerateDependentHandles: snapshot only the raw
   HandleData under SyncRoot, then resolve object pointers
   (IDataReader/IMemoryReader.ReadPointer, not DAC calls) and build results
   outside the lock, so a huge/slow handle table no longer holds the global
   DAC lock and starves all other DAC consumers. (Public ClrRuntime handle
   cache is intentionally kept - Flush-invalidated, same as Threads - the
   reliability issue was the lock-hold, now fixed.)
2. DacServiceProvider.Dispose: dispose the native/COM wrappers under
   _sos.SyncRoot so teardown can't race an in-flight DAC read.
3. ClrRuntime.FlushCachedData: reset the runtime caches under
   _runtimeCacheLock so a mid-population builder can't republish pre-flush
   data after the reset.
4. DacServiceProvider service singletons: mark the lock-free-read fields
   volatile (acquire on the fast path), same publication fix as _threads/_handles.
5. DacRuntime ctor: issue the DacVersion request under _sos.SyncRoot instead
   of leaving a DAC entry point uncovered during lazy service creation.
6. Enforce MaxThreads/MaxAppDomains/MaxModules at the DAC materialization
   boundary (EnumerateThreads/EnumerateAppDomains/GetModuleList now take a
   maxCount) so a corrupt/hostile dump can't drive unbounded materialization
   before the caller-side limit applies.

All builds clean (net10.0 + netstandard2.0); 595 thread/handle/root/appdomain/
module/runtime tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants