Serialize DAC access and materialize unsafe enumerations#1473
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a single DAC serialization model (DacLibrary.SyncRoot → SOSDac.SyncRoot) and updates key DAC-backed helpers to (1) serialize stateful DAC entry points and (2) materialize previously lazy enumerations before returning, preventing concurrent DAC enumerations from corrupting each other and avoiding caching of partial results.
Changes:
- Added a shared DAC
SyncRootand routed SOS/DAC helpers through it to serialize stateful enumerations and cache fills. - Converted multiple DAC-backed
IEnumerableimplementations from lazy iterators to fully materialized collections created under the DAC lock. - Updated thread/heap/runtime cache population to be lock-protected to prevent publishing truncated results; updated stack walking to enforce
maxFramesinside the DAC walk.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Diagnostics.Runtime/DacLibrary.cs | Introduces SyncRoot used as the central DAC serialization lock. |
| src/Microsoft.Diagnostics.Runtime/DacInterface/SosDac.cs | Exposes SyncRoot and adjusts internal name caches to be eagerly allocated dictionaries. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacTypeHelpers.cs | Serializes DAC calls and materializes method/field enumeration results under the lock. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacThreadHelpers.cs | Serializes stack root/trace enumeration and materializes results; enforces maxFrames inside the walk. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacServiceProvider.cs | Adds locking for singleton service construction and serializes heap initialization/flush against the DAC lock. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacRuntime.cs | Serializes/materalizes threads/appdomains/handles/jit manager enumerations. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacNativeHeaps.cs | Serializes and materializes native heap traversals and region enumerations. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacModuleHelpers.cs | Serializes module/metadata access; passes DAC lock into metadata reader. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacMethodLocator.cs | Serializes method lookup helpers through SyncRoot. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacMetadataReader.cs | Serializes metadata import calls and materializes enumerations under the DAC lock. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacLegacyThreadPool.cs | Serializes threadpool DAC reads through SyncRoot. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacHeap.cs | Serializes/materalizes heap-related enumerations and validates method tables under the DAC lock. |
| src/Microsoft.Diagnostics.Runtime/DacImplementation/DacComHelpers.cs | Serializes COM-related DAC reads and materializes RCW cleanup enumeration. |
| src/Microsoft.Diagnostics.Runtime/ClrThread.cs | Serializes stack root/trace cache population and updates stack trace helper signature usage. |
| src/Microsoft.Diagnostics.Runtime/ClrRuntime.cs | Serializes runtime cache population and adds handle caching with lock protection. |
| src/Microsoft.Diagnostics.Runtime/ClrHeap.cs | Serializes heap cache population for sync blocks, allocation contexts, and dependent handles. |
| src/Microsoft.Diagnostics.Runtime/AbstractDac/IAbstractThreadHelpers.cs | Updates stack trace helper contract to include maxFrames enforced by the DAC walk. |
Copilot's findings
- Files reviewed: 17/17 changed files
- Comments generated: 3
Introduce and apply the DacLibrary/SOS SyncRoot model to stateful DAC entry points. Serialize service construction, flush, stack walks, runtime/heap/thread cache population, and the remaining DAC helper surfaces so partial native enumerations cannot race and be permanently cached. Materialize native heap and other DAC traversal results under the lock, and enforce stack maxFrames inside the DAC walk before the helper can keep walking past the caller's cap.
d43c438 to
c8355eb
Compare
| foreach (ClrHandleInfo info in GetDacRuntime().EnumerateHandles()) | ||
| builder.Add(new ClrHandle(this, info)); | ||
|
|
||
| _handles = builder.MoveOrCopyToImmutable(); |
There was a problem hiding this comment.
This wasn't previously cached - is caching it intentional?
There was a problem hiding this comment.
Yes, walking the handle table is expensive so I wanted to cache it. But this did remind me that there's a correctness issue with Flush. Using a volatile publish will fix this. It's ugly, but it's fast. I wish I hadn't chosen to use ImmutableArray so long ago.
| .Select(r => CreateClrStackFrame(r)) | ||
| .Take(maxFrames) | ||
| .ToArray(); | ||
| if (frames.Length < maxFrames) |
There was a problem hiding this comment.
I'm a bit confused on this line. Is the goal to only cache if frames is less than maxFrames? It can't be more than maxFrames because of the call to Take(maxFrames).
There was a problem hiding this comment.
This is correct because the if statement has frames.Length < maxFrames, not maxFrames < frames.Length. The result is we ONLY caches the result if we are sure we got the complete result, not if it was truncated.
For example, we requested 16 frames and got 8, we can safely cache it because we know the result is complete. If we requested 16 frames and got 16 back...we don't know if this was a full request or if it was truncated, so don't cache it...the caller may call back and request 32 frames and we don't want to hand back 16 frames not knowing if there were 17 in the list.
| yield return root; | ||
| ClrHeap heap = Runtime.Heap; | ||
| ClrStackFrame[] frames = GetFramesForRoots(); | ||
| return threadHelpers.EnumerateStackRoots(OSThreadId, traceErrors: IsAlive).Select(r => CreateClrStackRoot(heap, frames, r)).Where(r => r is not null)!; |
There was a problem hiding this comment.
Does the non-caching path need to be called under the _rootCacheLock as well?
There was a problem hiding this comment.
No, because the dac enumeration itself is properly serialized. The _rootCache lock is only to make sure we properly cache and publish a complete _rootCache.
| return _heapHelper = new DacHeap(_sos, _sos8, _sos12, _sos16, _dataReader, _dac.TargetProperties, data, mts); | ||
|
|
||
| return null; | ||
| lock (_serviceLock) |
There was a problem hiding this comment.
I'm noticing that _serviceLock is distinct from _dac.SyncRoot and _sos.SyncRoot. Just wanting to make sure that (a) we shouldn't be taking _sos.SyncRoot instead of _dac.SyncRoot since you call _sos.GetGCHeapData under the DAC SyncRoot, and (b) just wondering why you need an additional _serviceLock here.
There was a problem hiding this comment.
Service lock is intentional, as it's distinct for singleton publishing here.
_dac.SyncRoot and _sos.SyncRoot are the same object, so that isn't an issue, but it is confusing. I changed some of these over to make it more consistent in the code so it's easier to understand.
brianrob
left a comment
There was a problem hiding this comment.
Had a few questions below, but if all is good, feel free to merge.
_dac.SyncRoot and _sos.SyncRoot are the same object (SOSDac.SyncRoot => DacLibrary.SyncRoot, and _dac is that same DacLibrary), so locking the GC-heap init and Flush on _dac.SyncRoot was already correct. Switch both sites to _sos.SyncRoot to match the convention used by every other DAC helper and remove reviewer confusion about mixed lock names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The lock-free double-checked reads of the _threads and _handles caches read a non-volatile ImmutableArray<T> field. The struct is a single array reference so the read is atomic, but the fast path has no acquire barrier (unlike the volatile _heap/_domainAndModules fields beside it), so on weak memory models a reader could observe the published reference before the array contents. ImmutableArray<T> can't be volatile and ImmutableCollectionsMarshal isn't available on netstandard2.0, so gate the fast-path read behind a volatile published flag (release on write, acquire on read). FlushCachedData clears the flag before resetting the array. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. EnumerateHandles / EnumerateDependentHandles: snapshot only the raw HandleData under SyncRoot, then resolve object pointers (IDataReader/IMemoryReader.ReadPointer, not DAC calls) and build results outside the lock, so a huge/slow handle table no longer holds the global DAC lock and starves all other DAC consumers. (Public ClrRuntime handle cache is intentionally kept - Flush-invalidated, same as Threads - the reliability issue was the lock-hold, now fixed.) 2. DacServiceProvider.Dispose: dispose the native/COM wrappers under _sos.SyncRoot so teardown can't race an in-flight DAC read. 3. ClrRuntime.FlushCachedData: reset the runtime caches under _runtimeCacheLock so a mid-population builder can't republish pre-flush data after the reset. 4. DacServiceProvider service singletons: mark the lock-free-read fields volatile (acquire on the fast path), same publication fix as _threads/_handles. 5. DacRuntime ctor: issue the DacVersion request under _sos.SyncRoot instead of leaving a DAC entry point uncovered during lazy service creation. 6. Enforce MaxThreads/MaxAppDomains/MaxModules at the DAC materialization boundary (EnumerateThreads/EnumerateAppDomains/GetModuleList now take a maxCount) so a corrupt/hostile dump can't drive unbounded materialization before the caller-side limit applies. All builds clean (net10.0 + netstandard2.0); 595 thread/handle/root/appdomain/ module/runtime tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Introduces and applies a single DAC serialization model for stateful DAC entry points.
The DAC is not safe to call concurrently across independent helpers/enumerators. This PR adds and applies the
DacLibrary/SOSDacSyncRootconvention so DAC calls that can race native state are serialized and results are materialized before returning to callers.Included here:
SyncRootplumbing and documentationmaxFramesenforced inside the DAC walkSyncRootPart of #420.
Validation
dotnet build src/Microsoft.Diagnostics.Runtime/Microsoft.Diagnostics.Runtime.csproj --framework net10.0-- passed with 0 warnings and 0 errors.lockfree-mmf-threadsafe; all non-stress paths matched the original combined branch.