Fix large-page ESE tag-state parsing for Windows Server 2025 NTDS.dit (issue #1924)#2158
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes ESE page-header/tag-state parsing to support Windows Server 2025 NTDS.dit files using 32 KiB pages, preventing secretsdump.py from walking past the tag array and crashing.
Changes:
- Introduce masking/splitting of
FirstAvailablePageTagto derive an effective tag count on large pages. - Store derived
tagCount/tagReservedonESENT_PAGE. - Update tag iteration/slicing across page parsing code paths to use
tagCountinstead of the raw header field.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Logical node counts should be derived from the effective reserved-tag count | ||
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | ||
| self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1 | ||
| self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK |
There was a problem hiding this comment.
tagReserved is parsed for large pages but is never used when iterating tags (most loops still start at 1). If tagReserved can be > 1, the parser may still try to interpret reserved tags as records; consider iterating from tagReserved (or otherwise skipping reserved tags) when walking leaf/branch tags.
| # Logical node counts should be derived from the effective reserved-tag count | |
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | |
| self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1 | |
| self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK | |
| # Logical node counts should be derived from the effective reserved-tag count | |
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | |
| raw_tag_field = self.record['FirstAvailablePageTag'] | |
| self.tagReserved = (raw_tag_field >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1 | |
| physicalTagCount = raw_tag_field & FIRST_AVAILABLE_PAGE_TAG_MASK | |
| # On large pages, adjust tagCount so it represents the logical node count (excluding all reserved tags). | |
| # When tagReserved == 1 (the legacy assumption), this reduces to the original behavior. | |
| self.tagCount = physicalTagCount - (self.tagReserved - 1) |
There was a problem hiding this comment.
Intentional. This PR fixes the confirmed #1924 crash by correcting the large-page
tag count parsing. tagReserved is modeled for parity with dissect.esedb, but
I am not changing logical-node traversal without a sample exposing effective
tagReserved > 1, since that would require a broader change than this bug fix.
| for i in range(self.record['FirstAvailablePageTag']): | ||
| for i in range(self.tagCount): | ||
| tag = tags[-4:] | ||
| if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] > 11 and self.__DBHeader['PageSize'] > 8192: |
There was a problem hiding this comment.
In dump(), the large-page tag decoding check uses FileFormatRevision > 11, but elsewhere the Windows 7+ boundary is treated as >= 0x11 / >= 17 (see getTag() and ESENT_PAGE_HEADER). To keep behavior consistent and avoid applying the large-page decoding to revisions 0x0c–0x10, update this condition to match the same threshold used elsewhere.
| if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] > 11 and self.__DBHeader['PageSize'] > 8192: | |
| if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192: |
There was a problem hiding this comment.
I agree the condition is inconsistent with the rest of the module, but it is outside the crash path fixed here and I do not have a sample showing that revisions 0x0c..0x10 are mishandled by dump(). I’d prefer to keep this PR scoped
to the reproducible
| self.tagCount = self.record['FirstAvailablePageTag'] | ||
| if self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192: | ||
| # TODO: The upper 4 bits may encode how many leading tags are reserved on large pages. | ||
| # Logical node counts should be derived from the effective reserved-tag count | ||
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | ||
| self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1 | ||
| self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK |
There was a problem hiding this comment.
This change fixes a specific crash/regression for 32 KiB pages by masking FirstAvailablePageTag to 12 bits. Please add a regression test (unit-level if possible) that builds/parses a page header where FirstAvailablePageTag has high bits set (e.g., 0x100c) and asserts tagCount == 0x000c and tag iteration does not raise.
| self.tagCount = self.record['FirstAvailablePageTag'] | ||
| if self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192: | ||
| # TODO: The upper 4 bits may encode how many leading tags are reserved on large pages. | ||
| # Logical node counts should be derived from the effective reserved-tag count | ||
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | ||
| self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1 | ||
| self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK |
There was a problem hiding this comment.
The large-page condition in ESENT_PAGE.__init__ only checks FileFormatRevision/PageSize, but other large-page parsing logic in this module (e.g., getTag()) also gates on Version == 0x620. Consider aligning the predicate here with getTag() to avoid masking FirstAvailablePageTag on database versions that don’t use the 12-bit tag-count encoding.
| # Logical node counts should be derived from the effective reserved-tag count | ||
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. |
There was a problem hiding this comment.
The comment block under the large-page if is over-indented (lines after the TODO have extra indentation). This makes the code harder to read; align the comment indentation with the rest of the block.
| # Logical node counts should be derived from the effective reserved-tag count | |
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. | |
| # Logical node counts should be derived from the effective reserved-tag count | |
| # instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved. |
|
check together with #2165 |
added testcase
|
all tests passed ok. merging now |
This PR fixes the ESE page-header parsing bug behind issue #1924, where
secretsdump.pyfailed to parse Windows Server 2025NTDS.ditfiles using 32 KB ESE pages.Root cause:
FirstAvailablePageTagwas treated as a plain 16-bit tag count.0x100cto be interpreted as4108tag count instead of just12, causing Impacket to walk past the last valid tag and crash with IndexError.Fix:
FirstAvailablePageTagis now split into 2 values:tagReserved, there's not much information about this, dissect.esedb treats it as a counter for reserved tags which is then used to calculate the actual logical node count. I did not implement this because i could not manage to create a dump which had tagReserved > 1Additional Fix:
secretsdumpandraiseChildassumedUSER_PROPERTIESstructure always containsPropertyCountandUserProperties. Those fields are optional, and i obtained a dump in which a user had a valid zero-propertysupplementalCredentialsblob, where SAMR omitsPropertyCount. Because we modeledPropertyCountas unconditional, the parser consumed later bytes as the count and then failed while decoding a non-existentUSER_PROPERTY, producing an Error while processing that user.This was fixed by parsing only the fixed
USER_PROPERTIESheader in samr.py and handle the optional tail manually based on Length. When the blob is the zero-property form, we now returnPropertyCount= 0 and empty property data; otherwise we parsePropertyCount, the property buffer, andReserved5explicitly. secretsdump.py and raiseChild.py were updated to use that helper, and both now discard malformed trailing property data safely instead of crashing the whole row.