Skip to content

Fix large-page ESE tag-state parsing for Windows Server 2025 NTDS.dit (issue #1924)#2158

Merged
anadrianmanrique merged 4 commits into
fortra:masterfrom
alexisbalbachan:2025_ntds_parse_fix
Apr 30, 2026
Merged

Fix large-page ESE tag-state parsing for Windows Server 2025 NTDS.dit (issue #1924)#2158
anadrianmanrique merged 4 commits into
fortra:masterfrom
alexisbalbachan:2025_ntds_parse_fix

Conversation

@alexisbalbachan
Copy link
Copy Markdown
Collaborator

@alexisbalbachan alexisbalbachan commented Mar 27, 2026

This PR fixes the ESE page-header parsing bug behind issue #1924, where secretsdump.py failed to parse Windows Server 2025 NTDS.dit files using 32 KB ESE pages.

Root cause:

  • FirstAvailablePageTag was treated as a plain 16-bit tag count.
  • On large page databases, the low 12 bits contain the tag count, while the upper 4 bits appear to encode reserved tag state.
  • This resulted in values such as 0x100c to be interpreted as 4108 tag count instead of just 12, causing Impacket to walk past the last valid tag and crash with IndexError.

Fix:

  • FirstAvailablePageTag is now split into 2 values:
    • Upper 4 bits are stored as tagReserved, there's not much information about this, dissect.esedb treats it as a counter for reserved tags which is then used to calculate the actual logical node count. I did not implement this because i could not manage to create a dump which had tagReserved > 1
    • Remaining 12 bits are used for the tag count (instead of the 16 that were used previously)
  • These changes are in line with what dissect.esedb does

Additional Fix:

  • secretsdump and raiseChild assumed USER_PROPERTIES structure always contains PropertyCount and UserProperties. Those fields are optional, and i obtained a dump in which a user had a valid zero-property supplementalCredentials blob, where SAMR omits PropertyCount. Because we modeled PropertyCount as unconditional, the parser consumed later bytes as the count and then failed while decoding a non-existent USER_PROPERTY, producing an Error while processing that user.

  • This was fixed by parsing only the fixed USER_PROPERTIES header in samr.py and handle the optional tail manually based on Length. When the blob is the zero-property form, we now return PropertyCount = 0 and empty property data; otherwise we parse PropertyCount, the property buffer, and Reserved5 explicitly. secretsdump.py and raiseChild.py were updated to use that helper, and both now discard malformed trailing property data safely instead of crashing the whole row.

Copilot AI review requested due to automatic review settings March 27, 2026 08:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ESE page-header/tag-state parsing to support Windows Server 2025 NTDS.dit files using 32 KiB pages, preventing secretsdump.py from walking past the tag array and crashing.

Changes:

  • Introduce masking/splitting of FirstAvailablePageTag to derive an effective tag count on large pages.
  • Store derived tagCount/tagReserved on ESENT_PAGE.
  • Update tag iteration/slicing across page parsing code paths to use tagCount instead of the raw header field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread impacket/ese.py Outdated
Comment on lines +451 to +454
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1
self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagReserved is parsed for large pages but is never used when iterating tags (most loops still start at 1). If tagReserved can be > 1, the parser may still try to interpret reserved tags as records; consider iterating from tagReserved (or otherwise skipping reserved tags) when walking leaf/branch tags.

Suggested change
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1
self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
raw_tag_field = self.record['FirstAvailablePageTag']
self.tagReserved = (raw_tag_field >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1
physicalTagCount = raw_tag_field & FIRST_AVAILABLE_PAGE_TAG_MASK
# On large pages, adjust tagCount so it represents the logical node count (excluding all reserved tags).
# When tagReserved == 1 (the legacy assumption), this reduces to the original behavior.
self.tagCount = physicalTagCount - (self.tagReserved - 1)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. This PR fixes the confirmed #1924 crash by correcting the large-page
tag count parsing. tagReserved is modeled for parity with dissect.esedb, but
I am not changing logical-node traversal without a sample exposing effective
tagReserved > 1, since that would require a broader change than this bug fix.

Comment thread impacket/ese.py
for i in range(self.record['FirstAvailablePageTag']):
for i in range(self.tagCount):
tag = tags[-4:]
if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] > 11 and self.__DBHeader['PageSize'] > 8192:
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dump(), the large-page tag decoding check uses FileFormatRevision > 11, but elsewhere the Windows 7+ boundary is treated as >= 0x11 / >= 17 (see getTag() and ESENT_PAGE_HEADER). To keep behavior consistent and avoid applying the large-page decoding to revisions 0x0c–0x10, update this condition to match the same threshold used elsewhere.

Suggested change
if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] > 11 and self.__DBHeader['PageSize'] > 8192:
if self.__DBHeader['Version'] == 0x620 and self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192:

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the condition is inconsistent with the rest of the module, but it is outside the crash path fixed here and I do not have a sample showing that revisions 0x0c..0x10 are mishandled by dump(). I’d prefer to keep this PR scoped
to the reproducible

Comment thread impacket/ese.py
Comment on lines +448 to +454
self.tagCount = self.record['FirstAvailablePageTag']
if self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192:
# TODO: The upper 4 bits may encode how many leading tags are reserved on large pages.
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1
self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a specific crash/regression for 32 KiB pages by masking FirstAvailablePageTag to 12 bits. Please add a regression test (unit-level if possible) that builds/parses a page header where FirstAvailablePageTag has high bits set (e.g., 0x100c) and asserts tagCount == 0x000c and tag iteration does not raise.

Copilot uses AI. Check for mistakes.
Comment thread impacket/ese.py
Comment on lines +448 to +454
self.tagCount = self.record['FirstAvailablePageTag']
if self.__DBHeader['FileFormatRevision'] >= 0x11 and self.__DBHeader['PageSize'] > 8192:
# TODO: The upper 4 bits may encode how many leading tags are reserved on large pages.
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
self.tagReserved = (self.record['FirstAvailablePageTag'] >> FIRST_AVAILABLE_PAGE_TAG_RESERVED_SHIFT) or 1
self.tagCount = self.record['FirstAvailablePageTag'] & FIRST_AVAILABLE_PAGE_TAG_MASK
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The large-page condition in ESENT_PAGE.__init__ only checks FileFormatRevision/PageSize, but other large-page parsing logic in this module (e.g., getTag()) also gates on Version == 0x620. Consider aligning the predicate here with getTag() to avoid masking FirstAvailablePageTag on database versions that don’t use the 12-bit tag-count encoding.

Copilot uses AI. Check for mistakes.
Comment thread impacket/ese.py Outdated
Comment on lines +451 to +452
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment block under the large-page if is over-indented (lines after the TODO have extra indentation). This makes the code harder to read; align the comment indentation with the rest of the block.

Suggested change
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.
# Logical node counts should be derived from the effective reserved-tag count
# instead of assuming only tag 0 is reserved, the logical node count should be tagCount - tagReserved.

Copilot uses AI. Check for mistakes.
@gabrielg5
Copy link
Copy Markdown
Collaborator

check together with #2165

@anadrianmanrique anadrianmanrique self-assigned this Apr 16, 2026
@anadrianmanrique anadrianmanrique added medium Medium priority item bug Unexpected problem or unintended behavior labels Apr 16, 2026
@anadrianmanrique
Copy link
Copy Markdown
Collaborator

all tests passed ok. merging now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Unexpected problem or unintended behavior medium Medium priority item

Projects

None yet

Development

Successfully merging this pull request may close these issues.

secretsdump.py does not parse Windows Server 2025 NTDS.dit

4 participants