feat(migration): trie migration#3659
Open
MaksymMalicki wants to merge 8 commits into
Open
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## maksym/statehistory-migration #3659 +/- ##
=================================================================
- Coverage 76.10% 75.98% -0.12%
=================================================================
Files 408 415 +7
Lines 36913 37428 +515
=================================================================
+ Hits 28091 28440 +349
- Misses 6792 6912 +120
- Partials 2030 2076 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5c0928e to
5a09f6c
Compare
b23a1b7 to
3640393
Compare
fa47934 to
05dd000
Compare
f6ba2c8 to
4aac6ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One-shot migration that converts every deprecated Starknet trie on disk into the equivalent
trie2layout. After this migration runs, the new state package can read directly from the new buckets and the deprecated buckets are wiped.Bucket mapping:
ClassesTrieClassTrieStateTrieContractTrieContractContractStorageContractTrieStorageFormat differences
Three things change between the two layouts: how nodes are keyed on disk, how each node's value is encoded, and how path compression is expressed.
On-disk keying
Both layouts share a common prefix; only the suffix differs:
common (both) suffix
───────────── ─────────────────────────────────────────
bucket [|| owner] → path-length-byte || path-bytes (deprecated)
→ nodeType-byte || path-length-byte || path-bytes (new)
owneris present only for storage tries. The new layout's extranodeTypebyte splits leaves from internal nodes into two index slices within the same bucket —trie2state lookups use this to short-circuit between leaf reads and internal-node traversals.Node encoding
Both layouts are raw byte streams (no length prefixes, no varints — field-element widths are fixed).
Deprecated: nodes are self-contained. Internal binary nodes embed the compressed paths to their children inline:
leaf value
binary value || left-child-path || right-child-path
[|| left-hash || right-hash, optional cache, ignored here]
valueis the node's own Starknet trie hash, or the stored value when the node is a leaf. The trailing hash pair was a denormalised cache; the migrator does not read it (hashes are recomputed from scratch — see below).New: every node has an explicit type tag and path compression lives in dedicated edge nodes:
value value
binary 0x01 || left-edge-hash || right-edge-hash
edge 0x02 || child-hash || encoded-path-segment
Path compression — the key structural change
The deprecated format compresses paths inside the parent binary node (via its embedded child-path fields). The new format moves compression into dedicated edge nodes sitting between binary nodes and their children:
deprecated: binary ──────── child-path ────────► child
new: binary ──► edge ──► child
One consequence: the deprecated root marker (a single entry at the bare bucket prefix recording the root's path) disappears. Whatever the deprecated root embedded becomes either a direct binary/leaf at the empty path or, when the deprecated root path is itself non-empty, an edge node at the empty path that points "down" to the real root.
Traversal
DFS is a natural fit here. In the new layout a binary node's payload is
left-edge-hash || right-edge-hash— so before we can write the parent, we need both children's hashes. A bottom-up walk reads each deprecated node exactly once and produces the child hash that the parent needs at the moment the parent is encoded; no separate hashing pass, no intermediate caches sized to the trie. Going top-down would force us to either re-read every child later or hold every visited node in memory until its subtree is hashed.For each trie:
ContractStoragekeyed bybucket || owner, so the enumerator splits them into per-owner descriptors. Each descriptor records the root path (from the bare-prefix marker entry) and the node count.Pipeline:
IngestorCountworker goroutines pull descriptors from the enumeration source; a single committer flushes filled batches to disk. A semaphore caps in-flight batches atIngestorCount * 2. Every flush and every channel send observesctx.Done; on cancelMigratereturns theshouldRerunsentinel and the migration runner re-invokes on the next process start.After the full pipeline finishes, the three deprecated buckets are wiped via
DeleteRange. The wipe is gated on full success — a crashed mid-migration leaves the deprecated source intact, so partially migrated tries either have a new-format root (skipped on the next pass) or don't (re-migrated from scratch).Alternative considered: reverse-iteration BFS
An earlier attempt used reverse-iteration BFS over the deprecated bucket — iterating keys from longest-path to shortest so leaves are processed first, then their parents, then their parents, and so on. Each level's hashes are buffered until the next level up consumes them.
In practice it performed comparably to DFS on wall-clock time but with substantially higher peak memory — the buffer of "hashes waiting for their parent" grows roughly with the widest level of the trie, which for full or near-full subtrees is most of the leaves. DFS keeps only the current root-to-leaf path in flight, which is bounded by the trie depth (≤ 251), so memory stays flat regardless of trie size. Same correctness, worse memory profile, no speed win — dropped.
Hashing
Starknet trie hashes for the new layout:
leaf value
binary hashFn(left-edge-hash, right-edge-hash)
edge hashFn(child-hash, path-segment-as-felt) + segment-length
A zero-length edge short-circuits to the bare
child-hash— the convention for absent edges. Class tries hash with Poseidon; contract and storage tries with Pedersen.Performance: for small tries (below
SmallTrieThresholdnodes) every edge hash is computed inline. Above the threshold, edge-hash jobs are batched (parallelHashBatchSize) and dispatched to a fixed-size worker pool for parallel computation. The scheduler preserves the original job order so the persisted bytes are byte-identical to a natively-builttrie2— verified end-to-end in the tests by comparing the migrated DB against one built directly throughtrie2.Trie.Update(seeTestMigrationEndToEnd).