Ai opt p27 pre xdr by dmkozh · Pull Request #5301 · stellar/stellar-core

dmkozh · 2026-05-29T20:57:41Z

No description provided.

Replace xdrSha256(success) with streaming SHA256 calculation to avoid XDR re-serialization of InvokeHostFunctionSuccessPreImage. The return value and events are already available as XDR-encoded bytes, so we can hash them directly without round-trip serialization.

Adds parallel processing to transaction set handling: 1. Parallel TxFrame creation: Creates TxFrames from XDR envelopes in parallel during transaction set deserialization. Uses work-stealing via std::async with even distribution across available threads. 2. Parallel transaction validation: Validates transactions in parallel in txsAreValid() when there are 2+ transactions. 3. Hash precomputation: Precomputes content and full hashes before parallel operations to avoid race conditions. 4. Test coverage: Adds StreamingShaTest for InvokeHostFunctionSuccessPreImage verification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add sizeBytes field to ContractDataMapEntryT to cache the XDR serialized size of ledger entries. This avoids repeated xdr_size() calls during state updates, reducing CPU overhead in the hot path. Also adds Tracy zone to updateState() for profiling visibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

During ledger close, three independent operations are now parallelized: - addHotArchiveBatch (modifies mHotArchiveBucketList) - addLiveBatch (modifies mLiveBucketList) - runs on main thread - updateInMemorySorobanState (modifies mInMemorySorobanState) These operations modify completely independent data structures and can safely run concurrently. Added getInMemorySorobanStateForUpdate() to allow direct access to mInMemorySorobanState during COMMITTING phase. This reduces ledger close latency by overlapping CPU-bound operations. # Conflicts: # src/ledger/LedgerManagerImpl.cpp

-5ms for 6400 SAC transfers scenario

libsodium uses a portable C SHA256 implementation, missing SHA-NI hardware instructions available on Intel Xeon Platinum. OpenSSL automatically uses SHA-NI, providing 4.6x speedup for streaming add() (893ns->193ns/call) and 56% total SHA256 self-time reduction (3,744ms->1,659ms per 30s trace). Use opaque aligned storage for SHA256_CTX in the header to avoid naming conflict between OpenSSL's ::SHA256 function and stellar::SHA256 class. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…5ms/ledger) Run LiveBucketIndex construction on async worker thread in parallel with the put loop in mergeInMemory. Both read mergedEntries as const — fully independent. Tracy confirms full overlap: index future wait averages 2.2µs. finalizeLedgerTxnChanges drops from 164ms to 136ms per ledger. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When ledgerCloseMeta is null (meta tracking disabled), operate directly on the parent LTX in processFeesSeqNums and processPostTxSetApply instead of creating a child LTX per-transaction. The child LTX was only needed for getChanges() meta tracking. Saves ~41ms/ledger from eliminating ~10.6K child LTX create/commit cycles. Combined with experiment 011 (meta tracking), TPS improves from 10,688 to 12,736 (+19.2%). Also raises APPLY_LOAD_MAX_SAC_TPS_MAX_TPS from 12000 to 15000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # docs/apply-load-max-sac-tps.cfg

In commitChangesToLedgerTxn, determining whether an entry is INIT (new) vs LIVE (existing) required calling mInMemorySorobanState.get() which computes sha256(xdr_to_opaque(key)) for every CONTRACT_DATA entry. With ~40K entries per ledger, this added ~16ms of SHA256 per ledger. Track existence via a bool mIsNew flag in ParallelApplyEntry, set when a TX creates an entry that didn't previously exist. This replaces the expensive SHA256-based existence check with a simple boolean. commitChangesToLedgerTxn: 72.6ms -> 44.2ms (-39%) TPS: 16,640 -> 16,960 (+1.9%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # src/transactions/ParallelApplyUtils.cpp

Add move overloads for createWithoutLoading/updateWithoutLoading and ScopedLedgerEntryOpt::moveFromScope to eliminate two deep copies per entry when committing parallel apply state to LedgerTxn. Reduces commitChangesToLedgerTxn from 44ms to 39ms per ledger (-12.8%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pre-load Soroban read-only entries (contract instance, code, TTL) into the global parallel apply state during setup, so per-TX lookups hit thread-local maps instead of traversing to InMemorySorobanState. Also cache protocol version and skip Soroban merge tracking in processFeesSeqNums, and use std::move for mLatestTxResultSet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # docs/success/049-skip-child-ltx-processFeesSeqNums.md

Use bitset instead of maps and relax invariants a bit. This is pretty impactful - -10ms apply time for SAC, -20ms apply time for soroswap

Pre-compute expected entry counts from footprint sizes and call reserve() on ParallelApplyEntryMap containers before they accumulate entries. Eliminates log2(N) rehash operations during parallel apply, yielding -26% commitChangesFromThread and -27% commitChangesToLedgerTxn self-time. +576 TPS (+3.1%): 18,368 → 18,944 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # src/transactions/ParallelApplyUtils.cpp

resolveBackgroundEvictionScan previously received an UnorderedSet<LedgerKey> built by getAllKeysWithoutSealing() containing ~128K entries (~20ms to build), but only performed ~10-100 lookups. Added isModifiedKey() to LedgerTxn for direct O(1) lookups in the existing EntryMap, eliminating the set construction. resolveEviction zone: 20ms -> 0.116ms per ledger (99.4% reduction). TPS: 18,944 -> 19,328 avg (+2.0%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace single global mutex + RandomEvictionCache with 16 sharded caches, each with its own mutex. This eliminates contention when 4 parallel threads verify signatures simultaneously. Also use maybeGet() instead of exists()+get() double-lookup, fix ZoneText string heap allocations, make counters atomic, and remove unused liveSnapshot copy in applySorobanStageClustersInParallel.

Sort lightweight 24-byte EntryRef structs (type tag + pointer) instead of full BucketEntry objects (200-500 bytes) in convertToBucketEntry. Reduces sort swap cost by ~12x and materializes final vector in one cache-friendly sequential pass. Cuts convertToBucketEntry from 31.9ms to 25.4ms per ledger. Benchmark: 13,760 -> 14,144 TPS (+384 TPS, +2.8%)

Skip building LedgerTxnDelta in setEffectsDeltaFromSuccessfulTx when INVARIANT_CHECKS is empty. The delta is consumed exclusively by checkOnOperationApply which iterates an empty list when no invariants are configured. This eliminates ~285ms of shared_ptr allocations and entry copies across 4 worker threads per ledger. Benchmark: 12,736 -> 13,760 TPS (+1,024 TPS, +8.0%)

…ol version

LedgerSnapshot was renamed to CheckValidLedgerViewWrapper and ApplyLedgerStateSnapshot to ApplyLedgerView in upstream's LedgerState refactor. Branch's parallel pre-apply paths used the old names; rename to match. ApplyLedgerView privately inherits from ImmutableLedgerView, so use executeWithMaybeInnerSnapshot to derive a CheckValidLedgerViewWrapper from it for the read-only pre-apply paths.

The previous adaptation used ApplyLedgerView::executeWithMaybeInnerSnapshot to derive a CheckValidLedgerViewWrapper, but ImmutableLedgerView (and therefore ApplyLedgerView via using-declaration) explicitly throws on that call. Instead, add a narrow accessor that hands out the underlying ImmutableLedgerView and use the existing CheckValidLedgerViewWrapper(ImmutableLedgerView const&) constructor.

The branch's parallel TxFrame creation paths only checked XDRProvidesValidFee() but missed the getInclusionFee() <= 0 check that upstream added in the sequential equivalents. Restore parity so generalized tx sets with negative-fee txs are rejected during construction.

dmkozh and others added 30 commits May 28, 2026 15:13

budget opt step 1

119e987

rollback env, update benchmark config

24c3ea3

disable test meta

083e0c8

validate txs in parallel, small improvement on some tests (?)

3e10875

Parallel pre-apply 5-20ms

cbe0cb5

profile flag for bench matrix

53ecfc4

Cache ledger info

87bb20e

add config flag for ledger close worker threads

eeaba98

Detailed apply stage breakdown

8e725ae

Optimize rescope using move.

1d2f2da

-5ms for 6400 SAC transfers scenario

add tracy support to bench matrix

80838cb

Optimize recordStorageChanges.

67f57bb

Use bitset instead of maps and relax invariants a bit. This is pretty impactful - -10ms apply time for SAC, -20ms apply time for soroswap

Remove extra lookup from upsert

690373f

update scenarios

d1e7c10

More robust path handling in apply load matrix script

9183b6b

dmkozh added 9 commits May 28, 2026 15:23

Cache LedgerKey hash in parallel apply data structures - ~-5ms

f92295e

Manual txset building instrumentation

8eb6ed4

storage opt

369444f

revert host module to p26

dc18b67

format

18a753e

fix a bug - in-memory state update shouldn't be conditioned on protoc…

1a1a8b0

…ol version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ai opt p27 pre xdr#5301

Ai opt p27 pre xdr#5301
dmkozh wants to merge 39 commits into
stellar:masterfrom
dmkozh:ai_opt_p27_pre_xdr

dmkozh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dmkozh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants