Skip to content

Ai opt p27 pre xdr#5301

Draft
dmkozh wants to merge 39 commits into
stellar:masterfrom
dmkozh:ai_opt_p27_pre_xdr
Draft

Ai opt p27 pre xdr#5301
dmkozh wants to merge 39 commits into
stellar:masterfrom
dmkozh:ai_opt_p27_pre_xdr

Conversation

@dmkozh
Copy link
Copy Markdown
Contributor

@dmkozh dmkozh commented May 29, 2026

No description provided.

dmkozh and others added 30 commits May 28, 2026 15:13
Replace xdrSha256(success) with streaming SHA256 calculation to avoid
XDR re-serialization of InvokeHostFunctionSuccessPreImage. The return
value and events are already available as XDR-encoded bytes, so we can
hash them directly without round-trip serialization.
Adds parallel processing to transaction set handling:

1. Parallel TxFrame creation: Creates TxFrames from XDR envelopes in
   parallel during transaction set deserialization. Uses work-stealing
   via std::async with even distribution across available threads.

2. Parallel transaction validation: Validates transactions in parallel
   in txsAreValid() when there are 2+ transactions.

3. Hash precomputation: Precomputes content and full hashes before
   parallel operations to avoid race conditions.

4. Test coverage: Adds StreamingShaTest for InvokeHostFunctionSuccessPreImage
   verification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add sizeBytes field to ContractDataMapEntryT to cache the XDR serialized
size of ledger entries. This avoids repeated xdr_size() calls during
state updates, reducing CPU overhead in the hot path.

Also adds Tracy zone to updateState() for profiling visibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
During ledger close, three independent operations are now parallelized:
- addHotArchiveBatch (modifies mHotArchiveBucketList)
- addLiveBatch (modifies mLiveBucketList) - runs on main thread
- updateInMemorySorobanState (modifies mInMemorySorobanState)

These operations modify completely independent data structures and can
safely run concurrently. Added getInMemorySorobanStateForUpdate() to
allow direct access to mInMemorySorobanState during COMMITTING phase.

This reduces ledger close latency by overlapping CPU-bound operations.

# Conflicts:
#	src/ledger/LedgerManagerImpl.cpp
-5ms for 6400 SAC transfers scenario
libsodium uses a portable C SHA256 implementation, missing SHA-NI hardware
instructions available on Intel Xeon Platinum. OpenSSL automatically uses
SHA-NI, providing 4.6x speedup for streaming add() (893ns->193ns/call) and
56% total SHA256 self-time reduction (3,744ms->1,659ms per 30s trace).

Use opaque aligned storage for SHA256_CTX in the header to avoid naming
conflict between OpenSSL's ::SHA256 function and stellar::SHA256 class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5ms/ledger)

Run LiveBucketIndex construction on async worker thread in parallel with
the put loop in mergeInMemory. Both read mergedEntries as const — fully
independent. Tracy confirms full overlap: index future wait averages 2.2µs.
finalizeLedgerTxnChanges drops from 164ms to 136ms per ledger.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When ledgerCloseMeta is null (meta tracking disabled), operate directly
on the parent LTX in processFeesSeqNums and processPostTxSetApply instead
of creating a child LTX per-transaction. The child LTX was only needed
for getChanges() meta tracking.

Saves ~41ms/ledger from eliminating ~10.6K child LTX create/commit
cycles. Combined with experiment 011 (meta tracking), TPS improves
from 10,688 to 12,736 (+19.2%).

Also raises APPLY_LOAD_MAX_SAC_TPS_MAX_TPS from 12000 to 15000.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts:
#	docs/apply-load-max-sac-tps.cfg
In commitChangesToLedgerTxn, determining whether an entry is INIT (new)
vs LIVE (existing) required calling mInMemorySorobanState.get() which
computes sha256(xdr_to_opaque(key)) for every CONTRACT_DATA entry.
With ~40K entries per ledger, this added ~16ms of SHA256 per ledger.

Track existence via a bool mIsNew flag in ParallelApplyEntry, set when
a TX creates an entry that didn't previously exist. This replaces the
expensive SHA256-based existence check with a simple boolean.

commitChangesToLedgerTxn: 72.6ms -> 44.2ms (-39%)
TPS: 16,640 -> 16,960 (+1.9%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts:
#	src/transactions/ParallelApplyUtils.cpp
Add move overloads for createWithoutLoading/updateWithoutLoading and
ScopedLedgerEntryOpt::moveFromScope to eliminate two deep copies per
entry when committing parallel apply state to LedgerTxn. Reduces
commitChangesToLedgerTxn from 44ms to 39ms per ledger (-12.8%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-load Soroban read-only entries (contract instance, code, TTL) into
the global parallel apply state during setup, so per-TX lookups hit
thread-local maps instead of traversing to InMemorySorobanState. Also
cache protocol version and skip Soroban merge tracking in
processFeesSeqNums, and use std::move for mLatestTxResultSet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts:
#	docs/success/049-skip-child-ltx-processFeesSeqNums.md
Use bitset instead of maps and relax invariants a bit.

This is pretty impactful - -10ms apply time for SAC, -20ms apply time for soroswap
Pre-compute expected entry counts from footprint sizes and call reserve()
on ParallelApplyEntryMap containers before they accumulate entries.
Eliminates log2(N) rehash operations during parallel apply, yielding
-26% commitChangesFromThread and -27% commitChangesToLedgerTxn self-time.

+576 TPS (+3.1%): 18,368 → 18,944

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts:
#	src/transactions/ParallelApplyUtils.cpp
resolveBackgroundEvictionScan previously received an UnorderedSet<LedgerKey>
built by getAllKeysWithoutSealing() containing ~128K entries (~20ms to build),
but only performed ~10-100 lookups. Added isModifiedKey() to LedgerTxn for
direct O(1) lookups in the existing EntryMap, eliminating the set construction.

resolveEviction zone: 20ms -> 0.116ms per ledger (99.4% reduction).
TPS: 18,944 -> 19,328 avg (+2.0%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace single global mutex + RandomEvictionCache with 16 sharded caches,
each with its own mutex. This eliminates contention when 4 parallel threads
verify signatures simultaneously. Also use maybeGet() instead of exists()+get()
double-lookup, fix ZoneText string heap allocations, make counters atomic,
and remove unused liveSnapshot copy in applySorobanStageClustersInParallel.
Sort lightweight 24-byte EntryRef structs (type tag + pointer) instead of
full BucketEntry objects (200-500 bytes) in convertToBucketEntry. Reduces
sort swap cost by ~12x and materializes final vector in one cache-friendly
sequential pass. Cuts convertToBucketEntry from 31.9ms to 25.4ms per ledger.

Benchmark: 13,760 -> 14,144 TPS (+384 TPS, +2.8%)
Skip building LedgerTxnDelta in setEffectsDeltaFromSuccessfulTx when
INVARIANT_CHECKS is empty. The delta is consumed exclusively by
checkOnOperationApply which iterates an empty list when no invariants
are configured. This eliminates ~285ms of shared_ptr allocations and
entry copies across 4 worker threads per ledger.

Benchmark: 12,736 -> 13,760 TPS (+1,024 TPS, +8.0%)
dmkozh added 9 commits May 28, 2026 15:23
LedgerSnapshot was renamed to CheckValidLedgerViewWrapper and
ApplyLedgerStateSnapshot to ApplyLedgerView in upstream's LedgerState
refactor. Branch's parallel pre-apply paths used the old names; rename
to match. ApplyLedgerView privately inherits from ImmutableLedgerView,
so use executeWithMaybeInnerSnapshot to derive a
CheckValidLedgerViewWrapper from it for the read-only pre-apply paths.
The previous adaptation used ApplyLedgerView::executeWithMaybeInnerSnapshot
to derive a CheckValidLedgerViewWrapper, but ImmutableLedgerView (and
therefore ApplyLedgerView via using-declaration) explicitly throws on that
call. Instead, add a narrow accessor that hands out the underlying
ImmutableLedgerView and use the existing
CheckValidLedgerViewWrapper(ImmutableLedgerView const&) constructor.
The branch's parallel TxFrame creation paths only checked
XDRProvidesValidFee() but missed the getInclusionFee() <= 0 check that
upstream added in the sequential equivalents. Restore parity so
generalized tx sets with negative-fee txs are rejected during construction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants