Skip to content

fix(vara.eth/mempool): seed head + grace-window unresolved refs#5503

Draft
grishasobol wants to merge 1 commit into
masterfrom
gsobol/ethexe/fix-mempool-cold-start-tolerance
Draft

fix(vara.eth/mempool): seed head + grace-window unresolved refs#5503
grishasobol wants to merge 1 commit into
masterfrom
gsobol/ethexe/fix-mempool-cold-start-tolerance

Conversation

@grishasobol
Copy link
Copy Markdown
Member

Summary

Closes three related bugs in InjectedTxMempool's latest_head_height / purge_expired lifecycle (surfaced by the corner-case hunt — see PR #5491, log entries iter #11, #4, #7).

Cold-start expiry bypass (iter #11)

insert's is_expired check was guarded on latest_head_height.is_some():

if let Some(ref_height) = ref_height_opt
    && let Some(head_height) = inner.latest_head_height
    && Self::is_expired(head_height, ref_height) { ... }

Between process boot and the observer's first set_chain_head tick, latest_head_height is None, so the entire chain short-circuits and expired txs slip through. RPC returns Accept; the very next chain-head advance silently purges the tx. Users see flaky behaviour every restart.

Fix: seed latest_head_height from db.globals().latest_synced_eb.header.height in with_capacity. The DB's last-synced EB is a sound proxy for the chain head while the observer catches up.

Insert→purge race on a lagging observer (iter #4)

insert is intentionally tolerant of ref_blocks that haven't yet replicated to this validator's DB (the producer's EB lags the observer by O(seconds) in normal operation):

// ref_block resolution is best-effort: a recipient that hasn't yet
// observed the producer's reference Eth block accepts and filters
// at fetch time once the block lands locally. Only reject when
// the ref_block is known AND already past the validity window.

But purge_expired was intolerant — it evicted unknown ref_blocks on the very next chain-head advance via the _ => false arm. So the local RPC's Accept was immediately orphaned.

Forget→purge dedup bypass (iter #7)

Same shape on the seen table: forget() stamps every committed tx into seen. When the committed tx's ref_block hadn't replicated to this node's DB, the next purge_expired evicted the seen entry — letting the same network-committed tx re-enter the local pool. Inflates pool occupancy with already-committed work.

Fix for #4 and #7: grace-window

Naive "always keep unknown ref_block" breaks the existing pool_retains_unresolved_ref_block_indefinitely invariant (a stream of bogus ref_block txs would permanently exhaust capacity).

Real semantics: keep unresolved-ref_block entries for a bounded grace window, evict afterwards:

// Pool: keep while `head_height - inserted_at_head_height < VALIDITY_WINDOW`.
// Seen: same shape on `seen_at_head_height`.
fn grace_expired(head_height: u32, inserted_at_head_height: u32) -> bool {
    head_height.saturating_sub(inserted_at_head_height) >= VALIDITY_WINDOW as u32
}

Pool entries gain inserted_at_head_height; seen entries gain seen_at_head_height. Both satisfied: insert→purge race tolerated for a window long enough for any sane observer lag; bounded back-pressure for txs whose ref_block never lands.

Test plan

  • Three new regression tests, no #[ignore]:
    • cold_start_insert_rejects_expired_ref_block_using_latest_synced_eb
    • purge_expired_keeps_tx_with_unresolved_ref_block_within_grace
    • forget_then_purge_keeps_seen_for_unresolved_ref_block_within_grace
  • Pre-existing pool_retains_unresolved_ref_block_indefinitely continues to pass (proves the grace expiry actually fires past VALIDITY_WINDOW).
  • All 18 mempool tests + entire ethexe-malachite test suite green (59/59 passing).
  • cargo fmt --check -p ethexe-malachite clean.
  • cargo clippy -p ethexe-malachite --tests clean.

Out of scope

  • A side benefit of the grace-window: even when the ref_block IS in the DB, an arbitrary forever-stuck tx (e.g., ref_block on an alt branch never wins canonical) now has an upper bound on its pool lifetime equal to VALIDITY_WINDOW. Existing fetch ancestor-filter still keeps it from being included while unwinnable.
  • Per-sender quota for capacity exhaustion still tracked by ethexe-malachite: mempool has no per-signer quota, one signer can fill the pool #5474 (independent).

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses three critical bugs in the InjectedTxMempool lifecycle related to transaction expiry and deduplication. By implementing a grace window for unresolved reference blocks and properly seeding the initial chain head height, the changes prevent flaky behavior where transactions are prematurely purged or incorrectly accepted during node startup and observer lag. These improvements ensure that the mempool maintains strict validity invariants while remaining tolerant of expected network and database synchronization delays.

Highlights

  • Cold-start expiry fix: Seeded the mempool's latest head height from the database's last-synced block header during initialization to ensure expiry checks are active immediately upon process boot.
  • Grace-window implementation: Introduced a grace window for transactions with unresolved reference blocks, preventing premature eviction during observer lag and ensuring consistent deduplication.
  • Regression testing: Added three new regression tests to verify cold-start behavior, grace-window retention for pool entries, and deduplication stability for committed transactions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a grace window for mempool and 'seen' entries to handle scenarios where a transaction's reference block has not yet been replicated to the local database. It adds PoolEntry and SeenEntry structs to track the head height at the time of insertion or commitment and seeds the initial head height from the database to ensure the expiry logic is active during cold starts. Comprehensive regression tests were added to verify these changes. I have no feedback to provide.

@grishasobol grishasobol marked this pull request as draft May 22, 2026 15:13
Base automatically changed from gsobol/ethexe/malachite-new to master May 25, 2026 16:59
Three related bugs in the mempool's `latest_head_height` /
`purge_expired` lifecycle (corner-case hunt #11, #4, #7):

- Cold-start: `is_expired` in `insert` was guarded on
  `latest_head_height.is_some()` — skipped between process boot and
  the observer's first `set_chain_head` tick. RPC returned `Accept`
  for arbitrarily-old txs that the very next chain-head advance
  silently purged. Fix: seed `latest_head_height` from
  `db.globals().latest_synced_eb` at construction so the gate is
  active from boot.

- Insert→purge race: `insert` tolerated not-yet-replicated ref_blocks
  (the producer's EB lags the observer by O(seconds)) but
  `purge_expired` evicted them on the very next chain-head advance.
  The local RPC's `Accept` was orphaned.

- Forget→purge dedup bypass: when a committed tx's ref_block hadn't
  replicated to this node's DB, `purge_expired`'s seen-retain loop
  evicted the seen entry — letting the same network-committed tx
  re-enter the local pool.

Fix for the latter two: keep entries with unresolved ref_block within
a `VALIDITY_WINDOW`-block grace period, evicting only once that grace
expires. Pool entries carry `inserted_at_head_height`; seen entries
carry `seen_at_head_height`. Satisfies both "tolerate observer lag"
and the existing `pool_retains_unresolved_ref_block_indefinitely`
invariant on bounded capacity exhaustion.

Three regression tests added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@grishasobol grishasobol force-pushed the gsobol/ethexe/fix-mempool-cold-start-tolerance branch from 8f805f0 to e82387d Compare May 25, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant