Skip to content

fuzz: model chanmon mempool mining#4657

Open
joostjager wants to merge 3 commits into
lightningdevkit:mainfrom
joostjager:chanmon-mempool-mining
Open

fuzz: model chanmon mempool mining#4657
joostjager wants to merge 3 commits into
lightningdevkit:mainfrom
joostjager:chanmon-mempool-mining

Conversation

@joostjager

Copy link
Copy Markdown
Contributor

This prepares chanmon_consistency for force-close fuzzing by making its chain model closer to the environment LDK sees in normal operation.

Force-close scenarios depend heavily on transaction timing: claims may be broadcast, replaced, confirmed, followed by additional claims, and later become spendable only after more blocks. The previous harness mostly folded transaction confirmation into sync-style actions, which made it harder to express those flows accurately and made future force-close coverage depend on shortcuts in the test harness.

The updated model gives the harness an explicit mempool and block-mining path. Broadcast transactions can be relayed into the modeled mempool, mined into harness blocks, and then replayed to both monitors and managers through chain callbacks. The harness also tracks confirmed UTXOs and wallet change so later splice, anchor, and claim transactions have a realistic view of what can be spent.

This should make upcoming force-close fuzzing changes easier to review: first establish a more faithful chain environment, then add the force-close-specific scenarios and invariants on top of it.

@ldk-reviews-bot

ldk-reviews-bot commented Jun 2, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @jkczyz as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@joostjager

Copy link
Copy Markdown
Contributor Author

@wpaulino FYI, this may intersect with the existing splice fuzzing failures you’re looking at.

This PR makes splice transaction mining in chanmon_consistency more realistic: negotiated splice transactions no longer get implicitly confirmed through the old sync-style path. Instead, broadcasts have to pass through the harness’s modeled mempool and are confirmed only when the fuzz input mines blocks.

@joostjager joostjager force-pushed the chanmon-mempool-mining branch from b9a40e5 to bbbd224 Compare June 3, 2026 14:59
Restore cfg(splicing) to the fuzz check-cfg allow list and gate
chanmon consistency splice opcodes on that cfg again. Without the
cfg, those inputs stop before executing splice-specific operations.
@joostjager joostjager force-pushed the chanmon-mempool-mining branch 3 times, most recently from 4e6f262 to d02194d Compare June 4, 2026 05:19
@joostjager

Copy link
Copy Markdown
Contributor Author

Fuzz failure is pre-existing

@joostjager joostjager force-pushed the chanmon-mempool-mining branch 2 times, most recently from 0ab5882 to ef0be5b Compare June 4, 2026 09:24
@joostjager joostjager marked this pull request as ready for review June 4, 2026 10:04
@joostjager joostjager removed the request for review from valentinewallace June 4, 2026 10:04
Comment thread fuzz/src/chanmon_consistency.rs
Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment on lines +1184 to +1236
// Connects this node from its tracked height to target_height, delivering
// each relevant chain callback to both ChainMonitor and ChannelManager.
fn connect_chain_range(&mut self, chain_state: &ChainState, target_height: u32) {
// Mining commands can advance the harness chain by more than one block.
// Transaction blocks must be connected explicitly so LDK learns about
// on-chain spends, while empty depth blocks still need best-block
// updates so CSV/CLTV-sensitive logic can run. This is the normal sync
// path, so both the raw ChainMonitor and the ChannelManager receive the
// callbacks and the node's tracked height advances to the target.
let mut height = self.height;
while height < target_height {
let mut next_height = height + 1;
while next_height <= target_height && chain_state.block_at(next_height).1.is_empty() {
next_height += 1;
}
if next_height > target_height {
// The rest of the range is empty. One best-block update to the
// final height is enough because LDK's Confirm API explicitly
// allows best_block_updated to skip intermediary blocks, and
// empty blocks have no transactions_confirmed calls whose chain
// order must be preserved.
height = target_height;
let (header, _) = chain_state.block_at(height);
self.monitor.best_block_updated(header, height);
self.node.best_block_updated(header, height);
break;
}
if next_height > height + 1 {
// Advance across the empty prefix before the next transaction
// block. Confirm::best_block_updated may skip intermediary
// blocks, so this compressed update still lets height-triggered
// LDK work run at the correct tip before the transaction
// confirmations are connected.
height = next_height - 1;
let (header, _) = chain_state.block_at(height);
self.monitor.best_block_updated(header, height);
self.node.best_block_updated(header, height);
}
height = next_height;
let (header, txn) = chain_state.block_at(height);
let txdata: Vec<_> = txn.iter().enumerate().map(|(i, tx)| (i + 1, tx)).collect();
if !txdata.is_empty() {
// Transaction blocks need the explicit confirmation callback
// before the best-block update so watched spends are delivered
// in chain order before the node advances to that tip.
self.monitor.transactions_confirmed(header, &txdata, height);
self.node.transactions_confirmed(header, &txdata, height);
}
self.monitor.best_block_updated(header, height);
self.node.best_block_updated(header, height);
}
self.height = target_height;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement: the old sync_with_chain_state only notified self.node (ChannelManager), never self.monitor (ChainMonitor). This means ChannelMonitors weren't receiving transactions_confirmed / best_block_updated callbacks during normal fuzz-loop sync, which would have hidden any bugs in monitor chain-tracking logic. The new connect_chain_range fixing this is an important correctness improvement.

One behavioral change worth noting: the old code called best_block_updated for every individual block height. The new code batches consecutive empty blocks into a single best_block_updated call at the last empty height before the next tx block (or at the target). This is allowed by the Confirm trait ("May be skipped for intermediary blocks"), but it means height-triggered logic inside ChainMonitor/ChannelManager runs at fewer checkpoints than before. For a fuzzer seeking maximum coverage of height-sensitive paths (e.g., HTLC timeout detection), individual block delivery would exercise more code paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also argue that skipping exercises different paths. I think in practice, we'll get a bit from both.

Comment thread fuzz/src/chanmon_consistency.rs
@ldk-claude-review-bot

ldk-claude-review-bot commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

One new issue found this pass.

New issue

  • fuzz/src/chanmon_consistency.rs:2605 — The assert!(self.mine_blocks(ANTI_REORG_DELAY) > 0, ...) in finish can panic on legitimate fuzz input. When a relayed (but unmined) mempool transaction (e.g. a splice) coexists with a pending HTLC sitting exactly one block below its fail-back deadline, safe_mine_block_count clamps the count to 0, so mine_blocks returns 0 and the assert fires even though LDK behaved correctly. This is a distinct failure mode from the previously-noted "no event processing between finish rounds" concern.

Prior issues (still applicable, not re-posted)

  • :344 — off-by-one in locktime maturity check (tip_height() vs tip_height()+1).
  • :2608/finish — no process_events between relay/mine rounds (matters once anchor fee-bumping is fuzzed).
  • :3565break 'fuzz_loop on splice commands when !cfg!(splicing) discards remaining input.
  • :1236 — empty-block batching in connect_chain_range reduces height-trigger coverage.

Note on a prior comment

My earlier claim that claimed_payment_hashes.insert(...) was removed is incorrect — it is present at :2154 and the invariant check is wired (.remove() at :3207). That prior comment should be disregarded.

Comment thread fuzz/src/chanmon_consistency.rs Outdated
@jkczyz jkczyz self-requested a review June 4, 2026 14:06
@joostjager joostjager force-pushed the chanmon-mempool-mining branch from 4036d6a to 57960f1 Compare June 4, 2026 14:42
@joostjager joostjager self-assigned this Jun 4, 2026
@joostjager

Copy link
Copy Markdown
Contributor Author

When we are good with the changes, I can tack on one more commit that moves the chain/mempool code into fuzz/src/chanmon_consistency/chain_mempool.rs. With force close fuzzing, we are moving towards a 5000 line file, so no harm in already splitting things out if we are make changes anyway.

Comment thread fuzz/src/chanmon_consistency.rs Outdated
const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6;
// Single fuzz bytes can mine more than one block so a corpus entry does not
// need long runs of identical "mine one block" commands to reach CSV or CLTV
// boundaries. Mining is clipped below if unresolved HTLCs are near expiry.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "clipped below"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That mining stops before anything would expire. Changed it to 'capped'

Comment thread fuzz/src/chanmon_consistency.rs Outdated
// A mined transaction is considered deeply confirmed after this many blocks.
// This confirms the transaction in one block and then mines five empty depth
// blocks.
const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ANTI_REORG_DELAY?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment on lines +3110 to +3114
events::Event::PaymentClaimed { payment_hash, .. } => {
if payments.payment_preimages.contains_key(&payment_hash) {
payments.claimed_payment_hashes.insert(payment_hash);
}
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (moved from line 1943) be a separate commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes, that is already living in #4635. Removed code from this PR.

Comment on lines +3381 to +3387
assert!(
current_tip < timeout_deadline,
"pending HTLC with expiry {} and timeout deadline {} is already unsafe at tip {}",
expiry,
timeout_deadline,
current_tip
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with the following input:

printf '\x20\xdf\xdf\xb0\x0e\x10\x18\x10\x18\x10\x18\x30\xd8' | ./target/release/chanmon_consistency_target

Claude's succinct summary:

The persisted view can drift arbitrarily behind chain_state.tip because Harness::mine_blocks never re-checkpoints, so after a deferred reload the node builds an HTLC against a stale view with a deadline already below the tip. Fix it by either soft-capping the past-deadline branch in safe_mine_block_count to return 0, or — better — also capping mining against the lowest persisted-manager view so a future reload-then-send can't produce an immediately-unsafe HTLC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was the missing startup-sync piece from the FC branch that I forgot to cherry-pick. Reload now catches ChainMonitor up from the oldest monitor height and ChannelManager from its own best block before returning to the fuzz loop.

The repro passes now

Comment on lines +2676 to +2679
assert!(
self.mine_blocks(DEFAULT_TX_CONFIRMATION_BLOCKS) > 0,
"finish cannot mine pending mempool transactions without crossing an unresolved HTLC timeout deadline"
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be reached using:

printf '\xc9\xa6\xff\xde\xdf' | ./target/release/chanmon_consistency_target

@joostjager joostjager Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one the the ones below contain splice opcodes. As that is currently not yet stable, I don't want to go into debugging that preferably 😬 Without the splicing cfg flag, those strings pass.

@@ -2818,13 +3128,7 @@
.funding_transaction_signed(&channel_id, &counterparty_node_id, signed_tx)
.unwrap();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now reachable:

printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target

Comment on lines 3162 to 3164
panic!(
"It may take may iterations to settle the state, but it should not take forever"
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Able to hit this now but looks like it was preexisting, too.

printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target

let is_quiescent_msg = msg.data.contains("already sent splice_locked, cannot RBF");
if !msg.data.contains("Disconnecting due to timeout awaiting response") && !is_quiescent_msg
{
panic!("Unexpected disconnect case: {}", msg.data);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be hit prior to your change.

printf '\xc7\x3a\xa2\x32\x3a\xff\xff\xa7\x35\x33\x45\xff' | ./target/release/chanmon_consistency_target

We may need to expand the allowed reason.

@joostjager joostjager force-pushed the chanmon-mempool-mining branch from 57960f1 to 771e8a5 Compare June 8, 2026 09:24
Comment thread fuzz/src/chanmon_consistency.rs Outdated
@joostjager joostjager force-pushed the chanmon-mempool-mining branch from 771e8a5 to 534bb75 Compare June 8, 2026 09:36
@joostjager

joostjager commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Review comments addressed: https://github.com/lightningdevkit/rust-lightning/compare/57960f1..3b8a78e1d9c61e184e4ccb7ea9279b9f832df580

  • Use ANTI_REORG_DELAY for setup confirmations.
  • Derive node height from ChannelManager instead of cached state.
  • Sync restarted monitors and managers from their own persisted heights.
  • Restore claimed-payment tracking for the final PaymentSent invariant.
  • Move relay/mining opcodes to start at 0xd6, leaving 0xd3..0xd5 free for fuzz: add chanmon holder signer fuzz ops #4660

Comment thread fuzz/src/chanmon_consistency.rs
@joostjager joostjager force-pushed the chanmon-mempool-mining branch 2 times, most recently from a2335d0 to 9458dc7 Compare June 8, 2026 10:02
Route chanmon broadcasts through an explicit harness mempool so relay,
mining, wallet updates, and chain delivery share one path. This lets
splice, anchor, and claim transactions enter the mempool before mining.

On restart, sync loaded monitors and managers from their own persisted
best blocks so raw monitors catch up without rewinding ChannelManager
state. Cap modeled mining before unresolved HTLC timeout deadlines
and use the LDK anti-reorg depth for setup confirmations.
Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment on lines +107 to +109
// The fuzz wallet needs enough confirmed inputs to build many splice
// transactions without accidentally exhausting wallet liquidity before the
// transaction-relay logic is what the test is really exercising.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence isn't grammatically correct.

Comment thread fuzz/src/chanmon_consistency.rs Outdated
confirmed_txids: HashSet<Txid>,
/// Unconfirmed transactions (e.g., splice txs). Conflicting RBF candidates may coexist;
/// `confirm_pending_txs` determines which one confirms.
/// Unconfirmed transactions admitted by the harness mempool. Admission keeps

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by" or "into"?

Comment thread fuzz/src/chanmon_consistency.rs Outdated
/// created by an earlier transaction in this vector.
pending_txs: Vec<(Txid, Transaction)>,
/// Tracks unspent outputs created by confirmed transactions. Admission builds
/// a temporary package view from this set plus earlier mempool transactions,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "earlier mempool transactions" here? Do you mean the state of pending_txs at time of insertion into utxos?

Comment on lines -240 to -242
/// Confirm pending transactions in a single block, selecting deterministically among
/// conflicting RBF candidates. Sorting by txid ensures the winner is determined by fuzz input
/// content. Transactions that double-spend an already-confirmed outpoint are skipped.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to retain this behavior for splicing at least. Otherwise, we won't exercise code confirming an RBF that isn't the most recent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which candidate confirms is still fuzz-chosen, just via relay order instead of txid sort: admission keeps whichever conflicting tx was relayed last (no fee-rate policy), and mining confirms whatever is in the mempool at that point. So an input can confirm a non-latest candidate either by mining before the replacement is relayed, or by relaying an older copy from another node's queue after the newer one.

Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment on lines +382 to +384
// that conflicting transaction itself signals RBF. The harness does not
// model fee-rate policy, so fuzz-controlled relay order chooses between
// otherwise valid RBF candidates.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is meant by "fuzz-controlled relay order chooses between otherwise valid RBF candidates". How can an "order" choose? What criteria is it using for choosing? Is saying "does not model fee-rate policy" supposed to indicate that the fee rate isn't considered during replacement?

Comment thread fuzz/src/chanmon_consistency.rs Outdated
// need this to keep double-spends and unknown-prevout spends from producing
// impossible on-chain state.
fn has_invalid_inputs(tx: &Transaction, utxos: &HashSet<BitcoinOutPoint>) -> bool {
// The tiny UTXO set protects the chain model from two false positives:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "The tiny UTXO set"?

Comment thread fuzz/src/chanmon_consistency.rs Outdated
Comment on lines +265 to +268
// Coinbase transactions have a null input, and synthetic funding
// transactions have no inputs, but neither consumes a modeled UTXO.
// Normal transactions consume their inputs before exposing outputs to
// later transactions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I find the AI-generated comments like this and others jarring to read. I need to parse it multiple times to determine what it's trying to say and why it's worth saying. i.e. Why a "modeled" UTXO and not just a UTXO? What value does the last sentence provide?

@joostjager

Copy link
Copy Markdown
Contributor Author

Ouch, so much AI weirdness. Been experimenting with this extensive comment style. It is extra info, but it also needs to be right, which increases the review burden for author and reviewer to get there. Been looking over all kinds of change sets so often, and I did think I reviewed this one thoroughly, but clearly not.

I found the insights obtained via AI-added comments quite useful, to avoid asking it to explain things repeatedly, but perhaps it is better reserved for the dev stage and removed before review.

Will address this and clean up.

Clean up review-comment wording around broadcaster handling and
startup setup while simplifying block range connection.

The connector now skips intermediary empty blocks before transaction
blocks, relying on Confirm's allowance for skipped best-block updates.
return;
}
assert!(
self.mine_blocks(ANTI_REORG_DELAY) > 0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert can fire on legitimate fuzz input, producing a false-positive crash. mine_blocks clamps count via safe_mine_block_count, which returns 0 when the chain tip is one block below an HTLC's fail-back deadline (tip == expiry - HTLC_FAIL_BACK_BUFFER - 1).

A reachable sequence (under cfg(splicing)): mine close to an HTLC's deadline (clamped to deadline - 1), then splice_in (broadcasts the splice), then relay it into the mempool with 0xd6/0xd7/0xd8, then end the input. At finish, pending_txs is non-empty but safe_mine_block_count returns 0, so mine_blocks returns 0 and this assert panics — even though nothing went wrong in LDK.

Because relay does not advance the tip and mining always confirms the whole mempool, this is the one window where a mempool tx can coexist with a near-deadline HTLC at end of input. Consider draining/dropping the remaining mempool here, or resolving the HTLC, rather than asserting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants