Feature/async auto commit by JohannesLichtenberger · Pull Request #829 · sirixdb/sirix

JohannesLichtenberger · 2026-01-18T12:07:30Z

No description provided.

When KotlinJsonStreamingShredder (or other external code) calls setBulkInsertion(true), the nodeHashing.autoCommit flag was not being set, causing hashes to not be computed during node insertion. This resulted in nodes having hash=0, which in turn caused the serializer to not output descendantCount in metadata (since it's conditional on hash != 0). The fix updates AbstractNodeTrxImpl.setBulkInsertion() to also set autoCommit on nodeHashing based on isAutoCommitting, matching the behavior of insertSubtreeAsFirstChild() and similar methods. Also removed the now-redundant setBulkInsertion override from XmlNodeTrxImpl since the base class handles it properly.

This commit adds the foundation for asynchronous page persistence during auto-commit in bulk import scenarios. The key changes enable concurrent bulk inserts to continue while a background worker commits pages to disk. Key components: - PageReference: Added activeTilGeneration counter for O(1) TIL membership check - TransactionIntentLog: Added rotation API with generation increment and identity map - CommitSnapshot: New class to hold frozen TIL state for background commit - SnapshotCommitter: New class to execute commits using snapshot's TIL - NodeStorageEngineWriter: - Added 3-step layered lookup (active TIL -> snapshot -> disk) - Added prepareAsyncCommitSnapshot/executeCommitSnapshot flow - Added semaphore-based backpressure - Added copy-on-write for IndirectPages accessed from snapshot - Disk offset propagation via logKey->diskOffset mapping in snapshot The implementation follows TLA+ formal verification patterns and handles 9 critical concurrency issues identified during design review. Note: Full integration with AbstractNodeTrxImpl auto-commit trigger is a follow-up task. This commit provides the low-level infrastructure.

- HOTTrieWriter: Added layered lookup (active TIL -> snapshot -> disk) for async auto-commit support - HOTTrieWriter: Added CoW for HOTIndirectPages found in pending snapshot - Added 18 comprehensive integration tests covering: - TIL rotation and generation tracking - Identity map preservation - Snapshot isolation - Disk offset bounds checking - Commit state propagation - Large entry sets - Backpressure verification - RevisionRootPage deep copy - Empty snapshot handling All tests pass and verify correctness of the async auto-commit infrastructure.

Updated remaining log.get() calls to use getFromActiveOrPending() for complete async auto-commit support: - getHOTLeafPageForWrite(): layered lookup for HOT root - loadHOTPage(): layered lookup for HOT page loading - getLogRecord(): layered lookup for public TIL access - prepareRecordPageViaKeyedTrie() lambda: layered lookup for leaf refs - prepareRecordPage fetch container: layered lookup after tree traversal This ensures all page reads during bulk imports correctly check active TIL, pending snapshot, and then disk.

Added 10 new tests that specifically exercise the layered lookup paths: - testGetLogRecordUsesLayeredLookup: Verifies getLogRecord() finds pages from pending snapshot - testPrepareRecordPageAfterRotation: Tests record page preparation after TIL rotation - testLayeredLookupWithGenerationCheck: Verifies generation-based fast path - testLayeredLookupPrioritizesActiveTil: Confirms active TIL is checked first - testSnapshotLookupFallbackToLogKey: Tests logKey fallback mechanism - testDiskOffsetLazyPropagation: Verifies disk offset propagation after commit - testIndirectPageCowFromSnapshot: Tests IndirectPage CoW from snapshot - testActiveVsSnapshotGenerationIsolation: Verifies generation isolation - testGetFromActiveOrPendingReturnsNullWhenNotFound: Tests null return for unknown refs Total: 27 async auto-commit integration tests now passing.

…mmit

The pendingSnapshot was being cleared in the finally block of executeSnapshotCommit() immediately after the background commit completed. This caused a race condition where: 1. Background commit writes pages and sets keys on CLONED refs 2. pendingSnapshot = null clears the snapshot 3. Insert thread's ORIGINAL refs still have key=NULL_ID_LONG 4. Layered lookup skips snapshot (it's null) and falls to disk 5. Disk load with NULL key creates empty page = DATA LOSS The fix: - Don't clear pendingSnapshot in executeSnapshotCommit's finally block - Let the snapshot persist for lazy disk offset propagation - Clear pendingSnapshot in regular commit path (after log.clear()) - Clear pendingSnapshot in close() to release memory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…dler The HEAD handler was throwing exceptions (resulting in 500) when checking for non-existent databases or resources. Now it properly returns 404: - Check if database exists before opening it - Check if resource exists before opening session This allows clients to correctly distinguish between 'not found' (404) and 'server error' (500).

- AbstractHeadHandler: return 404 instead of 500 when nodeId doesn't exist - GetHandler: check database existence before getting database type

…mmit

Fix a bug in AbstractNodeTrxImpl.runLocked() where the lock would be held forever if the runnable threw an exception, by wrapping in try/finally. Add 50+ new tests for the async auto-commit infrastructure: - AsyncAutoCommitComprehensiveTest: 35 tests covering TIL rotation, snapshot immutability, layered lookup, backpressure, generation counters, concurrent commits, versioning strategies, and stress tests - AsyncAutoCommitChicagoTest: 15 tests using the chicago-subset dataset with parameterized auto-commit thresholds, serialization identity checks, DeweyIDs, rolling hashes, multiple revisions, and all four versioning strategies (FULL, DIFFERENTIAL, INCREMENTAL, SLIDING_SNAPSHOT) Include a formal correctness proof covering safety properties (serializability, durability, isolation), liveness properties (deadlock freedom, progress guarantees), and 6 core invariants of the async auto-commit mechanism.

…N_ASYNC Add AfterCommitState.KEEP_OPEN_ASYNC for bulk imports that use async TIL rotation without creating intermediate revisions. Only the final explicit commit() creates a visible revision. Key changes: - Add asyncIntermediateCommit() and awaitPendingAsyncCommit() to StorageEngineWriter interface and NodeStorageEngineWriter - Route KEEP_OPEN_ASYNC through async path in intermediateCommitIfRequired(), preserve sync commit("autoCommit") for KEEP_OPEN/CLOSE - Add asyncCommitInFlight volatile flag to prevent deadlocks when rollback()/close() is called without prior async commit - Fix ReferencesPage4 copy constructor: copy activeTilGeneration to cloned PageReferences (root cause of "Cannot retrieve record" errors after TIL rotation + CoW of IndirectPage) - Add flushBufferedWrites() to Writer interface for background commit support - Update correctness proof document to reflect current implementation

Replace the diagnostic stub with actual background KeyValueLeafPage writing. The background thread now serializes and writes leaf pages to disk via Writer.write() while the insert thread continues into the fresh TIL. The sync commit skips already-written pages via removeBackgroundWrittenEntries() which clears logKey on written refs. Full Chicago dataset: 310M nodes in 79s (3.9M nodes/sec).

Parameterized test that runs full Chicago import at 512K, 1M, 2M, 4M thresholds with OS page cache drop between runs for fair comparison. 1M threshold is the sweet spot (84s, 3.7M nodes/sec).

Johannes Lichtenberger and others added 15 commits January 14, 2026 19:30

Merge remote-tracking branch 'origin/main' into feature/async-auto-co…

474e3b7

…mmit

fix: return 404 for non-existent nodeId and deleted database

0ce3a41

- AbstractHeadHandler: return 404 instead of 500 when nodeId doesn't exist - GetHandler: check database existence before getting database type

fix: return 404 for DELETE on non-existent database

d7563d3

Merge remote-tracking branch 'origin/main' into feature/async-auto-co…

cb56b4c

…mmit

test: add threshold sweep benchmark with OS cache clearing

0f1b026

Parameterized test that runs full Chicago import at 512K, 1M, 2M, 4M thresholds with OS page cache drop between runs for fair comparison. 1M threshold is the sweet spot (84s, 3.7M nodes/sec).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/async auto commit#829

Feature/async auto commit#829
JohannesLichtenberger wants to merge 15 commits intomainfrom
feature/async-auto-commit

JohannesLichtenberger commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JohannesLichtenberger commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant