Fix slow throughput for bulk data without CRLFs#581
Conversation
…rrectness Large bulk data without CRLFs is split across multiple LineBlocks via a continuation chain. This commit fixes how those continuation blocks interact with line counting, removal, position conversion, search, and boundary stitching so that fragmented and monolithic buffers produce identical results.
Add testOnlyAppendPartialItems:ofLength:width:metadata:continuation: so tests can control metadata and continuation on bulk items, avoiding last-writer-wins overwriting seed values. Fix testContinuationStyleFieldsPreservedAcrossStitch: use new helper with styledCont on all items; build monolithic reference via single appendLine (independent of bulk path); assert absolute style values on line 0 continuation. Fix testMetadataPreservedAtBoundary: use new helper with metaA on all items; build monolithic reference via single appendLine; assert line 0 timestamp == 1000. Fix testCopyDoesNotRegressionToQuadratic: assert numberOfClients >= 1 on every block after copy() to prove actual COW data sharing. Fix continuation block removal to use exact cell-count removal instead of wrapped-line removal, preventing pCol drift.
The DWC guard in continuationWrappedLineAdjustmentForWidth: used the sticky buffer-level _mayHaveDoubleWidthCharacter flag, which could be true for an all-ASCII continuation block created after a DWC was appended elsewhere. This caused the method to return 0 instead of the correct adjustment (e.g. -1), leading to off-by-one wrapped line counts. Replace the global flag check with boundary-specific DWC detection: - _prefixHasDWC for the virtual prefix characters - Compare DWC-aware naive count with fast-path to detect actual DWC in the first raw line Add regression test that builds an ASCII continuation, retroactively sets the DWC flag, and verifies adjustment=-1 is preserved with full monolithic parity and readback completeness checks.
The implementation was added in 4e79c25 but the header declaration was omitted.
Stitch metadata merge: when stitchedLineFromBlockAtIndex combines tail chars from block A with head chars from block B, merge their metadata using iTermMetadataAppend (matching monolithic append semantics) instead of always using the tail's metadata. Write-path propagation: when a partial append extends the continuation- linked raw line (firstEntry) in a continuation block, walk the predecessor chain backward and propagate the metadata update so that earlier blocks' last raw line reflects the same timestamp/rtlFound as the monolithic single-block path. Gated on numRawLines == 1 to avoid corrupting unrelated lines after a hard EOL inside the block. Also fix removeLastWrappedLines continuation path to use actual wrapped line lengths instead of assuming kept lines are full width, fix nullable annotation on blockContainingLineNumber:blockIndex:, fix weak-self capture in setFirstValueWithBlock:, and remove unused stitched-EOL helper.
Add three parity tests that compare fragmented (multi-block) LineBuffer against a monolithic reference for adj==-1 continuation boundaries: - positionForCoordinate - numberOfCellsUsedInWrappedLineRange - coordinate round-trip These tests currently fail; implementation fixes to follow.
…continuation head stitching positionForCoordinate:width:offset: now detects stitch boundaries (where a continuation block prefix is not width-aligned) and remaps positions that fall in block B head contribution rather than reporting them as extending past block A tail. numberOfCellsUsedInWrappedLineRange:width: is rewritten to use per-wrapped-line summation via wrappedLineAtIndex (already stitch-aware), preserving the existing exclusive-end semantics. stitchedLineFromBlockAtIndex now allows zero-length first raw lines (e.g., immediate hard-EOL appends) so predecessor tail gets correct EOL treatment. Guards the rawHead memcpy for the zero-length case.
The stitch boundary code was deriving headUsed from block B's wrapped line 0, which can be shorter than the actual consumed head segment when continuationWrappedLineAdjustment is 0. Use the raw line length instead, and drop the unnecessary headP guard so the stitch path is taken whenever x falls within the stitched region.
The method was excluding the last line in the range (using length-1), returning 0 for single-line ranges. Fix to include all lines in [location, location+length and add a test covering various ranges. EOF )
Two new tests verify that dropping blocks from a copy (via dropExcessLines or copyWithMinimumLines) does not mutate continuation state in the original.
The per-line wrappedLineAtIndex: loop was O(lines * log(blocks)), which destroyed throughput for large buffers. Replace with block-level enumerateLinesInRange:width:block: which walks blocks directly.
Atomic counters with atexit dump for appendLines, reallyAppendLine, metadata propagation, and appendGang fast/slow paths. Tracks call counts, nanosecond timings, token/byte/item throughput.
The O(chain length) walk on every append was dominating bulk throughput. Disabled to measure impact; metadata consistency will need a cheaper approach.
Instead of walking the continuation chain on every append (O(chain length)), project metadata forward at read time by looking up the successor block metadata. Also fix clearContinuation to go through ModifyLineBlock for proper COW safety.
Perf counters for LineBuffer and append-gang are now opt-in via advanced settings (lineBufferPerfCounters, appendGangPerfCounters) instead of always-on. Zero overhead when disabled.
These local build infrastructure changes (codesign flags, parallel jobs, DEST_DIR, rustup paths) were accidentally included in an earlier commit.
149a827 to
25271e0
Compare
Read lineBufferPerfCounters and appendGangPerfCounters once at first access instead of on every call in the hot append path.
Also cache the setting via dispatch_once and remove dead wasPartialContinuation variable and stale comments.
|
This is ready for review. I should have a PR for Swift-based per-PTY dispatch sources soon. Hopefully tomorrow (which is now today), or Monday. |
|
This is a really scary change because it modifies a fundamental assumption about LineBlocks that has been baked in since its inception. This affects line counting, coordinate conversion, search, and COW semantics: basically everything that touches LineBuffer. That's a large surface area for bugs. I think there are two issues this PR addresses:
The first problem is by far the worse one and there is a simpler mitigation. A LineBlock could remember that it has been appended to only (no truncation from head or tail) and on the next cowCopy, just grow the copy and memcpy what's new. We'd need to deal with metadata as well, but that is not quadratic in this case since it is necessarily just a single line. The second issue could be addressed by the caller finding runs of partial lines and calling a new |
|
So, I did go down the append-only path first. The problem I ran in to is that append-only detaches owner/client links without cloning the buffer, so owner/client is no longer a sufficient test for “is this buffer shared.” At that point, I saw two options: a) create a second sharing state, “shared outside COW graph,” or b) move aliasing truth into storage and treat owner/client as non-authoritative. Option A is a lighter lift, but it changes assumptions behind existing mutation paths and COW tests, and making those changes together felt riskier than I wanted. Option B is cleaner, but still touches all of the mutation paths, and it's a broader architectural changes than continuation blocks. Continuation blocks do touch more code than option A, but they keep a single COW model. All testCopyOnWrite_* tests are preserved as-is, and asserting parity with monolithic vs. fragmented on wrapped line content, EOL, continuation, and metadata gave me (perhaps unwarranted) greater confidence in correctness. Append-only option A touches fewer places, but changes the detach rule in a way that feels even riskier (to me, personally, with limited exposure to copy-on-write, all of which is in usage, not design) and makes the code more brittle to boot. Also if we (or someone) later wants to decouple storage chunks from logical lines (chunk-native COW, non-monolithic buffers, block boundaries as indexing/cache vs. ownership), continuation can be a step in that direction. IDK how much either of these considerations matter in practice -- maybe the mutation paths are unlikely to change, ever, and maybe there's no compelling motivation to ever change the lines-as-segments storage anyway. I could create a PR for option A just for comparison. What do you think? |
|
I took a swing at append-only and it ended up being a tolerable level of complexity (about 300 LOC aside from tests). I landed it as commit 0ee322c. LMK what you think. |
|
Ahh, I see what you mean, just bypass COW. I think we can do even better then, because there's no need to copy anything. See #586 |
Summary
This PR fixes a throughput/memory regression in
LineBuffer appendLines:width:for CRLF-free streams (e.g. large base64 output or, more realistically, minified js or long base64 encoded payloads within otherwise human-readable files).Root cause:
lastBlock.hasPartialstays true), never reaching the bulkinitWithItems:path.Fix:
ModifyLineBlock)Performance
On a 50MB CRLF-free base64 input (
cat /tmp/iterm2_bench_50mb.txt, unlimited scrollback, Maximize Throughput):Instrumentation
Added opt-in perf counters for:
LineBufferappend pathsThese are gated by advanced settings:
lineBufferPerfCountersappendGangPerfCountersValidation
ModernTests/LineBufferTestsexpanded with parity/COW/DWC/regression coverage, including:copy(),dropExcessLines,copyWithMinimumLinesnumberOfCellsUsedInWrappedLineRange