Fix slow throughput for bulk data without CRLFs by chall37 · Pull Request #581 · gnachman/iTerm2

chall37 · 2026-02-13T00:34:57Z

Summary

This PR fixes a throughput/memory regression in LineBuffer appendLines:width: for CRLF-free streams (e.g. large base64 output or, more realistically, minified js or long base64 encoded payloads within otherwise human-readable files).

Root cause:

When every append item is partial, the one-at-a-time loop could keep consuming indefinitely (lastBlock.hasPartial stays true), never reaching the bulk initWithItems: path.
That caused very large single blocks and poor append complexity.

Fix:

Ensure partial-item appends transition to bulk block creation so continuation chains are distributed across blocks.
Make continuation-chain behavior correct across reads/removals/copy-on-write:
- continuation-aware stitched reads and metadata merge
- COW-safe continuation clearing (ModifyLineBlock)
- continuation-aware wrapped-line/cell-count/removal logic
- boundary-specific DWC handling for continuation adjustment
Replace append-time O(chain) metadata back-propagation with read-time projection from continuation successors.

Performance

On a 50MB CRLF-free base64 input (cat /tmp/iterm2_bench_50mb.txt, unlimited scrollback, Maximize Throughput):

Before: ~7m46s, very high memory growth (~40GB observed)
After: ~0.7s, memory remains bounded

Instrumentation

Added opt-in perf counters for:

LineBuffer append paths
appendGang fast/slow paths

These are gated by advanced settings:

lineBufferPerfCounters
appendGangPerfCounters

Validation

ModernTests/LineBufferTests expanded with parity/COW/DWC/regression coverage, including:

wrapped-line and cell-count parity vs monolithic reference
metadata/search/position round-trip parity across continuation boundaries
COW isolation across copy(), dropExcessLines, copyWithMinimumLines
regression test for wrapped-range off-by-one in numberOfCellsUsedInWrappedLineRange

…rrectness Large bulk data without CRLFs is split across multiple LineBlocks via a continuation chain. This commit fixes how those continuation blocks interact with line counting, removal, position conversion, search, and boundary stitching so that fragmented and monolithic buffers produce identical results.

Add testOnlyAppendPartialItems:ofLength:width:metadata:continuation: so tests can control metadata and continuation on bulk items, avoiding last-writer-wins overwriting seed values. Fix testContinuationStyleFieldsPreservedAcrossStitch: use new helper with styledCont on all items; build monolithic reference via single appendLine (independent of bulk path); assert absolute style values on line 0 continuation. Fix testMetadataPreservedAtBoundary: use new helper with metaA on all items; build monolithic reference via single appendLine; assert line 0 timestamp == 1000. Fix testCopyDoesNotRegressionToQuadratic: assert numberOfClients >= 1 on every block after copy() to prove actual COW data sharing. Fix continuation block removal to use exact cell-count removal instead of wrapped-line removal, preventing pCol drift.

The DWC guard in continuationWrappedLineAdjustmentForWidth: used the sticky buffer-level _mayHaveDoubleWidthCharacter flag, which could be true for an all-ASCII continuation block created after a DWC was appended elsewhere. This caused the method to return 0 instead of the correct adjustment (e.g. -1), leading to off-by-one wrapped line counts. Replace the global flag check with boundary-specific DWC detection: - _prefixHasDWC for the virtual prefix characters - Compare DWC-aware naive count with fast-path to detect actual DWC in the first raw line Add regression test that builds an ASCII continuation, retroactively sets the DWC flag, and verifies adjustment=-1 is preserved with full monolithic parity and readback completeness checks.

The implementation was added in 4e79c25 but the header declaration was omitted.

Stitch metadata merge: when stitchedLineFromBlockAtIndex combines tail chars from block A with head chars from block B, merge their metadata using iTermMetadataAppend (matching monolithic append semantics) instead of always using the tail's metadata. Write-path propagation: when a partial append extends the continuation- linked raw line (firstEntry) in a continuation block, walk the predecessor chain backward and propagate the metadata update so that earlier blocks' last raw line reflects the same timestamp/rtlFound as the monolithic single-block path. Gated on numRawLines == 1 to avoid corrupting unrelated lines after a hard EOL inside the block. Also fix removeLastWrappedLines continuation path to use actual wrapped line lengths instead of assuming kept lines are full width, fix nullable annotation on blockContainingLineNumber:blockIndex:, fix weak-self capture in setFirstValueWithBlock:, and remove unused stitched-EOL helper.

Add three parity tests that compare fragmented (multi-block) LineBuffer against a monolithic reference for adj==-1 continuation boundaries: - positionForCoordinate - numberOfCellsUsedInWrappedLineRange - coordinate round-trip These tests currently fail; implementation fixes to follow.

…continuation head stitching positionForCoordinate:width:offset: now detects stitch boundaries (where a continuation block prefix is not width-aligned) and remaps positions that fall in block B head contribution rather than reporting them as extending past block A tail. numberOfCellsUsedInWrappedLineRange:width: is rewritten to use per-wrapped-line summation via wrappedLineAtIndex (already stitch-aware), preserving the existing exclusive-end semantics. stitchedLineFromBlockAtIndex now allows zero-length first raw lines (e.g., immediate hard-EOL appends) so predecessor tail gets correct EOL treatment. Guards the rawHead memcpy for the zero-length case.

The stitch boundary code was deriving headUsed from block B's wrapped line 0, which can be shorter than the actual consumed head segment when continuationWrappedLineAdjustment is 0. Use the raw line length instead, and drop the unnecessary headP guard so the stitch path is taken whenever x falls within the stitched region.

The method was excluding the last line in the range (using length-1), returning 0 for single-line ranges. Fix to include all lines in [location, location+length and add a test covering various ranges. EOF )

Two new tests verify that dropping blocks from a copy (via dropExcessLines or copyWithMinimumLines) does not mutate continuation state in the original.

The per-line wrappedLineAtIndex: loop was O(lines * log(blocks)), which destroyed throughput for large buffers. Replace with block-level enumerateLinesInRange:width:block: which walks blocks directly.

Atomic counters with atexit dump for appendLines, reallyAppendLine, metadata propagation, and appendGang fast/slow paths. Tracks call counts, nanosecond timings, token/byte/item throughput.

The O(chain length) walk on every append was dominating bulk throughput. Disabled to measure impact; metadata consistency will need a cheaper approach.

Instead of walking the continuation chain on every append (O(chain length)), project metadata forward at read time by looking up the successor block metadata. Also fix clearContinuation to go through ModifyLineBlock for proper COW safety.

Perf counters for LineBuffer and append-gang are now opt-in via advanced settings (lineBufferPerfCounters, appendGangPerfCounters) instead of always-on. Zero overhead when disabled.

These local build infrastructure changes (codesign flags, parallel jobs, DEST_DIR, rustup paths) were accidentally included in an earlier commit.

Read lineBufferPerfCounters and appendGangPerfCounters once at first access instead of on every call in the hot append path.

Also cache the setting via dispatch_once and remove dead wasPartialContinuation variable and stale comments.

chall37 · 2026-02-15T11:50:16Z

This is ready for review. I should have a PR for Swift-based per-PTY dispatch sources soon. Hopefully tomorrow (which is now today), or Monday.

gnachman · 2026-02-15T20:05:06Z

This is a really scary change because it modifies a fundamental assumption about LineBlocks that has been baked in since its inception. This affects line counting, coordinate conversion, search, and COW semantics: basically everything that touches LineBuffer. That's a large surface area for bugs.

I think there are two issues this PR addresses:

Very large blocks pay quadratic cost when syncing because the same content gets copied on each sync, not only the stuff that was appended (so we copy n_1 bytes, then n_2>n_1 bytes, then n_3>n_2 bytes, etc.).
Appending is slow when you have a lot of tokens in a gang with no PR because we don't get to use the faster initWithItems: path.

The first problem is by far the worse one and there is a simpler mitigation. A LineBlock could remember that it has been appended to only (no truncation from head or tail) and on the next cowCopy, just grow the copy and memcpy what's new. We'd need to deal with metadata as well, but that is not quadratic in this case since it is necessarily just a single line.

The second issue could be addressed by the caller finding runs of partial lines and calling a new appendPartialItems: with them (the LineBlock would also need to be partial as well, of course).

chall37 · 2026-02-16T08:29:58Z

So, I did go down the append-only path first. The problem I ran in to is that append-only detaches owner/client links without cloning the buffer, so owner/client is no longer a sufficient test for “is this buffer shared.”

At that point, I saw two options: a) create a second sharing state, “shared outside COW graph,” or b) move aliasing truth into storage and treat owner/client as non-authoritative.

Option A is a lighter lift, but it changes assumptions behind existing mutation paths and COW tests, and making those changes together felt riskier than I wanted.

Option B is cleaner, but still touches all of the mutation paths, and it's a broader architectural changes than continuation blocks.

Continuation blocks do touch more code than option A, but they keep a single COW model. All testCopyOnWrite_* tests are preserved as-is, and asserting parity with monolithic vs. fragmented on wrapped line content, EOL, continuation, and metadata gave me (perhaps unwarranted) greater confidence in correctness. Append-only option A touches fewer places, but changes the detach rule in a way that feels even riskier (to me, personally, with limited exposure to copy-on-write, all of which is in usage, not design) and makes the code more brittle to boot.

Also if we (or someone) later wants to decouple storage chunks from logical lines (chunk-native COW, non-monolithic buffers, block boundaries as indexing/cache vs. ownership), continuation can be a step in that direction.

IDK how much either of these considerations matter in practice -- maybe the mutation paths are unlikely to change, ever, and maybe there's no compelling motivation to ever change the lines-as-segments storage anyway. I could create a PR for option A just for comparison. What do you think?

gnachman · 2026-02-19T01:56:41Z

I took a swing at append-only and it ended up being a tolerable level of complexity (about 300 LOC aside from tests). I landed it as commit 0ee322c. LMK what you think.

chall37 · 2026-02-20T19:21:46Z

Ahh, I see what you mean, just bypass COW. I think we can do even better then, because there's no need to copy anything.

See #586

gnachman reviewed Feb 13, 2026

View reviewed changes

Comment thread sources/LineBuffer.m Outdated

chall37 added 19 commits February 15, 2026 02:17

Fix slow throughput for bulk data without CRLFs

b89789f

Add regression test for bulk partial append block distribution

f265691

Add missing removeLastCells: declaration to LineBlock.h

bd7a497

The implementation was added in 4e79c25 but the header declaration was omitted.

Remove step-numbering from comments

7b85636

Fix off-by-one in numberOfCellsUsedInWrappedLineRange

519cd05

The method was excluding the last line in the range (using length-1), returning 0 for single-line ranges. Fix to include all lines in [location, location+length and add a test covering various ranges. EOF )

Add copy-on-write tests for continuation state across LineBuffer copies

8ea3b47

Two new tests verify that dropping blocks from a copy (via dropExcessLines or copyWithMinimumLines) does not mutate continuation state in the original.

Fix performance regression in numberOfCellsUsedInWrappedLineRange

125492e

The per-line wrappedLineAtIndex: loop was O(lines * log(blocks)), which destroyed throughput for large buffers. Replace with block-level enumerateLinesInRange:width:block: which walks blocks directly.

Add deterministic perf counters for bulk append pipeline

a1a548a

Atomic counters with atexit dump for appendLines, reallyAppendLine, metadata propagation, and appendGang fast/slow paths. Tracks call counts, nanosecond timings, token/byte/item throughput.

Disable backward metadata propagation for throughput investigation

94cd662

The O(chain length) walk on every append was dominating bulk throughput. Disabled to measure impact; metadata consistency will need a cheaper approach.

Gate perf counters behind advanced settings

7152fc5

Perf counters for LineBuffer and append-gang are now opt-in via advanced settings (lineBufferPerfCounters, appendGangPerfCounters) instead of always-on. Zero overhead when disabled.

Revert out-of-scope Makefile changes

25271e0

These local build infrastructure changes (codesign flags, parallel jobs, DEST_DIR, rustup paths) were accidentally included in an earlier commit.

chall37 force-pushed the fix/bulk-throughput-pr branch from 149a827 to 25271e0 Compare February 15, 2026 10:18

chall37 added 3 commits February 15, 2026 02:40

Remove task-numbering from comments

30d59b9

Cache perf counter settings via dispatch_once

5180bbf

Read lineBufferPerfCounters and appendGangPerfCounters once at first access instead of on every call in the hot append path.

Merge perf counter settings into single bulkAppendPerfCounters toggle

f501a4c

Also cache the setting via dispatch_once and remove dead wasPartialContinuation variable and stale comments.

gnachman closed this Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix slow throughput for bulk data without CRLFs#581

Fix slow throughput for bulk data without CRLFs#581
chall37 wants to merge 22 commits into
gnachman:masterfrom
chall37:fix/bulk-throughput-pr

chall37 commented Feb 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

chall37 commented Feb 15, 2026

Uh oh!

gnachman commented Feb 15, 2026

Uh oh!

chall37 commented Feb 16, 2026

Uh oh!

gnachman commented Feb 19, 2026

Uh oh!

chall37 commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chall37 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

Instrumentation

Validation

Uh oh!

Uh oh!

chall37 commented Feb 15, 2026

Uh oh!

gnachman commented Feb 15, 2026

Uh oh!

chall37 commented Feb 16, 2026

Uh oh!

gnachman commented Feb 19, 2026

Uh oh!

chall37 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chall37 commented Feb 13, 2026 •

edited

Loading

chall37 commented Feb 20, 2026 •

edited

Loading