Skip to content

Fix slow throughput for bulk data without CRLFs#581

Closed
chall37 wants to merge 22 commits into
gnachman:masterfrom
chall37:fix/bulk-throughput-pr
Closed

Fix slow throughput for bulk data without CRLFs#581
chall37 wants to merge 22 commits into
gnachman:masterfrom
chall37:fix/bulk-throughput-pr

Conversation

@chall37
Copy link
Copy Markdown
Contributor

@chall37 chall37 commented Feb 13, 2026

Summary

This PR fixes a throughput/memory regression in LineBuffer appendLines:width: for CRLF-free streams (e.g. large base64 output or, more realistically, minified js or long base64 encoded payloads within otherwise human-readable files).

Root cause:

  • When every append item is partial, the one-at-a-time loop could keep consuming indefinitely (lastBlock.hasPartial stays true), never reaching the bulk initWithItems: path.
  • That caused very large single blocks and poor append complexity.

Fix:

  • Ensure partial-item appends transition to bulk block creation so continuation chains are distributed across blocks.
  • Make continuation-chain behavior correct across reads/removals/copy-on-write:
    • continuation-aware stitched reads and metadata merge
    • COW-safe continuation clearing (ModifyLineBlock)
    • continuation-aware wrapped-line/cell-count/removal logic
    • boundary-specific DWC handling for continuation adjustment
  • Replace append-time O(chain) metadata back-propagation with read-time projection from continuation successors.

Performance

On a 50MB CRLF-free base64 input (cat /tmp/iterm2_bench_50mb.txt, unlimited scrollback, Maximize Throughput):

  • Before: ~7m46s, very high memory growth (~40GB observed)
  • After: ~0.7s, memory remains bounded

Instrumentation

Added opt-in perf counters for:

  • LineBuffer append paths
  • appendGang fast/slow paths

These are gated by advanced settings:

  • lineBufferPerfCounters
  • appendGangPerfCounters

Validation

ModernTests/LineBufferTests expanded with parity/COW/DWC/regression coverage, including:

  • wrapped-line and cell-count parity vs monolithic reference
  • metadata/search/position round-trip parity across continuation boundaries
  • COW isolation across copy(), dropExcessLines, copyWithMinimumLines
  • regression test for wrapped-range off-by-one in numberOfCellsUsedInWrappedLineRange

Comment thread sources/LineBuffer.m Outdated
…rrectness

Large bulk data without CRLFs is split across multiple LineBlocks via
a continuation chain. This commit fixes how those continuation blocks
interact with line counting, removal, position conversion, search,
and boundary stitching so that fragmented and monolithic buffers
produce identical results.
Add testOnlyAppendPartialItems:ofLength:width:metadata:continuation:
so tests can control metadata and continuation on bulk items, avoiding
last-writer-wins overwriting seed values.

Fix testContinuationStyleFieldsPreservedAcrossStitch: use new helper
with styledCont on all items; build monolithic reference via single
appendLine (independent of bulk path); assert absolute style values
on line 0 continuation.

Fix testMetadataPreservedAtBoundary: use new helper with metaA on all
items; build monolithic reference via single appendLine; assert
line 0 timestamp == 1000.

Fix testCopyDoesNotRegressionToQuadratic: assert numberOfClients >= 1
on every block after copy() to prove actual COW data sharing.

Fix continuation block removal to use exact cell-count removal
instead of wrapped-line removal, preventing pCol drift.
The DWC guard in continuationWrappedLineAdjustmentForWidth: used the
sticky buffer-level _mayHaveDoubleWidthCharacter flag, which could be
true for an all-ASCII continuation block created after a DWC was
appended elsewhere. This caused the method to return 0 instead of the
correct adjustment (e.g. -1), leading to off-by-one wrapped line counts.

Replace the global flag check with boundary-specific DWC detection:
- _prefixHasDWC for the virtual prefix characters
- Compare DWC-aware naive count with fast-path to detect actual DWC
  in the first raw line

Add regression test that builds an ASCII continuation, retroactively
sets the DWC flag, and verifies adjustment=-1 is preserved with full
monolithic parity and readback completeness checks.
The implementation was added in 4e79c25 but the header declaration
was omitted.
Stitch metadata merge: when stitchedLineFromBlockAtIndex combines tail
chars from block A with head chars from block B, merge their metadata
using iTermMetadataAppend (matching monolithic append semantics) instead
of always using the tail's metadata.

Write-path propagation: when a partial append extends the continuation-
linked raw line (firstEntry) in a continuation block, walk the
predecessor chain backward and propagate the metadata update so that
earlier blocks' last raw line reflects the same timestamp/rtlFound as
the monolithic single-block path. Gated on numRawLines == 1 to avoid
corrupting unrelated lines after a hard EOL inside the block.

Also fix removeLastWrappedLines continuation path to use actual wrapped
line lengths instead of assuming kept lines are full width, fix nullable
annotation on blockContainingLineNumber:blockIndex:, fix weak-self
capture in setFirstValueWithBlock:, and remove unused stitched-EOL
helper.
Add three parity tests that compare fragmented (multi-block) LineBuffer
against a monolithic reference for adj==-1 continuation boundaries:
- positionForCoordinate
- numberOfCellsUsedInWrappedLineRange
- coordinate round-trip

These tests currently fail; implementation fixes to follow.
…continuation head stitching

positionForCoordinate:width:offset: now detects stitch boundaries
(where a continuation block prefix is not width-aligned) and remaps
positions that fall in block B head contribution rather than reporting
them as extending past block A tail.

numberOfCellsUsedInWrappedLineRange:width: is rewritten to use
per-wrapped-line summation via wrappedLineAtIndex (already
stitch-aware), preserving the existing exclusive-end semantics.

stitchedLineFromBlockAtIndex now allows zero-length first raw lines
(e.g., immediate hard-EOL appends) so predecessor tail gets correct
EOL treatment. Guards the rawHead memcpy for the zero-length case.
The stitch boundary code was deriving headUsed from block B's wrapped
line 0, which can be shorter than the actual consumed head segment when
continuationWrappedLineAdjustment is 0. Use the raw line length instead,
and drop the unnecessary headP guard so the stitch path is taken whenever
x falls within the stitched region.
The method was excluding the last line in the range (using length-1),
returning 0 for single-line ranges. Fix to include all lines in
[location, location+length and add a test covering various ranges.
EOF
)
Two new tests verify that dropping blocks from a copy (via dropExcessLines
or copyWithMinimumLines) does not mutate continuation state in the original.
The per-line wrappedLineAtIndex: loop was O(lines * log(blocks)), which
destroyed throughput for large buffers. Replace with block-level
enumerateLinesInRange:width:block: which walks blocks directly.
Atomic counters with atexit dump for appendLines, reallyAppendLine,
metadata propagation, and appendGang fast/slow paths. Tracks call
counts, nanosecond timings, token/byte/item throughput.
The O(chain length) walk on every append was dominating bulk throughput.
Disabled to measure impact; metadata consistency will need a cheaper
approach.
Instead of walking the continuation chain on every append (O(chain length)),
project metadata forward at read time by looking up the successor block
metadata. Also fix clearContinuation to go through ModifyLineBlock for
proper COW safety.
Perf counters for LineBuffer and append-gang are now opt-in via
advanced settings (lineBufferPerfCounters, appendGangPerfCounters)
instead of always-on. Zero overhead when disabled.
These local build infrastructure changes (codesign flags, parallel
jobs, DEST_DIR, rustup paths) were accidentally included in an
earlier commit.
@chall37 chall37 force-pushed the fix/bulk-throughput-pr branch from 149a827 to 25271e0 Compare February 15, 2026 10:18
Read lineBufferPerfCounters and appendGangPerfCounters once at first
access instead of on every call in the hot append path.
Also cache the setting via dispatch_once and remove dead
wasPartialContinuation variable and stale comments.
@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Feb 15, 2026

This is ready for review. I should have a PR for Swift-based per-PTY dispatch sources soon. Hopefully tomorrow (which is now today), or Monday.

@gnachman
Copy link
Copy Markdown
Owner

This is a really scary change because it modifies a fundamental assumption about LineBlocks that has been baked in since its inception. This affects line counting, coordinate conversion, search, and COW semantics: basically everything that touches LineBuffer. That's a large surface area for bugs.

I think there are two issues this PR addresses:

  1. Very large blocks pay quadratic cost when syncing because the same content gets copied on each sync, not only the stuff that was appended (so we copy n_1 bytes, then n_2>n_1 bytes, then n_3>n_2 bytes, etc.).
  2. Appending is slow when you have a lot of tokens in a gang with no PR because we don't get to use the faster initWithItems: path.

The first problem is by far the worse one and there is a simpler mitigation. A LineBlock could remember that it has been appended to only (no truncation from head or tail) and on the next cowCopy, just grow the copy and memcpy what's new. We'd need to deal with metadata as well, but that is not quadratic in this case since it is necessarily just a single line.

The second issue could be addressed by the caller finding runs of partial lines and calling a new appendPartialItems: with them (the LineBlock would also need to be partial as well, of course).

@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Feb 16, 2026

So, I did go down the append-only path first. The problem I ran in to is that append-only detaches owner/client links without cloning the buffer, so owner/client is no longer a sufficient test for “is this buffer shared.”

At that point, I saw two options: a) create a second sharing state, “shared outside COW graph,” or b) move aliasing truth into storage and treat owner/client as non-authoritative.

Option A is a lighter lift, but it changes assumptions behind existing mutation paths and COW tests, and making those changes together felt riskier than I wanted.

Option B is cleaner, but still touches all of the mutation paths, and it's a broader architectural changes than continuation blocks.

Continuation blocks do touch more code than option A, but they keep a single COW model. All testCopyOnWrite_* tests are preserved as-is, and asserting parity with monolithic vs. fragmented on wrapped line content, EOL, continuation, and metadata gave me (perhaps unwarranted) greater confidence in correctness. Append-only option A touches fewer places, but changes the detach rule in a way that feels even riskier (to me, personally, with limited exposure to copy-on-write, all of which is in usage, not design) and makes the code more brittle to boot.

Also if we (or someone) later wants to decouple storage chunks from logical lines (chunk-native COW, non-monolithic buffers, block boundaries as indexing/cache vs. ownership), continuation can be a step in that direction.

IDK how much either of these considerations matter in practice -- maybe the mutation paths are unlikely to change, ever, and maybe there's no compelling motivation to ever change the lines-as-segments storage anyway. I could create a PR for option A just for comparison. What do you think?

@gnachman
Copy link
Copy Markdown
Owner

I took a swing at append-only and it ended up being a tolerable level of complexity (about 300 LOC aside from tests). I landed it as commit 0ee322c. LMK what you think.

@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Feb 20, 2026

Ahh, I see what you mean, just bypass COW. I think we can do even better then, because there's no need to copy anything.

See #586

@gnachman gnachman closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants