Zero-copy sync for single partial line blocks#586
Conversation
Replace incremental merge (O(delta) memcpy per sync) with zero-copy merge (O(1) metadata-only update). When catting a large file with no newlines, the progenitor and copy now share the same iTermCharacterBuffer permanently. On each sync, only the CLL and metadata are updated — no character data is ever copied. Key changes: - Add _zeroCopyShared flag to LineBlock; cowCopy enables it for single-partial-line blocks - validMutationCertificate skips buffer clone when _zeroCopyShared - Six mutation guards clone the buffer and clear the flag on non-append mutations - canZeroCopyMergeFromProgenitor checks buffer identity, relocation, append-only status, and COW ownership state - zeroCopyMergeFromProgenitor updates CLL + metadata in O(1) - Resize-in-place fast path moved from LineBlock to LineBuffer to preserve append-only eligibility - iTermCharacterBuffer tracks realloc relocation via wasRelocated
|
Actually, I think we can go even further. The only mutations that actually need buffer isolation are the ones that change the logical view destructively (dropLines, removeLastRawLine, popLastLine), and even those don't modify buffer bytes, just CLL/metadata. So the split could be:
That would keep zero-copy alive through line breaks, multi-line output, anything that's fundamentally appending. Which is the overwhelmingly common case during normal terminal output, not just the pathological 50MB-no-newline scenario. But I think that scope should be its own PR (assuming this one is compelling). Edit: On second thought, I think actually modifying the existing COW to be append-aware is simpler and avoids all of the complexity of bypassing legacy cow to basically do parallel cow under a different name. |
|
If you really want to take it to the limits, you don't need to copy the character buffer until characters would be overwritten. That means dropping from the start of a block is always safe (it just updates offsets, doesn't touch actual characters). Popping from the end is safe until a subsequent append. But, yes please in another PR :) |
| - (void)changeBufferSize:(int)capacity cert:(id<iTermLineBlockMutationCertificate>)cert { | ||
| ITAssertWithMessage(capacity >= [self rawSpaceUsed], @"Truncating used space"); | ||
| capacity = MAX(1, capacity); | ||
| [cert setRawBufferCapacity:capacity]; |
There was a problem hiding this comment.
Related to my comment about wasRelocated, this is only safe if we've already cloned the character buffer.
There was a problem hiding this comment.
Addressed as a consequence of removing wasRelocated.
| if (partial == is_partial) { | ||
| return; | ||
| } | ||
| if (_zeroCopyShared) { |
There was a problem hiding this comment.
Factor out this if statement, which is repeated in many places.
There was a problem hiding this comment.
factored into -cloneCharacterBufferIfZeroCopyShared
| // take gLineBlockMutex. The mutex protects the COW ownership graph | ||
| // (cowCopy/validMutationCertificate on the mutation thread); the merge | ||
| // path is serialized with mutations at a higher level (screen lock). | ||
| ITAssertWithMessage(NSThread.isMainThread, @"Zero-copy merge must run on the main thread"); |
There was a problem hiding this comment.
I think we should take the lock. I don't believe there is much lock contention (since lock-holding mutations would happen concurrentlyvery rarely while in this code path). Not assuming this runs on the main thread frees us up to use LineBlock across threads in other contexts without limitation.
There was a problem hiding this comment.
Done — canZeroCopyMergeFromProgenitor acquires gLineBlockMutex at entry
| screen_char_t *oldBuffer = _buffer; | ||
| _buffer = iTermRealloc(_buffer, newSize, sizeof(screen_char_t)); | ||
| if (_buffer != oldBuffer) { | ||
| _wasRelocated = YES; |
There was a problem hiding this comment.
I don't think wasRelocated can be done correctly. It's never safe to resize a shared character buffer. You can't guarantee that the other reference isn't in use (for example, search runs concurrenctly in its own thread). The only way to safely realloc is to first make a copy of the character buffer so you know you're only copying your own instance and no other LineBlock could be affected by it.
There was a problem hiding this comment.
Removed wasRelocated. The approach is now clone-before-resize in changeBufferSize:cert:
| if (!partial) { | ||
| if (_zeroCopyShared) { | ||
| _characterBuffer = [_characterBuffer clone]; | ||
| _zeroCopyShared = NO; |
There was a problem hiding this comment.
It's odd that zeroCopyShared remains true in one line block after a copy clones the character buffer and sets _zeroCopyShared=NO. That leads to unnecessary copies:
a = LineBlock()
a.append("partial line")
b = a.cowCopy()
// a and b both have _zeroCopyShared=YES
b.pop() // b copies characterBuffer, resets _zeroCopyShared to NO
a.pop() // a copies characterBuffer, resets _zeroCopyShared to NO
Instead of holding the zeroCopyShared flag, it might make more sense for _characterBuffer to carry a reference count. Prior to making a destructive change, if the refcount > 1, clone it (and all of that should happen atomically).
There was a problem hiding this comment.
_zeroCopyShared boolean is completely gone, iTermCharacterBuffer now carries _shareCount and isShared checks _shareCount > 1.
Replace the _zeroCopyShared boolean flag with a reference count in iTermCharacterBuffer. With a boolean, both sides of a shared pair would clone the buffer even when only one needed to; with a ref count, once one side clones and decrements to 1, the other side sees exclusive ownership and skips the clone. cloneCharacterBufferIfZeroCopyShared gates on _shareCount > 1. All mutation sites that must not write to a shared buffer call this before writing. Resize goes through the same guard (clone-before-realloc) to prevent use-after-free for concurrent readers holding raw pointers. Add LineBlockZeroCopyMergeTests (56 tests) covering: eligibility conditions for zero-copy merge, merge correctness (CLL, metadata, content, EOL, cache invalidation), shareCount invariants across all mutation paths, multi-cycle and stress tests, and fallback to standard merge on non-append mutations.
Evolves the incremental merge from 0ee322c into zero-copy sync for the eligible single-partial-line case. Instead of memcpy'ing deltas on each sync, progenitor and copy share the character buffer across sync cycles; sync updates only CLL and metadata.
The buffer is append-only: progenitor writes past the copy's CLL boundary, and the copy reads only within its current boundary. Merge advances the copy's CLL to include newly appended data, making sync O(1) instead of O(delta).
Non-append mutations (line completion, removal, drop, setPartial) break sharing via existing mutation guards and fall back to COW behavior. Buffer relocation on resize also disables zero-copy and uses the existing fallback path.