refactor: replace PositionMapper with free-function line_starts API#466
refactor: replace PositionMapper with free-function line_starts API#46616bit-ykiko wants to merge 8 commits into
Conversation
Adopt kotatsu's decomposed position mapping API (build_line_starts, to_position, to_offset) to eliminate the PositionMapper class abstraction. - Store line_starts in Session, rebuilt on didOpen/didChange - Replace OpenFileIndex with lightweight FileOverlay view struct - Cache line_starts lazily in MergedIndexShard for index queries - Update all feature files, compiler, indexer, and agent client - Bump kotatsu to 980c178 (merged PR #171)
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoves ChangesPositionMapper → line_starts Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e1922fa12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| session->text, | ||
| session->line_starts}; |
There was a problem hiding this comment.
Keep overlay ranges tied to indexed text
When an open file has a compiled file_index and then receives a didChange before the next compile finishes, this overlay now pairs the old index offsets with the current session->text/line_starts. The index ranges were produced for the compile-time buffer, so overlay consumers such as references, workspace symbols, and agent document-symbol reads can convert old byte offsets against a different buffer and return wrong or empty ranges; before this refactor OpenFileIndex kept the indexed content/mapper together with the index.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/feature/code_completion.cpp`:
- Around line 258-262: In the code_completion.cpp file around the Range
construction, replace the direct dereferencing of lsp::to_position calls with a
defensive pattern. First, assign the results of lsp::to_position for both
prefix.range.begin and prefix.range.end to separate variables (e.g., start_pos
and end_pos), then add null checks (if(!start_pos || !end_pos)) before
dereferencing them to initialize the protocol::Range fields. This matches the
defensive null-checking pattern already used in workspace.cpp and
tu_index_tests.cpp.
In `@src/feature/feature.h`:
- Around line 26-33: The to_range function dereferences the results of
lsp::to_position() calls without validating that the returned optional values
are valid, which can cause undefined behavior if the optional is empty. Fix this
by either adding assert statements to validate both optional results before
dereferencing them (documenting the precondition), or change the return type
from protocol::Range to std::optional<protocol::Range> and check both
lsp::to_position() results, returning an empty optional if either call fails.
In `@src/server/workspace/workspace.h`:
- Around line 98-117: The `line_starts()` const method in the `MergedIndexShard`
struct has a data race when modifying the `mutable` member `cached_line_starts`
without synchronization. Add a `mutable` mutex member variable to
`MergedIndexShard` and use it to synchronize access to `cached_line_starts` in
both the `line_starts()` method (where the check and assignment occur) and the
`invalidate()` method. This ensures that the check-then-assign operation on the
cache is atomic and prevents concurrent threads from simultaneously accessing or
modifying the vector.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d3703fb-89a5-4812-a5c8-213002334cf7
📒 Files selected for processing (26)
cmake/package.cmakesrc/feature/code_completion.cppsrc/feature/diagnostics.cppsrc/feature/document_links.cppsrc/feature/document_symbols.cppsrc/feature/feature.hsrc/feature/folding_ranges.cppsrc/feature/formatting.cppsrc/feature/hover.cppsrc/feature/inlay_hints.cppsrc/feature/semantic_tokens.cppsrc/server/compiler/compiler.cppsrc/server/compiler/indexer.cppsrc/server/compiler/indexer.hsrc/server/service/agent_client.cppsrc/server/service/lsp_client.cppsrc/server/service/master_server.cppsrc/server/service/session.hsrc/server/workspace/workspace.cppsrc/server/workspace/workspace.htests/unit/feature/document_link_tests.cpptests/unit/feature/document_symbol_tests.cpptests/unit/feature/folding_range_tests.cpptests/unit/feature/inlay_hint_tests.cpptests/unit/feature/semantic_tokens_tests.cpptests/unit/index/tu_index_tests.cpp
| inline auto to_range(std::string_view content, | ||
| std::span<const std::uint32_t> line_starts, | ||
| lsp::PositionEncoding encoding, | ||
| LocalSourceRange range) -> protocol::Range { | ||
| return protocol::Range{ | ||
| .start = *converter.to_position(range.begin), | ||
| .end = *converter.to_position(range.end), | ||
| .start = *lsp::to_position(content, line_starts, encoding, range.begin), | ||
| .end = *lsp::to_position(content, line_starts, encoding, range.end), | ||
| }; |
There was a problem hiding this comment.
Unchecked optional dereference may cause undefined behavior.
lsp::to_position returns std::optional<protocol::Position>. Lines 31-32 dereference the result without checking validity. If an offset is out of bounds or line_starts is malformed, this will cause undefined behavior.
Consider either:
- Adding a precondition assert to document the caller's responsibility
- Returning
std::optional<protocol::Range>for safe handling
🛡️ Option 1: Add defensive assert
inline auto to_range(std::string_view content,
std::span<const std::uint32_t> line_starts,
lsp::PositionEncoding encoding,
LocalSourceRange range) -> protocol::Range {
+ assert(range.end <= content.size() && "range extends beyond content");
+ auto start = lsp::to_position(content, line_starts, encoding, range.begin);
+ auto end = lsp::to_position(content, line_starts, encoding, range.end);
+ assert(start && end && "to_position failed for valid range");
return protocol::Range{
- .start = *lsp::to_position(content, line_starts, encoding, range.begin),
- .end = *lsp::to_position(content, line_starts, encoding, range.end),
+ .start = *start,
+ .end = *end,
};
}🛡️ Option 2: Return optional for caller handling
inline auto to_range(std::string_view content,
std::span<const std::uint32_t> line_starts,
lsp::PositionEncoding encoding,
- LocalSourceRange range) -> protocol::Range {
+ LocalSourceRange range) -> std::optional<protocol::Range> {
+ auto start = lsp::to_position(content, line_starts, encoding, range.begin);
+ auto end = lsp::to_position(content, line_starts, encoding, range.end);
+ if (!start || !end)
+ return std::nullopt;
return protocol::Range{
- .start = *lsp::to_position(content, line_starts, encoding, range.begin),
- .end = *lsp::to_position(content, line_starts, encoding, range.end),
+ .start = *start,
+ .end = *end,
};
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inline auto to_range(std::string_view content, | |
| std::span<const std::uint32_t> line_starts, | |
| lsp::PositionEncoding encoding, | |
| LocalSourceRange range) -> protocol::Range { | |
| return protocol::Range{ | |
| .start = *converter.to_position(range.begin), | |
| .end = *converter.to_position(range.end), | |
| .start = *lsp::to_position(content, line_starts, encoding, range.begin), | |
| .end = *lsp::to_position(content, line_starts, encoding, range.end), | |
| }; | |
| inline auto to_range(std::string_view content, | |
| std::span<const std::uint32_t> line_starts, | |
| lsp::PositionEncoding encoding, | |
| LocalSourceRange range) -> protocol::Range { | |
| assert(range.end <= content.size() && "range extends beyond content"); | |
| auto start = lsp::to_position(content, line_starts, encoding, range.begin); | |
| auto end = lsp::to_position(content, line_starts, encoding, range.end); | |
| assert(start && end && "to_position failed for valid range"); | |
| return protocol::Range{ | |
| .start = *start, | |
| .end = *end, | |
| }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/feature/feature.h` around lines 26 - 33, The to_range function
dereferences the results of lsp::to_position() calls without validating that the
returned optional values are valid, which can cause undefined behavior if the
optional is empty. Fix this by either adding assert statements to validate both
optional results before dereferencing them (documenting the precondition), or
change the return type from protocol::Range to std::optional<protocol::Range>
and check both lsp::to_position() results, returning an empty optional if either
call fails.
Lazily compute and cache line start offsets in CompilationUnitRef, avoiding repeated scans of interested_content() across feature calls.
Remove the intermediate FileOverlay view type — Indexer now receives const Session& directly. Add to_range/to_position overloads taking CompilationUnitRef to simplify feature-layer call sites.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/server/compiler/indexer.h (1)
285-286: 💤 Low valueConsider renaming member variable to match new semantics.
The documentation now says "Iterates all open Sessions" but the member is still named
each_overlay. This could cause confusion for future maintainers.- /// Iterates all open Sessions with valid file indices. - std::function<void(OverlayVisitor)> each_overlay; + /// Iterates all open Sessions with valid file indices. + std::function<void(OverlayVisitor)> each_session;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/compiler/indexer.h` around lines 285 - 286, The member variable `each_overlay` has documentation that describes it as iterating all open Sessions, but the name no longer reflects this semantics. Rename the member variable from `each_overlay` to a name that better describes its purpose of iterating sessions (such as `each_session`), and update all references to this variable throughout the codebase to use the new name consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/feature/inlay_hints.cpp`:
- Around line 935-939: The call to `to_position` returns a
`std::optional<protocol::Position>`, but the code unconditionally dereferences
it with `*` in the loop over `collected` without checking if the optional
contains a value. Add a guard before dereferencing: check if the optional
returned by `to_position(unit, encoding, hint.offset)` has a value using a
conditional statement, and only construct the `protocol::InlayHint` out object
and proceed with the position assignment if the optional is valid. If the
conversion fails, skip adding that particular hint to the output.
---
Nitpick comments:
In `@src/server/compiler/indexer.h`:
- Around line 285-286: The member variable `each_overlay` has documentation that
describes it as iterating all open Sessions, but the name no longer reflects
this semantics. Rename the member variable from `each_overlay` to a name that
better describes its purpose of iterating sessions (such as `each_session`), and
update all references to this variable throughout the codebase to use the new
name consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 69385ca5-3ff8-4124-aafc-b0bd7c0715a4
📒 Files selected for processing (13)
src/feature/diagnostics.cppsrc/feature/document_links.cppsrc/feature/document_symbols.cppsrc/feature/feature.hsrc/feature/folding_ranges.cppsrc/feature/hover.cppsrc/feature/inlay_hints.cppsrc/feature/semantic_tokens.cppsrc/server/compiler/indexer.cppsrc/server/compiler/indexer.hsrc/server/service/agent_client.cppsrc/server/service/master_server.cppsrc/server/workspace/workspace.h
💤 Files with no reviewable changes (1)
- src/server/workspace/workspace.h
🚧 Files skipped from review as they are similar to previous changes (5)
- src/feature/hover.cpp
- src/feature/document_links.cpp
- src/server/service/master_server.cpp
- src/feature/feature.h
- src/feature/semantic_tokens.cpp
| for(const auto& hint: collected) { | ||
| protocol::InlayHint out{ | ||
| .position = *converter.to_position(hint.offset), | ||
| .position = *to_position(unit, encoding, hint.offset), | ||
| .label = hint.label, | ||
| }; |
There was a problem hiding this comment.
Unchecked optional dereference could cause undefined behavior.
to_position returns std::optional<protocol::Position>. While the hint collection filters to valid source ranges, unconditionally dereferencing with * is unsafe if the conversion ever fails (e.g., corrupted line_starts or offset mismatch).
Consider adding a guard:
🛡️ Proposed defensive fix
for(const auto& hint: collected) {
+ auto pos = to_position(unit, encoding, hint.offset);
+ if(!pos) {
+ continue;
+ }
protocol::InlayHint out{
- .position = *to_position(unit, encoding, hint.offset),
+ .position = *pos,
.label = hint.label,
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for(const auto& hint: collected) { | |
| protocol::InlayHint out{ | |
| .position = *converter.to_position(hint.offset), | |
| .position = *to_position(unit, encoding, hint.offset), | |
| .label = hint.label, | |
| }; | |
| for(const auto& hint: collected) { | |
| auto pos = to_position(unit, encoding, hint.offset); | |
| if(!pos) { | |
| continue; | |
| } | |
| protocol::InlayHint out{ | |
| .position = *pos, | |
| .label = hint.label, | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/feature/inlay_hints.cpp` around lines 935 - 939, The call to
`to_position` returns a `std::optional<protocol::Position>`, but the code
unconditionally dereferences it with `*` in the loop over `collected` without
checking if the optional contains a value. Add a guard before dereferencing:
check if the optional returned by `to_position(unit, encoding, hint.offset)` has
a value using a conditional statement, and only construct the
`protocol::InlayHint` out object and proceed with the position assignment if the
optional is valid. If the conversion fails, skip adding that particular hint to
the output.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/index/merged_index.h (1)
71-72:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftMake the lazy
line_startscache safe under shared queries.
line_starts()is a const API backed by a mutable vector, and the implementation lazily writescached_line_startsbefore returning a span. Downstream query paths call this throughworkspace.merged_indices, so parallel requests can race on the cache initialization. Carry the previousMergedIndexShardcache-race fix over to the newMergedIndexowner; becauseMergedIndexis movable and returns spans into the cache, prefer a move-safe synchronized cache state or eager immutable cache refresh when content changes.Also applies to: 100-100
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/index/merged_index.h` around lines 71 - 72, The line_starts() const method in MergedIndex has a thread-safety issue where it lazily initializes the mutable cached_line_starts vector without synchronization, causing races when parallel queries call this method. Apply the same cache-race fix previously done in MergedIndexShard by either protecting cached_line_starts access with a mutex for move-safe synchronized state, or by eagerly computing and caching line_starts whenever the content changes rather than computing it lazily on first access.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/server/compiler/indexer.cpp`:
- Around line 57-88: Both the find_occurrence and find_relations functions
dereference session.file_index without first checking whether the optional has a
value. Add a guard at the beginning of each function to check if
session.file_index is valid before dereferencing it. In find_occurrence, return
std::nullopt if the file_index is not set. In find_relations, return early if
the file_index is not set. This prevents undefined behavior when these helpers
are called with sessions that haven't completed compilation yet.
- Line 676: The relation filtering at the condition checking `if(rel.kind !=
kind)` uses exact equality comparison but should use bitwise AND operation like
`rel.kind & kind` to properly handle cases where rel.kind has multiple flags
set. Change the comparison from `!=` (not equal) to check `if(!(rel.kind &
kind))` to ensure relations with multiple flags are correctly matched. This
makes the logic consistent with other relation filtering in the file such as at
line 523 where bitwise AND is already used.
---
Duplicate comments:
In `@src/index/merged_index.h`:
- Around line 71-72: The line_starts() const method in MergedIndex has a
thread-safety issue where it lazily initializes the mutable cached_line_starts
vector without synchronization, causing races when parallel queries call this
method. Apply the same cache-race fix previously done in MergedIndexShard by
either protecting cached_line_starts access with a mutex for move-safe
synchronized state, or by eagerly computing and caching line_starts whenever the
content changes rather than computing it lazily on first access.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eb5bbf7c-b3d8-48e1-975c-5ac4fb0380b6
📒 Files selected for processing (6)
src/index/merged_index.cppsrc/index/merged_index.hsrc/server/compiler/indexer.cppsrc/server/service/agent_client.cppsrc/server/workspace/workspace.cppsrc/server/workspace/workspace.h
💤 Files with no reviewable changes (1)
- src/server/workspace/workspace.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/server/service/agent_client.cpp
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8e68e7394
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if(root->line_starts() && root->line_starts()->size() > 0) { | ||
| return {root->line_starts()->data(), root->line_starts()->size()}; | ||
| } | ||
| } |
There was a problem hiding this comment.
Fall back to content for legacy shard line starts
When a user loads an index shard written by the previous schema, root->line_starts() is null even though content() is still present. Returning an empty span here makes the merged-index query paths that guard on ls.empty() skip the shard entirely, so definitions/references/workspace symbols from an otherwise up-to-date cache disappear until a source change forces reindexing; the old PositionMapper path built positions from the stored content for these same shards.
Useful? React with 👍 / 👎.
Summary
build_line_starts,to_position,to_offset) to eliminate thePositionMapperclass abstraction that bundled line-boundary scanning with encoding conversionline_startsinSession(rebuilt ondidOpen/didChange), replacing per-request re-scanningOpenFileIndextype, replace with lightweight non-owningFileOverlayview structline_startslazily inMergedIndexShard(replacingcached_mapper) for index queriesTest plan
Summary by CodeRabbit
kotatsudependency to a newer revision.