Skip to content

refactor: replace PositionMapper with free-function line_starts API#466

Open
16bit-ykiko wants to merge 8 commits into
mainfrom
refactor/line-starts
Open

refactor: replace PositionMapper with free-function line_starts API#466
16bit-ykiko wants to merge 8 commits into
mainfrom
refactor/line-starts

Conversation

@16bit-ykiko

@16bit-ykiko 16bit-ykiko commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • Adopt kotatsu's decomposed position mapping API (build_line_starts, to_position, to_offset) to eliminate the PositionMapper class abstraction that bundled line-boundary scanning with encoding conversion
  • Store line_starts in Session (rebuilt on didOpen/didChange), replacing per-request re-scanning
  • Remove OpenFileIndex type, replace with lightweight non-owning FileOverlay view struct
  • Cache line_starts lazily in MergedIndexShard (replacing cached_mapper) for index queries
  • Update all feature files, compiler, indexer, and agent client to use free functions
  • Bump kotatsu to 980c178 (merged PR fix: misc #171)

Test plan

  • Unit tests: 621 passed, 12 skipped
  • Smoke tests: 3/3 passed
  • Integration tests: 175/175 passed

Summary by CodeRabbit

  • Chores
    • Updated the bundled kotatsu dependency to a newer revision.
  • Refactor
    • Improved LSP position/range conversions by using precomputed per-document line-starts instead of position mappers across editor features and compiler/indexing flows.
    • Added session and merged-index line-start caching, and shifted index lookups to session-backed paths.
  • Tests
    • Updated unit test helpers and snapshots to match the revised conversion logic.

Adopt kotatsu's decomposed position mapping API (build_line_starts,
to_position, to_offset) to eliminate the PositionMapper class abstraction.

- Store line_starts in Session, rebuilt on didOpen/didChange
- Replace OpenFileIndex with lightweight FileOverlay view struct
- Cache line_starts lazily in MergedIndexShard for index queries
- Update all feature files, compiler, indexer, and agent client
- Bump kotatsu to 980c178 (merged PR #171)
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removes PositionMapper throughout the codebase and replaces it with lsp::build_line_starts + lsp::to_position/to_offset. Extends Session to store precomputed line-start offsets and index data directly; adds CompilationUnitRef::line_starts() lazy caching; updates all feature implementations, server compiler request handlers, indexer query paths, workspace helpers, agent client, and tests to use explicit content + line_starts parameters instead of mapper objects.

Changes

PositionMapper → line_starts Refactor

Layer / File(s) Summary
Core contracts: feature.h, workspace.h, MergedIndex
src/feature/feature.h, src/server/workspace/workspace.h, src/index/merged_index.h
feature.h replaces PositionMapper-based to_range with content/line_starts/encoding signature and adds to_position/to_range overloads for CompilationUnitRef; workspace.h removes OpenFileIndex and MergedIndexShard helper structs and changes merged_indices to store index::MergedIndex directly; MergedIndex adds line_starts() caching accessor and cached_line_starts field.
CompilationUnitRef: add line_starts() lazy caching
src/compile/compilation_unit.h, src/compile/compilation_unit.cpp, src/compile/implement.h
Adds <span> header include, introduces line_starts_cache field to CompilationUnitRef::Self, implements line_starts() method that lazily populates and returns cached std::span<const std::uint32_t> of line-start byte offsets.
Session data model: line_starts, FileIndex, SymbolTable
src/server/service/session.h
Extends Session with std::vector<std::uint32_t> line_starts for per-line byte offsets; changes file_index from optional<OpenFileIndex> to optional<index::FileIndex>; adds optional<index::SymbolTable> symbols for latest compilation results; includes required headers.
LSP client: maintain line_starts on open and edit
src/server/service/lsp_client.cpp
Rebuilds session->line_starts after didOpen and after each incremental didChange edit; refactors edit offset mapping to use lsp::to_offset with session->line_starts and UTF16 instead of PositionMapper.
Compiler: store and query FileIndex/SymbolTable via session
src/server/compiler/compiler.cpp, src/server/service/master_server.cpp
Moves tu_index.main_file_index and tu_index.symbols directly into session fields after compilation; all request handlers (forward_query, forward_build, forward_format, handle_completion) capture session->line_starts and convert positions/ranges via lsp::to_offset (UTF16).
Indexer API: session iteration helpers
src/server/compiler/indexer.h
Changes OverlayVisitor callback from const OpenFileIndex& to const Session&; adds for_each_session to iterate open sessions and with_session(server_path_id, fn) template replacing for_each_overlay and get_overlay.
Indexer merge/save/load and helpers
src/server/compiler/indexer.cpp
Adds local UTF16 conversion helpers and relation/occurrence lookup utilities for both Session and MergedIndex; refactors merge/save/load to use shard.merge(), shard.need_rewrite(), and MergedIndex::load() directly; clears cached_line_starts on merge.
Indexer query methods: migrate all to session-based
src/server/compiler/indexer.cpp
Migrates find_symbol_info, resolve_cursor, query_relations, find_definition_location, collect_grouped_relations, collect_unique_targets, get_definition_text, collect_references, and search_symbols to iterate sessions and use lsp::to_position/to_offset instead of PositionMapper.
Workspace: remove occurrence helper implementations
src/server/workspace/workspace.cpp, src/server/workspace/workspace.h
Removes OpenFileIndex::find_occurrence and MergedIndexShard::find_occurrence implementations that relied on PositionMapper; drops LSP position/protocol headers.
AgentClient: session-based symbol and definition lookup
src/server/service/agent_client.cpp
Adds find_relations template helper for mapping merged-index relations to protocol::Range; updates resolve_locator, SymbolSearchParams, and DocumentSymbolsParams to iterate sessions via for_each_session/with_session and convert positions using lsp::to_position (UTF16).
All LSP features: precompute line_starts
src/feature/code_completion.cpp, src/feature/diagnostics.cpp, src/feature/document_links.cpp, src/feature/document_symbols.cpp, src/feature/folding_ranges.cpp, src/feature/formatting.cpp, src/feature/hover.cpp, src/feature/inlay_hints.cpp, src/feature/semantic_tokens.cpp
Each feature precomputes line_starts once and derives all LSP positions via lsp::to_position and lsp::to_offset; SemanticTokenEncoder stores line_starts and encoding instead of PositionMapper.
Tests: migrate to lsp helpers
tests/unit/feature/document_link_tests.cpp, tests/unit/feature/document_symbol_tests.cpp, tests/unit/feature/folding_range_tests.cpp, tests/unit/feature/inlay_hint_tests.cpp, tests/unit/feature/semantic_tokens_tests.cpp, tests/unit/index/tu_index_tests.cpp
All test helpers replace PositionMapper with lsp::build_line_starts + lsp::to_position/to_offset; adds required <cstdint>, <span>, <string_view> includes.
kotatsu dependency update
cmake/package.cmake
Updates kotatsu GIT_TAG to 980c178.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hoppity-hop through the code I go,
No more PositionMapper stealing the show!
build_line_starts precomputes with delight,
to_position gleams in the LSP light.
With line_starts cached, efficiency's complete,
The rabbit declares: this refactor's sweet! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and concisely describes the main refactoring: replacing PositionMapper with free-function line_starts API, which is the core change across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/line-starts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e1922fa12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/service/master_server.cpp Outdated
Comment on lines +52 to +53
session->text,
session->line_starts};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep overlay ranges tied to indexed text

When an open file has a compiled file_index and then receives a didChange before the next compile finishes, this overlay now pairs the old index offsets with the current session->text/line_starts. The index ranges were produced for the compile-time buffer, so overlay consumers such as references, workspace symbols, and agent document-symbol reads can convert old byte offsets against a different buffer and return wrong or empty ranges; before this refactor OpenFileIndex kept the indexed content/mapper together with the index.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/feature/code_completion.cpp`:
- Around line 258-262: In the code_completion.cpp file around the Range
construction, replace the direct dereferencing of lsp::to_position calls with a
defensive pattern. First, assign the results of lsp::to_position for both
prefix.range.begin and prefix.range.end to separate variables (e.g., start_pos
and end_pos), then add null checks (if(!start_pos || !end_pos)) before
dereferencing them to initialize the protocol::Range fields. This matches the
defensive null-checking pattern already used in workspace.cpp and
tu_index_tests.cpp.

In `@src/feature/feature.h`:
- Around line 26-33: The to_range function dereferences the results of
lsp::to_position() calls without validating that the returned optional values
are valid, which can cause undefined behavior if the optional is empty. Fix this
by either adding assert statements to validate both optional results before
dereferencing them (documenting the precondition), or change the return type
from protocol::Range to std::optional<protocol::Range> and check both
lsp::to_position() results, returning an empty optional if either call fails.

In `@src/server/workspace/workspace.h`:
- Around line 98-117: The `line_starts()` const method in the `MergedIndexShard`
struct has a data race when modifying the `mutable` member `cached_line_starts`
without synchronization. Add a `mutable` mutex member variable to
`MergedIndexShard` and use it to synchronize access to `cached_line_starts` in
both the `line_starts()` method (where the check and assignment occur) and the
`invalidate()` method. This ensures that the check-then-assign operation on the
cache is atomic and prevents concurrent threads from simultaneously accessing or
modifying the vector.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d3703fb-89a5-4812-a5c8-213002334cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 8cba654 and 9e1922f.

📒 Files selected for processing (26)
  • cmake/package.cmake
  • src/feature/code_completion.cpp
  • src/feature/diagnostics.cpp
  • src/feature/document_links.cpp
  • src/feature/document_symbols.cpp
  • src/feature/feature.h
  • src/feature/folding_ranges.cpp
  • src/feature/formatting.cpp
  • src/feature/hover.cpp
  • src/feature/inlay_hints.cpp
  • src/feature/semantic_tokens.cpp
  • src/server/compiler/compiler.cpp
  • src/server/compiler/indexer.cpp
  • src/server/compiler/indexer.h
  • src/server/service/agent_client.cpp
  • src/server/service/lsp_client.cpp
  • src/server/service/master_server.cpp
  • src/server/service/session.h
  • src/server/workspace/workspace.cpp
  • src/server/workspace/workspace.h
  • tests/unit/feature/document_link_tests.cpp
  • tests/unit/feature/document_symbol_tests.cpp
  • tests/unit/feature/folding_range_tests.cpp
  • tests/unit/feature/inlay_hint_tests.cpp
  • tests/unit/feature/semantic_tokens_tests.cpp
  • tests/unit/index/tu_index_tests.cpp

Comment thread src/feature/code_completion.cpp Outdated
Comment thread src/feature/feature.h
Comment on lines +26 to 33
inline auto to_range(std::string_view content,
std::span<const std::uint32_t> line_starts,
lsp::PositionEncoding encoding,
LocalSourceRange range) -> protocol::Range {
return protocol::Range{
.start = *converter.to_position(range.begin),
.end = *converter.to_position(range.end),
.start = *lsp::to_position(content, line_starts, encoding, range.begin),
.end = *lsp::to_position(content, line_starts, encoding, range.end),
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unchecked optional dereference may cause undefined behavior.

lsp::to_position returns std::optional<protocol::Position>. Lines 31-32 dereference the result without checking validity. If an offset is out of bounds or line_starts is malformed, this will cause undefined behavior.

Consider either:

  1. Adding a precondition assert to document the caller's responsibility
  2. Returning std::optional<protocol::Range> for safe handling
🛡️ Option 1: Add defensive assert
 inline auto to_range(std::string_view content,
                      std::span<const std::uint32_t> line_starts,
                      lsp::PositionEncoding encoding,
                      LocalSourceRange range) -> protocol::Range {
+    assert(range.end <= content.size() && "range extends beyond content");
+    auto start = lsp::to_position(content, line_starts, encoding, range.begin);
+    auto end = lsp::to_position(content, line_starts, encoding, range.end);
+    assert(start && end && "to_position failed for valid range");
     return protocol::Range{
-        .start = *lsp::to_position(content, line_starts, encoding, range.begin),
-        .end = *lsp::to_position(content, line_starts, encoding, range.end),
+        .start = *start,
+        .end = *end,
     };
 }
🛡️ Option 2: Return optional for caller handling
 inline auto to_range(std::string_view content,
                      std::span<const std::uint32_t> line_starts,
                      lsp::PositionEncoding encoding,
-                     LocalSourceRange range) -> protocol::Range {
+                     LocalSourceRange range) -> std::optional<protocol::Range> {
+    auto start = lsp::to_position(content, line_starts, encoding, range.begin);
+    auto end = lsp::to_position(content, line_starts, encoding, range.end);
+    if (!start || !end)
+        return std::nullopt;
     return protocol::Range{
-        .start = *lsp::to_position(content, line_starts, encoding, range.begin),
-        .end = *lsp::to_position(content, line_starts, encoding, range.end),
+        .start = *start,
+        .end = *end,
     };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline auto to_range(std::string_view content,
std::span<const std::uint32_t> line_starts,
lsp::PositionEncoding encoding,
LocalSourceRange range) -> protocol::Range {
return protocol::Range{
.start = *converter.to_position(range.begin),
.end = *converter.to_position(range.end),
.start = *lsp::to_position(content, line_starts, encoding, range.begin),
.end = *lsp::to_position(content, line_starts, encoding, range.end),
};
inline auto to_range(std::string_view content,
std::span<const std::uint32_t> line_starts,
lsp::PositionEncoding encoding,
LocalSourceRange range) -> protocol::Range {
assert(range.end <= content.size() && "range extends beyond content");
auto start = lsp::to_position(content, line_starts, encoding, range.begin);
auto end = lsp::to_position(content, line_starts, encoding, range.end);
assert(start && end && "to_position failed for valid range");
return protocol::Range{
.start = *start,
.end = *end,
};
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/feature/feature.h` around lines 26 - 33, The to_range function
dereferences the results of lsp::to_position() calls without validating that the
returned optional values are valid, which can cause undefined behavior if the
optional is empty. Fix this by either adding assert statements to validate both
optional results before dereferencing them (documenting the precondition), or
change the return type from protocol::Range to std::optional<protocol::Range>
and check both lsp::to_position() results, returning an empty optional if either
call fails.

Comment thread src/server/workspace/workspace.h Outdated
Lazily compute and cache line start offsets in CompilationUnitRef,
avoiding repeated scans of interested_content() across feature calls.
Remove the intermediate FileOverlay view type — Indexer now receives
const Session& directly.  Add to_range/to_position overloads taking
CompilationUnitRef to simplify feature-layer call sites.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/server/compiler/indexer.h (1)

285-286: 💤 Low value

Consider renaming member variable to match new semantics.

The documentation now says "Iterates all open Sessions" but the member is still named each_overlay. This could cause confusion for future maintainers.

-    /// Iterates all open Sessions with valid file indices.
-    std::function<void(OverlayVisitor)> each_overlay;
+    /// Iterates all open Sessions with valid file indices.
+    std::function<void(OverlayVisitor)> each_session;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/compiler/indexer.h` around lines 285 - 286, The member variable
`each_overlay` has documentation that describes it as iterating all open
Sessions, but the name no longer reflects this semantics. Rename the member
variable from `each_overlay` to a name that better describes its purpose of
iterating sessions (such as `each_session`), and update all references to this
variable throughout the codebase to use the new name consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/feature/inlay_hints.cpp`:
- Around line 935-939: The call to `to_position` returns a
`std::optional<protocol::Position>`, but the code unconditionally dereferences
it with `*` in the loop over `collected` without checking if the optional
contains a value. Add a guard before dereferencing: check if the optional
returned by `to_position(unit, encoding, hint.offset)` has a value using a
conditional statement, and only construct the `protocol::InlayHint` out object
and proceed with the position assignment if the optional is valid. If the
conversion fails, skip adding that particular hint to the output.

---

Nitpick comments:
In `@src/server/compiler/indexer.h`:
- Around line 285-286: The member variable `each_overlay` has documentation that
describes it as iterating all open Sessions, but the name no longer reflects
this semantics. Rename the member variable from `each_overlay` to a name that
better describes its purpose of iterating sessions (such as `each_session`), and
update all references to this variable throughout the codebase to use the new
name consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 69385ca5-3ff8-4124-aafc-b0bd7c0715a4

📥 Commits

Reviewing files that changed from the base of the PR and between d583e96 and cc82e7a.

📒 Files selected for processing (13)
  • src/feature/diagnostics.cpp
  • src/feature/document_links.cpp
  • src/feature/document_symbols.cpp
  • src/feature/feature.h
  • src/feature/folding_ranges.cpp
  • src/feature/hover.cpp
  • src/feature/inlay_hints.cpp
  • src/feature/semantic_tokens.cpp
  • src/server/compiler/indexer.cpp
  • src/server/compiler/indexer.h
  • src/server/service/agent_client.cpp
  • src/server/service/master_server.cpp
  • src/server/workspace/workspace.h
💤 Files with no reviewable changes (1)
  • src/server/workspace/workspace.h
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/feature/hover.cpp
  • src/feature/document_links.cpp
  • src/server/service/master_server.cpp
  • src/feature/feature.h
  • src/feature/semantic_tokens.cpp

Comment on lines 935 to 939
for(const auto& hint: collected) {
protocol::InlayHint out{
.position = *converter.to_position(hint.offset),
.position = *to_position(unit, encoding, hint.offset),
.label = hint.label,
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unchecked optional dereference could cause undefined behavior.

to_position returns std::optional<protocol::Position>. While the hint collection filters to valid source ranges, unconditionally dereferencing with * is unsafe if the conversion ever fails (e.g., corrupted line_starts or offset mismatch).

Consider adding a guard:

🛡️ Proposed defensive fix
     for(const auto& hint: collected) {
+        auto pos = to_position(unit, encoding, hint.offset);
+        if(!pos) {
+            continue;
+        }
         protocol::InlayHint out{
-            .position = *to_position(unit, encoding, hint.offset),
+            .position = *pos,
             .label = hint.label,
         };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for(const auto& hint: collected) {
protocol::InlayHint out{
.position = *converter.to_position(hint.offset),
.position = *to_position(unit, encoding, hint.offset),
.label = hint.label,
};
for(const auto& hint: collected) {
auto pos = to_position(unit, encoding, hint.offset);
if(!pos) {
continue;
}
protocol::InlayHint out{
.position = *pos,
.label = hint.label,
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/feature/inlay_hints.cpp` around lines 935 - 939, The call to
`to_position` returns a `std::optional<protocol::Position>`, but the code
unconditionally dereferences it with `*` in the loop over `collected` without
checking if the optional contains a value. Add a guard before dereferencing:
check if the optional returned by `to_position(unit, encoding, hint.offset)` has
a value using a conditional statement, and only construct the
`protocol::InlayHint` out object and proceed with the position assignment if the
optional is valid. If the conversion fails, skip adding that particular hint to
the output.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/index/merged_index.h (1)

71-72: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Make the lazy line_starts cache safe under shared queries.

line_starts() is a const API backed by a mutable vector, and the implementation lazily writes cached_line_starts before returning a span. Downstream query paths call this through workspace.merged_indices, so parallel requests can race on the cache initialization. Carry the previous MergedIndexShard cache-race fix over to the new MergedIndex owner; because MergedIndex is movable and returns spans into the cache, prefer a move-safe synchronized cache state or eager immutable cache refresh when content changes.

Also applies to: 100-100

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/index/merged_index.h` around lines 71 - 72, The line_starts() const
method in MergedIndex has a thread-safety issue where it lazily initializes the
mutable cached_line_starts vector without synchronization, causing races when
parallel queries call this method. Apply the same cache-race fix previously done
in MergedIndexShard by either protecting cached_line_starts access with a mutex
for move-safe synchronized state, or by eagerly computing and caching
line_starts whenever the content changes rather than computing it lazily on
first access.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/server/compiler/indexer.cpp`:
- Around line 57-88: Both the find_occurrence and find_relations functions
dereference session.file_index without first checking whether the optional has a
value. Add a guard at the beginning of each function to check if
session.file_index is valid before dereferencing it. In find_occurrence, return
std::nullopt if the file_index is not set. In find_relations, return early if
the file_index is not set. This prevents undefined behavior when these helpers
are called with sessions that haven't completed compilation yet.
- Line 676: The relation filtering at the condition checking `if(rel.kind !=
kind)` uses exact equality comparison but should use bitwise AND operation like
`rel.kind & kind` to properly handle cases where rel.kind has multiple flags
set. Change the comparison from `!=` (not equal) to check `if(!(rel.kind &
kind))` to ensure relations with multiple flags are correctly matched. This
makes the logic consistent with other relation filtering in the file such as at
line 523 where bitwise AND is already used.

---

Duplicate comments:
In `@src/index/merged_index.h`:
- Around line 71-72: The line_starts() const method in MergedIndex has a
thread-safety issue where it lazily initializes the mutable cached_line_starts
vector without synchronization, causing races when parallel queries call this
method. Apply the same cache-race fix previously done in MergedIndexShard by
either protecting cached_line_starts access with a mutex for move-safe
synchronized state, or by eagerly computing and caching line_starts whenever the
content changes rather than computing it lazily on first access.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb5bbf7c-b3d8-48e1-975c-5ac4fb0380b6

📥 Commits

Reviewing files that changed from the base of the PR and between cc82e7a and 3ff7b4b.

📒 Files selected for processing (6)
  • src/index/merged_index.cpp
  • src/index/merged_index.h
  • src/server/compiler/indexer.cpp
  • src/server/service/agent_client.cpp
  • src/server/workspace/workspace.cpp
  • src/server/workspace/workspace.h
💤 Files with no reviewable changes (1)
  • src/server/workspace/workspace.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server/service/agent_client.cpp

Comment thread src/server/compiler/indexer.cpp
Comment thread src/server/compiler/indexer.cpp

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8e68e7394

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +646 to +649
if(root->line_starts() && root->line_starts()->size() > 0) {
return {root->line_starts()->data(), root->line_starts()->size()};
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to content for legacy shard line starts

When a user loads an index shard written by the previous schema, root->line_starts() is null even though content() is still present. Returning an empty span here makes the merged-index query paths that guard on ls.empty() skip the shard entirely, so definitions/references/workspace symbols from an otherwise up-to-date cache disappear until a source change forces reindexing; the old PositionMapper path built positions from the stored content for these same shards.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant