Skip to content

feat(server): error feedback and logging channels (D9)#456

Open
16bit-ykiko wants to merge 6 commits into
mainfrom
feat/server-feedback
Open

feat(server): error feedback and logging channels (D9)#456
16bit-ykiko wants to merge 6 commits into
mainfrom
feat/server-feedback

Conversation

@16bit-ykiko

@16bit-ykiko 16bit-ykiko commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Implements the server-side error feedback plan (temp/plan-server-feedback.md), supersedes #436 (reimplemented on current main rather than rebased — it predates the #437/#447/#449 restructures).

Channels

Anomaly (soft assertions, src/support/anomaly.h)LOG_ANOMALY(id, fmt, ...) for internal states that must be unreachable when clice works correctly. Debug builds abort after logging (the CI Debug matrix and local dev catch bugs earliest; CLICE_ANOMALY_NO_TRAP exists for tests); Release logs [anomaly:<id>], pushes window/logMessage (master) and continues. Per-ID rate limit; the gate runs before format arguments are evaluated (lazy contract locked by unit tests with side-effecting args). IDs: pch_build_fail, pcm_build_fail, compile_fail, worker_request_fail, worker_crash, worker_spawn_fail, position_map_fail. Situations reachable by user input/user code are deliberately not anomalies — PCH/PCM failures carry BuildResult.has_user_errors to make that split.

GuidanceLOG_GUIDANCE(...) ([guidance] + Warning logMessage) for user-actionable situations: no compile_commands.json, invalid initializationOptions, config problems.

LSP errors instead of silent null — feature requests on closed documents → Document not open; unresolvable call/type hierarchy items → error; client positions/ranges that fail to map → InvalidParams; empty hierarchy/workspace-symbol results now return [] instead of null.

Guidance diagnosticsfill_compile_args() returns a CommandSource (CdbExact/IncludeGraph/Inferred (reserved)/Fallback) and emits a per-file decision log (tiers tried, tier hit, args hash). When a guessed command produces file-not-found errors, the publish merges a file-top Warning explaining it (code inferred-compile-command, linking the quick-start guide — resolves the DiagnosticSource::Clice TODO). Exact CDB matches never get it.

clice.toml diagnostics — parse/type errors (Error, defaults apply) and unknown keys (Warning via a strict second decode pass, config still applies) are published on the config file's URI with line/column from the TOML decoder (needs clice-io/kotatsu#168, pinned). Fixing the file and restarting clears them.

Notable fixes uncovered along the way

  • CompilationDatabase::lookup() synthesizes a command for unknown files, so the old results.empty() checks were dead: the automatic include-graph header-context tier was unreachable, and headers without entries silently compiled as bare clang. Tier selection now uses has_entry(); background indexing skips Fallback-sourced files.
  • initializationOptions now overlay before apply_defaults(): a client-provided cache_dir previously didn't propagate into the derived logging_dir/index_dir.

Test infrastructure

  • CliceClient records window/logMessage; assert_no_anomaly() runs in every integration teardown (client + agentic fixtures, agentic local fixtures, make_client tests), scanning notifications and master/worker log files. tests/replay.py fails a smoke trace on any anomaly push.
  • E2E: guidance-diagnostic lifecycle (no CDB → appears; add CDB + restart → gone while the include error stays), clice.toml type/unknown-key/clear scenarios with line+column, worker-crash anomaly reporting (Linux, @allow_anomaly), pytest.raises for closed/unknown documents, out-of-range positions, bogus hierarchy items.
  • Named timing constants (MTIME_GRANULARITY/SETTLE_TIME/IDLE_TIMEOUT) replace hardcoded sleeps; yield-based workspace cleanup; log dump on test failure; Tester::to_local_range() dedup.

Out of scope (per plan)

Module-cycle guidance diagnostics (4c) wait for the module refactor; CacheStore hit/miss logs wait for #453; worker logMessage forwarding and request-correlation IDs belong to the WorkerPool task.

Test plan

  • RelWithDebInfo: 611 unit / 180 integration / 2-2 smoke — all green locally
  • Debug: 611 unit / 180 integration / 2-2 smoke — all green locally (anomaly = abort in Debug, so this run is the "zero anomalies on regular paths" acceptance)
  • pixi run format clean; 3 parallel review subagents (correctness / style / tests) — findings fixed
  • After feat(codec): attach source location to TOML syntax and unknown-field errors kotatsu#168 merges, re-pin to the kotatsu main SHA (currently pinned to the PR head)

Closes #436 (superseded).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added diagnostic reporting for configuration file issues with precise error locations
    • Introduced guidance diagnostics explaining inferred compile commands
    • Enhanced error messages for invalid LSP requests with detailed feedback
  • Bug Fixes

    • Fixed position mapping in code completion, formatting, and semantic token features
    • Improved worker process crash detection and reporting
  • Improvements

    • Better visibility into which compile command source is used (database, include-graph, inferred, or fallback)
    • Refined error handling to distinguish user code errors from server infrastructure failures
    • Added internal anomaly tracking with rate-limiting and logging

Pulls clice-io/kotatsu#168: TOML syntax errors carry description and
line/column in no-exceptions builds, and unknown-field errors get the
offending node's location.
LOG_ANOMALY(id, ...) is a soft assertion for internal invariants: Debug
builds abort after logging (CLICE_ANOMALY_NO_TRAP escapes for tests),
Release builds log with a stable [anomaly:<id>] marker, forward it to a
notify hook (master: window/logMessage) and continue. Reports are
rate-limited per ID; the gate runs before format arguments are
evaluated, so suppressed reports cost nothing.

LOG_GUIDANCE(...) is the user-actionable channel ([guidance] marker,
Warning logMessage) for situations that are not clice bugs.

Unit tests lock the lazy-evaluation contract, rate limiting, markers
and the trap mock point.
Reimplements the intent of #436 on current main (superseding it):

- fill_compile_args() reports a CommandSource (CdbExact / IncludeGraph /
  Inferred / Fallback) and emits a per-file decision log (tiers tried,
  tier hit, command hash). Real CDB entries are now distinguished from
  the synthesized default command via has_entry(), which also revives
  the automatic include-graph header context tier that the synthesis
  had shadowed.
- Guessed commands whose diagnostics contain file-not-found errors get
  a file-top Warning guidance diagnostic (source: clice, code links the
  setup guide, resolving the DiagnosticSource::Clice TODO). Exact CDB
  matches never get it.
- clice.toml problems become diagnostics on the config file's URI with
  line/column from the TOML decoder: decode failures as Error (defaults
  apply), unknown keys as Warning via a strict second decode pass.
  initializationOptions now overlay BEFORE apply_defaults so derived
  fields (logging_dir, index_dir) follow overridden values.
- Feature requests on closed documents and unresolvable hierarchy items
  return LSP errors instead of silent null; client-provided positions
  and ranges that fail to map return InvalidParams. Empty hierarchy and
  workspace-symbol results return empty arrays instead of null.
- Internal failures report anomalies: unexpected PCH/PCM build failures
  (user-code errors are distinguished via BuildResult.has_user_errors),
  compile/query IPC failures, worker crash/spawn failures, and
  internally-produced offsets failing to map (feature::to_position now
  clamps instead of dereferencing an empty optional).
- Master pushes anomaly/guidance to the client via window/logMessage;
  workers only log (their files are scanned by tests).
- CliceClient records window/logMessage; assert_no_anomaly() runs in
  every client/agentic fixture teardown, scanning both notifications
  and master/worker log files (opt-out: @pytest.mark.allow_anomaly).
- replay.py fails a trace when an [anomaly:] message is pushed.
- E2E: fallback-command guidance diagnostic lifecycle (appears without
  CDB, gone after providing one), clice.toml type/unknown-key
  diagnostics with line/column and clearing after a fix, worker-crash
  anomaly reporting (Linux), error responses for requests on closed or
  never-opened documents.
- Named timing constants (MTIME_GRANULARITY/SETTLE_TIME/IDLE_TIMEOUT)
  replace hardcoded sleeps; workspace fixture cleans up after tests;
  log messages are dumped when a test fails.
- Tester::to_local_range() replaces per-file duplicates.
Pre-PR review findings:
- publish_config_diagnostics groups issues by their own file URI; a
  malformed clice.toml no longer renders its error on the next config
  candidate that loaded.
- A failed switchContext header context now falls through to the real
  CDB entry before reaching the synthesized fallback.
- Header-context host selection filters hosts through has_entry();
  lookup() never returns empty (it synthesizes), so the old emptiness
  checks were dead and hosts without real entries were mislabeled as
  IncludeGraph.
- Style: string_view message parameters, llvm::join, doc comments.
- All seven AnomalyIds fire through the macro once (marker names are
  wire-stable for the integration greps).
- InvalidParams for an out-of-range hover position; error response for
  an unresolvable call hierarchy item.
- Column assertions for config issue locations.
- assert_no_anomaly in the agentic local fixtures and make_client-based
  tests that bypass the client fixture teardown.
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a comprehensive internal anomaly reporting infrastructure with rate limiting and notifications, refactors feature position conversions to use safe helpers instead of dereferencing optionals, classifies compile command sources with per-file decision logging, extends configuration loading to capture and report issues as LSP diagnostics, and updates LSP error handling to return structured errors with guidance diagnostics for inferred compile commands.

Changes

Anomaly Reporting Infrastructure

Layer / File(s) Summary
Anomaly public contract and logging APIs
src/support/anomaly.h
Introduces AnomalyId enum with compile-time count, NotifyLevel severity enum mirroring LSP types, stable anomaly_name() mapping, pre-format gates for lazy suppression, core report_anomaly() and report_guidance() functions, process-wide notify hook setter, and testing trap/reset functions; adds LOG_ANOMALY and LOG_GUIDANCE macros with gating-before-format behavior.
Anomaly reporting implementation
src/support/anomaly.cpp
Implements atomic per-anomaly counters with configurable report limit, stable name mapping with fallback, debug trap behavior aborting unless CLICE_ANOMALY_NO_TRAP env set, notification hook forwarding, and test-only trap/reset for deterministic testing.
Anomaly reporting unit tests
tests/unit/support/anomaly_tests.cpp
Validates anomaly marker formatting, trap invocation, per-anomaly rate limiting with suppression, lazy argument evaluation when gated, guidance logging with warning level, and stable anomaly ID name strings.

Safe Position Mapping Refactoring

Layer / File(s) Summary
Safe position conversion helper
src/feature/feature.h
Adds to_position(converter, offset) -> protocol::Position helper that safely converts offsets via PositionMapper, emits anomalies and returns clamped {0, 0} on failure instead of dereferencing optionals.
Feature implementations using safe mapping
src/feature/code_completion.cpp, src/feature/diagnostics.cpp, src/feature/folding_ranges.cpp, src/feature/formatting.cpp, src/feature/inlay_hints.cpp, src/feature/semantic_tokens.cpp
Updates all feature implementations to use to_position(converter, ...) helper instead of dereferencing converter.to_position(...) results, handling partial/invalid mappings gracefully.
Test infrastructure for position mapping
tests/unit/test/tester.h
Adds Tester::to_local_range(protocol::Range) method to centralize protocol-to-local range conversion using PositionMapper, enabling consistent and reusable position mapping across unit tests.
Unit test refactors to use centralized mapping
tests/unit/feature/folding_range_tests.cpp, tests/unit/feature/document_link_tests.cpp
Updates unit tests to delegate range-to-local conversion to Tester::to_local_range() instead of maintaining local PositionMapper instances.

Compile Command Source Tracking

Layer / File(s) Summary
CommandSource enum and logging contract
src/server/compiler/compiler.h
Defines CommandSource enum (CdbExact, IncludeGraph, Inferred, Fallback) and declares command_source_name() helper for stable logging identifiers.
fill_compile_args return type and decision logging
src/server/compiler/compiler.cpp (lines 162–246)
Rewrites Compiler::fill_compile_args() to return CommandSource instead of bool, adds log_command_decision() helper for per-file tier-selection logs, implements explicit tiered resolution with fallback as last resort.
Header context resolution tightening
src/server/compiler/compiler.cpp (lines 286–351)
Checks workspace.cdb.has_entry() before accepting candidate host paths in header-context resolution, preventing reliance on synthesized CDB entries and simplifying fallback host selection.
CommandSource publication to diagnostics
src/server/compiler/compiler.cpp (lines 479–536)
Updates Compiler::publish_diagnostics() signature to accept CommandSource source, implements guidance-diagnostic injection when source is not exact CDB and diagnostics contain file-not-found codes.
Compiler implementation updates
src/server/compiler/compiler.cpp (lines 791–848)
Captures returned CommandSource in run_compile(), passes it to publish_diagnostics(), changes worker failure logging to LOG_ANOMALY, updates PCH/PCM build failure handling to use has_user_errors flag.
LSP request handler error handling
src/server/compiler/compiler.cpp (lines 932–1067)
Updates forward_query(), forward_build(), and forward_format() to return structured outcome_error for invalid parameters and worker failures, adds LOG_ANOMALY records for failures.
Indexer fallback command avoidance
src/server/compiler/indexer.cpp
Updates Indexer::index_one to return early only on CommandSource::Fallback, avoiding synthesized fallback commands in bulk indexing.
Agent client compile command handling
src/server/service/agent_client.cpp
Removes conditional error handling so agent client always returns compile command result, adapting to new CommandSource return type.
BuildResult user_errors classification
src/server/protocol/worker.h, src/server/worker/stateless_worker.cpp
Adds has_user_errors boolean field to BuildResult to distinguish user-code failures from infrastructure failures, populated by PCH/PCM handlers based on diagnostic presence.
Worker pool anomaly logging
src/server/worker/worker_pool.cpp
Emits WorkerSpawnFail and WorkerCrash anomaly logs with worker identity and error details instead of generic error logs.
Diagnostic URI for inferred-compile-command
src/compile/diagnostic.cpp
Returns Clice quick-start documentation URL for the "inferred-compile-command" diagnostic code.
Kotatsu dependency update
cmake/package.cmake
Updates kotatsu FetchContent_Declare GIT_TAG to new commit hash.

Configuration Issue Reporting

Layer / File(s) Summary
ConfigIssue struct and load signature updates
src/server/workspace/config.h
Defines ConfigIssue struct with severity, file path, message, and optional 1-based line/column; extends Config::load() and Config::load_from_workspace() with optional issue capture and default-control parameters.
Config loading implementation with strict validation
src/server/workspace/config.cpp
Implements strict TOML validation (DenyUnknownKeys) to surface unknown keys as warnings, populates issues vector with precise location data, and conditionally applies defaults based on with_defaults flag.
MasterServer config storage and initialization
src/server/service/master_server.h, src/server/service/master_server.cpp
Stores config_issues and config_path in MasterServer, updates initialization to load config without defaults, overlay initializationOptions, then apply defaults; changes logging: guidance for initializationOptions/compile_commands.json missing, anomaly for worker startup failure.
Config loading unit tests
tests/unit/server/config_tests.cpp
Validates Config::load(..., &issues) captures exact source location for syntax errors, type errors, and unknown keys with precise message content.

LSP Error Handling and Diagnostics Publishing

Layer / File(s) Summary
LSP request handler error responses
src/server/service/lsp_client.cpp (lines 40–592)
Introduces document_not_open() helper returning InvalidParams error; updates hover, semantic tokens, inlay hints, folding ranges, document symbols, definition, completion, formatting, call hierarchy, and type hierarchy handlers to return structured errors instead of null for missing sessions and failures.
Configuration diagnostics publishing
src/server/service/lsp_client.h, src/server/service/lsp_client.cpp (lines 723–771)
Implements LSPClient::publish_config_diagnostics() converting MasterServer::config_issues into per-file LSP diagnostics, called during initialization; clears logging notify hook in destructor.
LSP notify hook for guidance messages
src/server/service/lsp_client.cpp (lines 13, 40–57)
Adds logging notify hook during LSPClient construction to forward guidance messages as LSP window/logMessage notifications.
Configuration diagnostics integration tests
tests/integration/lifecycle/test_config.py
Validates type mismatch produces error diagnostic with correct location/message, unknown key produces warning, and fixed config clears diagnostics.
Guidance diagnostic integration tests
tests/integration/features/test_guidance_diagnostics.py
Validates guidance appears when no CDB exists with file-not-found, disappears after adding matching CDB, and no guidance when fallback succeeds; includes diagnostic filtering helpers.

Test Infrastructure and Reliability

Layer / File(s) Summary
Standardized test timing constants
tests/integration/utils/wait.py
Introduces MTIME_GRANULARITY, SETTLE_TIME, IDLE_TIMEOUT constants for centralized test synchronization across filesystem operations and event settling.
Test client log message capture
tests/integration/utils/client.py
Extends CliceClient to capture server window/logMessage notifications, tracks workspace path during initialization.
Test anomaly detection and assertions
tests/integration/utils/assertions.py
Adds anomaly extraction from client log messages and workspace log files, provides assert_no_anomaly() to validate no anomalies occurred, and guidance_messages() to filter guidance entries.
Test fixture anomaly validation
tests/conftest.py
Adds test result recording, converts workspace to generator-based with post-test cleanup, updates client and agentic fixtures to assert no anomalies unless @pytest.mark.allow_anomaly is set; adds verbose shutdown logging.
Test fixture anomaly check imports
tests/integration/agentic/test_agentic.py, tests/integration/agentic/test_cli.py
Imports _check_no_anomaly to perform post-test anomaly assertions during fixture teardown.
Test timing updates
tests/integration/compilation/test_persistent_cache.py, tests/integration/compilation/test_staleness.py, tests/integration/features/test_server.py, tests/integration/lifecycle/test_file_operation.py, tests/integration/modules/test_modules.py, tests/integration/stress/test_rapid_edit.py
Replaces hardcoded sleep values with standardized timing constants to reduce environment-specific flakiness.
Test expectations for new error handling
tests/integration/features/test_index.py, tests/integration/features/test_server.py, tests/integration/lifecycle/test_file_operation.py
Updates test expectations for new error handling: closed-document hover throws "Document not open" exception, invalid position throws "Invalid position" exception, validates call-hierarchy and type-hierarchy resolution failures.
Worker crash anomaly integration test
tests/integration/lifecycle/test_anomaly.py
End-to-end test validating worker crash anomaly reporting via /proc worker discovery, SIGKILL, and anomaly presence verification in logs.
Test replay anomaly detection
tests/replay.py
Detects [anomaly:] markers in replay logs, accumulates anomalies, and fails session if any are recorded.
Pytest allow_anomaly marker
tests/pytest.ini
Adds pytest marker to let tests opt out of automatic anomaly assertion checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • clice-io/clice#377: Both PRs modify the shared clice::testing::Tester test fixture (tests/unit/test/tester.h); this PR adds a Tester::to_local_range helper and feature include, while the related PR refactors the fixture to add VFS and driver entry points.

Poem

🐰 Anomalies now whisper their names,
Position maps safely without daming flames,
Commands announce their source with flair,
Configurations show their issues fair,
Tests rest assured—no crashes in the night!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/server-feedback

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/server/compiler/indexer.cpp (1)

821-834: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the fallback-source gate ahead of the module prebuild.

Lines 823-825 still run workspace.compile_graph->compile(server_path_id) before the new CommandSource::Fallback check. For module interface units with no real compile command, background indexing now skips the final worker request on Line 833, but it still pays for the PCM build first. That undermines the new “skip fallback-sourced files” rule on the heaviest path.

Suggested reorder
-    // For module interface units, compile their PCM (and transitive deps)
-    // first so the stateless worker has the artifacts it needs.
-    if(workspace.compile_graph && workspace.path_to_module.contains(server_path_id)) {
-        co_await workspace.compile_graph->compile(server_path_id);
-    }
-
     worker::BuildParams params;
     params.kind = worker::BuildKind::Index;
     params.file = file_path;
-    // Bulk background indexing sticks to real commands; synthesized fallback
-    // commands would fill the index with guesses.
-    if(compiler.fill_compile_args(file_path, params.directory, params.arguments, nullptr) ==
-       CommandSource::Fallback)
+    auto source =
+        compiler.fill_compile_args(file_path, params.directory, params.arguments, nullptr);
+    // Bulk background indexing sticks to real commands; synthesized fallback
+    // commands would fill the index with guesses.
+    if(source == CommandSource::Fallback)
         co_return;
+
+    // For module interface units, compile their PCM (and transitive deps)
+    // first so the stateless worker has the artifacts it needs.
+    if(workspace.compile_graph && workspace.path_to_module.contains(server_path_id)) {
+        co_await workspace.compile_graph->compile(server_path_id);
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/compiler/indexer.cpp` around lines 821 - 834, The fallback-source
gate should run before triggering module prebuild to avoid building PCMs for
files that will be skipped; move the call that checks
compiler.fill_compile_args(file_path, params.directory, params.arguments,
nullptr) against CommandSource::Fallback to precede the
workspace.compile_graph->compile(server_path_id) call so that if the command
source is Fallback you co_return early and do not invoke
workspace.compile_graph->compile(server_path_id); keep the existing
worker::BuildParams setup (params.kind, params.file) but only create/use them
after the fallback check or ensure the fallback check uses
file_path/server_path_id as it does now.
src/server/service/agent_client.cpp (1)

195-206: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don’t drop compile-command provenance at the agent boundary.

This handler now always succeeds, but it discards the CommandSource returned by fill_compile_args(...). That means agentic callers cannot distinguish an exact CDB command from an inferred/fallback one anymore, even though this PR adds provenance specifically to track that difference. If the old error-on-miss behavior is intentionally gone, CompileCommandResult needs to carry the source; otherwise keep rejecting Fallback here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/service/agent_client.cpp` around lines 195 - 206, The handler
passed to peer.on_request currently calls
srv.compiler.fill_compile_args(params.path, directory, arguments) and drops the
returned CommandSource, so CompileCommandResult loses provenance; update the
handler to capture the CommandSource from fill_compile_args and include it in
the response (e.g., add a source field to CompileCommandResult and populate it
from the returned CommandSource), or if the original behavior should still
reject inferred commands, check the returned CommandSource and return a failure
for CommandSource::Fallback instead of always succeeding; adjust the code paths
around CompileCommandResult, CompileCommandParams, and
srv.compiler.fill_compile_args accordingly.
src/server/worker/stateless_worker.cpp (1)

218-221: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate has_user_errors for index compile failures too.

handle_build_pch() and handle_build_pcm() now distinguish user diagnostics from infrastructure failures, but handle_index() still returns a bare failure result. That leaves has_user_errors false for normal source-level index failures, so broken user code can be reported downstream as an anomaly instead of a regular compile problem.

Suggested fix
 static worker::BuildResult handle_index(const worker::BuildParams& params) {
     ScopedTimer timer;
@@
     auto unit = compile(cp);
     if(!unit.completed()) {
-        LOG_WARN("Index failed: file={}, {}ms", params.file, timer.ms());
-        return {false, "Index compilation failed"};
+        auto errors = collect_errors(unit);
+        LOG_WARN("Index failed: file={}, {}ms, errors=[{}]", params.file, timer.ms(), errors);
+        worker::BuildResult result;
+        result.success = false;
+        result.error = errors.empty() ? "Index compilation failed" : errors;
+        result.has_user_errors = !errors.empty();
+        return result;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/worker/stateless_worker.cpp` around lines 218 - 221, handle_index
currently returns a plain failure when compile(cp) yields !unit.completed(),
which loses the user-vs-infrastructure distinction; update the failure path to
propagate the unit's user-error flag (as done in
handle_build_pch()/handle_build_pcm()) by checking unit.has_user_errors() (or
the equivalent user-diagnostics accessor) and returning a result that sets
has_user_errors accordingly instead of always leaving it false, and log/include
that flag in the returned failure from handle_index and in the LOG_WARN message
to mirror the other handlers.
🧹 Nitpick comments (1)
src/server/compiler/compiler.cpp (1)

99-99: 💤 Low value

Discarded CommandSource return value.

fill_compile_args now returns a CommandSource to indicate how the compile command was resolved, but this call site ignores the return value. While the PCM build path may not need to inject guidance diagnostics (PCMs come from module sources that likely have CDB entries), silently discarding the result loses potentially useful provenance information for debugging or future use.

Consider capturing the return value for logging consistency with other call sites, or add a comment explaining why it's intentionally ignored here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/compiler/compiler.cpp` at line 99, The call to
fill_compile_args(file_path, bp.directory, bp.arguments) discards its
CommandSource return value; capture and use that return (e.g., auto src =
fill_compile_args(...)) or explicitly document why it's ignored to preserve
provenance; update the call site in compiler.cpp to store the returned
CommandSource and either log it (consistent with other sites) or add a clear
comment referencing fill_compile_args and CommandSource explaining why the value
is intentionally unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/feature/feature.h`:
- Around line 29-35: The to_position function currently returns a default (0,0)
on mapping failure which breaks downstream unsigned delta math; change its
contract to return std::optional<protocol::Position> (i.e., inline auto
to_position(...) -> std::optional<protocol::Position>), log the anomaly via
LOG_ANOMALY(PositionMapFail, ...) and return std::nullopt on failure instead of
protocol::Position{.line=0,.character=0}; then update callers (notably the
consumer in semantic_tokens.cpp around the relative encoder) to check the
optional, drop or skip the affected artifact when mapping fails, or otherwise
handle the error path so no unsigned underflow occurs.

In `@src/server/workspace/config.cpp`:
- Around line 207-210: The function Config::load_from_workspace should clear the
out-parameter loaded_path on entry to satisfy the header contract that it “stays
empty” when no config is found; update the start of Config::load_from_workspace
to check if loaded_path is non-null and call loaded_path->clear() so any
previous value is removed before the function searches for candidates and only
set when a candidate is found.

In `@src/support/anomaly.cpp`:
- Around line 17-19: notify_hook and testing_trap are accessed concurrently
(written by set_notify_hook(), set_anomaly_trap_for_testing(),
reset_anomaly_for_testing() and read by report_anomaly()/report_guidance()),
causing a data race; protect these process-wide hooks with a mutex: add a
std::mutex (or std::shared_mutex) guarding the hook variables, acquire the lock
when mutating them in
set_notify_hook()/set_anomaly_trap_for_testing()/reset_anomaly_for_testing(),
and in report_anomaly()/report_guidance() take the lock, copy the std::function
into a local variable, release the lock, then invoke the copied callable outside
the lock to avoid holding the mutex during callback execution.

In `@tests/integration/compilation/test_staleness.py`:
- Around line 343-346: The test currently awaits
client.text_document_hover_async(...) inside pytest.raises without a timeout and
can hang; wrap the awaited call in asyncio.wait_for(...) with a short timeout
(e.g., 1s) so the assertion fails fast if no response is returned. Update the
block around client.text_document_hover_async(HoverParams(...)) to await
asyncio.wait_for(client.text_document_hover_async(...), timeout=1) and add an
asyncio import if not present.

In `@tests/replay.py`:
- Around line 150-163: The reader_loop currently only captures
"window/logMessage" entries and misses worker-only anomalies written to the
workspace logs; update the replay flow to, when display_ws resolves to a
workspace path, perform the same scan of the workspace ".clice/logs" files
(reuse the existing log-file scan logic) before making the final PASS/FAIL
decision so worker-side anomalies are collected into the anomalies list;
specifically, after or alongside reader_loop (and in the analogous spot
referenced by the other occurrence), call the log-file scanner and append any
found anomalies to the anomalies list so they are considered when deciding
PASS/FAIL.

In `@tests/unit/support/anomaly_tests.cpp`:
- Around line 14-32: The destructor of AnomalyCapture currently unconditionally
clears globals instead of restoring previous state; update the constructor to
capture the existing notify hook and anomaly trap (via logging::set_notify_hook
and logging::set_anomaly_trap_for_testing accessors) into members (e.g.,
prev_notify_hook, prev_anomaly_trap) and then in ~AnomalyCapture restore them
(call logging::set_notify_hook(prev_notify_hook) and
logging::set_anomaly_trap_for_testing(prev_anomaly_trap)) while still restoring
logging::options.level and calling logging::reset_anomaly_for_testing as
appropriate; alternatively, if restoring previous callbacks is impossible,
change the struct comment and name to clearly document that it overwrites global
hooks for the test duration.

In `@tests/unit/test/tester.h`:
- Around line 100-103: The helper to_local_range currently dereferences unit and
the results of converter.to_offset(range.start/end) unconditionally; change it
to fail tests explicitly instead of crashing by asserting unit is non-null and
that converter.to_offset(range.start) and converter.to_offset(range.end) return
values before dereferencing (e.g., use test assertions like
ASSERT_TRUE/EXPECT_TRUE or return std::optional<LocalSourceRange>), then
construct and return LocalSourceRange from the validated offsets; reference the
to_local_range function, the unit pointer, and
converter.to_offset(range.start)/converter.to_offset(range.end) locations when
making the checks.

---

Outside diff comments:
In `@src/server/compiler/indexer.cpp`:
- Around line 821-834: The fallback-source gate should run before triggering
module prebuild to avoid building PCMs for files that will be skipped; move the
call that checks compiler.fill_compile_args(file_path, params.directory,
params.arguments, nullptr) against CommandSource::Fallback to precede the
workspace.compile_graph->compile(server_path_id) call so that if the command
source is Fallback you co_return early and do not invoke
workspace.compile_graph->compile(server_path_id); keep the existing
worker::BuildParams setup (params.kind, params.file) but only create/use them
after the fallback check or ensure the fallback check uses
file_path/server_path_id as it does now.

In `@src/server/service/agent_client.cpp`:
- Around line 195-206: The handler passed to peer.on_request currently calls
srv.compiler.fill_compile_args(params.path, directory, arguments) and drops the
returned CommandSource, so CompileCommandResult loses provenance; update the
handler to capture the CommandSource from fill_compile_args and include it in
the response (e.g., add a source field to CompileCommandResult and populate it
from the returned CommandSource), or if the original behavior should still
reject inferred commands, check the returned CommandSource and return a failure
for CommandSource::Fallback instead of always succeeding; adjust the code paths
around CompileCommandResult, CompileCommandParams, and
srv.compiler.fill_compile_args accordingly.

In `@src/server/worker/stateless_worker.cpp`:
- Around line 218-221: handle_index currently returns a plain failure when
compile(cp) yields !unit.completed(), which loses the user-vs-infrastructure
distinction; update the failure path to propagate the unit's user-error flag (as
done in handle_build_pch()/handle_build_pcm()) by checking
unit.has_user_errors() (or the equivalent user-diagnostics accessor) and
returning a result that sets has_user_errors accordingly instead of always
leaving it false, and log/include that flag in the returned failure from
handle_index and in the LOG_WARN message to mirror the other handlers.

---

Nitpick comments:
In `@src/server/compiler/compiler.cpp`:
- Line 99: The call to fill_compile_args(file_path, bp.directory, bp.arguments)
discards its CommandSource return value; capture and use that return (e.g., auto
src = fill_compile_args(...)) or explicitly document why it's ignored to
preserve provenance; update the call site in compiler.cpp to store the returned
CommandSource and either log it (consistent with other sites) or add a clear
comment referencing fill_compile_args and CommandSource explaining why the value
is intentionally unused.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 017bb9e9-2234-408b-a1dc-fde1bdc1e898

📥 Commits

Reviewing files that changed from the base of the PR and between d20e897 and 340f9dd.

📒 Files selected for processing (47)
  • cmake/package.cmake
  • src/compile/diagnostic.cpp
  • src/feature/code_completion.cpp
  • src/feature/diagnostics.cpp
  • src/feature/feature.h
  • src/feature/folding_ranges.cpp
  • src/feature/formatting.cpp
  • src/feature/inlay_hints.cpp
  • src/feature/semantic_tokens.cpp
  • src/server/compiler/compiler.cpp
  • src/server/compiler/compiler.h
  • src/server/compiler/indexer.cpp
  • src/server/protocol/worker.h
  • src/server/service/agent_client.cpp
  • src/server/service/lsp_client.cpp
  • src/server/service/lsp_client.h
  • src/server/service/master_server.cpp
  • src/server/service/master_server.h
  • src/server/worker/stateless_worker.cpp
  • src/server/worker/worker_pool.cpp
  • src/server/workspace/config.cpp
  • src/server/workspace/config.h
  • src/support/anomaly.cpp
  • src/support/anomaly.h
  • tests/conftest.py
  • tests/integration/agentic/test_agentic.py
  • tests/integration/agentic/test_cli.py
  • tests/integration/compilation/test_persistent_cache.py
  • tests/integration/compilation/test_staleness.py
  • tests/integration/features/test_guidance_diagnostics.py
  • tests/integration/features/test_index.py
  • tests/integration/features/test_server.py
  • tests/integration/lifecycle/test_anomaly.py
  • tests/integration/lifecycle/test_config.py
  • tests/integration/lifecycle/test_file_operation.py
  • tests/integration/modules/test_modules.py
  • tests/integration/stress/test_rapid_edit.py
  • tests/integration/utils/assertions.py
  • tests/integration/utils/client.py
  • tests/integration/utils/wait.py
  • tests/pytest.ini
  • tests/replay.py
  • tests/unit/feature/document_link_tests.cpp
  • tests/unit/feature/folding_range_tests.cpp
  • tests/unit/server/config_tests.cpp
  • tests/unit/support/anomaly_tests.cpp
  • tests/unit/test/tester.h
💤 Files with no reviewable changes (1)
  • tests/unit/feature/document_link_tests.cpp

Comment thread src/feature/feature.h
Comment on lines +29 to +35
inline auto to_position(const PositionMapper& converter, std::uint32_t offset)
-> protocol::Position {
if(auto position = converter.to_position(offset)) {
return *position;
}
LOG_ANOMALY(PositionMapFail, "offset {} cannot be mapped to a position", offset);
return protocol::Position{.line = 0, .character = 0};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don’t clamp failed mappings to document start.

Line 35 turns every mapping failure into (0, 0), which breaks downstream ordering assumptions in release mode. A concrete case is src/feature/semantic_tokens.cpp Lines 503-506: the relative encoder subtracts unsigned line/column deltas from the previous token, so a late failure here underflows and corrupts the semantic-token payload instead of degrading one item cleanly. Please return an optional/error and let callers drop the affected artifact after logging, or clamp to a monotonic boundary such as EOF rather than the document start.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/feature/feature.h` around lines 29 - 35, The to_position function
currently returns a default (0,0) on mapping failure which breaks downstream
unsigned delta math; change its contract to return
std::optional<protocol::Position> (i.e., inline auto to_position(...) ->
std::optional<protocol::Position>), log the anomaly via
LOG_ANOMALY(PositionMapFail, ...) and return std::nullopt on failure instead of
protocol::Position{.line=0,.character=0}; then update callers (notably the
consumer in semantic_tokens.cpp around the relative encoder) to check the
optional, drop or skip the affected artifact when mapping fails, or otherwise
handle the error path so no unsigned underflow occurs.

Comment on lines +207 to +210
Config Config::load_from_workspace(llvm::StringRef workspace_root,
std::vector<ConfigIssue>* issues,
std::string* loaded_path,
bool with_defaults) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear loaded_path on entry to preserve the documented contract.

load_from_workspace() only writes loaded_path when it finds a candidate, so callers that reuse a non-empty string keep a stale path after a no-config lookup. That contradicts the header contract in src/server/workspace/config.h ("stays empty") and can leak old config URIs into downstream state.

Proposed fix
 Config Config::load_from_workspace(llvm::StringRef workspace_root,
                                    std::vector<ConfigIssue>* issues,
                                    std::string* loaded_path,
                                    bool with_defaults) {
+    if(loaded_path)
+        loaded_path->clear();
+
     if(!workspace_root.empty()) {
         for(auto* name: {"clice.toml", ".clice/config.toml"}) {
             auto config_path = path::join(workspace_root, name);
             if(!llvm::sys::fs::exists(config_path))
                 continue;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/workspace/config.cpp` around lines 207 - 210, The function
Config::load_from_workspace should clear the out-parameter loaded_path on entry
to satisfy the header contract that it “stays empty” when no config is found;
update the start of Config::load_from_workspace to check if loaded_path is
non-null and call loaded_path->clear() so any previous value is removed before
the function searches for candidates and only set when a candidate is found.

Comment thread src/support/anomaly.cpp
Comment on lines +17 to +19
std::function<void(NotifyLevel, std::string_view)> notify_hook;

std::function<void(AnomalyId)> testing_trap;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Synchronize the process-wide hooks before this ships.

notify_hook and testing_trap are mutated in set_notify_hook(), set_anomaly_trap_for_testing(), and reset_anomaly_for_testing(), then read from report_anomaly() / report_guidance() with no synchronization. In a server that already models worker crashes/spawn failures, that is a real cross-thread data race on std::function, so a report during hook install/reset is UB and can crash or invoke torn state. Guard the hook state with a mutex (copy the callable under the lock, then invoke it after unlocking) or move to another thread-safe ownership model.

Also applies to: 75-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/support/anomaly.cpp` around lines 17 - 19, notify_hook and testing_trap
are accessed concurrently (written by set_notify_hook(),
set_anomaly_trap_for_testing(), reset_anomaly_for_testing() and read by
report_anomaly()/report_guidance()), causing a data race; protect these
process-wide hooks with a mutex: add a std::mutex (or std::shared_mutex)
guarding the hook variables, acquire the lock when mutating them in
set_notify_hook()/set_anomaly_trap_for_testing()/reset_anomaly_for_testing(),
and in report_anomaly()/report_guidance() take the lock, copy the std::function
into a local variable, release the lock, then invoke the copied callable outside
the lock to avoid holding the mutex during callback execution.

Comment on lines +343 to +346
with pytest.raises(Exception, match="Document not open"):
await client.text_document_hover_async(
HoverParams(text_document=doc(uri), position=Position(line=0, character=4))
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep a timeout on the closed-document hover assertion.

This request is now validating an error path, but it no longer has a local timeout. If the server regresses from “returns LSP error” to “never responds”, this test will hang the suite instead of failing fast.

Suggested fix
-    with pytest.raises(Exception, match="Document not open"):
-        await client.text_document_hover_async(
-            HoverParams(text_document=doc(uri), position=Position(line=0, character=4))
-        )
+    with pytest.raises(Exception, match="Document not open"):
+        await asyncio.wait_for(
+            client.text_document_hover_async(
+                HoverParams(
+                    text_document=doc(uri), position=Position(line=0, character=4)
+                )
+            ),
+            timeout=5.0,
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/compilation/test_staleness.py` around lines 343 - 346, The
test currently awaits client.text_document_hover_async(...) inside pytest.raises
without a timeout and can hang; wrap the awaited call in asyncio.wait_for(...)
with a short timeout (e.g., 1s) so the assertion fails fast if no response is
returned. Update the block around
client.text_document_hover_async(HoverParams(...)) to await
asyncio.wait_for(client.text_document_hover_async(...), timeout=1) and add an
asyncio import if not present.

Comment thread tests/replay.py
Comment on lines +150 to +163
anomalies: list[str] = []

async def reader_loop():
try:
while True:
msg = await read_lsp_message(proc.stdout)
if msg is None:
break
msg_id, method = msg.get("id"), msg.get("method")
if method == "window/logMessage":
text = msg.get("params", {}).get("message", "")
if "[anomaly:" in text:
anomalies.append(text)
print(f" ANOMALY: {text}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Detect worker-only anomalies here too.

This replay path only watches window/logMessage, but the new test helpers explicitly need a second pass over <workspace>/.clice/logs because worker anomalies are not forwarded there in v1.0. As written, a replay can still report PASS after a worker-side anomaly if the master never echoed it through logMessage. Reuse the same log-file scan before the final PASS/FAIL decision when display_ws resolves to a workspace path.

Suggested fix
 import argparse
 import asyncio
 import json
 import os
 import re
 import signal
 import sys
 import time

 # Force line-buffered stdout so CI sees output immediately.
 sys.stdout.reconfigure(line_buffering=True)
 from pathlib import Path
 from urllib.parse import quote, unquote
+
+from tests.integration.utils.assertions import anomalies_in_log_files
@@
-    if anomalies:
+    if display_ws:
+        anomalies.extend(anomalies_in_log_files(Path(display_ws)))
+
+    if anomalies:
         # Anomalies are internal clice bugs; a replay session must be clean.
         print(f"  result: FAIL ({len(anomalies)} anomaly report(s), {elapsed:.1f}s)")
         print_stderr()
         return False

Also applies to: 322-326

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/replay.py` around lines 150 - 163, The reader_loop currently only
captures "window/logMessage" entries and misses worker-only anomalies written to
the workspace logs; update the replay flow to, when display_ws resolves to a
workspace path, perform the same scan of the workspace ".clice/logs" files
(reuse the existing log-file scan logic) before making the final PASS/FAIL
decision so worker-side anomalies are collected into the anomalies list;
specifically, after or alongside reader_loop (and in the analogous spot
referenced by the other occurrence), call the log-file scanner and append any
found anomalies to the anomalies list so they are considered when deciding
PASS/FAIL.

Comment on lines +14 to +32
/// RAII fixture: capture the notify hook and trap, restore globals afterwards.
struct AnomalyCapture {
std::vector<std::pair<NotifyLevel, std::string>> notified;
std::vector<AnomalyId> trapped;
logging::Level saved_level;

AnomalyCapture() : saved_level(logging::options.level) {
logging::reset_anomaly_for_testing();
logging::set_notify_hook([this](NotifyLevel level, std::string_view message) {
notified.emplace_back(level, std::string(message));
});
logging::set_anomaly_trap_for_testing([this](AnomalyId id) { trapped.push_back(id); });
}

~AnomalyCapture() {
logging::set_notify_hook(nullptr);
logging::reset_anomaly_for_testing();
logging::options.level = saved_level;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This fixture does not actually restore the prior hook state.

The comment says AnomalyCapture “restore[s] globals afterwards”, but the destructor always forces notify_hook to nullptr and clears the trap instead of restoring whatever was installed before construction. That makes the helper order-dependent if another test or fixture already configured anomaly delivery. Either capture/restore the previous callbacks via test-only accessors, or narrow the fixture contract/comments so it is explicit that it overwrites global state for the duration of the test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/support/anomaly_tests.cpp` around lines 14 - 32, The destructor of
AnomalyCapture currently unconditionally clears globals instead of restoring
previous state; update the constructor to capture the existing notify hook and
anomaly trap (via logging::set_notify_hook and
logging::set_anomaly_trap_for_testing accessors) into members (e.g.,
prev_notify_hook, prev_anomaly_trap) and then in ~AnomalyCapture restore them
(call logging::set_notify_hook(prev_notify_hook) and
logging::set_anomaly_trap_for_testing(prev_anomaly_trap)) while still restoring
logging::options.level and calling logging::reset_anomaly_for_testing as
appropriate; alternatively, if restoring previous callbacks is impossible,
change the struct comment and name to clearly document that it overwrites global
hooks for the test duration.

Comment thread tests/unit/test/tester.h
Comment on lines +100 to +103
LocalSourceRange to_local_range(const kota::ipc::protocol::Range& range) {
feature::PositionMapper converter(unit->interested_content(),
feature::PositionEncoding::UTF8);
return LocalSourceRange(*converter.to_offset(range.start), *converter.to_offset(range.end));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid hard-crashing the shared range helper on unmappable positions.

This helper centralizes protocol→local conversion for multiple feature tests, but it still dereferences unit, converter.to_offset(range.start), and converter.to_offset(range.end) unconditionally. If a feature regression produces an invalid/clamped protocol range, the test process dies here instead of failing with a targeted assertion. Please turn these into explicit test assertions (or return an std::optional<LocalSourceRange>) so mapping failures surface as actionable test failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test/tester.h` around lines 100 - 103, The helper to_local_range
currently dereferences unit and the results of
converter.to_offset(range.start/end) unconditionally; change it to fail tests
explicitly instead of crashing by asserting unit is non-null and that
converter.to_offset(range.start) and converter.to_offset(range.end) return
values before dereferencing (e.g., use test assertions like
ASSERT_TRUE/EXPECT_TRUE or return std::optional<LocalSourceRange>), then
construct and return LocalSourceRange from the validated offsets; reference the
to_local_range function, the unit pointer, and
converter.to_offset(range.start)/converter.to_offset(range.end) locations when
making the checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant