feat(server): error feedback and logging channels (D9)#456
Conversation
Pulls clice-io/kotatsu#168: TOML syntax errors carry description and line/column in no-exceptions builds, and unknown-field errors get the offending node's location.
LOG_ANOMALY(id, ...) is a soft assertion for internal invariants: Debug builds abort after logging (CLICE_ANOMALY_NO_TRAP escapes for tests), Release builds log with a stable [anomaly:<id>] marker, forward it to a notify hook (master: window/logMessage) and continue. Reports are rate-limited per ID; the gate runs before format arguments are evaluated, so suppressed reports cost nothing. LOG_GUIDANCE(...) is the user-actionable channel ([guidance] marker, Warning logMessage) for situations that are not clice bugs. Unit tests lock the lazy-evaluation contract, rate limiting, markers and the trap mock point.
Reimplements the intent of #436 on current main (superseding it): - fill_compile_args() reports a CommandSource (CdbExact / IncludeGraph / Inferred / Fallback) and emits a per-file decision log (tiers tried, tier hit, command hash). Real CDB entries are now distinguished from the synthesized default command via has_entry(), which also revives the automatic include-graph header context tier that the synthesis had shadowed. - Guessed commands whose diagnostics contain file-not-found errors get a file-top Warning guidance diagnostic (source: clice, code links the setup guide, resolving the DiagnosticSource::Clice TODO). Exact CDB matches never get it. - clice.toml problems become diagnostics on the config file's URI with line/column from the TOML decoder: decode failures as Error (defaults apply), unknown keys as Warning via a strict second decode pass. initializationOptions now overlay BEFORE apply_defaults so derived fields (logging_dir, index_dir) follow overridden values. - Feature requests on closed documents and unresolvable hierarchy items return LSP errors instead of silent null; client-provided positions and ranges that fail to map return InvalidParams. Empty hierarchy and workspace-symbol results return empty arrays instead of null. - Internal failures report anomalies: unexpected PCH/PCM build failures (user-code errors are distinguished via BuildResult.has_user_errors), compile/query IPC failures, worker crash/spawn failures, and internally-produced offsets failing to map (feature::to_position now clamps instead of dereferencing an empty optional). - Master pushes anomaly/guidance to the client via window/logMessage; workers only log (their files are scanned by tests).
- CliceClient records window/logMessage; assert_no_anomaly() runs in every client/agentic fixture teardown, scanning both notifications and master/worker log files (opt-out: @pytest.mark.allow_anomaly). - replay.py fails a trace when an [anomaly:] message is pushed. - E2E: fallback-command guidance diagnostic lifecycle (appears without CDB, gone after providing one), clice.toml type/unknown-key diagnostics with line/column and clearing after a fix, worker-crash anomaly reporting (Linux), error responses for requests on closed or never-opened documents. - Named timing constants (MTIME_GRANULARITY/SETTLE_TIME/IDLE_TIMEOUT) replace hardcoded sleeps; workspace fixture cleans up after tests; log messages are dumped when a test fails. - Tester::to_local_range() replaces per-file duplicates.
Pre-PR review findings: - publish_config_diagnostics groups issues by their own file URI; a malformed clice.toml no longer renders its error on the next config candidate that loaded. - A failed switchContext header context now falls through to the real CDB entry before reaching the synthesized fallback. - Header-context host selection filters hosts through has_entry(); lookup() never returns empty (it synthesizes), so the old emptiness checks were dead and hosts without real entries were mislabeled as IncludeGraph. - Style: string_view message parameters, llvm::join, doc comments.
- All seven AnomalyIds fire through the macro once (marker names are wire-stable for the integration greps). - InvalidParams for an out-of-range hover position; error response for an unresolvable call hierarchy item. - Column assertions for config issue locations. - assert_no_anomaly in the agentic local fixtures and make_client-based tests that bypass the client fixture teardown.
📝 WalkthroughWalkthroughThis PR implements a comprehensive internal anomaly reporting infrastructure with rate limiting and notifications, refactors feature position conversions to use safe helpers instead of dereferencing optionals, classifies compile command sources with per-file decision logging, extends configuration loading to capture and report issues as LSP diagnostics, and updates LSP error handling to return structured errors with guidance diagnostics for inferred compile commands. ChangesAnomaly Reporting Infrastructure
Safe Position Mapping Refactoring
Compile Command Source Tracking
Configuration Issue Reporting
LSP Error Handling and Diagnostics Publishing
Test Infrastructure and Reliability
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/server/compiler/indexer.cpp (1)
821-834:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove the fallback-source gate ahead of the module prebuild.
Lines 823-825 still run
workspace.compile_graph->compile(server_path_id)before the newCommandSource::Fallbackcheck. For module interface units with no real compile command, background indexing now skips the final worker request on Line 833, but it still pays for the PCM build first. That undermines the new “skip fallback-sourced files” rule on the heaviest path.Suggested reorder
- // For module interface units, compile their PCM (and transitive deps) - // first so the stateless worker has the artifacts it needs. - if(workspace.compile_graph && workspace.path_to_module.contains(server_path_id)) { - co_await workspace.compile_graph->compile(server_path_id); - } - worker::BuildParams params; params.kind = worker::BuildKind::Index; params.file = file_path; - // Bulk background indexing sticks to real commands; synthesized fallback - // commands would fill the index with guesses. - if(compiler.fill_compile_args(file_path, params.directory, params.arguments, nullptr) == - CommandSource::Fallback) + auto source = + compiler.fill_compile_args(file_path, params.directory, params.arguments, nullptr); + // Bulk background indexing sticks to real commands; synthesized fallback + // commands would fill the index with guesses. + if(source == CommandSource::Fallback) co_return; + + // For module interface units, compile their PCM (and transitive deps) + // first so the stateless worker has the artifacts it needs. + if(workspace.compile_graph && workspace.path_to_module.contains(server_path_id)) { + co_await workspace.compile_graph->compile(server_path_id); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/compiler/indexer.cpp` around lines 821 - 834, The fallback-source gate should run before triggering module prebuild to avoid building PCMs for files that will be skipped; move the call that checks compiler.fill_compile_args(file_path, params.directory, params.arguments, nullptr) against CommandSource::Fallback to precede the workspace.compile_graph->compile(server_path_id) call so that if the command source is Fallback you co_return early and do not invoke workspace.compile_graph->compile(server_path_id); keep the existing worker::BuildParams setup (params.kind, params.file) but only create/use them after the fallback check or ensure the fallback check uses file_path/server_path_id as it does now.src/server/service/agent_client.cpp (1)
195-206:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon’t drop compile-command provenance at the agent boundary.
This handler now always succeeds, but it discards the
CommandSourcereturned byfill_compile_args(...). That means agentic callers cannot distinguish an exact CDB command from an inferred/fallback one anymore, even though this PR adds provenance specifically to track that difference. If the old error-on-miss behavior is intentionally gone,CompileCommandResultneeds to carry the source; otherwise keep rejectingFallbackhere.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/service/agent_client.cpp` around lines 195 - 206, The handler passed to peer.on_request currently calls srv.compiler.fill_compile_args(params.path, directory, arguments) and drops the returned CommandSource, so CompileCommandResult loses provenance; update the handler to capture the CommandSource from fill_compile_args and include it in the response (e.g., add a source field to CompileCommandResult and populate it from the returned CommandSource), or if the original behavior should still reject inferred commands, check the returned CommandSource and return a failure for CommandSource::Fallback instead of always succeeding; adjust the code paths around CompileCommandResult, CompileCommandParams, and srv.compiler.fill_compile_args accordingly.src/server/worker/stateless_worker.cpp (1)
218-221:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate
has_user_errorsfor index compile failures too.
handle_build_pch()andhandle_build_pcm()now distinguish user diagnostics from infrastructure failures, buthandle_index()still returns a bare failure result. That leaveshas_user_errorsfalse for normal source-level index failures, so broken user code can be reported downstream as an anomaly instead of a regular compile problem.Suggested fix
static worker::BuildResult handle_index(const worker::BuildParams& params) { ScopedTimer timer; @@ auto unit = compile(cp); if(!unit.completed()) { - LOG_WARN("Index failed: file={}, {}ms", params.file, timer.ms()); - return {false, "Index compilation failed"}; + auto errors = collect_errors(unit); + LOG_WARN("Index failed: file={}, {}ms, errors=[{}]", params.file, timer.ms(), errors); + worker::BuildResult result; + result.success = false; + result.error = errors.empty() ? "Index compilation failed" : errors; + result.has_user_errors = !errors.empty(); + return result; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/worker/stateless_worker.cpp` around lines 218 - 221, handle_index currently returns a plain failure when compile(cp) yields !unit.completed(), which loses the user-vs-infrastructure distinction; update the failure path to propagate the unit's user-error flag (as done in handle_build_pch()/handle_build_pcm()) by checking unit.has_user_errors() (or the equivalent user-diagnostics accessor) and returning a result that sets has_user_errors accordingly instead of always leaving it false, and log/include that flag in the returned failure from handle_index and in the LOG_WARN message to mirror the other handlers.
🧹 Nitpick comments (1)
src/server/compiler/compiler.cpp (1)
99-99: 💤 Low valueDiscarded
CommandSourcereturn value.
fill_compile_argsnow returns aCommandSourceto indicate how the compile command was resolved, but this call site ignores the return value. While the PCM build path may not need to inject guidance diagnostics (PCMs come from module sources that likely have CDB entries), silently discarding the result loses potentially useful provenance information for debugging or future use.Consider capturing the return value for logging consistency with other call sites, or add a comment explaining why it's intentionally ignored here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/compiler/compiler.cpp` at line 99, The call to fill_compile_args(file_path, bp.directory, bp.arguments) discards its CommandSource return value; capture and use that return (e.g., auto src = fill_compile_args(...)) or explicitly document why it's ignored to preserve provenance; update the call site in compiler.cpp to store the returned CommandSource and either log it (consistent with other sites) or add a clear comment referencing fill_compile_args and CommandSource explaining why the value is intentionally unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/feature/feature.h`:
- Around line 29-35: The to_position function currently returns a default (0,0)
on mapping failure which breaks downstream unsigned delta math; change its
contract to return std::optional<protocol::Position> (i.e., inline auto
to_position(...) -> std::optional<protocol::Position>), log the anomaly via
LOG_ANOMALY(PositionMapFail, ...) and return std::nullopt on failure instead of
protocol::Position{.line=0,.character=0}; then update callers (notably the
consumer in semantic_tokens.cpp around the relative encoder) to check the
optional, drop or skip the affected artifact when mapping fails, or otherwise
handle the error path so no unsigned underflow occurs.
In `@src/server/workspace/config.cpp`:
- Around line 207-210: The function Config::load_from_workspace should clear the
out-parameter loaded_path on entry to satisfy the header contract that it “stays
empty” when no config is found; update the start of Config::load_from_workspace
to check if loaded_path is non-null and call loaded_path->clear() so any
previous value is removed before the function searches for candidates and only
set when a candidate is found.
In `@src/support/anomaly.cpp`:
- Around line 17-19: notify_hook and testing_trap are accessed concurrently
(written by set_notify_hook(), set_anomaly_trap_for_testing(),
reset_anomaly_for_testing() and read by report_anomaly()/report_guidance()),
causing a data race; protect these process-wide hooks with a mutex: add a
std::mutex (or std::shared_mutex) guarding the hook variables, acquire the lock
when mutating them in
set_notify_hook()/set_anomaly_trap_for_testing()/reset_anomaly_for_testing(),
and in report_anomaly()/report_guidance() take the lock, copy the std::function
into a local variable, release the lock, then invoke the copied callable outside
the lock to avoid holding the mutex during callback execution.
In `@tests/integration/compilation/test_staleness.py`:
- Around line 343-346: The test currently awaits
client.text_document_hover_async(...) inside pytest.raises without a timeout and
can hang; wrap the awaited call in asyncio.wait_for(...) with a short timeout
(e.g., 1s) so the assertion fails fast if no response is returned. Update the
block around client.text_document_hover_async(HoverParams(...)) to await
asyncio.wait_for(client.text_document_hover_async(...), timeout=1) and add an
asyncio import if not present.
In `@tests/replay.py`:
- Around line 150-163: The reader_loop currently only captures
"window/logMessage" entries and misses worker-only anomalies written to the
workspace logs; update the replay flow to, when display_ws resolves to a
workspace path, perform the same scan of the workspace ".clice/logs" files
(reuse the existing log-file scan logic) before making the final PASS/FAIL
decision so worker-side anomalies are collected into the anomalies list;
specifically, after or alongside reader_loop (and in the analogous spot
referenced by the other occurrence), call the log-file scanner and append any
found anomalies to the anomalies list so they are considered when deciding
PASS/FAIL.
In `@tests/unit/support/anomaly_tests.cpp`:
- Around line 14-32: The destructor of AnomalyCapture currently unconditionally
clears globals instead of restoring previous state; update the constructor to
capture the existing notify hook and anomaly trap (via logging::set_notify_hook
and logging::set_anomaly_trap_for_testing accessors) into members (e.g.,
prev_notify_hook, prev_anomaly_trap) and then in ~AnomalyCapture restore them
(call logging::set_notify_hook(prev_notify_hook) and
logging::set_anomaly_trap_for_testing(prev_anomaly_trap)) while still restoring
logging::options.level and calling logging::reset_anomaly_for_testing as
appropriate; alternatively, if restoring previous callbacks is impossible,
change the struct comment and name to clearly document that it overwrites global
hooks for the test duration.
In `@tests/unit/test/tester.h`:
- Around line 100-103: The helper to_local_range currently dereferences unit and
the results of converter.to_offset(range.start/end) unconditionally; change it
to fail tests explicitly instead of crashing by asserting unit is non-null and
that converter.to_offset(range.start) and converter.to_offset(range.end) return
values before dereferencing (e.g., use test assertions like
ASSERT_TRUE/EXPECT_TRUE or return std::optional<LocalSourceRange>), then
construct and return LocalSourceRange from the validated offsets; reference the
to_local_range function, the unit pointer, and
converter.to_offset(range.start)/converter.to_offset(range.end) locations when
making the checks.
---
Outside diff comments:
In `@src/server/compiler/indexer.cpp`:
- Around line 821-834: The fallback-source gate should run before triggering
module prebuild to avoid building PCMs for files that will be skipped; move the
call that checks compiler.fill_compile_args(file_path, params.directory,
params.arguments, nullptr) against CommandSource::Fallback to precede the
workspace.compile_graph->compile(server_path_id) call so that if the command
source is Fallback you co_return early and do not invoke
workspace.compile_graph->compile(server_path_id); keep the existing
worker::BuildParams setup (params.kind, params.file) but only create/use them
after the fallback check or ensure the fallback check uses
file_path/server_path_id as it does now.
In `@src/server/service/agent_client.cpp`:
- Around line 195-206: The handler passed to peer.on_request currently calls
srv.compiler.fill_compile_args(params.path, directory, arguments) and drops the
returned CommandSource, so CompileCommandResult loses provenance; update the
handler to capture the CommandSource from fill_compile_args and include it in
the response (e.g., add a source field to CompileCommandResult and populate it
from the returned CommandSource), or if the original behavior should still
reject inferred commands, check the returned CommandSource and return a failure
for CommandSource::Fallback instead of always succeeding; adjust the code paths
around CompileCommandResult, CompileCommandParams, and
srv.compiler.fill_compile_args accordingly.
In `@src/server/worker/stateless_worker.cpp`:
- Around line 218-221: handle_index currently returns a plain failure when
compile(cp) yields !unit.completed(), which loses the user-vs-infrastructure
distinction; update the failure path to propagate the unit's user-error flag (as
done in handle_build_pch()/handle_build_pcm()) by checking
unit.has_user_errors() (or the equivalent user-diagnostics accessor) and
returning a result that sets has_user_errors accordingly instead of always
leaving it false, and log/include that flag in the returned failure from
handle_index and in the LOG_WARN message to mirror the other handlers.
---
Nitpick comments:
In `@src/server/compiler/compiler.cpp`:
- Line 99: The call to fill_compile_args(file_path, bp.directory, bp.arguments)
discards its CommandSource return value; capture and use that return (e.g., auto
src = fill_compile_args(...)) or explicitly document why it's ignored to
preserve provenance; update the call site in compiler.cpp to store the returned
CommandSource and either log it (consistent with other sites) or add a clear
comment referencing fill_compile_args and CommandSource explaining why the value
is intentionally unused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 017bb9e9-2234-408b-a1dc-fde1bdc1e898
📒 Files selected for processing (47)
cmake/package.cmakesrc/compile/diagnostic.cppsrc/feature/code_completion.cppsrc/feature/diagnostics.cppsrc/feature/feature.hsrc/feature/folding_ranges.cppsrc/feature/formatting.cppsrc/feature/inlay_hints.cppsrc/feature/semantic_tokens.cppsrc/server/compiler/compiler.cppsrc/server/compiler/compiler.hsrc/server/compiler/indexer.cppsrc/server/protocol/worker.hsrc/server/service/agent_client.cppsrc/server/service/lsp_client.cppsrc/server/service/lsp_client.hsrc/server/service/master_server.cppsrc/server/service/master_server.hsrc/server/worker/stateless_worker.cppsrc/server/worker/worker_pool.cppsrc/server/workspace/config.cppsrc/server/workspace/config.hsrc/support/anomaly.cppsrc/support/anomaly.htests/conftest.pytests/integration/agentic/test_agentic.pytests/integration/agentic/test_cli.pytests/integration/compilation/test_persistent_cache.pytests/integration/compilation/test_staleness.pytests/integration/features/test_guidance_diagnostics.pytests/integration/features/test_index.pytests/integration/features/test_server.pytests/integration/lifecycle/test_anomaly.pytests/integration/lifecycle/test_config.pytests/integration/lifecycle/test_file_operation.pytests/integration/modules/test_modules.pytests/integration/stress/test_rapid_edit.pytests/integration/utils/assertions.pytests/integration/utils/client.pytests/integration/utils/wait.pytests/pytest.initests/replay.pytests/unit/feature/document_link_tests.cpptests/unit/feature/folding_range_tests.cpptests/unit/server/config_tests.cpptests/unit/support/anomaly_tests.cpptests/unit/test/tester.h
💤 Files with no reviewable changes (1)
- tests/unit/feature/document_link_tests.cpp
| inline auto to_position(const PositionMapper& converter, std::uint32_t offset) | ||
| -> protocol::Position { | ||
| if(auto position = converter.to_position(offset)) { | ||
| return *position; | ||
| } | ||
| LOG_ANOMALY(PositionMapFail, "offset {} cannot be mapped to a position", offset); | ||
| return protocol::Position{.line = 0, .character = 0}; |
There was a problem hiding this comment.
Don’t clamp failed mappings to document start.
Line 35 turns every mapping failure into (0, 0), which breaks downstream ordering assumptions in release mode. A concrete case is src/feature/semantic_tokens.cpp Lines 503-506: the relative encoder subtracts unsigned line/column deltas from the previous token, so a late failure here underflows and corrupts the semantic-token payload instead of degrading one item cleanly. Please return an optional/error and let callers drop the affected artifact after logging, or clamp to a monotonic boundary such as EOF rather than the document start.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/feature/feature.h` around lines 29 - 35, The to_position function
currently returns a default (0,0) on mapping failure which breaks downstream
unsigned delta math; change its contract to return
std::optional<protocol::Position> (i.e., inline auto to_position(...) ->
std::optional<protocol::Position>), log the anomaly via
LOG_ANOMALY(PositionMapFail, ...) and return std::nullopt on failure instead of
protocol::Position{.line=0,.character=0}; then update callers (notably the
consumer in semantic_tokens.cpp around the relative encoder) to check the
optional, drop or skip the affected artifact when mapping fails, or otherwise
handle the error path so no unsigned underflow occurs.
| Config Config::load_from_workspace(llvm::StringRef workspace_root, | ||
| std::vector<ConfigIssue>* issues, | ||
| std::string* loaded_path, | ||
| bool with_defaults) { |
There was a problem hiding this comment.
Clear loaded_path on entry to preserve the documented contract.
load_from_workspace() only writes loaded_path when it finds a candidate, so callers that reuse a non-empty string keep a stale path after a no-config lookup. That contradicts the header contract in src/server/workspace/config.h ("stays empty") and can leak old config URIs into downstream state.
Proposed fix
Config Config::load_from_workspace(llvm::StringRef workspace_root,
std::vector<ConfigIssue>* issues,
std::string* loaded_path,
bool with_defaults) {
+ if(loaded_path)
+ loaded_path->clear();
+
if(!workspace_root.empty()) {
for(auto* name: {"clice.toml", ".clice/config.toml"}) {
auto config_path = path::join(workspace_root, name);
if(!llvm::sys::fs::exists(config_path))
continue;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/server/workspace/config.cpp` around lines 207 - 210, The function
Config::load_from_workspace should clear the out-parameter loaded_path on entry
to satisfy the header contract that it “stays empty” when no config is found;
update the start of Config::load_from_workspace to check if loaded_path is
non-null and call loaded_path->clear() so any previous value is removed before
the function searches for candidates and only set when a candidate is found.
| std::function<void(NotifyLevel, std::string_view)> notify_hook; | ||
|
|
||
| std::function<void(AnomalyId)> testing_trap; |
There was a problem hiding this comment.
Synchronize the process-wide hooks before this ships.
notify_hook and testing_trap are mutated in set_notify_hook(), set_anomaly_trap_for_testing(), and reset_anomaly_for_testing(), then read from report_anomaly() / report_guidance() with no synchronization. In a server that already models worker crashes/spawn failures, that is a real cross-thread data race on std::function, so a report during hook install/reset is UB and can crash or invoke torn state. Guard the hook state with a mutex (copy the callable under the lock, then invoke it after unlocking) or move to another thread-safe ownership model.
Also applies to: 75-105
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/support/anomaly.cpp` around lines 17 - 19, notify_hook and testing_trap
are accessed concurrently (written by set_notify_hook(),
set_anomaly_trap_for_testing(), reset_anomaly_for_testing() and read by
report_anomaly()/report_guidance()), causing a data race; protect these
process-wide hooks with a mutex: add a std::mutex (or std::shared_mutex)
guarding the hook variables, acquire the lock when mutating them in
set_notify_hook()/set_anomaly_trap_for_testing()/reset_anomaly_for_testing(),
and in report_anomaly()/report_guidance() take the lock, copy the std::function
into a local variable, release the lock, then invoke the copied callable outside
the lock to avoid holding the mutex during callback execution.
| with pytest.raises(Exception, match="Document not open"): | ||
| await client.text_document_hover_async( | ||
| HoverParams(text_document=doc(uri), position=Position(line=0, character=4)) | ||
| ) |
There was a problem hiding this comment.
Keep a timeout on the closed-document hover assertion.
This request is now validating an error path, but it no longer has a local timeout. If the server regresses from “returns LSP error” to “never responds”, this test will hang the suite instead of failing fast.
Suggested fix
- with pytest.raises(Exception, match="Document not open"):
- await client.text_document_hover_async(
- HoverParams(text_document=doc(uri), position=Position(line=0, character=4))
- )
+ with pytest.raises(Exception, match="Document not open"):
+ await asyncio.wait_for(
+ client.text_document_hover_async(
+ HoverParams(
+ text_document=doc(uri), position=Position(line=0, character=4)
+ )
+ ),
+ timeout=5.0,
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/compilation/test_staleness.py` around lines 343 - 346, The
test currently awaits client.text_document_hover_async(...) inside pytest.raises
without a timeout and can hang; wrap the awaited call in asyncio.wait_for(...)
with a short timeout (e.g., 1s) so the assertion fails fast if no response is
returned. Update the block around
client.text_document_hover_async(HoverParams(...)) to await
asyncio.wait_for(client.text_document_hover_async(...), timeout=1) and add an
asyncio import if not present.
| anomalies: list[str] = [] | ||
|
|
||
| async def reader_loop(): | ||
| try: | ||
| while True: | ||
| msg = await read_lsp_message(proc.stdout) | ||
| if msg is None: | ||
| break | ||
| msg_id, method = msg.get("id"), msg.get("method") | ||
| if method == "window/logMessage": | ||
| text = msg.get("params", {}).get("message", "") | ||
| if "[anomaly:" in text: | ||
| anomalies.append(text) | ||
| print(f" ANOMALY: {text}") |
There was a problem hiding this comment.
Detect worker-only anomalies here too.
This replay path only watches window/logMessage, but the new test helpers explicitly need a second pass over <workspace>/.clice/logs because worker anomalies are not forwarded there in v1.0. As written, a replay can still report PASS after a worker-side anomaly if the master never echoed it through logMessage. Reuse the same log-file scan before the final PASS/FAIL decision when display_ws resolves to a workspace path.
Suggested fix
import argparse
import asyncio
import json
import os
import re
import signal
import sys
import time
# Force line-buffered stdout so CI sees output immediately.
sys.stdout.reconfigure(line_buffering=True)
from pathlib import Path
from urllib.parse import quote, unquote
+
+from tests.integration.utils.assertions import anomalies_in_log_files
@@
- if anomalies:
+ if display_ws:
+ anomalies.extend(anomalies_in_log_files(Path(display_ws)))
+
+ if anomalies:
# Anomalies are internal clice bugs; a replay session must be clean.
print(f" result: FAIL ({len(anomalies)} anomaly report(s), {elapsed:.1f}s)")
print_stderr()
return FalseAlso applies to: 322-326
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/replay.py` around lines 150 - 163, The reader_loop currently only
captures "window/logMessage" entries and misses worker-only anomalies written to
the workspace logs; update the replay flow to, when display_ws resolves to a
workspace path, perform the same scan of the workspace ".clice/logs" files
(reuse the existing log-file scan logic) before making the final PASS/FAIL
decision so worker-side anomalies are collected into the anomalies list;
specifically, after or alongside reader_loop (and in the analogous spot
referenced by the other occurrence), call the log-file scanner and append any
found anomalies to the anomalies list so they are considered when deciding
PASS/FAIL.
| /// RAII fixture: capture the notify hook and trap, restore globals afterwards. | ||
| struct AnomalyCapture { | ||
| std::vector<std::pair<NotifyLevel, std::string>> notified; | ||
| std::vector<AnomalyId> trapped; | ||
| logging::Level saved_level; | ||
|
|
||
| AnomalyCapture() : saved_level(logging::options.level) { | ||
| logging::reset_anomaly_for_testing(); | ||
| logging::set_notify_hook([this](NotifyLevel level, std::string_view message) { | ||
| notified.emplace_back(level, std::string(message)); | ||
| }); | ||
| logging::set_anomaly_trap_for_testing([this](AnomalyId id) { trapped.push_back(id); }); | ||
| } | ||
|
|
||
| ~AnomalyCapture() { | ||
| logging::set_notify_hook(nullptr); | ||
| logging::reset_anomaly_for_testing(); | ||
| logging::options.level = saved_level; | ||
| } |
There was a problem hiding this comment.
This fixture does not actually restore the prior hook state.
The comment says AnomalyCapture “restore[s] globals afterwards”, but the destructor always forces notify_hook to nullptr and clears the trap instead of restoring whatever was installed before construction. That makes the helper order-dependent if another test or fixture already configured anomaly delivery. Either capture/restore the previous callbacks via test-only accessors, or narrow the fixture contract/comments so it is explicit that it overwrites global state for the duration of the test.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/support/anomaly_tests.cpp` around lines 14 - 32, The destructor of
AnomalyCapture currently unconditionally clears globals instead of restoring
previous state; update the constructor to capture the existing notify hook and
anomaly trap (via logging::set_notify_hook and
logging::set_anomaly_trap_for_testing accessors) into members (e.g.,
prev_notify_hook, prev_anomaly_trap) and then in ~AnomalyCapture restore them
(call logging::set_notify_hook(prev_notify_hook) and
logging::set_anomaly_trap_for_testing(prev_anomaly_trap)) while still restoring
logging::options.level and calling logging::reset_anomaly_for_testing as
appropriate; alternatively, if restoring previous callbacks is impossible,
change the struct comment and name to clearly document that it overwrites global
hooks for the test duration.
| LocalSourceRange to_local_range(const kota::ipc::protocol::Range& range) { | ||
| feature::PositionMapper converter(unit->interested_content(), | ||
| feature::PositionEncoding::UTF8); | ||
| return LocalSourceRange(*converter.to_offset(range.start), *converter.to_offset(range.end)); |
There was a problem hiding this comment.
Avoid hard-crashing the shared range helper on unmappable positions.
This helper centralizes protocol→local conversion for multiple feature tests, but it still dereferences unit, converter.to_offset(range.start), and converter.to_offset(range.end) unconditionally. If a feature regression produces an invalid/clamped protocol range, the test process dies here instead of failing with a targeted assertion. Please turn these into explicit test assertions (or return an std::optional<LocalSourceRange>) so mapping failures surface as actionable test failures.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test/tester.h` around lines 100 - 103, The helper to_local_range
currently dereferences unit and the results of
converter.to_offset(range.start/end) unconditionally; change it to fail tests
explicitly instead of crashing by asserting unit is non-null and that
converter.to_offset(range.start) and converter.to_offset(range.end) return
values before dereferencing (e.g., use test assertions like
ASSERT_TRUE/EXPECT_TRUE or return std::optional<LocalSourceRange>), then
construct and return LocalSourceRange from the validated offsets; reference the
to_local_range function, the unit pointer, and
converter.to_offset(range.start)/converter.to_offset(range.end) locations when
making the checks.
Summary
Implements the server-side error feedback plan (
temp/plan-server-feedback.md), supersedes #436 (reimplemented on current main rather than rebased — it predates the #437/#447/#449 restructures).Channels
Anomaly (soft assertions,
src/support/anomaly.h) —LOG_ANOMALY(id, fmt, ...)for internal states that must be unreachable when clice works correctly. Debug builds abort after logging (the CI Debug matrix and local dev catch bugs earliest;CLICE_ANOMALY_NO_TRAPexists for tests); Release logs[anomaly:<id>], pusheswindow/logMessage(master) and continues. Per-ID rate limit; the gate runs before format arguments are evaluated (lazy contract locked by unit tests with side-effecting args). IDs:pch_build_fail,pcm_build_fail,compile_fail,worker_request_fail,worker_crash,worker_spawn_fail,position_map_fail. Situations reachable by user input/user code are deliberately not anomalies — PCH/PCM failures carryBuildResult.has_user_errorsto make that split.Guidance —
LOG_GUIDANCE(...)([guidance]+ Warning logMessage) for user-actionable situations: no compile_commands.json, invalid initializationOptions, config problems.LSP errors instead of silent null — feature requests on closed documents →
Document not open; unresolvable call/type hierarchy items → error; client positions/ranges that fail to map →InvalidParams; empty hierarchy/workspace-symbol results now return[]instead ofnull.Guidance diagnostics —
fill_compile_args()returns aCommandSource(CdbExact/IncludeGraph/Inferred(reserved)/Fallback) and emits a per-file decision log (tiers tried, tier hit, args hash). When a guessed command produces file-not-found errors, the publish merges a file-top Warning explaining it (codeinferred-compile-command, linking the quick-start guide — resolves theDiagnosticSource::CliceTODO). Exact CDB matches never get it.clice.toml diagnostics — parse/type errors (Error, defaults apply) and unknown keys (Warning via a strict second decode pass, config still applies) are published on the config file's URI with line/column from the TOML decoder (needs clice-io/kotatsu#168, pinned). Fixing the file and restarting clears them.
Notable fixes uncovered along the way
CompilationDatabase::lookup()synthesizes a command for unknown files, so the oldresults.empty()checks were dead: the automatic include-graph header-context tier was unreachable, and headers without entries silently compiled as bareclang. Tier selection now useshas_entry(); background indexing skips Fallback-sourced files.apply_defaults(): a client-providedcache_dirpreviously didn't propagate into the derivedlogging_dir/index_dir.Test infrastructure
CliceClientrecordswindow/logMessage;assert_no_anomaly()runs in every integration teardown (client + agentic fixtures, agentic local fixtures, make_client tests), scanning notifications and master/worker log files.tests/replay.pyfails a smoke trace on any anomaly push.@allow_anomaly),pytest.raisesfor closed/unknown documents, out-of-range positions, bogus hierarchy items.MTIME_GRANULARITY/SETTLE_TIME/IDLE_TIMEOUT) replace hardcoded sleeps; yield-based workspace cleanup; log dump on test failure;Tester::to_local_range()dedup.Out of scope (per plan)
Module-cycle guidance diagnostics (4c) wait for the module refactor; CacheStore hit/miss logs wait for #453; worker logMessage forwarding and request-correlation IDs belong to the WorkerPool task.
Test plan
pixi run formatclean; 3 parallel review subagents (correctness / style / tests) — findings fixedCloses #436 (superseded).
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Improvements