Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .claude/commands/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Claude Code slash commands for the mcp-server test suite

Three AI-assisted workflows wrapping `mcp-server/run-tests.sh` and the meshtastic MCP tools. Each one has a twin in `.github/prompts/` for Copilot users.

| Slash command | What it does | Copilot equivalent |
| --------------------- | ------------------------------------------------------------------------- | ---------------------------------------- |
| `/test [args]` | Runs the test suite (auto-detects hardware) and interprets failures | `.github/prompts/mcp-test.prompt.md` |
| `/diagnose [role]` | Read-only device health report via the meshtastic MCP tools | `.github/prompts/mcp-diagnose.prompt.md` |
| `/repro <test> [n=5]` | Re-runs one test N times, diffs firmware logs between passes and failures | `.github/prompts/mcp-repro.prompt.md` |

## Why two surfaces

The Claude Code commands and Copilot prompts cover the same three workflows but each speaks its host's idiom:

- **Claude Code** (`/test`) uses `$ARGUMENTS` for pass-through, has direct access to Bash + all MCP tools registered in the user's settings, and runs in the terminal context.
- **Copilot** (`/mcp-test`) runs in VS Code's agent mode; it has terminal + MCP access too but typically asks the operator to confirm inputs interactively.

A contributor using either IDE gets equivalent assistance. Keep the two in sync when behavior changes — the diff of intent should be minimal.

## House rules

- **No destructive writes without explicit operator approval.** Skills that could reflash, factory-reset, or reboot a device must describe the action and stop — the operator authorizes.
- **Interpret failures, don't just echo them.** The skill body should pull firmware log lines from `mcp-server/tests/report.html` (the `Meshtastic debug` section, attached by `tests/conftest.py::pytest_runtest_makereport`) and classify the failure.
- **Keep MCP tool calls sequential per port.** SerialInterface holds an exclusive port lock; two parallel tool calls on the same port deadlock.
- **Never speculate about root cause.** If the evidence doesn't support a classification, say "unknown" and list what you'd need to disambiguate.

## Adding a new command

1. Write the Claude Code version at `.claude/commands/<name>.md` with YAML frontmatter:

```yaml
---
description: one-line purpose (used for auto-invocation by the model)
argument-hint: [optional-hint]
---
```

2. Write the Copilot equivalent at `.github/prompts/mcp-<name>.prompt.md` with:

```yaml
---
mode: agent
description: ...
---
```

3. Add the row to the table above. Cross-link in both bodies.

4. Smoke-test on Claude Code first (`/<name>` should appear in autocomplete), then in VS Code Copilot (`/mcp-<name>` in Chat).
55 changes: 55 additions & 0 deletions .claude/commands/diagnose.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
description: Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture)
argument-hint: [role=all|nrf52|esp32s3|<port>]
---

# `/diagnose` — device health report

Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator.

## What to do

1. **Enumerate hardware.** Call `mcp__meshtastic__list_devices(include_unknown=True)`. For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`.

2. **Filter by `$ARGUMENTS`**:
- No args, `all` → every likely-meshtastic device.
- `nrf52` → only devices with `vid == 0x239a`.
- `esp32s3` → only devices with `vid == 0x303a` or `vid == 0x10c4`.
- A `/dev/cu.*` path → only that one port.
- Anything else → treat as a substring match against the `port` string.

3. **For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):**
- `mcp__meshtastic__device_info(port=<p>)` — captures `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel`.
- `mcp__meshtastic__list_nodes(port=<p>)` — count of peers, which ones have `publicKey` set, SNR/RSSI distribution.
- `mcp__meshtastic__get_config(section="lora", port=<p>)` — region, preset, channel_num, tx_power, hop_limit.
- Optionally, if the device seems unhappy (fails to connect, `num_nodes==1` when ≥2 are plugged in, missing firmware*version), open a short firmware log window: `mcp__meshtastic__serial_open(port=<p>, env=<inferred-env>)`, wait 3s, `serial_read(session_id=<s>, max_lines=100)`, `serial_close(session_id=<s>)`. The env should be inferred from the VID map in `mcp-server/run-tests.sh` (nrf52 → rak4631, esp32s3 → heltec-v3) unless `MESHTASTIC_MCP_ENV*<ROLE>` is set.

4. **Render per-device report** as:

```
[nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631
owner : Meshtastic 40eb / 40eb
region/band : US, channel 88, LONG_FAST
tx_power : 30 dBm, hop_limit=3
peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm)
primary ch : McpTest
firmware : no panics in last 3s; NodeInfoModule emitted 2 broadcasts
```

Keep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub), flag it inline with a short `⚠︎ <one-line reason>`.

5. **Cross-device correlation** (only when >1 device is inspected):
- Do both sides see each other in `nodesByNum`? If one does and the other doesn't, that's asymmetric NodeInfo — flag it.
- Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh)
- Do the primary channel NAMES match? Mismatch = different PSK = no decode.

6. **Suggest next actions only for specific, recognisable failure modes**:
- Stale PKI pubkey one-way → "run `/test tests/mesh/test_direct_with_ack.py` — the retry + nodeinfo-ping heals this in the test path."
- Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`."
- Device unreachable → point at touch_1200bps + the CP2102-wedged-driver note in run-tests.sh.

## What NOT to do

- No writes. No `set_config`, no `reboot`, no `factory_reset`. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly.
- No `flash` / `erase_and_flash`. Those are separate escalations.
- No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive.
65 changes: 65 additions & 0 deletions .claude/commands/repro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
description: Re-run a specific test N times in isolation to triage flakes, diff firmware logs between passes and failures
argument-hint: <test-node-id> [count=5]
---

# `/repro` — flakiness triage for one test

Re-run a single pytest node ID N times in isolation, track pass rate, and surface what's _different_ in the firmware logs between the passing attempts and the failing ones. Turns "it's flaky, I guess" into "it fails when X, passes when Y."

## What to do

1. **Parse `$ARGUMENTS`**: first token is the pytest node id (e.g. `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[nrf52->esp32s3]`); second token is an integer count (default `5`, cap at `20`). If the first token doesn't look like a test path (no `::` and no `tests/` prefix), treat the whole `$ARGUMENTS` as a `-k` filter instead.

2. **Sanity-check the hub first** (so we're not measuring "nothing plugged in" N times): call `mcp__meshtastic__list_devices`. If the test name contains `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help.

3. **Loop N times**. For each iteration:

```bash
./mcp-server/run-tests.sh <test-id> --tb=short -p no:cacheprovider
```

Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware log section from `mcp-server/tests/report.html`. `-p no:cacheprovider` suppresses pytest's `.pytest_cache` writes so iterations don't influence each other.

4. **Track a small structured tally**:

```
attempt 1: PASS (42s)
attempt 2: FAIL (128s) ← firmware log 200-line tail captured
attempt 3: PASS (39s)
attempt 4: FAIL (121s)
attempt 5: PASS (41s)
--------------------------------------
pass rate: 3/5 (60%) | mean duration: 74s
```

5. **On mixed outcomes**: diff the firmware log tails between a representative passing attempt and a representative failing attempt. Focus on:
- Error-level lines only present in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`)
- Timing around the assertion event — did a broadcast go out, was there an ACK, did NAK fire?
- Device state fields that changed (nodesByNum entries, region/preset, channel_num)

Surface the top 3 differences as a "passes when / fails when" table. Don't dump full logs — pull specific lines with uptime timestamps.

6. **Classify the flake** into one of:
- **LoRa airtime collision** → pass rate improves with fewer concurrent transmitters; propose a `time.sleep` gap or retry bump in the test body.
- **PKI key staleness** → fails on first attempt, passes after self-heal; existing retry loop in `test_direct_with_ack.py` handles this.
- **NodeInfo cooldown** → `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs `broadcast_nodeinfo_ping()` warmup.
- **Hardware-specific** (one direction fails, other passes; one device's firmware is older; driver wedged) → specific recovery pointer.
- **Genuinely unknown** → say so; don't invent a root cause.

7. **Report back** with:
- Pass rate and mean duration.
- Classification + evidence (the specific log lines that support it).
- A suggested next step (re-run with specific args, open `/diagnose`, edit a specific test file, nothing).

## Examples

- `/repro tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — runs 10 times, diffs firmware logs.
- `/repro broadcast_delivers` — no `::`, no `tests/`, so interpreted as `-k broadcast_delivers`; runs every matching test the default 5 times.
- `/repro tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter run for a slow test.

## Constraints

- Don't exceed `count=20` per invocation — airtime and USB wear add up. If the user asks for 50, negotiate down.
- Don't rebuild firmware as part of triage; flakes that only reproduce under different firmware belong in a separate session.
- If the FIRST attempt fails AND the rest all pass, that's a classic "state leak from a prior test" → say so and suggest running with `--force-bake` or starting from a clean state rather than chasing the first failure.
42 changes: 42 additions & 0 deletions .claude/commands/test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
description: Run the mcp-server test suite (auto-detects devices) and interpret the results
argument-hint: [pytest-args]
---

# `/test` — mcp-server test runner with interpretation

Run `mcp-server/run-tests.sh` and make sense of the output so the operator doesn't have to.

## What to do

1. **Invoke the wrapper.** From the firmware repo root, run:

```bash
./mcp-server/run-tests.sh $ARGUMENTS
```

The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required `MESHTASTIC_MCP_ENV_*` env vars, and invokes pytest. If the user passed no arguments, the wrapper supplies a sensible default set (`tests/ --html=tests/report.html --self-contained-html --junitxml=tests/junit.xml -v --tb=short`). A `--report-log=tests/reportlog.jsonl` arg is always appended (unless the operator passed their own). `--assume-baked` is deliberately NOT in the defaults — `test_00_bake.py` has its own skip-if-already-baked check and runs the ~8 s verification by default. Operators can opt into the fast path with `--assume-baked`, or force a reflash with `--force-bake`.

2. **Read the pre-flight header.** First ~6 lines print the detected hub (role → port → env). If that line reads `detected hub : (none)`, the wrapper will narrow to `tests/unit` only — say so explicitly in your summary so the operator knows hardware tiers were skipped.

3. **On pass**: one-line summary of the form `N passed, M skipped in <duration>`. Don't enumerate the 52 test names — the user can read those. Do mention if any test was SKIPPED for a NON-placeholder reason (e.g. "role not present on hub" is worth flagging).

4. **On failure**: for every FAILED test, open `mcp-server/tests/report.html` and extract the `Meshtastic debug` section for that test. pytest-html embeds the firmware log stream + device state dump there; the 200-line firmware log tail is usually enough to explain the failure. Summarise: which test, one-line assertion message, the firmware log lines that matter (things like `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`).

5. **Classify the failure** as one of:
- **Transient/flake**: LoRa collision, timing-sensitive assertion, first-attempt NAK + successful retry pattern. Propose `/repro <test_node_id>` to confirm.
- **Environmental**: device unreachable, port busy, CP2102 driver wedged. Suggest the specific recovery (replug USB, `touch_1200bps`, check `git status userPrefs.jsonc`).
- **Regression**: same assertion fails repeatedly, firmware log shows a new/unusual error. Surface the diff between expected and observed, identify the module likely responsible.

6. **Never run destructive recovery automatically.** If a failure looks like it needs a reflash, factory*reset, or USB replug, \_describe what to do* — don't execute. The operator decides.

## Arguments handling

- No args → wrapper's defaults (full suite).
- `$ARGUMENTS` passed verbatim to the wrapper, which passes them to pytest.
- Common operator invocations: `/test tests/mesh`, `/test tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip`, `/test --force-bake`, `/test -k telemetry`.

## Side-effects to mention in summary

- The session fixture snapshots `userPrefs.jsonc` at session start and restores at teardown (plus on `atexit`). After a clean run, `git status userPrefs.jsonc` should be empty. If the wrapper's pre-flight printed a warning about a stale sidecar, call that out — means a prior session crashed.
- `mcp-server/tests/report.html` and `junit.xml` are regenerated on every run; the HTML is self-contained (shareable).
Loading
Loading