meshtastic · thebentern · Apr 18, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/.claude/commands/README.md b/.claude/commands/README.md
@@ -0,0 +1,49 @@
+# Claude Code slash commands for the mcp-server test suite
+
+Three AI-assisted workflows wrapping `mcp-server/run-tests.sh` and the meshtastic MCP tools. Each one has a twin in `.github/prompts/` for Copilot users.
+
+| Slash command         | What it does                                                              | Copilot equivalent                       |
+| --------------------- | ------------------------------------------------------------------------- | ---------------------------------------- |
+| `/test [args]`        | Runs the test suite (auto-detects hardware) and interprets failures       | `.github/prompts/mcp-test.prompt.md`     |
+| `/diagnose [role]`    | Read-only device health report via the meshtastic MCP tools               | `.github/prompts/mcp-diagnose.prompt.md` |
+| `/repro <test> [n=5]` | Re-runs one test N times, diffs firmware logs between passes and failures | `.github/prompts/mcp-repro.prompt.md`    |
+
+## Why two surfaces
+
+The Claude Code commands and Copilot prompts cover the same three workflows but each speaks its host's idiom:
+
+- **Claude Code** (`/test`) uses `$ARGUMENTS` for pass-through, has direct access to Bash + all MCP tools registered in the user's settings, and runs in the terminal context.
+- **Copilot** (`/mcp-test`) runs in VS Code's agent mode; it has terminal + MCP access too but typically asks the operator to confirm inputs interactively.
+
+A contributor using either IDE gets equivalent assistance. Keep the two in sync when behavior changes — the diff of intent should be minimal.
+
+## House rules
+
+- **No destructive writes without explicit operator approval.** Skills that could reflash, factory-reset, or reboot a device must describe the action and stop — the operator authorizes.
+- **Interpret failures, don't just echo them.** The skill body should pull firmware log lines from `mcp-server/tests/report.html` (the `Meshtastic debug` section, attached by `tests/conftest.py::pytest_runtest_makereport`) and classify the failure.
+- **Keep MCP tool calls sequential per port.** SerialInterface holds an exclusive port lock; two parallel tool calls on the same port deadlock.
+- **Never speculate about root cause.** If the evidence doesn't support a classification, say "unknown" and list what you'd need to disambiguate.
+
+## Adding a new command
+
+1. Write the Claude Code version at `.claude/commands/<name>.md` with YAML frontmatter:
+
+   ```yaml
+   ---
+   description: one-line purpose (used for auto-invocation by the model)
+   argument-hint: [optional-hint]
+   ---
+   ```
+
+2. Write the Copilot equivalent at `.github/prompts/mcp-<name>.prompt.md` with:
+
+   ```yaml
+   ---
+   mode: agent
+   description: ...
+   ---
+   ```
+
+3. Add the row to the table above. Cross-link in both bodies.
+
+4. Smoke-test on Claude Code first (`/<name>` should appear in autocomplete), then in VS Code Copilot (`/mcp-<name>` in Chat).
diff --git a/.claude/commands/diagnose.md b/.claude/commands/diagnose.md
@@ -0,0 +1,55 @@
+---
+description: Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture)
+argument-hint: [role=all|nrf52|esp32s3|<port>]
+---
+
+# `/diagnose` — device health report
+
+Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator.
+
+## What to do
+
+1. **Enumerate hardware.** Call `mcp__meshtastic__list_devices(include_unknown=True)`. For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`.
+
+2. **Filter by `$ARGUMENTS`**:
+   - No args, `all` → every likely-meshtastic device.
+   - `nrf52` → only devices with `vid == 0x239a`.
+   - `esp32s3` → only devices with `vid == 0x303a` or `vid == 0x10c4`.
+   - A `/dev/cu.*` path → only that one port.
+   - Anything else → treat as a substring match against the `port` string.
+
+3. **For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):**
+   - `mcp__meshtastic__device_info(port=<p>)` — captures `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel`.
+   - `mcp__meshtastic__list_nodes(port=<p>)` — count of peers, which ones have `publicKey` set, SNR/RSSI distribution.
+   - `mcp__meshtastic__get_config(section="lora", port=<p>)` — region, preset, channel_num, tx_power, hop_limit.
+   - Optionally, if the device seems unhappy (fails to connect, `num_nodes==1` when ≥2 are plugged in, missing firmware*version), open a short firmware log window: `mcp__meshtastic__serial_open(port=<p>, env=<inferred-env>)`, wait 3s, `serial_read(session_id=<s>, max_lines=100)`, `serial_close(session_id=<s>)`. The env should be inferred from the VID map in `mcp-server/run-tests.sh` (nrf52 → rak4631, esp32s3 → heltec-v3) unless `MESHTASTIC_MCP_ENV*<ROLE>` is set.
+
+4. **Render per-device report** as:
+
+   ```
+   [nrf52 @ /dev/cu.usbmodem1101]      fw=2.7.23.bce2825, hw=RAK4631
+     owner       : Meshtastic 40eb / 40eb
+     region/band : US, channel 88, LONG_FAST
+     tx_power    : 30 dBm, hop_limit=3
+     peers       : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm)
+     primary ch  : McpTest
+     firmware    : no panics in last 3s; NodeInfoModule emitted 2 broadcasts
+   ```
+
+   Keep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub), flag it inline with a short `⚠︎ <one-line reason>`.
+
+5. **Cross-device correlation** (only when >1 device is inspected):
+   - Do both sides see each other in `nodesByNum`? If one does and the other doesn't, that's asymmetric NodeInfo — flag it.
+   - Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh)
+   - Do the primary channel NAMES match? Mismatch = different PSK = no decode.
+
+6. **Suggest next actions only for specific, recognisable failure modes**:
+   - Stale PKI pubkey one-way → "run `/test tests/mesh/test_direct_with_ack.py` — the retry + nodeinfo-ping heals this in the test path."
+   - Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`."
+   - Device unreachable → point at touch_1200bps + the CP2102-wedged-driver note in run-tests.sh.
+
+## What NOT to do
+
+- No writes. No `set_config`, no `reboot`, no `factory_reset`. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly.
+- No `flash` / `erase_and_flash`. Those are separate escalations.
+- No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive.
diff --git a/.claude/commands/repro.md b/.claude/commands/repro.md
@@ -0,0 +1,65 @@
+---
+description: Re-run a specific test N times in isolation to triage flakes, diff firmware logs between passes and failures
+argument-hint: <test-node-id> [count=5]
+---
+
+# `/repro` — flakiness triage for one test
+
+Re-run a single pytest node ID N times in isolation, track pass rate, and surface what's _different_ in the firmware logs between the passing attempts and the failing ones. Turns "it's flaky, I guess" into "it fails when X, passes when Y."
+
+## What to do
+
+1. **Parse `$ARGUMENTS`**: first token is the pytest node id (e.g. `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[nrf52->esp32s3]`); second token is an integer count (default `5`, cap at `20`). If the first token doesn't look like a test path (no `::` and no `tests/` prefix), treat the whole `$ARGUMENTS` as a `-k` filter instead.
+
+2. **Sanity-check the hub first** (so we're not measuring "nothing plugged in" N times): call `mcp__meshtastic__list_devices`. If the test name contains `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help.
+
+3. **Loop N times**. For each iteration:
+
+   ```bash
+   ./mcp-server/run-tests.sh <test-id> --tb=short -p no:cacheprovider
+   ```
+
+   Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware log section from `mcp-server/tests/report.html`. `-p no:cacheprovider` suppresses pytest's `.pytest_cache` writes so iterations don't influence each other.
+
+4. **Track a small structured tally**:
+
+   ```
+   attempt 1: PASS (42s)
+   attempt 2: FAIL (128s)  ← firmware log 200-line tail captured
+   attempt 3: PASS (39s)
+   attempt 4: FAIL (121s)
+   attempt 5: PASS (41s)
+   --------------------------------------
+   pass rate: 3/5 (60%)   |   mean duration: 74s
+   ```
+
+5. **On mixed outcomes**: diff the firmware log tails between a representative passing attempt and a representative failing attempt. Focus on:
+   - Error-level lines only present in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`)
+   - Timing around the assertion event — did a broadcast go out, was there an ACK, did NAK fire?
+   - Device state fields that changed (nodesByNum entries, region/preset, channel_num)
+
+   Surface the top 3 differences as a "passes when / fails when" table. Don't dump full logs — pull specific lines with uptime timestamps.
+
+6. **Classify the flake** into one of:
+   - **LoRa airtime collision** → pass rate improves with fewer concurrent transmitters; propose a `time.sleep` gap or retry bump in the test body.
+   - **PKI key staleness** → fails on first attempt, passes after self-heal; existing retry loop in `test_direct_with_ack.py` handles this.
+   - **NodeInfo cooldown** → `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs `broadcast_nodeinfo_ping()` warmup.
+   - **Hardware-specific** (one direction fails, other passes; one device's firmware is older; driver wedged) → specific recovery pointer.
+   - **Genuinely unknown** → say so; don't invent a root cause.
+
+7. **Report back** with:
+   - Pass rate and mean duration.
+   - Classification + evidence (the specific log lines that support it).
+   - A suggested next step (re-run with specific args, open `/diagnose`, edit a specific test file, nothing).
+
+## Examples
+
+- `/repro tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — runs 10 times, diffs firmware logs.
+- `/repro broadcast_delivers` — no `::`, no `tests/`, so interpreted as `-k broadcast_delivers`; runs every matching test the default 5 times.
+- `/repro tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter run for a slow test.
+
+## Constraints
+
+- Don't exceed `count=20` per invocation — airtime and USB wear add up. If the user asks for 50, negotiate down.
+- Don't rebuild firmware as part of triage; flakes that only reproduce under different firmware belong in a separate session.
+- If the FIRST attempt fails AND the rest all pass, that's a classic "state leak from a prior test" → say so and suggest running with `--force-bake` or starting from a clean state rather than chasing the first failure.
diff --git a/.claude/commands/test.md b/.claude/commands/test.md
@@ -0,0 +1,42 @@
+---
+description: Run the mcp-server test suite (auto-detects devices) and interpret the results
+argument-hint: [pytest-args]
+---
+
+# `/test` — mcp-server test runner with interpretation
+
+Run `mcp-server/run-tests.sh` and make sense of the output so the operator doesn't have to.
+
+## What to do
+
+1. **Invoke the wrapper.** From the firmware repo root, run:
+
+   ```bash
+   ./mcp-server/run-tests.sh $ARGUMENTS
+   ```
+
+   The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required `MESHTASTIC_MCP_ENV_*` env vars, and invokes pytest. If the user passed no arguments, the wrapper supplies a sensible default set (`tests/ --html=tests/report.html --self-contained-html --junitxml=tests/junit.xml -v --tb=short`). A `--report-log=tests/reportlog.jsonl` arg is always appended (unless the operator passed their own). `--assume-baked` is deliberately NOT in the defaults — `test_00_bake.py` has its own skip-if-already-baked check and runs the ~8 s verification by default. Operators can opt into the fast path with `--assume-baked`, or force a reflash with `--force-bake`.
+
+2. **Read the pre-flight header.** First ~6 lines print the detected hub (role → port → env). If that line reads `detected hub : (none)`, the wrapper will narrow to `tests/unit` only — say so explicitly in your summary so the operator knows hardware tiers were skipped.
+
+3. **On pass**: one-line summary of the form `N passed, M skipped in <duration>`. Don't enumerate the 52 test names — the user can read those. Do mention if any test was SKIPPED for a NON-placeholder reason (e.g. "role not present on hub" is worth flagging).
+
+4. **On failure**: for every FAILED test, open `mcp-server/tests/report.html` and extract the `Meshtastic debug` section for that test. pytest-html embeds the firmware log stream + device state dump there; the 200-line firmware log tail is usually enough to explain the failure. Summarise: which test, one-line assertion message, the firmware log lines that matter (things like `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`).
+
+5. **Classify the failure** as one of:
+   - **Transient/flake**: LoRa collision, timing-sensitive assertion, first-attempt NAK + successful retry pattern. Propose `/repro <test_node_id>` to confirm.
+   - **Environmental**: device unreachable, port busy, CP2102 driver wedged. Suggest the specific recovery (replug USB, `touch_1200bps`, check `git status userPrefs.jsonc`).
+   - **Regression**: same assertion fails repeatedly, firmware log shows a new/unusual error. Surface the diff between expected and observed, identify the module likely responsible.
+
+6. **Never run destructive recovery automatically.** If a failure looks like it needs a reflash, factory*reset, or USB replug, \_describe what to do* — don't execute. The operator decides.
+
+## Arguments handling
+
+- No args → wrapper's defaults (full suite).
+- `$ARGUMENTS` passed verbatim to the wrapper, which passes them to pytest.
+- Common operator invocations: `/test tests/mesh`, `/test tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip`, `/test --force-bake`, `/test -k telemetry`.
+
+## Side-effects to mention in summary
+
+- The session fixture snapshots `userPrefs.jsonc` at session start and restores at teardown (plus on `atexit`). After a clean run, `git status userPrefs.jsonc` should be empty. If the wrapper's pre-flight printed a warning about a stale sidecar, call that out — means a prior session crashed.
+- `mcp-server/tests/report.html` and `junit.xml` are regenerated on every run; the HTML is self-contained (shareable).