Skip to content

Commit c8dac10

Browse files
thebenternCopilotCopilot
authored
Add MCP server for interacting with meshtastic devices and testing framework / TUI (#10194)
* Start of MCP server and test suite * Add MCP server for interacting with meshtastic devices and testing framework / TUI * Update mcp-server/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix mcp-server review feedback from thread Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/91dc128a-ed50-4d07-8bb2-3dc6623a05f7 Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Enhance StreamAPI and PhoneAPI for improved log record handling and concurrency control * Semgrep fixes * Trunk and semgrep fixes * optimize pio streaming tee file writes Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/04e26c6b-6a2b-45be-bbeb-79ae4d0be633 Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * chore: remove redundant log handle assignment Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/04e26c6b-6a2b-45be-bbeb-79ae4d0be633 Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Consolidate type imports and remove placeholder test files * Add tests for config persistence and more exchange messages * Refactor position test to validate on-demand request/reply behavior * Remove position request/reply test and update README for telemetry behavior * Fix transmit history file to get removed on factory reset --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
1 parent aab4cd0 commit c8dac10

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+10701
-13
lines changed

.claude/commands/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Claude Code slash commands for the mcp-server test suite
2+
3+
Three AI-assisted workflows wrapping `mcp-server/run-tests.sh` and the meshtastic MCP tools. Each one has a twin in `.github/prompts/` for Copilot users.
4+
5+
| Slash command | What it does | Copilot equivalent |
6+
| --------------------- | ------------------------------------------------------------------------- | ---------------------------------------- |
7+
| `/test [args]` | Runs the test suite (auto-detects hardware) and interprets failures | `.github/prompts/mcp-test.prompt.md` |
8+
| `/diagnose [role]` | Read-only device health report via the meshtastic MCP tools | `.github/prompts/mcp-diagnose.prompt.md` |
9+
| `/repro <test> [n=5]` | Re-runs one test N times, diffs firmware logs between passes and failures | `.github/prompts/mcp-repro.prompt.md` |
10+
11+
## Why two surfaces
12+
13+
The Claude Code commands and Copilot prompts cover the same three workflows but each speaks its host's idiom:
14+
15+
- **Claude Code** (`/test`) uses `$ARGUMENTS` for pass-through, has direct access to Bash + all MCP tools registered in the user's settings, and runs in the terminal context.
16+
- **Copilot** (`/mcp-test`) runs in VS Code's agent mode; it has terminal + MCP access too but typically asks the operator to confirm inputs interactively.
17+
18+
A contributor using either IDE gets equivalent assistance. Keep the two in sync when behavior changes — the diff of intent should be minimal.
19+
20+
## House rules
21+
22+
- **No destructive writes without explicit operator approval.** Skills that could reflash, factory-reset, or reboot a device must describe the action and stop — the operator authorizes.
23+
- **Interpret failures, don't just echo them.** The skill body should pull firmware log lines from `mcp-server/tests/report.html` (the `Meshtastic debug` section, attached by `tests/conftest.py::pytest_runtest_makereport`) and classify the failure.
24+
- **Keep MCP tool calls sequential per port.** SerialInterface holds an exclusive port lock; two parallel tool calls on the same port deadlock.
25+
- **Never speculate about root cause.** If the evidence doesn't support a classification, say "unknown" and list what you'd need to disambiguate.
26+
27+
## Adding a new command
28+
29+
1. Write the Claude Code version at `.claude/commands/<name>.md` with YAML frontmatter:
30+
31+
```yaml
32+
---
33+
description: one-line purpose (used for auto-invocation by the model)
34+
argument-hint: [optional-hint]
35+
---
36+
```
37+
38+
2. Write the Copilot equivalent at `.github/prompts/mcp-<name>.prompt.md` with:
39+
40+
```yaml
41+
---
42+
mode: agent
43+
description: ...
44+
---
45+
```
46+
47+
3. Add the row to the table above. Cross-link in both bodies.
48+
49+
4. Smoke-test on Claude Code first (`/<name>` should appear in autocomplete), then in VS Code Copilot (`/mcp-<name>` in Chat).

.claude/commands/diagnose.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
description: Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture)
3+
argument-hint: [role=all|nrf52|esp32s3|<port>]
4+
---
5+
6+
# `/diagnose` — device health report
7+
8+
Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator.
9+
10+
## What to do
11+
12+
1. **Enumerate hardware.** Call `mcp__meshtastic__list_devices(include_unknown=True)`. For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`.
13+
14+
2. **Filter by `$ARGUMENTS`**:
15+
- No args, `all` → every likely-meshtastic device.
16+
- `nrf52` → only devices with `vid == 0x239a`.
17+
- `esp32s3` → only devices with `vid == 0x303a` or `vid == 0x10c4`.
18+
- A `/dev/cu.*` path → only that one port.
19+
- Anything else → treat as a substring match against the `port` string.
20+
21+
3. **For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):**
22+
- `mcp__meshtastic__device_info(port=<p>)` — captures `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel`.
23+
- `mcp__meshtastic__list_nodes(port=<p>)` — count of peers, which ones have `publicKey` set, SNR/RSSI distribution.
24+
- `mcp__meshtastic__get_config(section="lora", port=<p>)` — region, preset, channel_num, tx_power, hop_limit.
25+
- Optionally, if the device seems unhappy (fails to connect, `num_nodes==1` when ≥2 are plugged in, missing firmware*version), open a short firmware log window: `mcp__meshtastic__serial_open(port=<p>, env=<inferred-env>)`, wait 3s, `serial_read(session_id=<s>, max_lines=100)`, `serial_close(session_id=<s>)`. The env should be inferred from the VID map in `mcp-server/run-tests.sh` (nrf52 → rak4631, esp32s3 → heltec-v3) unless `MESHTASTIC_MCP_ENV*<ROLE>` is set.
26+
27+
4. **Render per-device report** as:
28+
29+
```text
30+
[nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631
31+
owner : Meshtastic 40eb / 40eb
32+
region/band : US, channel 88, LONG_FAST
33+
tx_power : 30 dBm, hop_limit=3
34+
peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm)
35+
primary ch : McpTest
36+
firmware : no panics in last 3s; NodeInfoModule emitted 2 broadcasts
37+
```
38+
39+
Keep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub), flag it inline with a short `⚠︎ <one-line reason>`.
40+
41+
5. **Cross-device correlation** (only when >1 device is inspected):
42+
- Do both sides see each other in `nodesByNum`? If one does and the other doesn't, that's asymmetric NodeInfo — flag it.
43+
- Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh)
44+
- Do the primary channel NAMES match? Mismatch = different PSK = no decode.
45+
46+
6. **Suggest next actions only for specific, recognisable failure modes**:
47+
- Stale PKI pubkey one-way → "run `/test tests/mesh/test_direct_with_ack.py` — the retry + nodeinfo-ping heals this in the test path."
48+
- Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`."
49+
- Device unreachable → point at touch_1200bps + the CP2102-wedged-driver note in run-tests.sh.
50+
51+
## What NOT to do
52+
53+
- No writes. No `set_config`, no `reboot`, no `factory_reset`. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly.
54+
- No `flash` / `erase_and_flash`. Those are separate escalations.
55+
- No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive.

.claude/commands/repro.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
description: Re-run a specific test N times in isolation to triage flakes, diff firmware logs between passes and failures
3+
argument-hint: <test-node-id> [count=5]
4+
---
5+
6+
# `/repro` — flakiness triage for one test
7+
8+
Re-run a single pytest node ID N times in isolation, track pass rate, and surface what's _different_ in the firmware logs between the passing attempts and the failing ones. Turns "it's flaky, I guess" into "it fails when X, passes when Y."
9+
10+
## What to do
11+
12+
1. **Parse `$ARGUMENTS`**: first token is the pytest node id (e.g. `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[nrf52->esp32s3]`); second token is an integer count (default `5`, cap at `20`). If the first token doesn't look like a test path (no `::` and no `tests/` prefix), treat the whole `$ARGUMENTS` as a `-k` filter instead.
13+
14+
2. **Sanity-check the hub first** (so we're not measuring "nothing plugged in" N times): call `mcp__meshtastic__list_devices`. If the test name contains `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help.
15+
16+
3. **Loop N times**. For each iteration:
17+
18+
```bash
19+
./mcp-server/run-tests.sh <test-id> --tb=short -p no:cacheprovider
20+
```
21+
22+
Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware log section from `mcp-server/tests/report.html`. `-p no:cacheprovider` suppresses pytest's `.pytest_cache` writes so iterations don't influence each other.
23+
24+
4. **Track a small structured tally**:
25+
26+
```text
27+
attempt 1: PASS (42s)
28+
attempt 2: FAIL (128s) ← firmware log 200-line tail captured
29+
attempt 3: PASS (39s)
30+
attempt 4: FAIL (121s)
31+
attempt 5: PASS (41s)
32+
--------------------------------------
33+
pass rate: 3/5 (60%) | mean duration: 74s
34+
```
35+
36+
5. **On mixed outcomes**: diff the firmware log tails between a representative passing attempt and a representative failing attempt. Focus on:
37+
- Error-level lines only present in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`)
38+
- Timing around the assertion event — did a broadcast go out, was there an ACK, did NAK fire?
39+
- Device state fields that changed (nodesByNum entries, region/preset, channel_num)
40+
41+
Surface the top 3 differences as a "passes when / fails when" table. Don't dump full logs — pull specific lines with uptime timestamps.
42+
43+
6. **Classify the flake** into one of:
44+
- **LoRa airtime collision** → pass rate improves with fewer concurrent transmitters; propose a `time.sleep` gap or retry bump in the test body.
45+
- **PKI key staleness** → fails on first attempt, passes after self-heal; existing retry loop in `test_direct_with_ack.py` handles this.
46+
- **NodeInfo cooldown**`Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs `broadcast_nodeinfo_ping()` warmup.
47+
- **Hardware-specific** (one direction fails, other passes; one device's firmware is older; driver wedged) → specific recovery pointer.
48+
- **Genuinely unknown** → say so; don't invent a root cause.
49+
50+
7. **Report back** with:
51+
- Pass rate and mean duration.
52+
- Classification + evidence (the specific log lines that support it).
53+
- A suggested next step (re-run with specific args, open `/diagnose`, edit a specific test file, nothing).
54+
55+
## Examples
56+
57+
- `/repro tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — runs 10 times, diffs firmware logs.
58+
- `/repro broadcast_delivers` — no `::`, no `tests/`, so interpreted as `-k broadcast_delivers`; runs every matching test the default 5 times.
59+
- `/repro tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter run for a slow test.
60+
61+
## Constraints
62+
63+
- Don't exceed `count=20` per invocation — airtime and USB wear add up. If the user asks for 50, negotiate down.
64+
- Don't rebuild firmware as part of triage; flakes that only reproduce under different firmware belong in a separate session.
65+
- If the FIRST attempt fails AND the rest all pass, that's a classic "state leak from a prior test" → say so and suggest running with `--force-bake` or starting from a clean state rather than chasing the first failure.

.claude/commands/test.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
description: Run the mcp-server test suite (auto-detects devices) and interpret the results
3+
argument-hint: [pytest-args]
4+
---
5+
6+
# `/test` — mcp-server test runner with interpretation
7+
8+
Run `mcp-server/run-tests.sh` and make sense of the output so the operator doesn't have to.
9+
10+
## What to do
11+
12+
1. **Invoke the wrapper.** From the firmware repo root, run:
13+
14+
```bash
15+
./mcp-server/run-tests.sh $ARGUMENTS
16+
```
17+
18+
The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required `MESHTASTIC_MCP_ENV_*` env vars, and invokes pytest. If the user passed no arguments, the wrapper supplies a sensible default set (`tests/ --html=tests/report.html --self-contained-html --junitxml=tests/junit.xml -v --tb=short`). A `--report-log=tests/reportlog.jsonl` arg is always appended (unless the operator passed their own). `--assume-baked` is deliberately NOT in the defaults — `test_00_bake.py` has its own skip-if-already-baked check and runs the ~8 s verification by default. Operators can opt into the fast path with `--assume-baked`, or force a reflash with `--force-bake`.
19+
20+
2. **Read the pre-flight header.** First ~6 lines print the detected hub (role → port → env). If that line reads `detected hub : (none)`, the wrapper will narrow to `tests/unit` only — say so explicitly in your summary so the operator knows hardware tiers were skipped.
21+
22+
3. **On pass**: one-line summary of the form `N passed, M skipped in <duration>`. Don't enumerate the 52 test names — the user can read those. Do mention if any test was SKIPPED for a NON-placeholder reason (e.g. "role not present on hub" is worth flagging).
23+
24+
4. **On failure**: for every FAILED test, open `mcp-server/tests/report.html` and extract the `Meshtastic debug` section for that test. pytest-html embeds the firmware log stream + device state dump there; the 200-line firmware log tail is usually enough to explain the failure. Summarise: which test, one-line assertion message, the firmware log lines that matter (things like `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`).
25+
26+
5. **Classify the failure** as one of:
27+
- **Transient/flake**: LoRa collision, timing-sensitive assertion, first-attempt NAK + successful retry pattern. Propose `/repro <test_node_id>` to confirm.
28+
- **Environmental**: device unreachable, port busy, CP2102 driver wedged. Suggest the specific recovery (replug USB, `touch_1200bps`, check `git status userPrefs.jsonc`).
29+
- **Regression**: same assertion fails repeatedly, firmware log shows a new/unusual error. Surface the diff between expected and observed, identify the module likely responsible.
30+
31+
6. **Never run destructive recovery automatically.** If a failure looks like it needs a reflash, factory*reset, or USB replug, \_describe what to do* — don't execute. The operator decides.
32+
33+
## Arguments handling
34+
35+
- No args → wrapper's defaults (full suite).
36+
- `$ARGUMENTS` passed verbatim to the wrapper, which passes them to pytest.
37+
- Common operator invocations: `/test tests/mesh`, `/test tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip`, `/test --force-bake`, `/test -k telemetry`.
38+
39+
## Side-effects to mention in summary
40+
41+
- The session fixture snapshots `userPrefs.jsonc` at session start and restores at teardown (plus on `atexit`). After a clean run, `git status userPrefs.jsonc` should be empty. If the wrapper's pre-flight printed a warning about a stale sidecar, call that out — means a prior session crashed.
42+
- `mcp-server/tests/report.html` and `junit.xml` are regenerated on every run; the HTML is self-contained (shareable).

0 commit comments

Comments
 (0)