Skip to content

Batman V support for gateway respondd#291

Open
grische wants to merge 6 commits into
freifunkMUC:mainfrom
grische:batman-v-respondd
Open

Batman V support for gateway respondd#291
grische wants to merge 6 commits into
freifunkMUC:mainfrom
grische:batman-v-respondd

Conversation

@grische
Copy link
Copy Markdown
Contributor

@grische grische commented May 3, 2026

This fixes a few things in the respondd on the gateway, drops WiFi support, drops Batman IV support and adds Batman V support.

Tested on ffwert-events on gw05.

This is part of the Batman V series: freifunkMUC/site-ffm#842

grische and others added 5 commits May 3, 2026 21:38
The file.recurse was wrapped in a file.directory_exists guard, so
changes under lib/ never propagated to minions that already had
/opt/respondd-<site>/. Drop the guard (file.recurse is idempotent),
exclude the jinja-templated respondd_client.py and *.example files
from the recurse, and add the templated files to the service.running
require list so ordering is explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Required by the upcoming batman-adv generic-netlink client. Ships
in Ubuntu 24.04 noble/main as 0.7.11; no pip required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helper.merge() referenced itertools.izip_longest — the Python 2
name — and itertools was not imported at all. Any list-merge path
would have raised NameError. Import itertools and use zip_longest.

Also narrow the bare except in call() to Exception so it no longer
swallows KeyboardInterrupt or SystemExit, route the error through
logging instead of print(), and narrow the getInterfaceMAC catch
to (ValueError, KeyError, IndexError).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three unrelated style fixes that landed together:

  * respondd.py: getStruct() had a rootName parameter no caller ever
    passed. Remove it. Document that the returned dict is shared with
    the cache so callers must not mutate it. Drop the misleading
    "prevent loading more than once" TODO on alias.json — each
    subclass loads it exactly once at startup, which is correct given
    each site runs from its own working directory.
  * nodeinfo.py: narrow the bare except in getInterfaceAddresses to
    (ValueError, KeyError).
  * statistics.py: replace `gateway != None` with `is not None` (PEP 8
    / ruff E711).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert print() calls to the logging module. ext-respondd.py
configures basicConfig at startup; -v raises the level to DEBUG.
systemd routes stderr to journald, so journalctl filters cleanly by
priority. Also set PYTHONUNBUFFERED=1 in the unit so logging output
is not stuck in the stdio buffer.

The UDP receive loop in respondd_client.start() was a bare
recvfrom/decode/dispatch chain — any host on the mesh bridge could
flap the service with a non-UTF-8 probe. Wrap each step in
try/except so probes that fail to decode or to dispatch are logged
and skipped, not fatal. The dry-run path collapses too: the
verbose/dry_run print block becomes a single log.debug from
sendStruct, dropping the second json.dumps pass.

ext-respondd.py -d also exits 0 — the test dump is a diagnostic,
not a failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
continue
if mac == macBridge:
continue
if flags & BATADV_TT_CLIENT_ROAM:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the bitwise operatore needed here? I would generally discourage using it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using anything but bitwise operators sound like a very complicated way of handling these flags.

BATADV_ATTR_TT_FLAGS is a u32 whose bits are defined by enum batadv_tt_client_flags in https://github.com/torvalds/linux/blob/v6.12/include/uapi/linux/batman_adv.h#L26-L75

Bits from 0 to 7 are called remote flags because they are sent on the wire.
Bits from 8 to 15 are called local flags because they are used for local
computations only.

What alternative would you propose?

Copy link
Copy Markdown

@maurerle maurerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good in general, but at the size of such a larger rewrite, I would think that it is more sustainable to add BATMAN_V support to:
https://github.com/ffnord/mesh-announce

which is use by various communities and not just FFMUC.

Comment thread respondd/respondd-tmpl/lib/respondd.py Outdated
# Shared when injected by ResponddClient; private per-instance
# otherwise (e.g. ext-respondd.py -d test mode).
if self._nl is None:
self._nl = BatadvNetlinkClient()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the Respondd has self._nl - a BatadvNetlinkClient.
This has self._nl - a BatadvNetlink, which is a pyroute2 GenericNetlinkSocket.

I think this encapsulation with the same name is quite hard to read.
I also think that it would be better to not get rid of one layer here, by having the BatadvNetlinkClient directly take care of the reconnection of its GenericNetlinkSocket

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I renamed Respondd._nl to Respondd._batadv and _get_nl() to _get_batadv().
I left BatadvNetlinkClient._nl as-is.

Regarding encapsulation:
pyroute2.GenericNetlinkSocket binds its socket at construction time in __init__, so reconnection in BatadvNetlinkClient._reset requires close() followed by re-instantiation. Separating the lifecycle owner BatadvNetlinkClient from the socket itself (BatadvNetlink) keeps that swap localised, rather than requiring a single class to replace its own socket during a request.

Do you have a better alternative in mind?

@grische grische force-pushed the batman-v-respondd branch from 9b95468 to a5780e5 Compare May 12, 2026 17:22
Replace every `batctl meshif <bat> …` subprocess plus its text-output
regex parser with a generic-netlink client that talks to batman-adv
directly. The new lib/batadv_netlink.py exposes BatadvNetlink (raw
genlmsg wrapper) and BatadvNetlinkClient (lazy-bind + rebind-on-error
facade); ResponddClient owns one client and injects it into Nodeinfo,
Statistics and Neighbours so the daemon runs a single batman-adv
socket.

Migrated call sites:

  * Neighbours._get: BATADV_CMD_GET_NEIGHBORS — direct ELP neighbours
    keyed by BATADV_ATTR_NEIGH_ADDRESS, throughput as raw kbit/s,
    mirroring gluon-mesh-batman-adv. Drops the old originator-table
    walk that required `origin == nexthop` filtering, which on
    Freifunk gateways with a dummy primary plus a VXLAN backbone
    rejected every row and emitted {"batadv": {}}.
  * Nodeinfo.getBatmanInterfaces / Statistics.getClients /
    Statistics.getGateway / Nodeinfo.getVPNFlag /
    ResponddClient.start (mcast joins): BATADV_CMD_GET_HARDIF,
    BATADV_CMD_GET_TRANSTABLE_LOCAL, BATADV_CMD_GET_GATEWAYS,
    BATADV_CMD_GET_MESH respectively. TT flags become a u32 bitfield
    check (BATADV_TT_CLIENT_ROAM / WIFI), independent of batctl's
    print format.

Also drops the dead Batman IV and wifi-station code paths
(getStationDump, "wireless" interface type, mesh-wlan config),
which the netlink-only path makes unnecessary: deployment targets
run batman V exclusively and have no wifi.

Visible behaviour change worth calling out: getGateway now reports
strictly the BATADV_ATTR_FLAG_BEST gateway. The old regex matched
both "*" (selected) and "=>" (announced) markers and overwrote its
return value on each match — so it returned whichever batctl printed
last, typically the "=>" row during a transition. yanic's struct.go
does not distinguish the two and IsGateway() derives from
nodeinfo.vpn instead, so the change is invisible downstream and
stabilises meshviewer output during transitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@grische grische force-pushed the batman-v-respondd branch from a5780e5 to 4b7b041 Compare May 12, 2026 17:23
@grische
Copy link
Copy Markdown
Contributor Author

grische commented May 12, 2026

@maurerle I went ahead and made sure that the changes here are in sync with freifunk-gluon/gluon#3736 , like the BATADV_TT_CLIENT_NOPURGE changes and emitting best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants