Batman V support for gateway respondd#291
Conversation
The file.recurse was wrapped in a file.directory_exists guard, so changes under lib/ never propagated to minions that already had /opt/respondd-<site>/. Drop the guard (file.recurse is idempotent), exclude the jinja-templated respondd_client.py and *.example files from the recurse, and add the templated files to the service.running require list so ordering is explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Required by the upcoming batman-adv generic-netlink client. Ships in Ubuntu 24.04 noble/main as 0.7.11; no pip required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helper.merge() referenced itertools.izip_longest — the Python 2 name — and itertools was not imported at all. Any list-merge path would have raised NameError. Import itertools and use zip_longest. Also narrow the bare except in call() to Exception so it no longer swallows KeyboardInterrupt or SystemExit, route the error through logging instead of print(), and narrow the getInterfaceMAC catch to (ValueError, KeyError, IndexError). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three unrelated style fixes that landed together:
* respondd.py: getStruct() had a rootName parameter no caller ever
passed. Remove it. Document that the returned dict is shared with
the cache so callers must not mutate it. Drop the misleading
"prevent loading more than once" TODO on alias.json — each
subclass loads it exactly once at startup, which is correct given
each site runs from its own working directory.
* nodeinfo.py: narrow the bare except in getInterfaceAddresses to
(ValueError, KeyError).
* statistics.py: replace `gateway != None` with `is not None` (PEP 8
/ ruff E711).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert print() calls to the logging module. ext-respondd.py configures basicConfig at startup; -v raises the level to DEBUG. systemd routes stderr to journald, so journalctl filters cleanly by priority. Also set PYTHONUNBUFFERED=1 in the unit so logging output is not stuck in the stdio buffer. The UDP receive loop in respondd_client.start() was a bare recvfrom/decode/dispatch chain — any host on the mesh bridge could flap the service with a non-UTF-8 probe. Wrap each step in try/except so probes that fail to decode or to dispatch are logged and skipped, not fatal. The dry-run path collapses too: the verbose/dry_run print block becomes a single log.debug from sendStruct, dropping the second json.dumps pass. ext-respondd.py -d also exits 0 — the test dump is a diagnostic, not a failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| continue | ||
| if mac == macBridge: | ||
| continue | ||
| if flags & BATADV_TT_CLIENT_ROAM: |
There was a problem hiding this comment.
why is the bitwise operatore needed here? I would generally discourage using it.
There was a problem hiding this comment.
Using anything but bitwise operators sound like a very complicated way of handling these flags.
BATADV_ATTR_TT_FLAGS is a u32 whose bits are defined by enum batadv_tt_client_flags in https://github.com/torvalds/linux/blob/v6.12/include/uapi/linux/batman_adv.h#L26-L75
Bits from 0 to 7 are called remote flags because they are sent on the wire.
Bits from 8 to 15 are called local flags because they are used for local
computations only.
What alternative would you propose?
maurerle
left a comment
There was a problem hiding this comment.
I think this looks good in general, but at the size of such a larger rewrite, I would think that it is more sustainable to add BATMAN_V support to:
https://github.com/ffnord/mesh-announce
which is use by various communities and not just FFMUC.
| # Shared when injected by ResponddClient; private per-instance | ||
| # otherwise (e.g. ext-respondd.py -d test mode). | ||
| if self._nl is None: | ||
| self._nl = BatadvNetlinkClient() |
There was a problem hiding this comment.
So the Respondd has self._nl - a BatadvNetlinkClient.
This has self._nl - a BatadvNetlink, which is a pyroute2 GenericNetlinkSocket.
I think this encapsulation with the same name is quite hard to read.
I also think that it would be better to not get rid of one layer here, by having the BatadvNetlinkClient directly take care of the reconnection of its GenericNetlinkSocket
There was a problem hiding this comment.
Good catch. I renamed Respondd._nl to Respondd._batadv and _get_nl() to _get_batadv().
I left BatadvNetlinkClient._nl as-is.
Regarding encapsulation:
pyroute2.GenericNetlinkSocket binds its socket at construction time in __init__, so reconnection in BatadvNetlinkClient._reset requires close() followed by re-instantiation. Separating the lifecycle owner BatadvNetlinkClient from the socket itself (BatadvNetlink) keeps that swap localised, rather than requiring a single class to replace its own socket during a request.
Do you have a better alternative in mind?
9b95468 to
a5780e5
Compare
Replace every `batctl meshif <bat> …` subprocess plus its text-output
regex parser with a generic-netlink client that talks to batman-adv
directly. The new lib/batadv_netlink.py exposes BatadvNetlink (raw
genlmsg wrapper) and BatadvNetlinkClient (lazy-bind + rebind-on-error
facade); ResponddClient owns one client and injects it into Nodeinfo,
Statistics and Neighbours so the daemon runs a single batman-adv
socket.
Migrated call sites:
* Neighbours._get: BATADV_CMD_GET_NEIGHBORS — direct ELP neighbours
keyed by BATADV_ATTR_NEIGH_ADDRESS, throughput as raw kbit/s,
mirroring gluon-mesh-batman-adv. Drops the old originator-table
walk that required `origin == nexthop` filtering, which on
Freifunk gateways with a dummy primary plus a VXLAN backbone
rejected every row and emitted {"batadv": {}}.
* Nodeinfo.getBatmanInterfaces / Statistics.getClients /
Statistics.getGateway / Nodeinfo.getVPNFlag /
ResponddClient.start (mcast joins): BATADV_CMD_GET_HARDIF,
BATADV_CMD_GET_TRANSTABLE_LOCAL, BATADV_CMD_GET_GATEWAYS,
BATADV_CMD_GET_MESH respectively. TT flags become a u32 bitfield
check (BATADV_TT_CLIENT_ROAM / WIFI), independent of batctl's
print format.
Also drops the dead Batman IV and wifi-station code paths
(getStationDump, "wireless" interface type, mesh-wlan config),
which the netlink-only path makes unnecessary: deployment targets
run batman V exclusively and have no wifi.
Visible behaviour change worth calling out: getGateway now reports
strictly the BATADV_ATTR_FLAG_BEST gateway. The old regex matched
both "*" (selected) and "=>" (announced) markers and overwrote its
return value on each match — so it returned whichever batctl printed
last, typically the "=>" row during a transition. yanic's struct.go
does not distinguish the two and IsGateway() derives from
nodeinfo.vpn instead, so the change is invisible downstream and
stabilises meshviewer output during transitions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a5780e5 to
4b7b041
Compare
|
@maurerle I went ahead and made sure that the changes here are in sync with freifunk-gluon/gluon#3736 , like the |
This fixes a few things in the respondd on the gateway, drops WiFi support, drops Batman IV support and adds Batman V support.
Tested on ffwert-events on gw05.
This is part of the Batman V series: freifunkMUC/site-ffm#842