Skip to content

Feature: Web UI#556

Draft
AltayAkkus wants to merge 5 commits into
internetarchive:mainfrom
AltayAkkus:feat-web-ui
Draft

Feature: Web UI#556
AltayAkkus wants to merge 5 commits into
internetarchive:mainfrom
AltayAkkus:feat-web-ui

Conversation

@AltayAkkus

@AltayAkkus AltayAkkus commented Feb 3, 2026

Copy link
Copy Markdown
Contributor

#74 proposed adding a UI to manage Zeno, I came up with the following:

General architecture

My proposal is to heavily extend the already existing API (currently only used by Prometheus), breaking out as many functions as possible.

  1. The API shall act as a general control interface, not only for our Web UI but also for use-cases like automations

(Example given: A seed was unsuccessful, we did not find any other pages. A script automatically performs a Google dork search site:example.com, and adds the newly found URLs back to the reactor. Or you bruteforce known paths, or use the robots.txt + bruteforce).

  1. The Web UI shall be interchangeable, because every org has different requirements and likes other flavors.

The API can now be configured via api-static-dir to serve a local directory, you can input the dist from Vite, or write barebones .html yourself.

  1. The API and WebUI shall allow easy, authenticated access, without turning Zeno into half-webserver half-webcrawler

All API endpoints (and the served files) are unauthenticated, and you can install a authenticator-proxy in front of it
(I used Cloudflare Tunnels, the daemon can be installed on Linux, and you can configure access via SSO, via password or via whitelisted e-mails. You can also use AWS Cognito with CloudFront, or whatever floats your boat)

I propose avoiding RBAC, SSO et cetera to keep this thing as simple as possible, with an API + directory file server everybody can get this up and running fast.

API feature list (to be extended)

  • Pause/Unpause
    Implemented and working, I had to fix a bug in a55719 (the pause would not stop the crawling, only if a new seed begun)

  • Live-tailed Frontier
    You can stream the Reactor StateTable via WebSocket now 🎉
    There is a poller which polls the Reactor, computes delta, and fan-out's the StateTable in realtime. More details below.

  • Add seeds to Reactor

  • Live-tail logs
    Similiar to the Frontier (websocket), conserving the full schema of logs (level and fields) so we can filter through the logs.

  • WebHook
    If you could register a WebHook on e.g. ERROR level logs, you could hook that up to Slack and get notifications whenever there is something wrong with your Zeno instance.

@AltayAkkus AltayAkkus marked this pull request as draft February 3, 2026 22:53
@AltayAkkus

AltayAkkus commented Feb 3, 2026

Copy link
Copy Markdown
Contributor Author

Frontier errata

I thought that just returning the StateTable via HTTP will lead to performance issues, especially when you want high-fidelity data
image

Caveats

The streamed items can be inconsistent depending on when you opened that WebSocket channel (referencing parent IDs which were fan-out before you started your WS). For now this is an necessary evil, if anyone has a nice idea on how we could solve this I would be really glad :)

The delta is calculated on item-level only, normally triggering a retransmit whenever the status of the Item changes. You could do this finer, to reduce the bandwidth of each WS.

We should also use something more efficient than JSON over-the-wire, CBOR seems like a good choice.

Demo

frontierdemo-720.mp4

(note: the entire Web UI is vibe-coded for now to validate the API. I did not include it into this branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant