GitHub - Babyhamsta/Fenceline: Free, self-hosted, on-device CIPA web filtering for managed Chrome — enforces on every network, no server, no subscription.

Self-hosted, on-device CIPA web filtering for managed Chrome.

No subscription · no server to run · no per-request API calls · no backhaul.
The filter lives on the device and enforces on every network — school, home, hotspot, or offline.

Fenceline is a content filter you actually own. A GitHub Action compiles free categorized blocklists into static files every couple of days; a Manifest V3 extension — force-installed through your admin console — syncs them and matches every navigation on the device. Nothing is proxied, nothing is logged off-box, and there is no recurring cost. It runs on managed Chromebooks (the strongest story — Chrome is the device) and equally on managed Chrome for Windows, macOS, and Linux via Chrome Browser Cloud Management or OS policy.

Blocklists are compiled every two days by a GitHub Action from free categorized sources (UT1, HaGeZi, hosts-format lists, your own) and published as static files on GitHub Pages. "Updating the filter" = the Action committing new artifacts; forcing a change = editing lists/block.txt / lists/allow.txt and pushing.
The extension is force-installed via managed policy on the student OU and matches every navigation on-device — ~6 µs per check at 2M domains in benchmarks. It polls a tiny version file every 12 hours (ETag/304, ~1 KB) and downloads the full list at most once every 7 days, so fleet bandwidth stays trivial.
Logging records blocked attempts only — domain, category, timestamp, and which layer blocked it. No browsing history, nothing leaves the device. The report page shows lifetime counts by category/domain/day and exports CSV/JSON.
The block page is plain HTML/CSS/JS — restyle it freely (docs/CUSTOMIZING.md), with district branding injectable via managed policy without forking.

How it blocks — layered, all on-device

Each layer is cheaper and earlier than the one below it, so the common case never reaches the expensive checks. Every block is attributed to the layer that caught it, visible on the block page and in the report.

Layer	Catches	How
1 · Network rules	the ~500k most-popular blocked domains	`declarativeNetRequest` blocks in the network stack before the request leaves — zero flash, enforced even while the service worker is asleep
2 · Tail engine	every other listed domain (millions)	the full list as a sorted array of u64 hashes, checked on `webNavigation.onBeforeNavigate` and redirected
3 · Content model	pages the lists miss — judged by what the page actually says	a tiny on-device text classifier (below)
3b · Glyph-cipher guard	pages that scramble their own text to blind the model	a statistical fingerprint of substitution-cipher fonts (below)
4 · Proxy & evasion detection	web-proxies / "unblockers" that tunnel past everything above	behavioural signals — not names (below)

Tier 1 exists because webNavigation cannot cancel a navigation — it can only observe and redirect after the fact. Putting the domains students actually hit into network rules means the common case blocks atomically with no flash; the tail catches the long tail and also resolves the category when a Tier-1 block (net::ERR_BLOCKED_BY_CLIENT) fires. Subdomains of every listed domain are blocked automatically in both tiers.

The list pipeline

flowchart LR
    A["<b>GitHub Action</b> — every 2 days<br/><br/>compiler/compile.mjs<br/>pull UT1 · HaGeZi · …<br/>categorize · dedupe<br/>apply allow / block<br/>emit lists + model"]
    P["<b>GitHub Pages</b> — static, CDN<br/><br/>meta.json<br/>dnr/*.json<br/>tail.bin<br/>model.bin"]
    E["<b>Managed Chrome</b> — MV3, force-installed<br/><br/>Tier 1 · network rules<br/>Tier 2 · tail engine<br/>Tier 3 · content model"]
    A -->|publish| P
    P -->|"device sync<br/>12 h version check · full pull ≤ 7 days"| E

The content model (Tier 3)

A hybrid classifier that reads a page after it loads and blocks it if it confidently matches a filtered category (adult, gambling, games, proxy-bypass) the lists missed. The lists stay primary; this is the async backstop that generalises to sites nobody has listed yet. It pairs a fast text model with a small tree model over the page's structure, because text alone can't tell a page that IS a proxy from a page ABOUT proxies — same words, same score. A Wikipedia "Proxy server" article and a working web proxy are identical to a bag-of-words; what separates them is that the article never instantiates a proxy's functional element (a URL box, an embedded-URL path).

What it sees. A content script (extension/content/structural-features.js, the one extractor shared with the offline scraper so train/infer vectors are identical by construction) derives a fixed numeric vector locally — nothing leaves the device:

Text: title, meta description, and the first few hundred words of visible body text (data URIs stripped, capped). Also reads in-page "browser" proxies that draw a real site into an about:blank document, which URL filtering can't see.
URL/host: length, path depth, host/path entropy, digit ratio, cheap-TLD flag, embedded-URL-in-path, per-category keyword hits.
Structure: tag histogram, DOM depth, link density, paragraph count, third-party-script ratio, iframe/canvas/video composition, payment/password fields, and the "is-vs-about" functional-element flags (a URL box, a dominant game canvas, a casino iframe, an age gate).
Resource fingerprints: known adult-ad / gambling-affiliate / crypto-widget hosts, CGI-proxy software markers (Glype/CGIProxy/PHProxy), gambling license seals.

How it scores (two stages, then a hybrid rule).

Text model — words and char 3-/4-grams hashed with FNV-1a into 65,536 signed buckets (the "hashing trick" — no vocabulary file, fixed memory), run through a 5-class multinomial logistic regression. ~1.3 MB, microseconds.
Fusion model — a gradient-boosted decision tree (fusion.json) over the 5 text scores plus the ~60 structural/URL/fingerprint scalars. A tree natively models the conjunction text can't: proxy words AND a URL box AND not prose ⇒ block; proxy words AND article structure ⇒ clean.
Hybrid decision — the fusion model is the primary call (it learned is-vs-about, so it cleans articles the text model would block). The text model is a high-recall backstop for true positives the tree misses (a logged-out casino landing page, an atypical games portal) — but a structural article guard (prose-rescue.js: low link-density + real paragraphs + no functional element) suppresses the backstop on genuine articles, so the text model's vocabulary false-positives never leak through. Search-result pages are exempt upstream (a SERP is structurally a link hub and scores on whatever the student typed). Per-category thresholds tuned for ~1.5% clean false-positives.

Why it's trustworthy. The whole pipeline is parity-checked end to end: the vectorizer and the tree walk are byte-identical in Python (training) and JavaScript (device) — export_fusion.py asserts the exported trees reproduce sklearn's predict_proba exactly, and test_fusion_parity.mjs asserts the JS interpreter matches the Python reference, so the chain sklearn ≡ Python ≡ JS holds and the eval numbers are exactly what runs on the Chromebook. Cost is a few ms per page; assets (~1.3 MB text + ~3 MB tree) are pulled on version change with a SHA-256 check, and a baseline ships inside the extension so a fresh install is never unprotected. If the tree fails to load, decide() degrades to text-only.

How it's built. Trained on rendered text + structure from ~50k live sites (blocklist domains for the filtered classes, Tranco for clean), split leak-free by registrable domain. The decisive technique is hard-negative mining: pages with a category's vocabulary but a clean page's structure — Wikipedia topic articles, news, VPN/gambling coverage, sex-ed, interactive education — are mined into the training set so the tree learns to lean on structure. Policy is block-by-category-aggressively, allowlist the exceptions: VPN vendor/download pages and game portals are blocked (they're bypass tools / games), with the allowlist as the escape hatch. We publish the scraper, training/eval/export scripts, and the weights; we do not publish the scraped pages (third-party content). See classifier/README.md to reproduce.

Proxy & evasion detection (Tiers 3b + 4)

Web-proxies ("unblockers") are the hard case: a student loads one site that fetches and re-renders any other, slipping every domain list. They can rename every file, shuffle every script name, and obfuscate every line — so Fenceline keys on the behaviour a proxy can't avoid, never on names.

URL-in-path — every web proxy loads its target by embedding the destination URL in its own path, percent-encoded (/…/https%3A%2F%2F…) or base64. Legit sites pass a URL as a query parameter, never as a path segment.
Bare wire-protocol — most modern proxies tunnel through x-bare-* request headers; no legitimate site sends those.
App-as-image (SVG) — some proxies ship their whole UI inside an <svg><foreignObject> served under an .svg extension on a public CDN. Fenceline blocks a top-level SVG that smuggles an executable HTML app, while leaving real vector art and diagram exports (which carry no script) alone.
Glyph-cipher fonts — the toughest: a page renders its text through a substitution-cipher font, so it looks normal but the DOM text is gibberish the model can't read. Fenceline catches it by the statistical fingerprint — a long page drawn from a tiny fixed alphabet, which real language never is — regardless of the script, the declared language, or the font's name.

These detections block the page, not the origin. A blocked site reached via a shared service (web.archive.org, Google Translate, a public CDN) is blocked on that visit without permanently pinning the service, so legitimate use of those hosts keeps working.

Repo layout

extension/          the MV3 extension (load unpacked to dev-test)
  lib/hash.js       FNV-1a 64 + binary search — shared with the compiler
  lib/model.js      on-device classifier (mirrors classifier/infer.mjs)
  lib/detect/       behaviour-based evasion detectors (proxy-url, glyph-cipher, svg-app)
  lib/pins.js       block-but-never-pin store + no-pin host set
  content/          content scan + evasion guards
  block/            customizable block page
  report/           on-device report: stats, exports, force-sync
  policy/           managed-storage schema + example admin policy
  model/            bundled baseline model.bin + model-meta.json
compiler/           list compiler + sources.json (categories, tier sizing)
  no-pin-hosts.txt  synced block-but-never-pin baseline
classifier/         scraper, training/eval, model export (Python + JS parity)
tools/              build-time helpers (render-icons.py: SVG → manifest PNGs, needs Pillow)
lists/              allow.txt / block.txt district overrides
.github/workflows/  build + publish to gh-pages (every 2 days)
test/               selftest.mjs (engine e2e), detect.mjs (evasion detectors), popup.mjs
docs/               DEPLOYMENT.md (admin console, hardening), CUSTOMIZING.md

Quick start

node test/selftest.mjs        # sanity-check the toolchain
node compiler/compile.mjs     # full build into dist/ (downloads upstream lists)

Then follow docs/DEPLOYMENT.md: enable Pages, run the Action, publish the extension (Web Store unlisted), force-install on the student OU with the managed policy, and work the hardening checklist — the checklist is most of the real security; the extension can't compensate for an enabled guest mode or a Linux container.

Admin controls (managed policy, students can't touch)

listBaseUrl, sync intervals, allowDomains / extraBlockDomains overrides, the content model on/off and threshold, block-page branding, and whether the report page's Clear/Export buttons function. See extension/policy/example_admin_policy.json.

Privacy / FERPA posture

Only blocked attempts are logged, on-device, capped (5k distinct domains, 2k recent events, 400 days of daily counts). There is no telemetry, no remote logging, and the extension makes exactly two kinds of network request: fetching its own list artifacts and its own model file. The content model runs entirely on-device — page text is never sent anywhere.

Known limits and caveats (read before trusting your fleet to it)

It filters Chrome, not the machine. Guest mode, other browsers, and other devices are out of scope — that's admin-console policy (see the hardening checklist) and your network-edge filter. On a managed Chromebook this gap closes (Chrome is the device); on Windows/macOS, pair it with device policy that blocks other browsers.
Tier 2 flash: a long-tail block redirects after navigation starts; on a slow device, blocked content can render briefly. Popular domains live in Tier 1 specifically to avoid this where it matters most.
The content model is a backstop, not an oracle. It blocks confident matches only (0.90 threshold) to keep false positives low, which means it will miss borderline pages. It catches what the lists don't; it doesn't replace them.
Glyph-cipher edge: a cipher mapping into a small alphabet (e.g. Cyrillic) on a page that also spoofs its language tag is the one residual gap — defeating it needs font-coverage inspection, which Fenceline doesn't yet do.
CIPA scope: student-OU filtering alone doesn't complete CIPA/E-Rate certification (staff filtering, monitoring, and a board-adopted Internet Safety Policy are also required).
Log loss window: stats writes are debounced; if Chrome kills the service worker in that window, the last event can be lost. Counts are operational telemetry, not forensic evidence.

Blocklist sources & credits

Fenceline compiles only free, publicly maintained domain lists, fetched fresh at build time so coverage tracks upstream automatically. Huge thanks to the maintainers below — please review and honor each project's license before redistributing (UT1 and its mirrors are CC BY-SA, which requires attribution and share-alike). The authoritative list lives in compiler/sources.json.

Project	Feeds these categories	License
UT1 — Université Toulouse 1 Capitole (Fabrice Prigent)	adult, gambling, drugs, hate-violence, malware-phishing, proxy-bypass, social, games	CC BY-SA
HaGeZi DNS Blocklists	adult (NSFW), gambling, malware-phishing (TIF), proxy-bypass (DoH/VPN)	see repo
StevenBlack/hosts	adult (porn extensions), gambling	MIT
Sinfonietta/hostfiles	hate-violence	MIT
arkynx/blocklists	gambling	see repo
olbat/ut1-blacklists	drugs, games (daily UT1 mirror)	CC BY-SA
nickoppen/pihole-blocklists	social (TikTok, Snapchat, Reddit, Discord, Telegram, Meta), proxy-bypass (VPN), games (Roblox, Steam, Epic, Minecraft, Nintendo)	see repo
Mafraysse/AdGuard_GameList-Filter	games (browser-game portals)	see repo
oisd (sjhgvr/oisd)	adult (NSFW)	see repo
4skinSkywalker/Anti-Porn-HOSTS-File	adult	see repo
dibdot/DoH-IP-blocklists	proxy-bypass (DoH servers)	see repo

UT1 and HaGeZi refresh daily; the build Action re-pulls every 2 days, so the fleet's filter stays current with no manual list maintenance.

License

Fenceline's own code is MIT — see LICENSE. Upstream blocklists carry their own licenses; see Blocklist sources & credits above and review the terms of any source you add to compiler/sources.json.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github		.github
classifier		classifier
compiler		compiler
data		data
docs		docs
extension		extension
lists		lists
test		test
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How it blocks — layered, all on-device

The list pipeline

The content model (Tier 3)

Proxy & evasion detection (Tiers 3b + 4)

Repo layout

Quick start

Admin controls (managed policy, students can't touch)

Privacy / FERPA posture

Known limits and caveats (read before trusting your fleet to it)

Blocklist sources & credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How it blocks — layered, all on-device

The list pipeline

The content model (Tier 3)

Proxy & evasion detection (Tiers 3b + 4)

Repo layout

Quick start

Admin controls (managed policy, students can't touch)

Privacy / FERPA posture

Known limits and caveats (read before trusting your fleet to it)

Blocklist sources & credits

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages