frit

GPU reliability engineering at homelab scale. One GPU, the full inference stack, and the reliability practices that hold at 1000.

M0 shipped · M2 shipped · M3 active -- live status at 75asu.github.io/frit

What it is

A public lab that runs a frontier-lab-style inference stack on a single NVIDIA GPU, then practices the work that actually matters at fleet scale: GPU observability, SLOs, load testing, chaos, and postmortems. Every milestone ships a real artifact.

Production platform	frit equivalent
Chat UI (Claude.ai, ChatGPT)	Open WebUI
Model API gateway	LiteLLM
Model serving	vLLM + Qwen3-4B on a Tesla T4
GPU observability	DCGM + Prometheus + Grafana

Architecture

One GCP Spot Tesla T4, single-node k3s, reconciled by Flux (GitOps). The request path mirrors a production serving stack -- Open WebUI to LiteLLM to vLLM to Qwen3-4B -- with a CloudNativePG data tier, Vault/ESO secrets, and the kube-prometheus stack.

_{Source: docs/architecture.drawio -- regenerate both themes with make diagram.}

Milestones

#	Milestone	Status
M0	GPU foundation -- driver, DCGM, k3s, GPU-in-k8s	shipped
M1	GPU metrics exporter -- NVML to Prometheus, Go	queued
M2	Observability stack -- GPU Operator + kube-prometheus via Flux	shipped
M3	Inference layer -- vLLM + LiteLLM + Open WebUI, TTFT dashboard	active
M4	SLOs + alerting -- error budgets, burn-rate alerts	queued
M5	Multi-platform simulation -- canary routing, equivalence checks	queued
M6	Load testing -- ramp / soak / spike, breaking point	queued
M7	Chaos + postmortems -- experiments and blameless writeups	queued
M8	OSS cadence -- ops reviews, merged upstream PRs	queued

Stack

Inference -- vLLM, LiteLLM, Open WebUI
Observability -- DCGM, Prometheus, Grafana, Alertmanager
Platform -- k3s + Flux (GitOps), Vault + External Secrets, CloudNativePG
GPU -- NVIDIA Tesla T4 (16 GB); any NVIDIA GPU works

Quick start -- bare VM to a running, observable stack, one command per step

Bring any NVIDIA GPU VM (a GCP Spot T4 is the reference; any Ubuntu host with a GPU works). No secrets touch git -- .env and the rendered inventory are gitignored.

git clone https://github.com/75asu/frit.git && cd frit
cp .env.example .env        # GCP coords + TARGET_USER/SSH_KEY_PATH + secrets

make up                     # connect + gpu + k3s + bootstrap (Flux applies gitops/)
make tunnel                 # forward the k3s API, then:
make kubectl CMD="get pods -A"
make grafana                # open Grafana / Open WebUI over SSH (also: make webui)

make down                   # stop the VM -- disk + cluster persist (make up restores)
make teardown               # wipe back to bare Ubuntu, no residue

Run make help for the full command list.

MIT License -- by @75asu

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
ansible		ansible
bin		bin
docs		docs
gitops		gitops
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

frit

What it is

Architecture

Milestones

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

frit

What it is

Architecture

Milestones

Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages