A GPT-style autoregressive language model built from scratch using PyTorch. No pre-built transformer classes. No model hubs. Every component — attention, tokenization, training, and inference — written from first principles.
Fornax is a decoder-only transformer that you train on your own corpus and run inference against through a web interface. It implements modern architectural choices including RoPE positional encoding, SwiGLU activations, RMSNorm, and KV-cached autoregressive generation.
It is not a fine-tuned checkpoint. It is not a wrapper. It is a full language model pipeline from raw text to generated output.
| Layer | Technology |
|---|---|
| Model | PyTorch 2.x |
| Tokenizer | HuggingFace Tokenizers (BPE) |
| Backend | FastAPI + SQLAlchemy + PostgreSQL |
| Migrations | Alembic |
| Frontend | React + Vite + Framer Motion |
| Tracking | Weights and Biases |
| Infrastructure | Docker + Docker Compose |
Decoder-only transformer with the following design decisions:
- Rotary Positional Embeddings (RoPE) for relative position awareness
- Multi-head causal self-attention with KV cache at inference
- SwiGLU feed-forward networks
- Pre-LayerNorm with RMSNorm
- BPE tokenizer trained on your corpus
- AdamW with cosine decay and linear warmup
- Top-k and nucleus (top-p) sampling at inference
Prerequisites: Docker and Docker Compose installed.
Clone the repository and create your environment file:
cp .env.example .envStart the database and API:
docker compose up -dRun database migrations:
docker compose exec api alembic upgrade headDownload a corpus and begin training:
python scripts/download_corpus.py
python train.pyOnce a run reaches completed status, open the frontend:
cd frontend
npm install
npm run devNavigate to http://localhost:5173 to start generating.
All hyperparameters are configured through the Pydantic models in config/.
Key defaults:
| Parameter | Default |
|---|---|
| d_model | 256 |
| n_layers | 6 |
| n_heads | 8 |
| vocab_size | 8000 |
| max_seq_len | 256 |
| batch_size | 32 |
| max_steps | 5000 |
| learning rate | 3e-4 |
To resume a stopped run, set RESUME_CHECKPOINT in .env to the checkpoint path.
Set WANDB_API_KEY in .env to enable Weights and Biases logging.
If the key is absent, training continues without remote logging.
MIT