This guide walks through preparing a local development environment, running the automated checks, and working with the Rust acceleration layer that now powers the core runtime. It is the source of truth for contributor workflow details.
- Python 3.10+
pipand a virtual environment tool of your choice (the examples below usepython -m venv)- A Rust toolchain (
rustupor system packages) andmaturinfor compiling the PyO3 extensions
-
Clone the repository and create an isolated environment:
git clone https://github.com/osoleve/glitchlings.git cd glitchlings python -m venv .venv source .venv/bin/activate
-
Install the package in editable mode with the development dependencies:
pip install -e .[dev]
Add the
primeextra (pip install -e .[dev,prime]) when you need the Prime Intellect integration and itsverifiersdependency. -
Install the git hooks so the shared formatting, linting, and type checks run automatically:
pre-commit install
Execute the automated tests from the repository root:
pytestThe suite covers determinism guarantees, dataset integrations, and the compiled Rust implementation that now backs orchestration.
Run the shared quality gates before opening a pull request:
ruff check .
python -m mypy --config-file pyproject.toml src
uv build
pytest-
Rebuild the Rust extension after editing files under
rust/zoo/:uv build -Uq
-
Regenerate the CLI reference page, Monster Manual (both repo root and docs site copies), and glitchling gallery together with:
python -m glitchlings.dev.docs # or, once installed: glitchlings-refresh-docs
All glitchlings support two base-class parameters for controlling which regions of text are corrupted:
A list of regex patterns marking text that must not be modified. Matched regions are treated as immutable and passed through unchanged.
from glitchlings import Typogre
# Preserve HTML tags while corrupting surrounding text
typo = Typogre(rate=0.1, exclude_patterns=[r"<[^>]+>"])
typo("<h1>Welcome</h1> to the show!")
# -> "<h1>Welcoem</h1> to teh shwo!"
# Protect code blocks in Markdown
from glitchlings import Gaggle, Mim1c, Rushmore
gaggle = Gaggle(
[Mim1c(rate=0.02), Rushmore(rate=0.01)],
seed=404,
exclude_patterns=[r"```[\s\S]*?```", r"`[^`]+`"],
)A list of regex patterns restricting corruption to only matched regions. Text outside these matches is treated as immutable.
from glitchlings import Typogre
# Only corrupt text inside backticks
typo = Typogre(rate=0.5, include_only_patterns=[r"`[^`]+`"])
typo("Run `echo hello` to test")
# -> "Run `ecoh helo` to test"When patterns are set on a Gaggle, they apply to all member glitchlings and merge with any patterns set on individual glitchlings:
from glitchlings import Gaggle, Typogre, Mim1c
# Protect system tags for the entire roster
gaggle = Gaggle(
[Typogre(rate=0.02), Mim1c(rate=0.01)],
seed=404,
exclude_patterns=[r"<system>.*?</system>"],
)Pass patterns directly in the glitchling specification:
glitchlings -g "Typogre(rate=0.1, exclude_patterns=['<[^>]+>'])" "<b>Bold</b> text"The codebase follows a layered architecture that separates pure (deterministic, side-effect-free) code from impure (stateful, side-effectful) code, and requires all defensive coding to occur at module boundaries instead of all throughout. This pattern improves maintainability, testability, and clarity, especially when working with AI coding agents that tend to add defensive checks everywhere.
Pure functions:
- Return the same output given the same inputs
- Have no side effects (no IO, logging, or external state mutation)
- Do not manipulate RNG objects directly—they accept pre-computed random values
Impure code includes:
- File IO (configuration loading, cache reading/writing)
- Rust FFI calls via
get_rust_operation() - RNG state management (
random.Randominstantiation, seeding) - Optional dependency imports (
compat.pyloaders) - Global state access (
get_config(), cached singletons)
The zoo subpackage organizes code by purity:
| Module | Type | Purpose |
|---|---|---|
zoo/validation.py |
Pure | Boundary validation, rate clamping, parameter normalization |
zoo/transforms.py |
Pure | Text tokenization, keyboard processing, string diffs, word splitting |
zoo/rng.py |
Pure | Seed resolution, hierarchical derivation |
compat/types.py |
Pure | Type definitions for optional dependency loading |
conf/types.py |
Pure | Configuration dataclasses (RuntimeConfig, AttackConfig) |
constants.py |
Pure | Centralized default values and constants |
attack/compose.py |
Pure | Result assembly helpers |
attack/encode.py |
Pure | Tokenization helpers |
attack/metrics_dispatch.py |
Pure | Metric dispatch logic |
internal/rust.py |
Impure | Low-level Rust FFI loader and primitives |
internal/rust_ffi.py |
Impure | Centralized Rust operation wrappers (preferred) |
compat/loaders.py |
Impure | Optional dependency lazy loading machinery |
conf/loaders.py |
Impure | Configuration file loading, caching, Gaggle construction |
Validation and defensive code belong at module boundaries where untrusted input enters:
- CLI argument parsing (
main.py) - Public API entry points (
Glitchling.__init__,Attack.__init__) - Configuration loaders (
conf/module)
Core transformation functions inside these boundaries should:
- Trust that inputs are already validated
- NOT check for
Noneon required parameters - NOT re-validate types that the boundary already checked
- NOT add defensive
try/exceptaround trusted calls
# In validation.py (boundary layer)
def validate_rate(rate: float | None) -> float:
if rate is None:
raise ValueError("rate cannot be None")
if not isinstance(rate, (int, float)):
raise TypeError("rate must be numeric")
if math.isnan(rate):
return 0.0
return max(0.0, min(1.0, float(rate)))
# In typogre.py (uses boundary layer, trusts result)
def fatfinger(text: str, rate: float, ...) -> str:
# rate is already validated - just use it
return keyboard_typo_rust(
text,
rate,
layout,
seed,
shift_slip_rate=slip_rate,
shift_slip_exit_rate=slip_exit_rate,
shift_map=shift_map,
)# DON'T: Re-validate everywhere
def some_pure_transform(text: str, rate: float) -> str:
# Bad: re-validating what boundary should have checked
if rate is None:
raise ValueError("rate cannot be None")
if not isinstance(rate, (int, float)):
raise TypeError("rate must be numeric")
if math.isnan(rate):
rate = 0.0
# ... actual logicPure modules must follow strict import rules:
-
Pure modules can only import from:
- Python standard library
- Other pure modules
-
Pure modules must NOT import:
glitchlings.internal.rustglitchlings.compat- Any module that triggers side effects at import time
-
Use TYPE_CHECKING guards for type-only imports:
from typing import TYPE_CHECKING if TYPE_CHECKING: from glitchlings.zoo.core import Glitchling
The architecture is enforced by automated tests in tests/test_purity_architecture.py:
pytest tests/test_purity_architecture.py -vThese tests verify:
- Pure modules don't import impure modules
- Pure modules only use stdlib imports
- All pure modules have docstrings documenting their purity guarantees
AI coding agents tend to add defensive checks everywhere. This architecture makes it explicit:
- If you're in a
pure/ortransforms/module: trust your inputs - If you're at a boundary: validate thoroughly once
- If you're unsure: check which layer the file belongs to
This reduces noise in the codebase and makes the agent-written code more consistent with human-written code.