NOMOS — AI Code Governance & Benchmark · v0.12

17 deterministic rules | 53K+ RC events | 2 languages | self-scanning

"Every benchmark tells you which model is strongest. NOMOS tells you which one can survive reality."

What NOMOS Is

NOMOS is a code governance engine. Its first domain is C# and Python; the architecture is domain-agnostic — any field where AI generates output and a deterministic verifier exists can plug into the same pipeline.

Typical tool	NOMOS
Rules written once, never change	Rules evolve from Reference Channel data
Runs on your code, never on itself	`run.py` scans `dev/` — governance governed
Tests models, reports scores	RC events feed QLoRA fine-tuning
Single-language	Language profiles (C# via tree-sitter, Python via regex)

The Governance Engine (v0.1/dev/)

L0  Infrastructure    —  tree-sitter AST, RC storage (JSONL + SHA-256)
L1  LLM Gateway       —  multi-model routing, trace
L2  Rule Engine       —  17 rules (AST + regex + multi-file), multi-language
L3  Planning          —  scan strategy
L4  Analysis          —  constitution extraction, self-inspection
L5  Competition       —  shadow evaluation, threshold auto-tuning
L6  Builder           —  Agent instruction injection

Architecture design → | Implementation details →

The Benchmark Matrix (5 Dimensions)

Model	Python80	C#30	C# Multi-8	Fuzzy20	Pollution10
coder:6.7b	72%	70%	62%	100%*	80%
qwen:7b	80%	87%	88%	95%*	30%
v4-flash (cloud)	81%	100%	88%	100%*	100%

Pollution: 7 cross-domain instruction layers + 200-turn VS Code Agent context injected.
* Fuzzy prompts: all models PASS but default to single-file (f=1/N) — engineering unusable.

Full leaderboard →

Key Discoveries

Clean benchmarks lie. qwen: 100% clean → 30% polluted. You'd ship it. It'd crash.
Size ≠ resilience. 6.7B coder beats cloud v4-flash in pollution resistance.
Pollution quality > quantity. 100K same-domain noise = no effect. 15K cross-domain instructions = 0-30% destruction.
Multi-file is the real cliff. L2 two-file: all pass. L3 three-file+interface: coder drops to 0%.
Even 1.3T fails. v4-pro needed 13 human corrections for 4 tasks under 392K context.

Quick Start

git clone https://github.com/guilingzhouyi-creator/NOMOS.git
cd NOMOS/v0.1/dev

# Scan Small_WarThunder (C#, auto-detect)
python run.py

# Scan Python codebase
python run.py --lang py

# Run benchmarks
cd .. && python _long_prompt_bench.py     # Python 80 problems
python _cs_bench.py                       # C# 30 problems
python _cs_multifile_bench.py             # C# multi-file L2-L4
python _pollution_bench.py                # Context pollution

Repository Structure

v0.1/
├── dev/                        —  Governance engine (102 modules, 8.3K lines)
│   ├── run.py                  —  Entry point (@step pipeline, multi-language)
│   ├── l0/                     —  Infrastructure (AST, RC, config, MCP)
│   ├── l1/                     —  LLM gateway (router, client, trace)
│   ├── l2/                     —  Rule engine (17 rules, pipeline, constants)
│   ├── l3/ l4/ l5/ l6/         —  Higher-order governance layers
│   ├── rc/                     —  Batch scanners, GitHub integration
│   └── tools/                  —  Fingerprint, doc check, MCP server
├── .reference_channel/         —  53,000+ SHA-256 verified events
├── _*_bench.py                 —  5-dimension benchmark suite
├── _qlora_train.py             —  QLoRA fine-tuning pipeline
├── output/ train_data/         —  Generated artifacts & training pairs
└── LEADERBOARD.md              —  Full multi-model matrix

Research Questions

How do models degrade under Agent-style context pollution?
Can a 7B local model beat cloud models when fine-tuned with compliance data?
What's the multi-file decoupling cliff — and which models survive it?
Can governance rules self-evolve from accumulated Reference Channel data?
Does the architecture generalize beyond code to other AI-output domains?

Citation

@software{NOMOS2026,
  author = {guilingzhouyi},
  title = {NOMOS: A Self-Improving AI Code Governance \& Benchmark System},
  year = {2026},
  url = {https://github.com/guilingzhouyi-creator/NOMOS}
}

License

MIT — use it, fork it, build on it.

Built by a university student + an LLM agent. First discovered the problem, then built the solution.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
docs		docs
v0.1		v0.1
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOMOS — AI Code Governance & Benchmark · v0.12

What NOMOS Is

The Governance Engine (v0.1/dev/)

The Benchmark Matrix (5 Dimensions)

Key Discoveries

Quick Start

Repository Structure

Research Questions

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NOMOS — AI Code Governance & Benchmark · v0.12

What NOMOS Is

The Governance Engine (v0.1/dev/)

The Benchmark Matrix (5 Dimensions)

Key Discoveries

Quick Start

Repository Structure

Research Questions

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages