Changelog

All notable changes to this project are documented here. Format follows Keep a Changelog.

[1.0.0] - 2026-04-05

Added

Models: Complete Pydantic v2 models (TaskId, Action, Scenario, EpisodeResult, etc.)
Scenarios: 30 synthetic PR scenarios (10 per task) with realistic Python diffs
Env: Full episode state machine with noise budget, reward calculation, and history tracking
Graders:
- bug_grader.py: Coverage + precision + severity-weighted scoring
- security_grader.py: Severity-accuracy-weighted scoring (CRITICAL misclassification penalized)
- arch_grader.py: Binary issue detection + verdict scoring + detail quality bonus
Config: Pydantic-settings config with all options documented in .env.example
Database: SQLModel persistence (EpisodeRecord, LeaderboardRecord, helpers)
API Endpoints:
- GET /stats: Aggregate metrics across all recorded episodes
- GET /episodes/{id}/replay: Full action-by-action replay for completed episodes
- GET /episodes: List active episodes with metadata
- GET /dashboard: Web dashboard (dark theme, live leaderboard, WebSocket event feed, stats cards)
Security:
- Rate limiting via slowapi: 60 req/min per IP (configurable)
- API key authentication: optional, off by default, enabled via API_KEY_ENABLED=true
- Added TrustedHostMiddleware and Security Headers (XSS, Frame protection)
Episode Lifecycle: Auto-cleanup of expired episodes every 5 minutes (default 1hr)
Leaderboard: Paginated /leaderboard?limit=N&offset=M&task_id=X
Baseline Agent: Full rewrite with argparse CLI, KeywordAgent (35 rules), LLMAgent (Claude)
Evaluation: scripts/evaluate.py for batch evaluation of all 30 scenarios with summary report and progress bars
Testing: 155+ parametrized tests with full coverage reporting.
Dockerization: Multi-stage builder + production builds with non-root user security.
CI/CD: Unified 5-job pipeline (lint, test, validate, docker-build, publish to GHCR).
Branding: Full rebrand to CodeLens., including signature iconography.

Fixed

CLI: Port mismatch in baseline.py (8000 → 7860) and added --url, --task, --seed CLI flags.
Crash Fixes: Leaderboard submit crash after list slicing (captured rank before slice).
WebSocket: Disconnect now handled with typed WebSocketDisconnect and clients.discard().
Metadata: Incoherent weight structure in openenv.yaml replaced with named, accurate pairs.
Security: Implemented TrustedHostMiddleware and hardened headers.

[0.1.0] - Initial Baseline Fork

Initial FastAPI skeleton.
In-memory episode storage.
Basic Dockerfile and Pylint-only CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[1.0.0] - 2026-04-05

Added

Fixed

[0.1.0] - Initial Baseline Fork

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[1.0.0] - 2026-04-05

Added

Fixed

[0.1.0] - Initial Baseline Fork