Learner and developer focused on AI safety and agent reliability β teaching myself to build tools that hold AI systems to the same bar we'd hold an engineer: verify before you claim "done."
Founder & Operator, Onslaught Gaming LLC.
π What I'm building β Nemesis
A Python evaluation harness that turns real, observed AI-agent failure modes into automated detectors. When an agent reports success, Nemesis checks whether it actually verified the work β the tests, the files, the repository state β and reports the truth, with evidence.
- 20 detectors grounded in a documented failure-mode catalog
- Ships three ways: a CLI, a GitHub Action, and a pip package
- Tokenless OIDC publishing, CodeQL + dependency scanning, full test suite, green CI
I'm building Nemesis in the open partly to learn β the best way to understand how an AI-safety harness works is to build one that actually runs.
- Evidence over assertion β verify the real state, never trust the transcript
- Test-driven, small reviewable PRs, CI green before anything ships
- Learning in public, with clean docs and reproducible builds
Python Β· pytest Β· pre-commit Β· ruff Β· black Β· GitHub Actions Β· hatchling / packaging Β· Git
Going deep on Python and AI-safety tooling, and open to opportunities where rigor and correctness matter.
- LinkedIn β https://linkedin.com/in/luis-betancourt-39377b302
- Site β https://onslaughtgaming.carrd.co



