PITHOS is a Python/uv port of Anthropic's Mythos reference harness for Pi. It runs
authorized repository security research through the pi coding agent, combines
deterministic dependency advisory lookup with static exploit research, and can
optionally replay findings against a disposable live runtime.
dependency advisories -> threat model -> static scan -> autoresearch -> triage -> verification
Static review is read-only. PITHOS does not build, test, start services, run
migrations, or execute target application code unless live verification is
explicitly enabled with --execute-app and a safe runtime profile.
- Python 3.11+
uvpicoding agent access and provider credentials- Docker for the default static/runtime sandbox mode
- Git for local or GitHub repository sources
Optional integrations:
GH_TOKENorGITHUB_TOKENfor private HTTPS GitHub repositoriesFIRECRAWL_API_KEYfor public web/advisory context with--web- Superagent-started Daytona Computer Use for
--computer-use daytona
uv tool install git+https://github.com/superagent-ai/PITHOS.git# default provider is azure-openai-responses
export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_BASE_URL=https://<resource>.openai.azure.comOptional:
export GH_TOKEN=... # private HTTPS GitHub repos
export FIRECRAWL_API_KEY=... # optional web search
export PITHOS_PROVIDER=openai # any Pi provider
export PITHOS_MODEL=gpt-5.1
export PITHOS_SANDBOX_MODE=docker # docker (default) or local
export PITHOS_COMPUTER_USE=none # none (default) or daytonapithos doctor
pithos doctor --provider openai
pithos run /path/to/repo --model gpt-5.5
pithos run git@github.com:owner/repo.git --model gpt-5.5
pithos run /path/to/repo --provider anthropic --model claude-sonnet-4-5
pithos run /path/to/repo --provider openai --model gpt-5.1
pithos run /path/to/repo --provider amazon-bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0
pithos run /path/to/repo --provider custom-proxy --model my-model --pi-config-dir ~/.pi/agent
GH_TOKEN=... pithos run https://github.com/owner/private-repo --model gpt-5.5
pithos run /path/to/repo --model gpt-5.5 --no-advisories
FIRECRAWL_API_KEY=... pithos run git@github.com:owner/repo.git --model gpt-5.5 --web
pithos run /path/to/repo --model gpt-5.5 --sandbox-mode localStatic scan agents use --sandbox-mode docker by default. Use --sandbox-mode local only inside a disposable outer environment such as Daytona, an ephemeral
CI worker, or a throwaway VM; local mode runs pi directly and does not provide
PITHOS-managed process isolation.
PITHOS static runs are designed to preserve evidence while staying read-only:
DEPENDENCY-ADVISORIES.json/.mdrecord exact-version OSV matches.THREAT_MODEL.mdmaps trust boundaries, auth surfaces, data stores, and risky integrations.RESEARCH-BRIEF.mdandRESEARCH-SEEDS.jsonseed autoresearch with entry points, invariants, prior art, research targets, and negative space.INVARIANTS.json,EXECUTION-PATHS.json, andINVARIANT-ATTEMPTS.jsonmodel deeper security properties and supported or blocked evidence paths.EXPLOIT-PRIMITIVES.json,DAISY-CHAINS.json, andEXPLOIT-CHAINS.jsoncapture supported attacker capabilities and possible chains.run-summary.jsonincludes additiveautoresearch.coverageaccounting for covered targets, uncovered targets, blocked attempts, and structured harness plans.
Findings may include a structured recommended_verification object. This is a
static replay plan, not execution: it names the persona, setup, entrypoint,
payload ideas, expected vulnerable/safe signals, artifacts to capture, and
blockers for later runtime verification.
Live verification is opt-in. PITHOS first needs a runtime profile that describes
how to install/start/seed the target app and which env vars, personas, services,
mocks, and replay harnesses are safe to use. Local live verification defaults to
environment.sandbox: docker, so install/start/healthcheck and profile-declared
verifier commands run inside a Docker container, not on the host checkout.
Generate a starter profile:
pithos runtime init /path/to/repo
pithos runtime init /path/to/repo --dry-runThe initializer inspects package scripts, lockfiles, .cursor/environment.json,
.env.example, Supabase migrations, Dockerfile hints, seed scripts, and common
integrations. It writes variable names only. It intentionally ignores
.env.local.
Review .pithos/runtime.yaml, then export every required variable explicitly:
export NEXT_PUBLIC_SUPABASE_URL=...
export NEXT_PUBLIC_SUPABASE_ANON_KEY=...
export SUPABASE_SERVICE_ROLE_KEY=...
export PITHOS_ATTACKER_TOKEN=...
export PITHOS_VICTIM_TOKEN=...Run with live verification enabled:
pithos run /path/to/repo \
--provider google \
--model gemini-2.5-pro \
--execute-app \
--runtime-profile /path/to/repo/.pithos/runtime.yamlWhen live verification is enabled, PITHOS can:
- run deterministic source oracles for supported finding classes,
- probe route-shaped findings with an anonymous and persona-aware HTTP matrix,
- replay structured
recommended_verificationharness plans through the live agent or a profile-declaredverification.agent_command, - write per-finding replay artifacts such as
harness-plan.json,http-matrix.json,coverage.json, andreplay.sh, - record runtime evidence coverage in
verify/runtime-summary.json.
When Superagent has already started Daytona Computer Use and enabled recordings for the outer sandbox, PITHOS can require the live Pi verification agent to use the local Daytona Toolbox Computer Use API through a bundled Pi skill:
pithos run /path/to/repo \
--execute-app \
--sandbox-mode local \
--computer-use daytonaPITHOS loads pithos/skills/daytona-computer-use for the live Pi agent. The
skill documents http://127.0.0.1:2280/computeruse/... endpoints for status,
display/window inspection, screenshots, mouse/keyboard control, accessibility,
and recordings. PITHOS does not start Daytona, Xvfb, XFCE, x11vnc, or noVNC,
and it does not create wrappers, run /usr/local/bin/daytona, kill Daytona
processes, or restart desktop/computer-use services. The skill tells live agents
to avoid Daytona lifecycle operations, raw VNC/X11 tools, and stop/start/restart
Computer Use endpoints. Normal repo/app commands are still allowed, but desktop
evidence must come from the Toolbox API. Multimodal Pi models such as GPT-5.5
can inspect saved screenshot artifacts directly; agents should fall back to
textual UI evidence such as /computeruse/a11y/tree, DOM output, terminal logs,
and app responses when image inspection is unavailable or ambiguous.
If setup is incomplete, PITHOS writes verify/RUNTIME-SETUP.md with the missing
env vars, personas, mocks, or profile fields. Live verification does not load
.env.local or other env files; secrets must come from the process environment
or your secret manager.
stack:
- node
environment:
sandbox: docker
install: pnpm install
start: pnpm dev --hostname 0.0.0.0
healthcheck: http://127.0.0.1:3000/
seed: pnpm run seed
env:
EXAMPLE_API_KEY: env:EXAMPLE_API_KEY
required_env:
- EXAMPLE_API_KEY
- PITHOS_ATTACKER_TOKEN
- PITHOS_VICTIM_TOKEN
personas:
attacker:
auth_header_env: PITHOS_ATTACKER_TOKEN
victim:
auth_header_env: PITHOS_VICTIM_TOKEN
mocks:
stripe: true
verification:
execute_app: true
coverage:
record_unverified_surfaces: true
capture_request_response_artifacts: true
harness:
write_replay_artifacts: true
prefer_persona_matrix: true
notes: Use mocked external services only.Artifacts are written under results/<repo>/<timestamp>/:
THREAT_MODEL.mdDEPENDENCY-ADVISORIES.jsonDEPENDENCY-ADVISORIES.mdRESEARCH-BRIEF.mdRESEARCH-SEEDS.jsonINVARIANTS.jsonEXECUTION-PATHS.jsonINVARIANT-ATTEMPTS.jsonRESEARCH-HYPOTHESES.jsonRESEARCH-ATTEMPTS.jsonEXPLOIT-PRIMITIVES.jsonDAISY-PRIMITIVES.jsonCHAIN-CANDIDATES.jsonDAISY-CHAINS.jsonEXPLOIT-CHAINS.jsonRESEARCH-FINDINGS.jsonVULN-FINDINGS.jsonVULN-FINDINGS.mdTRIAGE.jsonTRIAGE.mdRUN.mdrun-summary.jsonverify/environment-summary.jsonverify/RUNTIME-SETUP.mdwhen live setup is incompleteverify/runtime-summary.jsonverify/<finding-id>/plan.jsonverify/<finding-id>/probe.jsonverify/<finding-id>/verdict.jsonverify/<finding-id>/VERDICT.mdverify/<finding-id>/http-matrix.jsonfor HTTP/API matrix probesverify/<finding-id>/harness-plan.json,coverage.json, andreplay.shwhen structured harness replay is availableverify/<finding-id>/live-agent-transcript.jsonlwhen the Pi live-agent path is usedverify/<finding-id>/agent-command.jsonwhenverification.agent_commandis used
Each VULN-FINDINGS.json finding includes a github_advisory object shaped as
a GitHub repository security advisory draft, with summary, description,
severity, cwe_ids, vulnerabilities, and optional CVE/CVSS/credit fields.
- Target repositories are mounted read-only in Docker mode. Local sandbox mode uses a disposable copied workspace and relies on outer isolation.
- GitHub URL sources use full clones, not shallow clones.
- PITHOS passes
--providerand--modelthrough to Pi. - Private GitHub tokens are not written into remotes or summaries.
- Dependency advisories are checked by default through OSV for exact versions PITHOS can identify; use
--no-advisoriesto skip this stage. - Firecrawl is installed only for
--webruns and is optional model web context. Advisory lookup does not depend on Firecrawl. - Static-only runs report verification as
completed_static_onlywhen runtime probes are intentionally disabled. - Live verification runs target code only in a disposable copied workspace, never in the original checkout.