Skip to content

superagent-ai/PITHOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PITHOS

PITHOS is a Python/uv port of Anthropic's Mythos reference harness for Pi. It runs authorized repository security research through the pi coding agent, combines deterministic dependency advisory lookup with static exploit research, and can optionally replay findings against a disposable live runtime.

dependency advisories -> threat model -> static scan -> autoresearch -> triage -> verification

Static review is read-only. PITHOS does not build, test, start services, run migrations, or execute target application code unless live verification is explicitly enabled with --execute-app and a safe runtime profile.

Prerequisites

  • Python 3.11+
  • uv
  • pi coding agent access and provider credentials
  • Docker for the default static/runtime sandbox mode
  • Git for local or GitHub repository sources

Optional integrations:

  • GH_TOKEN or GITHUB_TOKEN for private HTTPS GitHub repositories
  • FIRECRAWL_API_KEY for public web/advisory context with --web
  • Superagent-started Daytona Computer Use for --computer-use daytona

Install

uv tool install git+https://github.com/superagent-ai/PITHOS.git

Configure

# default provider is azure-openai-responses
export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_BASE_URL=https://<resource>.openai.azure.com

Optional:

export GH_TOKEN=...              # private HTTPS GitHub repos
export FIRECRAWL_API_KEY=...     # optional web search
export PITHOS_PROVIDER=openai    # any Pi provider
export PITHOS_MODEL=gpt-5.1
export PITHOS_SANDBOX_MODE=docker # docker (default) or local
export PITHOS_COMPUTER_USE=none   # none (default) or daytona

Run

pithos doctor
pithos doctor --provider openai
pithos run /path/to/repo --model gpt-5.5
pithos run git@github.com:owner/repo.git --model gpt-5.5
pithos run /path/to/repo --provider anthropic --model claude-sonnet-4-5
pithos run /path/to/repo --provider openai --model gpt-5.1
pithos run /path/to/repo --provider amazon-bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0
pithos run /path/to/repo --provider custom-proxy --model my-model --pi-config-dir ~/.pi/agent
GH_TOKEN=... pithos run https://github.com/owner/private-repo --model gpt-5.5
pithos run /path/to/repo --model gpt-5.5 --no-advisories
FIRECRAWL_API_KEY=... pithos run git@github.com:owner/repo.git --model gpt-5.5 --web
pithos run /path/to/repo --model gpt-5.5 --sandbox-mode local

Static scan agents use --sandbox-mode docker by default. Use --sandbox-mode local only inside a disposable outer environment such as Daytona, an ephemeral CI worker, or a throwaway VM; local mode runs pi directly and does not provide PITHOS-managed process isolation.

Static Research

PITHOS static runs are designed to preserve evidence while staying read-only:

  • DEPENDENCY-ADVISORIES.json / .md record exact-version OSV matches.
  • THREAT_MODEL.md maps trust boundaries, auth surfaces, data stores, and risky integrations.
  • RESEARCH-BRIEF.md and RESEARCH-SEEDS.json seed autoresearch with entry points, invariants, prior art, research targets, and negative space.
  • INVARIANTS.json, EXECUTION-PATHS.json, and INVARIANT-ATTEMPTS.json model deeper security properties and supported or blocked evidence paths.
  • EXPLOIT-PRIMITIVES.json, DAISY-CHAINS.json, and EXPLOIT-CHAINS.json capture supported attacker capabilities and possible chains.
  • run-summary.json includes additive autoresearch.coverage accounting for covered targets, uncovered targets, blocked attempts, and structured harness plans.

Findings may include a structured recommended_verification object. This is a static replay plan, not execution: it names the persona, setup, entrypoint, payload ideas, expected vulnerable/safe signals, artifacts to capture, and blockers for later runtime verification.

Live Verification

Live verification is opt-in. PITHOS first needs a runtime profile that describes how to install/start/seed the target app and which env vars, personas, services, mocks, and replay harnesses are safe to use. Local live verification defaults to environment.sandbox: docker, so install/start/healthcheck and profile-declared verifier commands run inside a Docker container, not on the host checkout.

Generate a starter profile:

pithos runtime init /path/to/repo
pithos runtime init /path/to/repo --dry-run

The initializer inspects package scripts, lockfiles, .cursor/environment.json, .env.example, Supabase migrations, Dockerfile hints, seed scripts, and common integrations. It writes variable names only. It intentionally ignores .env.local.

Review .pithos/runtime.yaml, then export every required variable explicitly:

export NEXT_PUBLIC_SUPABASE_URL=...
export NEXT_PUBLIC_SUPABASE_ANON_KEY=...
export SUPABASE_SERVICE_ROLE_KEY=...
export PITHOS_ATTACKER_TOKEN=...
export PITHOS_VICTIM_TOKEN=...

Run with live verification enabled:

pithos run /path/to/repo \
  --provider google \
  --model gemini-2.5-pro \
  --execute-app \
  --runtime-profile /path/to/repo/.pithos/runtime.yaml

When live verification is enabled, PITHOS can:

  • run deterministic source oracles for supported finding classes,
  • probe route-shaped findings with an anonymous and persona-aware HTTP matrix,
  • replay structured recommended_verification harness plans through the live agent or a profile-declared verification.agent_command,
  • write per-finding replay artifacts such as harness-plan.json, http-matrix.json, coverage.json, and replay.sh,
  • record runtime evidence coverage in verify/runtime-summary.json.

When Superagent has already started Daytona Computer Use and enabled recordings for the outer sandbox, PITHOS can require the live Pi verification agent to use the local Daytona Toolbox Computer Use API through a bundled Pi skill:

pithos run /path/to/repo \
  --execute-app \
  --sandbox-mode local \
  --computer-use daytona

PITHOS loads pithos/skills/daytona-computer-use for the live Pi agent. The skill documents http://127.0.0.1:2280/computeruse/... endpoints for status, display/window inspection, screenshots, mouse/keyboard control, accessibility, and recordings. PITHOS does not start Daytona, Xvfb, XFCE, x11vnc, or noVNC, and it does not create wrappers, run /usr/local/bin/daytona, kill Daytona processes, or restart desktop/computer-use services. The skill tells live agents to avoid Daytona lifecycle operations, raw VNC/X11 tools, and stop/start/restart Computer Use endpoints. Normal repo/app commands are still allowed, but desktop evidence must come from the Toolbox API. Multimodal Pi models such as GPT-5.5 can inspect saved screenshot artifacts directly; agents should fall back to textual UI evidence such as /computeruse/a11y/tree, DOM output, terminal logs, and app responses when image inspection is unavailable or ambiguous.

If setup is incomplete, PITHOS writes verify/RUNTIME-SETUP.md with the missing env vars, personas, mocks, or profile fields. Live verification does not load .env.local or other env files; secrets must come from the process environment or your secret manager.

Runtime Profile Example

stack:
  - node
environment:
  sandbox: docker
  install: pnpm install
  start: pnpm dev --hostname 0.0.0.0
  healthcheck: http://127.0.0.1:3000/
seed: pnpm run seed
env:
  EXAMPLE_API_KEY: env:EXAMPLE_API_KEY
required_env:
  - EXAMPLE_API_KEY
  - PITHOS_ATTACKER_TOKEN
  - PITHOS_VICTIM_TOKEN
personas:
  attacker:
    auth_header_env: PITHOS_ATTACKER_TOKEN
  victim:
    auth_header_env: PITHOS_VICTIM_TOKEN
mocks:
  stripe: true
verification:
  execute_app: true
  coverage:
    record_unverified_surfaces: true
    capture_request_response_artifacts: true
  harness:
    write_replay_artifacts: true
    prefer_persona_matrix: true
  notes: Use mocked external services only.

Output

Artifacts are written under results/<repo>/<timestamp>/:

  • THREAT_MODEL.md
  • DEPENDENCY-ADVISORIES.json
  • DEPENDENCY-ADVISORIES.md
  • RESEARCH-BRIEF.md
  • RESEARCH-SEEDS.json
  • INVARIANTS.json
  • EXECUTION-PATHS.json
  • INVARIANT-ATTEMPTS.json
  • RESEARCH-HYPOTHESES.json
  • RESEARCH-ATTEMPTS.json
  • EXPLOIT-PRIMITIVES.json
  • DAISY-PRIMITIVES.json
  • CHAIN-CANDIDATES.json
  • DAISY-CHAINS.json
  • EXPLOIT-CHAINS.json
  • RESEARCH-FINDINGS.json
  • VULN-FINDINGS.json
  • VULN-FINDINGS.md
  • TRIAGE.json
  • TRIAGE.md
  • RUN.md
  • run-summary.json
  • verify/environment-summary.json
  • verify/RUNTIME-SETUP.md when live setup is incomplete
  • verify/runtime-summary.json
  • verify/<finding-id>/plan.json
  • verify/<finding-id>/probe.json
  • verify/<finding-id>/verdict.json
  • verify/<finding-id>/VERDICT.md
  • verify/<finding-id>/http-matrix.json for HTTP/API matrix probes
  • verify/<finding-id>/harness-plan.json, coverage.json, and replay.sh when structured harness replay is available
  • verify/<finding-id>/live-agent-transcript.jsonl when the Pi live-agent path is used
  • verify/<finding-id>/agent-command.json when verification.agent_command is used

Each VULN-FINDINGS.json finding includes a github_advisory object shaped as a GitHub repository security advisory draft, with summary, description, severity, cwe_ids, vulnerabilities, and optional CVE/CVSS/credit fields.

Notes

  • Target repositories are mounted read-only in Docker mode. Local sandbox mode uses a disposable copied workspace and relies on outer isolation.
  • GitHub URL sources use full clones, not shallow clones.
  • PITHOS passes --provider and --model through to Pi.
  • Private GitHub tokens are not written into remotes or summaries.
  • Dependency advisories are checked by default through OSV for exact versions PITHOS can identify; use --no-advisories to skip this stage.
  • Firecrawl is installed only for --web runs and is optional model web context. Advisory lookup does not depend on Firecrawl.
  • Static-only runs report verification as completed_static_only when runtime probes are intentionally disabled.
  • Live verification runs target code only in a disposable copied workspace, never in the original checkout.

About

A port of Anthropic's Mythos reference harness for Pi.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages