Security Sweep #6

Workflow file for this run

.github/workflows/security-sweep.yml at de3cac3

	name: Security Sweep

	on:
	schedule:
	# Every Saturday at 6:00 UTC
	- cron: "0 6 * * 6"
	workflow_dispatch:
	inputs:
	full_sweep:
	description: "Run full sweep (ignores baseline, reports everything)"
	type: boolean
	default: false

	concurrency:
	group: security-sweep
	cancel-in-progress: false

	jobs:
	sweep:
	runs-on: blacksmith-4vcpu-ubuntu-2404-arm
	timeout-minutes: 30
	permissions:
	contents: read
	issues: write

	steps:
	- name: Checkout repository
	uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
	with:
	fetch-depth: 0

	- name: Set up Python
	uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
	with:
	python-version: "3.12"

	- name: Install dependencies
	run: pip install -e ".[dev]"

	- name: Generate app token
	id: app-token
	uses: actions/create-github-app-token@d72941d797fd3113feb6b93fd0dec494b13a2547 # v1
	with:
	app-id: ${{ secrets.APP_ID }}
	private-key: ${{ secrets.APP_PRIVATE_KEY }}

	- name: Run Security Sweep
	uses: anthropics/claude-code-action@9469d113c6afd29550c402740f22d1a97dd1209b # v1
	with:
	claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
	github_token: ${{ steps.app-token.outputs.token }}
	model: claude-sonnet-4-6
	claude_args: '--max-turns 80 --allowedTools "Bash(git diff:),Bash(git log:),Bash(git show:),Bash(git checkout:),Bash(git add:),Bash(git commit:),Bash(git push:),Bash(git rev-parse:),Bash(git branch:),Bash(gh pr:),Bash(gh issue:),Bash(gh api:),Bash(python3 security/),Bash(cat:),Bash(grep:),Bash(wc:),Read,Glob,Grep,Write"'
	prompt: \|
	You are a security auditor performing a full adversarial sweep of the Edictum Console codebase. This is a security product — a self-hostable agent operations console. A single vulnerability doesn't create a bug; it destroys the credibility of a startup that sells trust.

	Think like an attacker. Be paranoid. Be thorough.

	## Context

	Read these files first:
	- `CLAUDE.md` — architecture, security boundaries S1-S8, coding standards
	- `security/baseline.json` — known findings from previous audits
	- `SDK_COMPAT.md` — API contract

	Full sweep mode: ${{ inputs.full_sweep \|\| 'false' }}
	If full sweep is false, only report NEW findings not already in baseline.json.

	## Audit scope — 8 attack surfaces

	Systematically audit each attack surface. For each, read ALL relevant source files (not just samples).

	### 1. Authentication & Session Security (S1, S2, S7, S8)

	Read: `auth/local.py`, `auth/api_keys.py`, `routes/auth.py`, `routes/setup.py`, `config.py`

	Attack questions:
	- Can I forge, replay, or extend a session token?
	- Is session data integrity-protected (signed/HMAC'd) or plain JSON in Redis?
	- Can I brute-force login? What are the rate limits? Can I bypass them with IP spoofing?
	- Can I re-run bootstrap after an admin exists? From multiple concurrent requests?
	- Are API key hashes timing-safe? Can I enumerate valid key prefixes?
	- Is `EDICTUM_SECRET_KEY` actually used for anything?

	### 2. Tenant Isolation (S3) — highest priority

	Read: ALL files in `routes/` and `services/`. Every single one.

	Attack questions:
	- Does every `select()`, `update()`, `delete()` have a `tenant_id` filter?
	- Can I access tenant B's resources by manipulating IDs in requests?
	- Do error messages (404 vs 403) reveal resource existence in other tenants?
	- Do list endpoints leak cross-tenant counts or metadata?
	- Do webhook handlers scope queries by `tenant_id`?
	- Can I manipulate headers (`X-Tenant-Id`, `X-Forwarded-For`) to switch context?

	### 3. Approval & Governance Integrity (S4)

	Read: `services/approval_service.py`, `routes/approvals.py`

	Attack questions:
	- Can I approve a request after it has logically expired (race the timeout worker)?
	- Can I spoof `decided_by` or `agent_id` via request body?
	- Can I replay an old approval decision?
	- Are state transitions atomic (UPDATE WHERE status='pending' RETURNING)?
	- Can I create an approval for another tenant's agent?

	### 4. Input Validation & DoS

	Read: ALL files in `schemas/`. Check every `str` and `list` field.

	Attack questions:
	- Which `str` fields lack `max_length`? (each is a DoS vector)
	- Which `list` fields lack `max_length`? (batch endpoint memory bomb)
	- Is there a global request body size limit?
	- Can I send null bytes, YAML bombs, deeply nested JSON?
	- Are email fields validated with `EmailStr`?

	### 5. Secrets & Cryptography

	Read: `auth/local.py`, `services/signing_service.py`, `services/notification_service.py`, `config.py`

	Attack questions:
	- Are all secret comparisons timing-safe (`hmac.compare_digest`)?
	- Is session data HMAC-signed or plain text in Redis?
	- Is the same encryption key used for multiple purposes (signing keys, notification configs, AI keys)?
	- Are there plaintext fallback paths for encrypted data?
	- Is the secret key minimum length enforced?

	### 6. SSRF & Outbound Requests

	Read: `services/notification_service.py`, `services/ai_service.py`, `services/channel_test_helpers.py`, `notifications/*.py`

	Attack questions:
	- Do all outbound HTTP calls use `SafeTransport`?
	- Can I set `base_url` to `http://169.254.169.254` (cloud metadata)?
	- Is there a DNS rebinding gap between URL validation and actual request?
	- Can I use the channel test endpoint as an SSRF proxy?

	### 7. SSE & Real-time (S5)

	Read: `routes/stream.py`, `push/manager.py`

	Attack questions:
	- Can I connect without valid auth? With a revoked API key?
	- After reconnection, could I receive events from a different tenant?
	- Is there a per-tenant connection limit? Can I exhaust server resources?
	- Does the SSE endpoint return 401 (not 422) for missing auth?

	### 8. Infrastructure & Configuration

	Read: `docker-compose.yml`, `docker-entrypoint.sh`, `Dockerfile`, `main.py`, `db/engine.py`

	Attack questions:
	- Does Redis require a password?
	- Are security headers set (HSTS, CSP, X-Frame-Options)?
	- Does uvicorn have concurrency/timeout limits?
	- Is the DB connection pool configured or using defaults?
	- Does Postgres use a non-superuser account?
	- Are Docker networks isolated?
	- Are Python dependencies pinned?

	## Output

	### Step 1: Write findings to /tmp/sweep-report.md

	Format:
	```
	# Security Sweep Report — [date]

	## New Findings (not in baseline)

	### [SEVERITY] Finding ID — Short title
	File: path:line
	Attack: How an attacker would exploit this
	Fix: Concrete code change
	Effort: estimated minutes

	## Regressions (fixed findings that reappeared)

	## Baseline Findings Still Open

	\| ID \| Severity \| Status \| Issue \| Description \|
	\|...

	## What's Working Well
	- List of security controls verified as correct
	```

	### Step 2: Create a GitHub issue with the report

	Only if there are NEW findings or regressions:

	```bash
	gh issue create \
	--title "Security Sweep [date]: [N] new findings" \
	--label "security" \
	--body-file /tmp/sweep-report.md
	```

	If clean (no new findings, no regressions):
	```bash
	echo "Security sweep clean — no new findings or regressions."
	```

	### Step 3: Update baseline.json

	For any finding in baseline.json with status "fix-planned" that you verified is actually fixed in the current code:
	```bash
	python3 security/manage-baseline.py fix <FINDING_ID> --commit $(git rev-parse HEAD)
	```

	For any new finding not in baseline.json, add it:
	```bash
	python3 security/manage-baseline.py add <ID> --severity <level> --file <path> --description "<desc>"
	```

	If baseline.json was modified, create a PR (never push directly to main):
	```bash
	git checkout -b chore/security-sweep-$(date +%Y%m%d)
	git add security/baseline.json
	git commit -m "chore: update security baseline from weekly sweep"
	git push -u origin HEAD
	gh pr create --title "chore: update security baseline" --body "Automated update from weekly security sweep." --label "security"
	```

	## Rules
	- Read EVERY file in the relevant directories, not just samples
	- Compare every finding against baseline.json before reporting
	- A finding already in baseline with status "fix-planned" is NOT new — skip it
	- A finding in baseline with status "fixed" that still exists IS a regression — flag it loudly
	- Be thorough but don't manufacture findings — if the code is secure, say so
	- Include concrete exploit scenarios, not just "this could be a problem"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Sweep #6

Workflow file

Security Sweep #6

Uh oh!

Workflow file for this run