Skip to content

feat(agent-scan): add jailbreak detection support (#303)#346

Closed
boy-hack wants to merge 1 commit intomainfrom
feat/issue-303-agent-jailbreak
Closed

feat(agent-scan): add jailbreak detection support (#303)#346
boy-hack wants to merge 1 commit intomainfrom
feat/issue-303-agent-jailbreak

Conversation

@boy-hack
Copy link
Copy Markdown
Collaborator

Summary

Closes #303

This PR adds optional jailbreak detection to the Agent Scan workflow, and implements the standalone ModelJailbreak task type.

Changes

common/websocket/api.go

  • Add jailbreak bool field to AgentScanTaskRequest
  • Pass the flag through to task params
  • Update Swagger docs with agent_scan example including jailbreak field
  • Update task type description to include Agent Scan (was missing)

common/agent/agent_task.go

  • Add Jailbreak bool to AgentScanParams
  • When jailbreak=true, add an extra "Jailbreak Detection" step to the plan
  • After agent scan completes successfully, invoke AIG-PromptSecurity/cli_run.py with Custom:prompt scenario
  • Jailbreak step failure is non-fatal — agent scan results remain valid

common/agent/jailbreak_task.go (new)

  • Implement ModelJailbreak struct with GetName() and Execute()
  • Standalone jailbreak task: accepts model.{model,token,base_url} + prompt
  • Invokes AIG-PromptSecurity/cli_run.py with Raw technique, serial choice

cmd/agent/main.go

  • Register ModelJailbreak handler alongside existing task handlers

Notes

⚠️ This PR has a minor overlap with PR #330 (api.go). If #330 merges first, this branch will need a git rebase origin/main to resolve the conflict (trivial — different struct fields).

Test

# Agent scan with jailbreak detection enabled
curl -X POST http://localhost:8080/api/v1/app/taskapi/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "type": "agent_scan",
    "content": {
      "agent_id": "my-agent",
      "eval_model": {"model": "gpt-4", "token": "sk-xxx", "base_url": "https://api.openai.com/v1"},
      "language": "en",
      "jailbreak": true
    }
  }'

…Jailbreak task

- Add Jailbreak bool field to AgentScanTaskRequest in api.go
- Pass jailbreak param through to agent task execution
- Update AgentTask.Execute() to optionally run AIG-PromptSecurity
  after agent scan completes when jailbreak=true
- Implement ModelJailbreak task struct (closes #303 partially)
- Register ModelJailbreak handler in cmd/agent/main.go
- Update Swagger docs to include agent_scan example with jailbreak field

Resolves #303
@boy-hack boy-hack closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Agent Scan — add jailbreak detection capability

1 participant