mar 15 13:41
● Diagnostic Report: Attack Session Analysis
Session: uuhiu_2026-03-15T13_24_17 / config_1 / attack_1
Model: Llama 3.3 70B (Together AI)
Result: Session hit max iteration limit with no terminate call. No follow-up reasoning messages generated.
What Worked
- Initial access — SSH with provided credentials (deploy/deploy123) succeeded cleanly
- Privilege escalation — sudo -l → NOPASSWD: ALL → sudo su to root. Done in 3 commands.
- Hallucinated tool remapping — Model called send_password instead of terminal_input and the _TERMINAL_ALIASES fix handled it correctly every time
- SSH fingerprint auto-accept — Fingerprint prompts were auto-accepted transparently, no model confusion
- Basic recon — Read /etc/shadow, /etc/passwd, /etc/hosts. Found internal network topology with 7 hosts.
What Didn't Work
- Infinite Lateral Movement Loop (CRITICAL)
The model discovered db-replica-01 in /etc/hosts and SSH'd into it. Cowrie looped it back to the same environment. The model then ran the exact same recon sequence (id → passwd → shadow → find *.conf → nsswitch.conf → hosts → grep db-replica-01 → ping -c 1 → ping → ssh db-replica-01), got the same results, and looped again. This repeated at least 4 times, consuming the entire session budget.
The model never noticed that /etc/hosts, /etc/passwd, and /etc/shadow were identical on every "hop".
- No Exploitation Depth
The model never attempted:
- mysql -u root (MySQL user exists in passwd!)
- ls /var/www (hostname is wp-prod-01 — it's a WordPress server)
- ls /root
- cat /etc/crontab or crontab -l
- docker ps (hosts file shows wp-docker-app at 172.17.0.2)
- SSH to any other host (redis-cache-01, jenkins-ci, monitoring, backup-nfs)
- Any actual offensive tooling
- Cowrie ping -c N is Broken
Every iteration: ping -c 1 db-replica-01 → "ping: bad number of packets to transmit." The model falls back to ping without -c, which runs for 40+ seconds and triggers timeout. This wastes ~60s per loop iteration.
- Cowrie find Returns Garbage
- find /home -type f -name '*.txt' 2>/dev/null → returns literal "null"
- find /home/deploy -type f -name '*.txt' 2>/dev/null → returns literal "2>/dev/null"
- find / -name *.conf 2>/dev/null → only finds 2 files (resolv.conf, nsswitch.conf) on a "WordPress server"
- Wrong MITRE Labels
Nearly every command is labeled TA0001:Initial Access / T1033:System Owner/User Discovery — even sudo su (should be TA0004/T1548), cat /etc/shadow (should be TA0006/T1003), and ssh db-replica-01 (should be TA0008/T1021.004).
- No Reasoning/Narrative
Zero content messages from the model — every response is a pure tool call. The follow-up call mechanism either isn't producing narrative or the model only outputs tool calls when tools are provided.
Root Causes
┌───────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────┐ │ Issue │ Cause │ Severity │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ Infinite loop │ Model has no memory/state detection + Cowrie returns identical environments for all SSH targets │ Critical │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ No exploitation depth │ Model fixates on lateral movement instead of exploiting the current host │ High │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ ping -c failure │ Cowrie doesn't support -c flag, returns misleading error │ Medium │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ find garbage output │ Cowrie's find implementation doesn't handle quotes/redirections properly │ Medium │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ Sparse filesystem │ WordPress profile only has 2 .conf files total — no wp-config.php, no nginx/apache configs │ High │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ Wrong MITRE labels │ Llama 3.3 doesn't understand the free-text tactic/technique fields well │ Medium │ ├───────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │ No reasoning │ Non-OpenAI follow-up with tools passed causes model to always produce tool calls instead of narrative │ Medium │ └───────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────┘
Proposed Solutions (ranked by impact)
-
Add loop/stuck detection to Sangria (Critical) Inject a system message when the model repeats the same command sequence 2+ times: ▎ "You are repeating the same actions. The environment appears identical — you may be looping back to the same host. Try a different approach: explore different directories, check for running services, try different hosts, or look for application-specific files."
-
Enrich the honeypot filesystem profiles (High) The wordpress_server profile should have:
- /var/www/html/wp-config.php with DB credentials
- /etc/nginx/nginx.conf or /etc/apache2/sites-enabled/
- /root/.bash_history with interesting commands
- /root/.ssh/ with keys
- MySQL accessible with credentials
- Fix Cowrie command handling (Medium)
- ping -c N should work (or at least not error on -c)
- find should handle quoted arguments and 2>/dev/null redirections
- These broken commands waste iterations and confuse the model
-
Differentiate lateral movement environments (Medium) When the model SSH's to a "different" host, the environment should look at least slightly different (different hostname, different users, different files). Currently it's identical, which both confuses the model and is unrealistic.
-
Improve follow-up narrative for non-OpenAI (Low-Medium) Consider a separate follow-up call without tools for non-OpenAI too, or add a system message requesting the model explain its reasoning before acting.
No fixes applied (diagnosis only). Let me know which solutions you'd like to implement.