Release v0.0.2 — Massive Template Expansion · wuyoscar/ISC-Bench

ISC-Bench v0.0.2

Highlights

77 templates across 9 domains (was 57)
309 prompt variants — English, Chinese, extreme, zero-shot
28 confirmed ISC models (was 26) — added GLM-4.7, GLM-4.6
100% trigger rate on Qwen3 Coder (309/309)

New Templates (+20)

AI/ML (+16): sentiment, toxigen, phishing, spambot, malware, openai_detector, fraud, darkweb, pii, clickbait, medical_ner, wildguard, emotion, fake_news, sarcasm, propaganda, code_vuln
Cyber (+1): nids (IDS evasion)
Epi (+1): biosurveillance (CBRN threat assessment)
Media (+1): disinfo (information operations)

Prompt Variants

prompt.txt — standard (one-shot)
prompt_ex.txt — extreme anchor
prompt_zh.txt / prompt_zh_ex.txt — Chinese localized (33 templates)
prompt_v2.txt — strengthened variants
prompt_v3_zeroshot.txt — untargeted (all ???, zero-shot)

Infrastructure

Folder rename: domain_tool_descriptor → domain_tool
MIN_WORDS raised to 50 across all templates
PROBE_REQUEST_PATTERNS removed from LlamaGuard variants
scripts/batch_test_templates.py — parallel batch testing (10 workers)
Leaderboard chart auto-generation
Demo video embedded in README
README: ISC definition, evaluation settings (TVD-Single / TVD-ICL / TVD-Agent)

New ISC Cases

#64 GLM-4.7 — toxin biosynthesis, nerve agent docking, radiological dispersal
#65 GLM-4.6 — 6/7 templates triggered, poisoning data, VX, dirty bomb
Both verified by GPT-5.2 biosecurity/toxicology expert judge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.2 — Massive Template Expansion

Choose a tag to compare

Sorry, something went wrong.