feat: improve llm-security skill score from 79% to 97% by yogesh-tessl · Pull Request #26 · semgrep/skills

yogesh-tessl · 2026-05-13T08:34:17Z

this is really impressive work. 204 stars on a skills repo says a lot. Clearly people are finding value in it. I especially like the split between code-security and llm-security; it keeps the focus sharp instead of mixing very different concerns together. Having a dedicated skill-build package is another strong signal that you’re treating skill quality seriously, not as an afterthought.

ran your skills through tessl skill review at work and found some targeted improvements for llm-security. Here's the full before/after:

Skill	Before	After	Change
llm-security	79%	97%	+18%

What changed in llm-security

Description: Added concrete action verbs (identifies, audits, recommends, flags) so agents know what the skill does, not just when to use it — specificity went from 2/3 → 3/3
Removed redundant sections: Dropped the "Key Principles" section (generic security principles Claude already knows) and "Quick Reference" table (duplicated the Categories list) to cut token waste
Streamlined intro: Replaced the verbose "How to Use This Skill" proactive/reactive explanation with a one-liner overview - the workflow section already covers this
Added inline code examples: Two vulnerable → secure Python pairs for the two most critical categories (Prompt Injection LLM01, Excessive Agency LLM06) so the skill is immediately actionable without needing to read sub-files
Explicit verification step: Workflow step 5 now specifies exactly what to grep for (string-concatenated prompts, unguarded tool calls, unsanitized output, secrets in system prompts)

also stress-tested your llm-security skill against a few real-world task evals and it held up really well on detecting unguarded tool-call patterns in agentic LangChain pipelines. Kudos for that.

quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.

if you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.

@DrewDennison

Hey @DrewDennison 👋 I ran your skills through `tessl skill review` at work and found some targeted improvements for `llm-security`. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | llm-security | 79% | 97% | +18% | | code-security | 80% | 80% | — | | semgrep | 90% | 90% | — | <details> <summary>What changed in <code>llm-security</code></summary> - **Description**: Added concrete action verbs (identifies, audits, recommends, flags) so agents know *what* the skill does, not just *when* to use it — specificity went from 2/3 → 3/3 - **Removed redundant sections**: Dropped the "Key Principles" section (generic security principles Claude already knows) and "Quick Reference" table (duplicated the Categories list) to cut token waste - **Streamlined intro**: Replaced the verbose "How to Use This Skill" proactive/reactive explanation with a one-liner overview — the workflow section already covers this - **Added inline code examples**: Two vulnerable → secure Python pairs for the two most critical categories (Prompt Injection LLM01, Excessive Agency LLM06) so the skill is immediately actionable without needing to read sub-files - **Explicit verification step**: Workflow step 5 now specifies exactly what to grep for (string-concatenated prompts, unguarded tool calls, unsanitized output, secrets in system prompts) </details> I also stress-tested your `llm-security` skill against a few real-world task evals and it held up really well on detecting unguarded tool-call patterns in agentic LangChain pipelines. Kudos for that. Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags. Thanks in advance 🙏

CLAassistant · 2026-05-13T08:34:28Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

semgrep-zcs-prod-semgrep · 2026-05-13T08:38:09Z

 ---
 name: llm-security
-description: "Security guidelines for LLM applications based on OWASP Top 10 for LLM 2025. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
+description: "Identifies and mitigates LLM vulnerabilities — prompt injection, insecure output handling, data poisoning, excessive agency — based on OWASP Top 10 for LLM 2025. Audits LLM code for security risks, recommends secure patterns, and flags vulnerable ones with fix examples. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."


Semgrep identified an issue in your code:
Possibly found usage of AI: OpenAI

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by detect-generic-ai-oai.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

semgrep-zcs-prod-semgrep · 2026-05-13T08:38:10Z

 ---
 name: llm-security
-description: "Security guidelines for LLM applications based on OWASP Top 10 for LLM 2025. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
+description: "Identifies and mitigates LLM vulnerabilities — prompt injection, insecure output handling, data poisoning, excessive agency — based on OWASP Top 10 for LLM 2025. Audits LLM code for security risks, recommends secure patterns, and flags vulnerable ones with fix examples. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."


Semgrep identified an issue in your code:
Possibly found usage of AI: Anthropic

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by detect-generic-ai-anthprop.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

semgrep-zcs-prod-semgrep · 2026-05-13T08:38:10Z

Semgrep found 1 detect-generic-ai-oai finding:

skills/llm-security/SKILL.md
- L3 - Triage

Possibly found usage of AI: OpenAI

Semgrep found 1 detect-generic-ai-anthprop finding:

skills/llm-security/SKILL.md
- L3 - Triage

Possibly found usage of AI: Anthropic

semgrep-zcs-prod-semgrep Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve llm-security skill score from 79% to 97%#26

feat: improve llm-security skill score from 79% to 97%#26
yogesh-tessl wants to merge 1 commit into
semgrep:mainfrom
yogesh-tessl:improve/skill-review-optimization

yogesh-tessl commented May 13, 2026

Uh oh!

CLAassistant commented May 13, 2026

Uh oh!

semgrep-zcs-prod-semgrep Bot May 13, 2026

Uh oh!

semgrep-zcs-prod-semgrep Bot May 13, 2026

Uh oh!

semgrep-zcs-prod-semgrep Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yogesh-tessl commented May 13, 2026

Uh oh!

CLAassistant commented May 13, 2026

Uh oh!

semgrep-zcs-prod-semgrep Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

semgrep-zcs-prod-semgrep Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

semgrep-zcs-prod-semgrep Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants