Skip to content

feat: improve llm-security skill score from 79% to 97%#26

Open
yogesh-tessl wants to merge 1 commit into
semgrep:mainfrom
yogesh-tessl:improve/skill-review-optimization
Open

feat: improve llm-security skill score from 79% to 97%#26
yogesh-tessl wants to merge 1 commit into
semgrep:mainfrom
yogesh-tessl:improve/skill-review-optimization

Conversation

@yogesh-tessl
Copy link
Copy Markdown

Hey @DrewDennison 👋

this is really impressive work. 204 stars on a skills repo says a lot. Clearly people are finding value in it. I especially like the split between code-security and llm-security; it keeps the focus sharp instead of mixing very different concerns together. Having a dedicated skill-build package is another strong signal that you’re treating skill quality seriously, not as an afterthought.

ran your skills through tessl skill review at work and found some targeted improvements for llm-security. Here's the full before/after:

Skill Before After Change
llm-security 79% 97% +18%
What changed in llm-security
  • Description: Added concrete action verbs (identifies, audits, recommends, flags) so agents know what the skill does, not just when to use it — specificity went from 2/3 → 3/3
  • Removed redundant sections: Dropped the "Key Principles" section (generic security principles Claude already knows) and "Quick Reference" table (duplicated the Categories list) to cut token waste
  • Streamlined intro: Replaced the verbose "How to Use This Skill" proactive/reactive explanation with a one-liner overview - the workflow section already covers this
  • Added inline code examples: Two vulnerable → secure Python pairs for the two most critical categories (Prompt Injection LLM01, Excessive Agency LLM06) so the skill is immediately actionable without needing to read sub-files
  • Explicit verification step: Workflow step 5 now specifies exactly what to grep for (string-concatenated prompts, unguarded tool calls, unsanitized output, secrets in system prompts)

also stress-tested your llm-security skill against a few real-world task evals and it held up really well on detecting unguarded tool-call patterns in agentic LangChain pipelines. Kudos for that.

quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.

if you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.

Hey @DrewDennison 👋

I ran your skills through `tessl skill review` at work and found some targeted improvements for `llm-security`. Here's the full before/after:

| Skill | Before | After | Change |
|-------|--------|-------|--------|
| llm-security | 79% | 97% | +18% |
| code-security | 80% | 80% | — |
| semgrep | 90% | 90% | — |

<details>
<summary>What changed in <code>llm-security</code></summary>

- **Description**: Added concrete action verbs (identifies, audits, recommends, flags) so agents know *what* the skill does, not just *when* to use it — specificity went from 2/3 → 3/3
- **Removed redundant sections**: Dropped the "Key Principles" section (generic security principles Claude already knows) and "Quick Reference" table (duplicated the Categories list) to cut token waste
- **Streamlined intro**: Replaced the verbose "How to Use This Skill" proactive/reactive explanation with a one-liner overview — the workflow section already covers this
- **Added inline code examples**: Two vulnerable → secure Python pairs for the two most critical categories (Prompt Injection LLM01, Excessive Agency LLM06) so the skill is immediately actionable without needing to read sub-files
- **Explicit verification step**: Workflow step 5 now specifies exactly what to grep for (string-concatenated prompts, unguarded tool calls, unsanitized output, secrets in system prompts)

</details>

I also stress-tested your `llm-security` skill against a few real-world task evals and it held up really well on detecting unguarded tool-call patterns in agentic LangChain pipelines. Kudos for that.

Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute.

Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags.

Thanks in advance 🙏
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

---
name: llm-security
description: "Security guidelines for LLM applications based on OWASP Top 10 for LLM 2025. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
description: "Identifies and mitigates LLM vulnerabilities — prompt injection, insecure output handling, data poisoning, excessive agency — based on OWASP Top 10 for LLM 2025. Audits LLM code for security risks, recommends secure patterns, and flags vulnerable ones with fix examples. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:
Possibly found usage of AI: OpenAI

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by detect-generic-ai-oai.

You can view more details about this finding in the Semgrep AppSec Platform.

---
name: llm-security
description: "Security guidelines for LLM applications based on OWASP Top 10 for LLM 2025. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
description: "Identifies and mitigates LLM vulnerabilities — prompt injection, insecure output handling, data poisoning, excessive agency — based on OWASP Top 10 for LLM 2025. Audits LLM code for security risks, recommends secure patterns, and flags vulnerable ones with fix examples. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like 'prompt injection' or 'check LLM security'. IMPORTANT: Always consult this skill when building chatbots, AI agents, RAG pipelines, tool-using LLMs, agentic systems, or any application that calls an LLM API (OpenAI, Anthropic, Gemini, etc.) — even if the user doesn't explicitly mention security. Also use when users import 'openai', 'anthropic', 'langchain', 'llamaindex', or similar LLM libraries."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:
Possibly found usage of AI: Anthropic

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by detect-generic-ai-anthprop.

You can view more details about this finding in the Semgrep AppSec Platform.

@semgrep-zcs-prod-semgrep
Copy link
Copy Markdown

Semgrep found 1 detect-generic-ai-oai finding:

  • skills/llm-security/SKILL.md

Possibly found usage of AI: OpenAI

Semgrep found 1 detect-generic-ai-anthprop finding:

  • skills/llm-security/SKILL.md

Possibly found usage of AI: Anthropic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants