From d401292316327580e6c1b40dcabf2f4fc3c1421c Mon Sep 17 00:00:00 2001 From: Daniel Krupp Date: Tue, 28 Apr 2026 19:32:23 +0200 Subject: [PATCH 1/3] LLM instructions to help labeling Instructions to an LLM agent to help finding the SEI Cert rule associations and checker severities. Inputs to the classification -checker source code -checker documentation -sei cert test results --- .../checker_labeling_instructions.md | 51 +++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 scripts/llm-scripts/checker_labeling_instructions.md diff --git a/scripts/llm-scripts/checker_labeling_instructions.md b/scripts/llm-scripts/checker_labeling_instructions.md new file mode 100644 index 0000000000..0328b37c57 --- /dev/null +++ b/scripts/llm-scripts/checker_labeling_instructions.md @@ -0,0 +1,51 @@ +# CodeChecker SEI Cert mapping + +## 1. TASK + +We would like You to identify which SEI Rules that are corresponding to static analysis checkers. +Checkers are implemented by static analyzer tools such as gcc or clang-tidy. +There are multiple analyzer tools supported by CodeChecker. +There is a `config` directory in CodeChecker which contains metadata about the supported analyzers and checkers. +This metadata describes among other things the severity of the checkers, the profile of the checker, and any corresponding SEI-Cert-rule or other guideline rule. There might be checkers which do not have any corresponding sie-cert rule. + +This metadata is not complete and the corresponding sei-cert rule might be missing. + +Add the missing SEI Cert rule mapping to the CodeChecker label files based on the checker documentation, source code of the checker, Sei-cert test results and the SEI Cert rule description. + +Check the existing label mappings and remove any that are not relevant. + +Make sure that the correct severity labels are added for each checker according to the CodeChecker severity definitions. + +## 2. INPUT +| Path | Contents | +|------|----------| +| `scripts/llm-scripts/cppcheck` | The source code and documentation of cppcheck analyzer and checkers | +| `scripts/llm-scripts/gcc` | The source code and the documentation of the gcc static analyzer and checkers | +| `scripts/llm-scripts/sei-cert-rules` | Sei Cert rule description | +| `scripts/llm-scripts/sei-cert-tests` | Sei-Cert test results. There are multiple test cases per each SEI Cert rule demonstrating violations of the rule. Each test case file is named after the corresponding rule. This folder contains a json file per analyzer with the output of the tested analyzer on the test files. If a checker of an analyzer has a relevant finding on a test case, there is likely a correspondence of the given checker and the SEI Cert rule. | +| `config/labels/descriptions.json` | CodeChecker label definitions including severity and guideline definitions.| + +## 3. OUTPUT + +| Path | Contents | +|------|----------| +| `config/labels/analyzers` | CodeChecker label files in json format | + +## 4. CRITICAL RULES + +1. Do not add any sei-cert rule labels where you are not sure about the correspondence. +2. If you are unsure in the correspondence then always ask the user. do not guess. +3. Never make up any sei-cert rule id. Only use existing rule ids. +4. Always follow this naming convention of the labels sei-cert-c: for example: sei-cert-c:mem34-c + +## 5. ROLES + +- **You (AI Agent):** enterprise-level security expert assisting with the audit. +- **User:** application-level Product Owner / Architect for CodeChecker. + +## 6. Evaluation order of INPUT SOURCES +When finding a corresponding SEI-cert rule for a checker take into consideration the input sources in the following order of decreasing priority. + +1. The source code of the checker +2. The documentation of the checker +3. The test result of the checker for the given SEI Cert test case From b6d6a57166a9e5f2dd27c2bcb475dd1d88bcf5f1 Mon Sep 17 00:00:00 2001 From: Daniel Krupp Date: Thu, 30 Apr 2026 14:36:46 +0200 Subject: [PATCH 2/3] Add profile categorization definitions --- .../checker_labeling_instructions.md | 44 +++++++++++++++---- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/scripts/llm-scripts/checker_labeling_instructions.md b/scripts/llm-scripts/checker_labeling_instructions.md index 0328b37c57..65cd1e537e 100644 --- a/scripts/llm-scripts/checker_labeling_instructions.md +++ b/scripts/llm-scripts/checker_labeling_instructions.md @@ -1,7 +1,11 @@ -# CodeChecker SEI Cert mapping +# CodeChecker Labeling -## 1. TASK +## TASKS +In this section different tasks are defined to update checker labels in CodeChecker. +To complete a task use only those input sources which are listed as relevant in the "Used in Task" column of the table in section INPUT. + +### sei-cert-mapping We would like You to identify which SEI Rules that are corresponding to static analysis checkers. Checkers are implemented by static analyzer tools such as gcc or clang-tidy. There are multiple analyzer tools supported by CodeChecker. @@ -14,16 +18,38 @@ Add the missing SEI Cert rule mapping to the CodeChecker label files based on th Check the existing label mappings and remove any that are not relevant. +### severity-mapping Make sure that the correct severity labels are added for each checker according to the CodeChecker severity definitions. +### profile-mapping +Classify the checkers into CodeChecker profiles based on evaluation results. + +Noisiness is measured relative to the **median report count** across all checkers +in the evaluation data. + +* default-profile: Checkers with LOW, MEDIUM or HIGH severity whose report count + is ≤ 30× the median. Alpha, experimental, style, and debug checkers are excluded. + +* sensitive-profile: Checkers with LOW, MEDIUM or HIGH severity whose report count + is > 30× and ≤ 100× the median. Alpha, experimental, style, and debug checkers are excluded. + +* Checkers with report count > 100× the median are considered "very noisy" and are + excluded from both profiles unless explicitly overridden. + + ## 2. INPUT -| Path | Contents | -|------|----------| -| `scripts/llm-scripts/cppcheck` | The source code and documentation of cppcheck analyzer and checkers | -| `scripts/llm-scripts/gcc` | The source code and the documentation of the gcc static analyzer and checkers | -| `scripts/llm-scripts/sei-cert-rules` | Sei Cert rule description | -| `scripts/llm-scripts/sei-cert-tests` | Sei-Cert test results. There are multiple test cases per each SEI Cert rule demonstrating violations of the rule. Each test case file is named after the corresponding rule. This folder contains a json file per analyzer with the output of the tested analyzer on the test files. If a checker of an analyzer has a relevant finding on a test case, there is likely a correspondence of the given checker and the SEI Cert rule. | -| `config/labels/descriptions.json` | CodeChecker label definitions including severity and guideline definitions.| +| Path | Contents |Used in Task| +|------|----------|------------| +| `scripts/llm-scripts/cppcheck` | The source code and documentation of cppcheck analyzer and checkers |all tasks| +| `scripts/llm-scripts/gcc` | The source code and the documentation of the gcc static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clang-tidy` | The source code and the documentation of the clang-tidy static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clangsa` | The source code and the documentation of the clang static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clang-warnings` | The source code and the documentation of the clang warnings |all tasks| +| `scripts/llm-scripts/sei-cert-rules` | Sei Cert rule description |sei-cert-mapping| +| `scripts/llm-scripts/sei-cert-tests` | Sei-Cert test results. There are multiple test cases per each SEI Cert rule demonstrating violations of the rule. Each test case file is named after the corresponding rule. This folder contains a json file per analyzer with the output of the tested analyzer on the test files. If a checker of an analyzer has a relevant finding on a test case, there is likely a correspondence of the given checker and the SEI Cert rule. | sei-cert-mapping| +| `config/labels/descriptions.json` | CodeChecker label definitions including severity and guideline definitions.|all tasks| +| `config/llm-scripts/evaluation-results`| Analysis results on open source projects| profile-mapping| + ## 3. OUTPUT From 7ed93728a767e801b7ff625025f2ef75f661b630 Mon Sep 17 00:00:00 2001 From: Daniel Krupp Date: Tue, 5 May 2026 11:20:16 +0200 Subject: [PATCH 3/3] Better definition for default and sensitive profiles --- .../llm-scripts/checker_labeling_instructions.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/scripts/llm-scripts/checker_labeling_instructions.md b/scripts/llm-scripts/checker_labeling_instructions.md index 65cd1e537e..748470476f 100644 --- a/scripts/llm-scripts/checker_labeling_instructions.md +++ b/scripts/llm-scripts/checker_labeling_instructions.md @@ -28,10 +28,10 @@ Noisiness is measured relative to the **median report count** across all checker in the evaluation data. * default-profile: Checkers with LOW, MEDIUM or HIGH severity whose report count - is ≤ 30× the median. Alpha, experimental, style, and debug checkers are excluded. + is ≤ 30× the median. Alpha, experimental, style, and debug checkers are excluded. Also exclude rules which are targeting a specific non-linux platform or programming library such as: abseil, boost, fuchsia, google, llvm, zircon. * sensitive-profile: Checkers with LOW, MEDIUM or HIGH severity whose report count - is > 30× and ≤ 100× the median. Alpha, experimental, style, and debug checkers are excluded. + is > 30× and ≤ 100× the median. Alpha, experimental, style, and debug checkers are excluded. Also exclude rules which are targeting a specific non-linux platform or programming library such as: abseil, boost, fuchsia, google, llvm, zircon. * Checkers with report count > 100× the median are considered "very noisy" and are excluded from both profiles unless explicitly overridden. @@ -42,13 +42,15 @@ in the evaluation data. |------|----------|------------| | `scripts/llm-scripts/cppcheck` | The source code and documentation of cppcheck analyzer and checkers |all tasks| | `scripts/llm-scripts/gcc` | The source code and the documentation of the gcc static analyzer and checkers |all tasks| -| `scripts/llm-scripts/clang-tidy` | The source code and the documentation of the clang-tidy static analyzer and checkers |all tasks| -| `scripts/llm-scripts/clangsa` | The source code and the documentation of the clang static analyzer and checkers |all tasks| -| `scripts/llm-scripts/clang-warnings` | The source code and the documentation of the clang warnings |all tasks| +| `scripts/llm-scripts/clang-tidy-checks` | The source code of the clang-tidy static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clang-tidy-docs` | The documentation of the clang-tidy static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clangsa-checks` | The source code of the clang static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clangsa-docs` | The source code of the clang static analyzer and checkers |all tasks| +| `scripts/llm-scripts/clang-warning-docs` | Clang warnings documentation |all tasks| | `scripts/llm-scripts/sei-cert-rules` | Sei Cert rule description |sei-cert-mapping| | `scripts/llm-scripts/sei-cert-tests` | Sei-Cert test results. There are multiple test cases per each SEI Cert rule demonstrating violations of the rule. Each test case file is named after the corresponding rule. This folder contains a json file per analyzer with the output of the tested analyzer on the test files. If a checker of an analyzer has a relevant finding on a test case, there is likely a correspondence of the given checker and the SEI Cert rule. | sei-cert-mapping| | `config/labels/descriptions.json` | CodeChecker label definitions including severity and guideline definitions.|all tasks| -| `config/llm-scripts/evaluation-results`| Analysis results on open source projects| profile-mapping| +| `scripts/llm-scripts/evaluation-results`| Analysis results on open source projects| profile-mapping| ## 3. OUTPUT