diff --git a/docs/sbt-parser-implementation-plan.md b/docs/sbt-parser-implementation-plan.md new file mode 100644 index 0000000..c46beed --- /dev/null +++ b/docs/sbt-parser-implementation-plan.md @@ -0,0 +1,189 @@ +# SBT Parser Implementation Plan + +## Context + +The manifest-parser repository supports Maven, npm, PyPI, Go modules, and .NET. The user needs to extend it with SBT (Scala Build Tool) support to parse SBT manifest files and extract dependencies. The implementation must follow existing patterns exactly, add duplicate detection, include comprehensive tests with vulnerable packages, and integrate cleanly without modifying existing parsers. + +### Supported SBT File Types + +SBT uses multiple file types that can declare dependencies. The parser supports **all `.sbt` files** via extension-based matching (like `.csproj` for dotnet): + +| File | Purpose | Syntax | +|------|---------|--------| +| `build.sbt` | Primary build definition | `libraryDependencies += "g" % "a" % "v"` | +| `plugins.sbt` | SBT plugin dependencies (in `project/`) | `addSbtPlugin("g" % "a" % "v")` | +| `dependencies.sbt` | Separated dependency definitions | Same as `build.sbt` | +| Any other `*.sbt` | SBT auto-loads all `.sbt` files in project root | Same as `build.sbt` | + +The core dependency regex `"g" % "a" % "v"` matches inside any wrapper (`addSbtPlugin(...)`, `libraryDependencies +=`, bare declarations), so all these file types are handled by the same parser with no special-casing needed. + +--- + +## Files to Create (3) + +### 1. `internal/parsers/sbt/sbt-parser.go` — Core Parser + +**Package:** `sbt` | **Struct:** `SbtParser{}` | **PackageManager string:** `"sbt"` + +**Parsing Strategy — Two-pass, regex-based (like PyPI parser but with Seq-block state tracking):** + +- **Pass 1:** Extract variable definitions into `map[string]string` +- **Pass 2:** Line-by-line dependency extraction with state machine for `Seq(...)` blocks + +#### Variable Extraction (Pass 1) + +Supports all Scala variable declaration forms used in SBT files: + +| Pattern | Example | Regex | +|---------|---------|-------| +| `val` | `val v = "1.0"` | `^\s*val\s+(\w+)\s*=\s*"([^"]+)"` | +| `lazy val` | `lazy val v = "1.0"` | `^\s*lazy\s+val\s+(\w+)\s*=\s*"([^"]+)"` | +| `def` | `def v = "1.0"` | `^\s*def\s+(\w+)\s*=\s*"([^"]+)"` | + +All three patterns are combined into a single regex: +``` +^\s*(?:lazy\s+)?(?:val|def)\s+(\w+)\s*=\s*"([^"]+)" +``` + +#### Dependency Extraction (Pass 2) + +**Core dependency regex:** +``` +"([^"]+)"\s+(%{1,3})\s+"([^"]+)"\s+%\s+(?:"([^"]+)"|(\w+))(?:\s+%\s+(?:"[^"]*"|\w+))? +``` +Captures: groupId, operator (`%`/`%%`/`%%%`), artifactId, version (quoted or variable name), optional scope (ignored). + +#### Helper functions: +- `extractVariables(lines []string) map[string]string` — supports `val`, `lazy val`, and `def` +- `resolveVersion(version string, vars map[string]string) string` — exact version as-is, variable lookup, unresolvable → `"latest"` +- `stripComments(line string, inBlockComment *bool) string` — handles `//` and `/* */` +- `computeLocationIndices(rawLine, groupId) (int, int)` — calculates start/end with modifier-aware trimming + +#### Duplicate detection: +`map[string]bool` keyed by `"groupId:artifactId"`. Skip duplicates silently (no `log.Printf` — this is a library, not a CLI; callers control their own logging). + +#### Comment handling: +Strip `//` inline comments; track `/* */` block comment state across lines. The `//` stripping is applied **after** the dependency regex match on the raw line, so `//` inside quoted strings in dependency declarations won't cause false truncation. + +#### Location tracking: +Single `Location` per package (like PyPI), `Line` is 0-indexed: +- `StartIndex` = position of first `"` of groupId in the raw line +- `EndIndex` = end of the dependency declaration, **excluding** trailing modifiers + +**Modifier-aware EndIndex calculation:** The `computeLocationIndices` function trims the following patterns from the end of the line when computing `EndIndex`: +- Trailing commas and whitespace +- Dependency modifiers: `exclude(...)`, `excludeAll(...)`, `classifier(...)`, `intransitive()`, `withSources()`, `withJavadoc()`, `cross(...)` +- Closing parentheses from `addSbtPlugin(...)` or `Seq(...)` wrappers +- Inline comments (`// ...`) + +This ensures the location span covers only the `"g" % "a" % "v"` core declaration. + +#### Imports: +Only stdlib — `os`, `regexp`, `strings`, `fmt` + `models` package. No `log` import (library code should not write to stderr). + +### 2. `internal/testdata/build.sbt` and `internal/testdata/plugins.sbt` — Test Fixtures + +**`build.sbt`** — Contains known-vulnerable dependencies: +- **log4j-core 2.14.0** (CVE-2021-44228 — Log4Shell) +- **jackson-databind 2.13.0** (multiple CVEs) +- **struts2-core 2.5.20** (CVE-2020-17530) +- **commons-collections 3.2.1** (deserialization vulnerability) +- **snakeyaml 1.26** (CVE-2022-1471) + +Exercises all parsing scenarios: `%`, `%%`, `%%%`, `Seq(...)`, variable-based versions, inline comments, block comments, scope annotations. + +**`plugins.sbt`** — Contains SBT plugin dependencies using `addSbtPlugin(...)` syntax to validate that the parser handles `plugins.sbt` files correctly. + +### 3. `internal/parsers/sbt/sbt-parser_test.go` — Comprehensive Tests + +**Table-driven + individual tests following Maven/PyPI patterns:** + +| # | Test | What it validates | +|---|------|-------------------| +| 1 | TestParseSingleDependency | Basic `libraryDependencies += "g" % "a" % "v"` | +| 2 | TestParseSingleDependencyDoublePercent | `%%` operator → PackageName is `g:a` (no Scala suffix) | +| 3 | TestParseSingleDependencyTriplePercent | `%%%` operator (Scala.js) → same as `%%` | +| 4 | TestParseSeqBlock | `libraryDependencies ++= Seq(...)` with multiple deps | +| 5 | TestParseWithScope | Trailing `% "test"` or `% Test` → parsed correctly, scope ignored | +| 6 | TestParseWithVariableVersion | `val v = "1.0"` then `% v` → resolves to `"1.0"` | +| 7 | TestParseWithUnresolvableVariable | Missing variable → version is `"latest"` | +| 8 | TestParseSingleLineComment | `//` comments are skipped | +| 9 | TestParseBlockComment | `/* ... */` spanning lines → deps inside skipped | +| 10 | TestParseEmptyFile | Returns empty slice, no error | +| 11 | TestParseDuplicateDependencies | Same `g:a` twice → first wins, second skipped | +| 12 | TestParseLocationAccuracy | Verify exact Line, StartIndex, EndIndex values | +| 13 | TestParseNonExistentFile | Returns error | +| 14 | TestParseMixedOperators | Mix of `%` and `%%` in same Seq | +| 15 | TestResolveVersion | Table-driven: exact, variable, missing, empty | +| 16 | TestParseAddSbtPlugin | `addSbtPlugin("g" % "a" % "v")` syntax from `plugins.sbt` | +| 17 | TestParseLazyVal | `lazy val v = "1.0"` → variable extracted and resolved | +| 18 | TestParseDef | `def v = "1.0"` → variable extracted and resolved | +| 19 | TestParseWithExclude | `"g" % "a" % "v" exclude("x", "y")` → parsed, EndIndex excludes modifier | +| 20 | TestParseWithIntransitive | `"g" % "a" % "v" intransitive()` → parsed, EndIndex excludes modifier | +| 21 | TestParseWithCross | `"g" % "a" % "v" cross CrossVersion.full` → parsed, EndIndex excludes modifier | +| 22 | TestParseWithExcludeAll | `"g" % "a" % "v" excludeAll(...)` → parsed, EndIndex excludes modifier | +| 23 | TestParseDependencyOverrides | `dependencyOverrides += "g" % "a" % "v"` → parsed correctly | +| 24 | TestParseWithClassifier | `"g" % "a" % "v" % "test" classifier "tests"` → parsed, classifier ignored | +| 25 | TestExtractVariables | Table-driven: val, lazy val, def, commented out, indented | +| 26 | TestSbtParser_Parse_RealFile | Parse `../../testdata/build.sbt` and validate against expected packages | +| 27 | TestSbtParser_Parse_PluginsFile | Parse `../../testdata/plugins.sbt` and validate plugin dependencies | + +--- + +## Files to Modify (3) + +### 4. `pkg/parser/manifest-file-selector.go` + +- Add `SbtBuild` to the `Manifest` iota enum (after `GoMod`) +- Add extension-based detection: `if manifestFileExtension == ".sbt" { return SbtBuild }` + - This matches **all** `.sbt` files (`build.sbt`, `plugins.sbt`, `dependencies.sbt`, etc.) + - Follows the same pattern used for `.csproj` detection + +### 5. `pkg/parser/parser_factory.go` + +- Add import: `"github.com/Checkmarx/manifest-parser/internal/parsers/sbt"` +- Add case: `case SbtBuild: return &sbt.SbtParser{}` + +### 6. `pkg/parser/manifest-file-selector_test.go` + +- Add `TestManifestFileSelector_ExpectSbtBuild` test for `build.sbt` +- Add `TestManifestFileSelector_ExpectSbtPlugins` test for `plugins.sbt` +- Add `TestManifestFileSelector_ExpectSbtCustom` test for `dependencies.sbt` + +--- + +## Implementation Order + +1. Create `internal/parsers/sbt/sbt-parser.go` (core parser) +2. Create `internal/testdata/build.sbt` (test fixture) +3. Create `internal/parsers/sbt/sbt-parser_test.go` (tests) +4. Modify `pkg/parser/manifest-file-selector.go` (enum + detection) +5. Modify `pkg/parser/manifest-file-selector_test.go` (selector test) +6. Modify `pkg/parser/parser_factory.go` (factory registration) +7. Run `go test ./...` to verify all tests pass with no regressions + +## Verification + +1. `go build ./...` — compiles cleanly +2. `go test ./internal/parsers/sbt/ -v` — all SBT parser tests pass +3. `go test ./pkg/parser/ -v` — selector + factory tests pass (including new SBT test) +4. `go test ./... -v` — full suite, no regressions +5. `go test ./... -cover` — check coverage +6. `go run cmd/main.go internal/testdata/build.sbt` — produces correct JSON output +7. `go run cmd/main.go internal/testdata/plugins.sbt` — produces correct JSON output for plugin dependencies + +--- + +## Production-Readiness Hardening (v2) + +The following gaps were identified after initial implementation and are addressed in the updated parser: + +| # | Gap | Fix | Impact | +|---|-----|-----|--------| +| 1 | `lazy val` not matched | Extend varRegex to `(?:lazy\s+)?(?:val\|def)` | **High** — many real projects use `lazy val` | +| 2 | `def` declarations not matched | Same regex extension | **Medium** — some projects use `def` for versions | +| 3 | Modifiers corrupt EndIndex | `computeLocationIndices` trims `exclude(...)`, `intransitive()`, `withSources()`, `withJavadoc()`, `cross(...)`, `classifier(...)` | **Medium** — common in complex builds | +| 4 | Closing `)` from wrappers in EndIndex | Trim trailing `)` after modifiers | **Medium** — affects `addSbtPlugin(...)` | +| 5 | `log.Printf` in library code | Remove all `log.Printf` calls — library consumers control their own logging | **Medium** — breaks clean library usage | +| 6 | `dependencyOverrides` not tested | Already works (regex is context-free), add explicit test | **Low** — verification only | +| 7 | `classifier` keyword | Already handled by optional scope group in regex, add explicit test | **Low** — verification only | \ No newline at end of file diff --git a/docs/sbt-parser-prompt.md b/docs/sbt-parser-prompt.md new file mode 100644 index 0000000..3d85d72 --- /dev/null +++ b/docs/sbt-parser-prompt.md @@ -0,0 +1,214 @@ +# Reproducible Prompt: Add SBT Parser to manifest-parser + +Use the following prompt with an AI coding assistant to reproduce the exact same SBT parser implementation. Copy everything below the line. + +--- + +## Prompt + +I have a Go repository (`github.com/Checkmarx/manifest-parser`) that parses package manifest files and extracts dependency information. It already supports Maven, npm, PyPI, Go modules, and .NET. It is used as an internal Go module consumed by a parent package. + +### Existing Architecture (do NOT modify these files — only extend) + +**Common types** (`pkg/parser/models/package_model.go`): +```go +type Location struct { + Line int + StartIndex int + EndIndex int +} +type Package struct { + PackageManager string + PackageName string + Version string + FilePath string + Locations []Location +} +``` + +**Parser interface** (`pkg/parser/parser.go`): +```go +type Parser interface { + Parse(manifestFile string) ([]models.Package, error) +} +``` + +**File detection** (`pkg/parser/manifest-file-selector.go`): Uses a `Manifest` iota enum and `selectManifestFile()` function that matches filenames/extensions. Example: `.csproj` extension match for dotnet, `"pom.xml"` exact match for Maven. + +**Factory** (`pkg/parser/parser_factory.go`): `ParsersFactory(manifest string) Parser` with a switch on the Manifest type. Each parser lives in its own package under `internal/parsers//`. + +**Test patterns**: Table-driven tests with `t.TempDir()`, inline content strings, `testdata.ValidatePackages()` helper from `internal/testdata/helper.go`. Real file tests against fixtures in `internal/testdata/`. + +**Existing PackageManager identifiers**: `"mvn"`, `"npm"`, `"pypi"`, `"go"`, `"nuget"`. + +**Version resolution convention**: Exact version returned as-is. Ranges/specifiers/empty return `"latest"`. + +### Task: Add SBT (Scala Build Tool) Parser Support + +Implement a production-grade SBT parser following these exact specifications: + +#### 1. Create `internal/parsers/sbt/sbt-parser.go` + +- **Package**: `sbt` | **Struct**: `SbtParser{}` | **PackageManager string**: `"sbt"` +- **Two-pass, regex-based parsing** (similar to PyPI parser style): + - **Pass 1**: Extract variable definitions (`val`, `lazy val`, `def`) into a `map[string]string` + - **Pass 2**: Line-by-line dependency extraction using regex +- **Variable regex** (combined `val`, `lazy val`, `def`): + ``` + ^\s*(?:lazy\s+)?(?:val|def)\s+(\w+)\s*=\s*"([^"]+)" + ``` +- **Dependency regex**: + ``` + "([^"]+)"\s+(%{1,3})\s+"([^"]+)"\s+%\s+(?:"([^"]+)"|(\w+))(?:\s+%\s+(?:"[^"]*"|\w+))? + ``` + Captures: groupId, operator (`%`/`%%`/`%%%`), artifactId, version (quoted string or bare variable name), optional scope (ignored). The `%%`/`%%%` operators are captured but NOT used — PackageName is always `"groupId:artifactId"` without Scala version suffix. +- **Helper functions**: + - `extractVariables(lines []string) map[string]string` — scans for `val`/`lazy val`/`def` declarations, respects block comment state + - `resolveVersion(version string, vars map[string]string) string` — returns exact version if starts with digit, looks up variable map otherwise, returns `"latest"` if unresolvable or empty + - `stripComments(line string, inBlockComment *bool) string` — handles `//` single-line and `/* */` multi-line block comments (including inline `/* ... */` on same line) + - `computeLocationIndices(rawLine string, groupId string) (int, int)` — calculates StartIndex (position of first `"` of groupId) and EndIndex (end of core dependency, EXCLUDING modifiers) +- **Modifier-aware EndIndex**: The `computeLocationIndices` function must trim these modifier keywords from the end BEFORE trimming trailing punctuation (order matters — `intransitive()` has parens that get stripped by punctuation trimmer): + ```go + var modifierKeywords = []string{ + "exclude(", "excludeAll(", "intransitive()", "withSources()", + "withJavadoc()", "classifier ", "classifier(", "cross ", "cross(", + } + ``` + Also trim trailing `)`, `,`, whitespace via `trimTrailingPunctuation`. +- **Duplicate detection**: `map[string]bool` keyed by `"groupId:artifactId"`. Skip duplicates silently (NO `log.Printf` — this is a library, not a CLI). +- **Line ending normalization**: `strings.ReplaceAll(string(content), "\r\n", "\n")` before splitting. +- **Imports**: Only `fmt`, `os`, `regexp`, `strings` + `models` package. No `log` import. +- **Location tracking**: Single `Location` per package, `Line` is 0-indexed. + +#### 2. Create `internal/testdata/build.sbt` + +Test fixture exercising all parsing scenarios with known-vulnerable packages: +```scala +// Project settings +name := "vulnerable-test-project" +version := "1.0.0" +scalaVersion := "2.13.12" + +val jacksonVersion = "2.13.0" +lazy val log4jVersion = "2.14.0" +def strutsVersion = "2.5.20" + +// Single dependency with % — CVE-2021-44228 (Log4Shell) +libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % log4jVersion + +// Single dependency with %% — safe dependency +libraryDependencies += "org.typelevel" %% "cats-core" % "2.9.0" + +// Seq block with mixed operators and vulnerable packages +libraryDependencies ++= Seq( + "com.fasterxml.jackson.core" % "jackson-databind" % jacksonVersion, + "org.apache.struts" % "struts2-core" % strutsVersion, + "commons-collections" % "commons-collections" % "3.2.1", + "org.yaml" % "snakeyaml" % "1.26", + "io.netty" %% "netty-codec-http" % "4.1.68.Final" % "test" +) + +/* + This is a block comment — dependencies here should NOT be parsed + "org.example" % "should-not-parse" % "1.0.0" +*/ + +// Scala.js dependency with %%% +libraryDependencies += "org.scala-js" %%% "scalajs-dom" % "2.4.0" + +// Dependency with exclude modifier +libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.4" exclude("org.slf4j", "slf4j-log4j12") + +// Dependency override +dependencyOverrides += "com.google.guava" % "guava" % "32.1.2-jre" +``` + +#### 3. Create `internal/testdata/plugins.sbt` + +```scala +// SBT plugins +addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0") +addSbtPlugin("org.scalameta" % "sbt-scalafmt" % "2.5.2") +addSbtPlugin("com.github.sbt" % "sbt-native-packager" % "1.9.16") +``` + +#### 4. Create `internal/parsers/sbt/sbt-parser_test.go` + +Write 29 comprehensive tests following existing Maven/PyPI test patterns. Include: + +**Core parsing tests** (use `t.TempDir()` with inline content): +1. `TestParseSingleDependency` — basic `"g" % "a" % "v"` with exact location validation via `testdata.ValidatePackages` +2. `TestParseSingleDependencyDoublePercent` — `%%` operator, verify PackageName is `g:a` (no Scala suffix) +3. `TestParseSingleDependencyTriplePercent` — `%%%` operator (Scala.js), same as `%%` +4. `TestParseSeqBlock` — `libraryDependencies ++= Seq(...)` with multiple deps +5. `TestParseWithScope` — trailing `% "test"`, verify scope is ignored +6. `TestParseWithVariableVersion` — `val v = "1.0"` then `% v`, verify resolution +7. `TestParseWithUnresolvableVariable` — missing variable, version should be `"latest"` +8. `TestParseSingleLineComment` — `//` comments skipped, inline comment after dep works +9. `TestParseBlockComment` — `/* ... */` spanning lines, deps inside skipped +10. `TestParseEmptyFile` — returns empty slice, no error +11. `TestParseDuplicateDependencies` — same `g:a` twice, first wins, second skipped +12. `TestParseLocationAccuracy` — bare `"g" % "a" % "v"` line, verify exact indices +13. `TestParseNonExistentFile` — returns error +14. `TestParseMixedOperators` — mix of `%`, `%%`, `%%%` in same Seq +15. `TestParseMalformedLine` — missing version part, gracefully skipped + +**Production hardening tests**: +16. `TestParseAddSbtPlugin` — `addSbtPlugin(...)` syntax, 2 plugins parsed correctly +17. `TestParseLazyVal` — `lazy val v = "3.1.0"`, variable resolved +18. `TestParseDef` — `def v = "4.2.0"`, variable resolved +19. `TestParseWithExclude` — `exclude(...)` modifier, EndIndex must NOT include it +20. `TestParseWithIntransitive` — `intransitive()` modifier, EndIndex must NOT include it +21. `TestParseWithCross` — `cross CrossVersion.full` modifier, EndIndex must NOT include it +22. `TestParseWithExcludeAll` — `excludeAll(...)` modifier, EndIndex must NOT include it +23. `TestParseDependencyOverrides` — `dependencyOverrides +=` parsed correctly +24. `TestParseWithClassifier` — `classifier "tests"` modifier, EndIndex must NOT include it + +**Helper function tests**: +25. `TestResolveVersion` — table-driven: exact, variable lookup, missing, empty, semver with pre-release +26. `TestStripComments` — table-driven: no comments, single-line, full-line, block start/end, inline block +27. `TestExtractVariables` — `val`, `lazy val`, `def`, commented-out (skipped), indented variants + +**Real file tests**: +28. `TestSbtParser_Parse_RealFile` — parse `../../testdata/build.sbt`, validate all 10 packages (log4j, cats, jackson, struts, commons-collections, snakeyaml, netty, scalajs, hadoop, guava) with correct names, versions, and line numbers +29. `TestSbtParser_Parse_PluginsFile` — parse `../../testdata/plugins.sbt`, validate 3 plugin packages + +#### 5. Modify `pkg/parser/manifest-file-selector.go` + +- Add `SbtBuild` to the `Manifest` iota enum after `GoMod` +- Add **extension-based** detection (NOT exact filename): `if manifestFileExtension == ".sbt" { return SbtBuild }` — this matches all `.sbt` files (`build.sbt`, `plugins.sbt`, `dependencies.sbt`, etc.), following the same pattern used for `.csproj` + +#### 6. Modify `pkg/parser/parser_factory.go` + +- Add import: `"github.com/Checkmarx/manifest-parser/internal/parsers/sbt"` +- Add case: `case SbtBuild: return &sbt.SbtParser{}` + +#### 7. Modify `pkg/parser/manifest-file-selector_test.go` + +Add 3 tests: +- `TestManifestFileSelector_ExpectSbtBuild` — `"build.sbt"` → `SbtBuild` +- `TestManifestFileSelector_ExpectSbtPlugins` — `"plugins.sbt"` → `SbtBuild` +- `TestManifestFileSelector_ExpectSbtCustom` — `"dependencies.sbt"` → `SbtBuild` + +#### 8. Update `README.md` + +Add SBT to the supported package managers table, usage examples, and project structure. + +### Critical Implementation Details + +1. **Order in `computeLocationIndices`**: Run `trimModifiers` BEFORE `trimTrailingPunctuation`. If you reverse the order, `intransitive()` will have its `)` stripped first and the modifier keyword won't match. +2. **No `log` package**: This is a library module. Do NOT use `log.Printf` for duplicate warnings or any other logging. Skip duplicates silently. +3. **`\r\n` normalization**: Add `strings.ReplaceAll(string(content), "\r\n", "\n")` before `strings.Split` — required for Windows compatibility. +4. **Regex is context-free**: The dependency regex matches `"g" % "a" % "v"` anywhere on a line, regardless of prefix (`libraryDependencies +=`, `addSbtPlugin(`, `dependencyOverrides +=`, bare declaration). This is intentional — no need to check the prefix. +5. **`%%`/`%%%` operators**: Captured by regex but ignored in output. PackageName is always `groupId:artifactId`. + +### Verification + +After implementation, run: +```bash +go build ./... +go test ./internal/parsers/sbt/ -v -cover # expect 29 tests, ~97.8% coverage +go test ./pkg/parser/ -v # expect 10 selector tests pass +go run cmd/main.go internal/testdata/build.sbt +go run cmd/main.go internal/testdata/plugins.sbt +``` diff --git a/internal/parsers/sbt/sbt-parser.go b/internal/parsers/sbt/sbt-parser.go new file mode 100644 index 0000000..0946913 --- /dev/null +++ b/internal/parsers/sbt/sbt-parser.go @@ -0,0 +1,248 @@ +package sbt + +import ( + "fmt" + "os" + "regexp" + "strings" + + "github.com/Checkmarx/manifest-parser/pkg/parser/models" +) + +// SbtParser implements parsing of SBT .sbt files (build.sbt, plugins.sbt, etc.) +type SbtParser struct{} + +var ( + // varRegex matches Scala variable declarations: + // val name = "value" + // lazy val name = "value" + // def name = "value" + varRegex = regexp.MustCompile(`^\s*(?:lazy\s+)?(?:val|def)\s+(\w+)\s*=\s*"([^"]+)"`) + + // depRegex matches SBT dependency declarations: + // "groupId" % "artifactId" % "version" + // "groupId" %% "artifactId" % "version" + // "groupId" %%% "artifactId" % "version" + // "groupId" % "artifactId" % variableName + // With optional trailing scope: % "test" or % Test + depRegex = regexp.MustCompile(`"([^"]+)"\s+(%{1,3})\s+"([^"]+)"\s+%\s+(?:"([^"]+)"|(\w+))(?:\s+%\s+(?:"[^"]*"|\w+))?`) +) + +// extractVariables scans lines for val declarations and returns a variable map +func extractVariables(lines []string) map[string]string { + vars := make(map[string]string) + inBlockComment := false + + for _, rawLine := range lines { + line := stripComments(rawLine, &inBlockComment) + if inBlockComment { + continue + } + if match := varRegex.FindStringSubmatch(line); match != nil { + vars[match[1]] = match[2] + } + } + + return vars +} + +// resolveVersion resolves a version string using the variable map +func resolveVersion(version string, vars map[string]string) string { + if version == "" { + return "latest" + } + // If it looks like a literal version (starts with digit or contains dots/hyphens typical of versions), return as-is + if len(version) > 0 && (version[0] >= '0' && version[0] <= '9') { + return version + } + // Try to resolve as a variable + if resolved, exists := vars[version]; exists { + return resolved + } + return "latest" +} + +// stripComments removes comments from a line and tracks block comment state +func stripComments(line string, inBlockComment *bool) string { + if *inBlockComment { + if idx := strings.Index(line, "*/"); idx >= 0 { + *inBlockComment = false + line = line[idx+2:] + } else { + return "" + } + } + + // Handle inline block comments: /* ... */ on the same line + for { + startIdx := strings.Index(line, "/*") + if startIdx < 0 { + break + } + endIdx := strings.Index(line[startIdx+2:], "*/") + if endIdx >= 0 { + // Block comment opens and closes on same line + line = line[:startIdx] + line[startIdx+2+endIdx+2:] + } else { + // Block comment opens but doesn't close — entering block comment + *inBlockComment = true + line = line[:startIdx] + break + } + } + + // Handle single-line comments + if idx := strings.Index(line, "//"); idx >= 0 { + line = line[:idx] + } + + return line +} + +// modifierKeywords are SBT dependency modifiers that should be excluded from the location span. +// The EndIndex should cover only the core "g" % "a" % "v" declaration. +var modifierKeywords = []string{ + "exclude(", + "excludeAll(", + "intransitive()", + "withSources()", + "withJavadoc()", + "classifier ", + "classifier(", + "cross ", + "cross(", +} + +// computeLocationIndices calculates start and end indices for a dependency in a raw line. +// StartIndex = position of the first quote of the groupId. +// EndIndex = end of the core dependency declaration, excluding modifiers, comments, and trailing punctuation. +func computeLocationIndices(rawLine string, groupId string) (int, int) { + // StartIndex: position of the first quote of the groupId + searchStr := `"` + groupId + `"` + startIdx := strings.Index(rawLine, searchStr) + if startIdx < 0 { + startIdx = 0 + } + + // Start with the full line + endIdx := len(rawLine) + + // If there's a trailing comment, stop before it + if commentIdx := strings.Index(rawLine, "//"); commentIdx >= 0 && commentIdx < endIdx { + endIdx = commentIdx + } + + // Trim known dependency modifiers first (before punctuation removal, + // so keywords like "intransitive()" are still intact when searched) + endIdx = trimModifiers(rawLine, startIdx, endIdx) + + // Trim trailing whitespace, commas, and closing parentheses + endIdx = trimTrailingPunctuation(rawLine, endIdx) + + return startIdx, endIdx +} + +// trimTrailingPunctuation removes trailing whitespace, commas, and closing parens from the end boundary +func trimTrailingPunctuation(line string, endIdx int) int { + for endIdx > 0 { + ch := line[endIdx-1] + if ch == ' ' || ch == '\t' || ch == ',' || ch == ')' { + endIdx-- + } else { + break + } + } + return endIdx +} + +// trimModifiers scans the region [startIdx, endIdx) for modifier keywords and truncates endIdx +// to exclude them. Works backwards so nested modifiers are stripped in order. +func trimModifiers(line string, startIdx int, endIdx int) int { + region := line[startIdx:endIdx] + for _, kw := range modifierKeywords { + if idx := strings.Index(region, kw); idx >= 0 { + // Truncate at the modifier keyword + candidate := startIdx + idx + // Only trim if the modifier comes after the core dependency (at least "g" % "a" % "v") + if candidate > startIdx && candidate < endIdx { + endIdx = candidate + } + } + } + return endIdx +} + +// Parse implements the Parser interface for SBT build.sbt files +func (p *SbtParser) Parse(manifestFile string) ([]models.Package, error) { + content, err := os.ReadFile(manifestFile) + if err != nil { + return nil, fmt.Errorf("failed to read manifest file: %w", err) + } + + lines := strings.Split(strings.ReplaceAll(string(content), "\r\n", "\n"), "\n") + + // Pass 1: Extract variable definitions + vars := extractVariables(lines) + + // Pass 2: Extract dependencies + var packages []models.Package + seen := make(map[string]bool) + inBlockComment := false + + for lineNum, rawLine := range lines { + line := stripComments(rawLine, &inBlockComment) + if inBlockComment { + continue + } + + line = strings.TrimSpace(line) + if line == "" { + continue + } + + // Try to extract dependency from this line + match := depRegex.FindStringSubmatch(line) + if match == nil { + continue + } + + groupId := match[1] + // match[2] is the operator (%, %%, %%%) — captured but not used + artifactId := match[3] + quotedVersion := match[4] // version from quoted string + bareVersion := match[5] // version from variable name + + var version string + if quotedVersion != "" { + version = quotedVersion + } else if bareVersion != "" { + version = resolveVersion(bareVersion, vars) + } else { + version = "latest" + } + + // Build package key for duplicate detection + pkgKey := groupId + ":" + artifactId + if seen[pkgKey] { + continue + } + seen[pkgKey] = true + + // Calculate location + startIdx, endIdx := computeLocationIndices(rawLine, groupId) + + packages = append(packages, models.Package{ + PackageManager: "sbt", + PackageName: pkgKey, + Version: version, + FilePath: manifestFile, + Locations: []models.Location{{ + Line: lineNum, + StartIndex: startIdx, + EndIndex: endIdx, + }}, + }) + } + + return packages, nil +} diff --git a/internal/parsers/sbt/sbt-parser_test.go b/internal/parsers/sbt/sbt-parser_test.go new file mode 100644 index 0000000..063a9bf --- /dev/null +++ b/internal/parsers/sbt/sbt-parser_test.go @@ -0,0 +1,820 @@ +package sbt + +import ( + "os" + "path/filepath" + "strings" + "testing" + + "github.com/Checkmarx/manifest-parser/internal/testdata" + "github.com/Checkmarx/manifest-parser/pkg/parser/models" +) + +func TestParseSingleDependency(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + expected := []models.Package{ + { + PackageManager: "sbt", + PackageName: "org.example:test-lib", + Version: "1.0.0", + FilePath: filePath, + Locations: []models.Location{{ + Line: 0, + StartIndex: 23, + EndIndex: 59, + }}, + }, + } + testdata.ValidatePackages(t, pkgs, expected) +} + +func TestParseSingleDependencyDoublePercent(t *testing.T) { + content := `libraryDependencies += "org.typelevel" %% "cats-core" % "2.9.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + expected := []models.Package{ + { + PackageManager: "sbt", + PackageName: "org.typelevel:cats-core", + Version: "2.9.0", + FilePath: filePath, + Locations: []models.Location{{ + Line: 0, + StartIndex: 23, + EndIndex: 63, + }}, + }, + } + testdata.ValidatePackages(t, pkgs, expected) +} + +func TestParseSingleDependencyTriplePercent(t *testing.T) { + content := `libraryDependencies += "org.scala-js" %%% "scalajs-dom" % "2.4.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + expected := []models.Package{ + { + PackageManager: "sbt", + PackageName: "org.scala-js:scalajs-dom", + Version: "2.4.0", + FilePath: filePath, + Locations: []models.Location{{ + Line: 0, + StartIndex: 23, + EndIndex: 65, + }}, + }, + } + testdata.ValidatePackages(t, pkgs, expected) +} + +func TestParseSeqBlock(t *testing.T) { + content := `libraryDependencies ++= Seq( + "org.example" % "lib-a" % "1.0.0", + "org.example" % "lib-b" % "2.0.0" +) +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 2 { + t.Fatalf("expected 2 packages, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.example:lib-a" { + t.Errorf("expected pkg[0].PackageName = org.example:lib-a, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "1.0.0" { + t.Errorf("expected pkg[0].Version = 1.0.0, got %s", pkgs[0].Version) + } + if pkgs[1].PackageName != "org.example:lib-b" { + t.Errorf("expected pkg[1].PackageName = org.example:lib-b, got %s", pkgs[1].PackageName) + } + if pkgs[1].Version != "2.0.0" { + t.Errorf("expected pkg[1].Version = 2.0.0, got %s", pkgs[1].Version) + } +} + +func TestParseWithScope(t *testing.T) { + content := `libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.15" % "test" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.scalatest:scalatest" { + t.Errorf("expected PackageName = org.scalatest:scalatest, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "3.2.15" { + t.Errorf("expected Version = 3.2.15, got %s", pkgs[0].Version) + } +} + +func TestParseWithVariableVersion(t *testing.T) { + content := `val jacksonVersion = "2.13.0" +libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % jacksonVersion +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + if pkgs[0].Version != "2.13.0" { + t.Errorf("expected Version = 2.13.0, got %s", pkgs[0].Version) + } +} + +func TestParseWithUnresolvableVariable(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % unknownVar +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + if pkgs[0].Version != "latest" { + t.Errorf("expected Version = latest, got %s", pkgs[0].Version) + } +} + +func TestParseSingleLineComment(t *testing.T) { + content := `// "org.example" % "should-not-parse" % "1.0.0" +libraryDependencies += "org.example" % "real-lib" % "1.0.0" // inline comment +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.example:real-lib" { + t.Errorf("expected PackageName = org.example:real-lib, got %s", pkgs[0].PackageName) + } +} + +func TestParseBlockComment(t *testing.T) { + content := `/* + "org.example" % "should-not-parse" % "1.0.0" +*/ +libraryDependencies += "org.example" % "real-lib" % "2.0.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.example:real-lib" { + t.Errorf("expected PackageName = org.example:real-lib, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "2.0.0" { + t.Errorf("expected Version = 2.0.0, got %s", pkgs[0].Version) + } +} + +func TestParseEmptyFile(t *testing.T) { + content := "" + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 0 { + t.Fatalf("expected 0 packages, got %d", len(pkgs)) + } +} + +func TestParseDuplicateDependencies(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" +libraryDependencies += "org.example" % "test-lib" % "2.0.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package (duplicate skipped), got %d", len(pkgs)) + } + + if pkgs[0].Version != "1.0.0" { + t.Errorf("expected first occurrence version 1.0.0, got %s", pkgs[0].Version) + } +} + +func TestParseLocationAccuracy(t *testing.T) { + // Line: "org.example" % "test-lib" % "1.0.0" + // Positions: 0123456789... + content := `"org.example" % "test-lib" % "1.0.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + + expected := []models.Package{ + { + PackageManager: "sbt", + PackageName: "org.example:test-lib", + Version: "1.0.0", + FilePath: filePath, + Locations: []models.Location{{ + Line: 0, + StartIndex: 0, + EndIndex: 36, + }}, + }, + } + testdata.ValidatePackages(t, pkgs, expected) +} + +func TestParseNonExistentFile(t *testing.T) { + parser := &SbtParser{} + _, err := parser.Parse("/nonexistent/build.sbt") + if err == nil { + t.Error("expected error for non-existent file, got none") + } +} + +func TestParseMixedOperators(t *testing.T) { + content := `libraryDependencies ++= Seq( + "org.example" % "lib-a" % "1.0.0", + "org.typelevel" %% "cats-core" % "2.9.0", + "org.scala-js" %%% "scalajs-dom" % "2.4.0" +) +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 3 { + t.Fatalf("expected 3 packages, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.example:lib-a" { + t.Errorf("expected pkg[0] = org.example:lib-a, got %s", pkgs[0].PackageName) + } + if pkgs[1].PackageName != "org.typelevel:cats-core" { + t.Errorf("expected pkg[1] = org.typelevel:cats-core, got %s", pkgs[1].PackageName) + } + if pkgs[2].PackageName != "org.scala-js:scalajs-dom" { + t.Errorf("expected pkg[2] = org.scala-js:scalajs-dom, got %s", pkgs[2].PackageName) + } +} + +func TestParseMalformedLine(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" +libraryDependencies += "org.example" % "real-lib" % "1.0.0" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package (malformed skipped), got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "org.example:real-lib" { + t.Errorf("expected PackageName = org.example:real-lib, got %s", pkgs[0].PackageName) + } +} + +func TestResolveVersion(t *testing.T) { + vars := map[string]string{ + "jacksonVersion": "2.13.0", + "log4jVersion": "2.14.0", + } + + tests := []struct { + name string + version string + expected string + }{ + {"exact version", "1.2.3", "1.2.3"}, + {"variable lookup", "jacksonVersion", "2.13.0"}, + {"another variable", "log4jVersion", "2.14.0"}, + {"missing variable", "unknownVar", "latest"}, + {"empty version", "", "latest"}, + {"semver with pre-release", "2.0.0-RC1", "2.0.0-RC1"}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + result := resolveVersion(tt.version, vars) + if result != tt.expected { + t.Errorf("resolveVersion(%q) = %q, want %q", tt.version, result, tt.expected) + } + }) + } +} + +func TestStripComments(t *testing.T) { + tests := []struct { + name string + line string + inBlockComment bool + expected string + expectedBlock bool + }{ + {"no comments", `"org.example" % "lib" % "1.0"`, false, `"org.example" % "lib" % "1.0"`, false}, + {"single line comment", `"org.example" % "lib" % "1.0" // comment`, false, `"org.example" % "lib" % "1.0" `, false}, + {"full line comment", `// this is a comment`, false, ``, false}, + {"block comment start", `/* start of block`, false, ``, true}, + {"inside block comment", ` some content inside block`, true, ``, true}, + {"block comment end", `end of block */`, true, ``, false}, + {"inline block comment", `before /* inside */ after`, false, `before after`, false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + inBlock := tt.inBlockComment + result := stripComments(tt.line, &inBlock) + if result != tt.expected { + t.Errorf("stripComments(%q) = %q, want %q", tt.line, result, tt.expected) + } + if inBlock != tt.expectedBlock { + t.Errorf("inBlockComment = %v, want %v", inBlock, tt.expectedBlock) + } + }) + } +} + +func TestExtractVariables(t *testing.T) { + lines := []string{ + `val jacksonVersion = "2.13.0"`, + `lazy val log4jVersion = "2.14.0"`, + `def strutsVersion = "2.5.20"`, + `// val commentedOut = "1.0.0"`, + `name := "my-project"`, + `val emptyLine`, + ` val indentedVar = "3.0.0"`, + ` lazy val indentedLazy = "4.0.0"`, + ` def indentedDef = "5.0.0"`, + } + + vars := extractVariables(lines) + + expected := map[string]string{ + "jacksonVersion": "2.13.0", + "log4jVersion": "2.14.0", + "strutsVersion": "2.5.20", + "indentedVar": "3.0.0", + "indentedLazy": "4.0.0", + "indentedDef": "5.0.0", + } + + if len(vars) != len(expected) { + t.Fatalf("expected %d variables, got %d: %v", len(expected), len(vars), vars) + } + + for key, want := range expected { + got, exists := vars[key] + if !exists { + t.Errorf("expected variable %q not found", key) + continue + } + if got != want { + t.Errorf("variable %q = %q, want %q", key, got, want) + } + } +} + +func TestParseAddSbtPlugin(t *testing.T) { + content := `addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0") +addSbtPlugin("org.scalameta" % "sbt-scalafmt" % "2.5.2") +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "plugins.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 2 { + t.Fatalf("expected 2 packages, got %d", len(pkgs)) + } + + if pkgs[0].PackageName != "com.eed3si9n:sbt-assembly" { + t.Errorf("expected pkg[0].PackageName = com.eed3si9n:sbt-assembly, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "2.1.0" { + t.Errorf("expected pkg[0].Version = 2.1.0, got %s", pkgs[0].Version) + } + if pkgs[1].PackageName != "org.scalameta:sbt-scalafmt" { + t.Errorf("expected pkg[1].PackageName = org.scalameta:sbt-scalafmt, got %s", pkgs[1].PackageName) + } + if pkgs[1].Version != "2.5.2" { + t.Errorf("expected pkg[1].Version = 2.5.2, got %s", pkgs[1].Version) + } +} + +func TestParseLazyVal(t *testing.T) { + content := `lazy val myVersion = "3.1.0" +libraryDependencies += "org.example" % "test-lib" % myVersion +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + if pkgs[0].Version != "3.1.0" { + t.Errorf("expected Version = 3.1.0, got %s", pkgs[0].Version) + } +} + +func TestParseDef(t *testing.T) { + content := `def myVersion = "4.2.0" +libraryDependencies += "org.example" % "test-lib" % myVersion +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + if pkgs[0].Version != "4.2.0" { + t.Errorf("expected Version = 4.2.0, got %s", pkgs[0].Version) + } +} + +func TestParseWithExclude(t *testing.T) { + content := `libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.4" exclude("org.slf4j", "slf4j-log4j12") +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + if pkgs[0].PackageName != "org.apache.hadoop:hadoop-common" { + t.Errorf("expected PackageName = org.apache.hadoop:hadoop-common, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "3.3.4" { + t.Errorf("expected Version = 3.3.4, got %s", pkgs[0].Version) + } + // EndIndex should NOT include the exclude(...) modifier + loc := pkgs[0].Locations[0] + rawLine := `libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.4" exclude("org.slf4j", "slf4j-log4j12")` + excludeStart := strings.Index(rawLine, " exclude(") + if loc.EndIndex > excludeStart { + t.Errorf("EndIndex %d extends into exclude(...) modifier (starts at %d)", loc.EndIndex, excludeStart) + } +} + +func TestParseWithIntransitive(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" intransitive() +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + // EndIndex should NOT include the intransitive() modifier + loc := pkgs[0].Locations[0] + rawLine := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" intransitive()` + modifierStart := strings.Index(rawLine, " intransitive()") + if loc.EndIndex > modifierStart { + t.Errorf("EndIndex %d extends into intransitive() modifier (starts at %d)", loc.EndIndex, modifierStart) + } +} + +func TestParseWithCross(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" cross CrossVersion.full +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + loc := pkgs[0].Locations[0] + rawLine := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" cross CrossVersion.full` + modifierStart := strings.Index(rawLine, " cross ") + if loc.EndIndex > modifierStart { + t.Errorf("EndIndex %d extends into cross modifier (starts at %d)", loc.EndIndex, modifierStart) + } +} + +func TestParseWithExcludeAll(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" excludeAll(ExclusionRule("org.slf4j")) +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + loc := pkgs[0].Locations[0] + rawLine := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" excludeAll(ExclusionRule("org.slf4j"))` + modifierStart := strings.Index(rawLine, " excludeAll(") + if loc.EndIndex > modifierStart { + t.Errorf("EndIndex %d extends into excludeAll(...) modifier (starts at %d)", loc.EndIndex, modifierStart) + } +} + +func TestParseDependencyOverrides(t *testing.T) { + content := `dependencyOverrides += "com.google.guava" % "guava" % "32.1.2-jre" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + if pkgs[0].PackageName != "com.google.guava:guava" { + t.Errorf("expected PackageName = com.google.guava:guava, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "32.1.2-jre" { + t.Errorf("expected Version = 32.1.2-jre, got %s", pkgs[0].Version) + } +} + +func TestParseWithClassifier(t *testing.T) { + content := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" % "test" classifier "tests" +` + tmpDir := t.TempDir() + filePath := filepath.Join(tmpDir, "build.sbt") + os.WriteFile(filePath, []byte(content), 0644) + + parser := &SbtParser{} + pkgs, err := parser.Parse(filePath) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(pkgs) != 1 { + t.Fatalf("expected 1 package, got %d", len(pkgs)) + } + if pkgs[0].PackageName != "org.example:test-lib" { + t.Errorf("expected PackageName = org.example:test-lib, got %s", pkgs[0].PackageName) + } + if pkgs[0].Version != "1.0.0" { + t.Errorf("expected Version = 1.0.0, got %s", pkgs[0].Version) + } + loc := pkgs[0].Locations[0] + rawLine := `libraryDependencies += "org.example" % "test-lib" % "1.0.0" % "test" classifier "tests"` + modifierStart := strings.Index(rawLine, " classifier ") + if loc.EndIndex > modifierStart { + t.Errorf("EndIndex %d extends into classifier modifier (starts at %d)", loc.EndIndex, modifierStart) + } +} + +func TestSbtParser_Parse_RealFile(t *testing.T) { + parser := &SbtParser{} + manifestFile := "../../testdata/build.sbt" + packages, err := parser.Parse(manifestFile) + if err != nil { + t.Fatalf("Parse() error = %v", err) + } + + // Verify package count: 10 deps (log4j, cats, jackson, struts, commons, snakeyaml, netty, scalajs, hadoop, guava) + if len(packages) != 10 { + t.Fatalf("expected 10 packages, got %d", len(packages)) + } + + // Validate key fields for each package + expected := []struct { + name string + version string + line int + }{ + {"org.apache.logging.log4j:log4j-core", "2.14.0", 10}, + {"org.typelevel:cats-core", "2.9.0", 13}, + {"com.fasterxml.jackson.core:jackson-databind", "2.13.0", 17}, + {"org.apache.struts:struts2-core", "2.5.20", 18}, + {"commons-collections:commons-collections", "3.2.1", 19}, + {"org.yaml:snakeyaml", "1.26", 20}, + {"io.netty:netty-codec-http", "4.1.68.Final", 21}, + {"org.scala-js:scalajs-dom", "2.4.0", 30}, + {"org.apache.hadoop:hadoop-common", "3.3.4", 33}, + {"com.google.guava:guava", "32.1.2-jre", 36}, + } + + for i, exp := range expected { + if packages[i].PackageManager != "sbt" { + t.Errorf("pkg[%d].PackageManager = %q, want %q", i, packages[i].PackageManager, "sbt") + } + if packages[i].PackageName != exp.name { + t.Errorf("pkg[%d].PackageName = %q, want %q", i, packages[i].PackageName, exp.name) + } + if packages[i].Version != exp.version { + t.Errorf("pkg[%d].Version = %q, want %q", i, packages[i].Version, exp.version) + } + if packages[i].FilePath != manifestFile { + t.Errorf("pkg[%d].FilePath = %q, want %q", i, packages[i].FilePath, manifestFile) + } + if len(packages[i].Locations) != 1 { + t.Errorf("pkg[%d] has %d locations, want 1", i, len(packages[i].Locations)) + continue + } + if packages[i].Locations[0].Line != exp.line { + t.Errorf("pkg[%d].Location.Line = %d, want %d", i, packages[i].Locations[0].Line, exp.line) + } + } + + // Verify hadoop exclude modifier is NOT included in EndIndex + hadoopPkg := packages[8] + if hadoopPkg.Locations[0].EndIndex > 71 { + t.Errorf("hadoop EndIndex %d should not extend into exclude(...) modifier", hadoopPkg.Locations[0].EndIndex) + } +} + +func TestSbtParser_Parse_PluginsFile(t *testing.T) { + parser := &SbtParser{} + manifestFile := "../../testdata/plugins.sbt" + packages, err := parser.Parse(manifestFile) + if err != nil { + t.Fatalf("Parse() error = %v", err) + } + + expectedPackages := []models.Package{ + { + PackageManager: "sbt", + PackageName: "com.eed3si9n:sbt-assembly", + Version: "2.1.0", + FilePath: manifestFile, + }, + { + PackageManager: "sbt", + PackageName: "org.scalameta:sbt-scalafmt", + Version: "2.5.2", + FilePath: manifestFile, + }, + { + PackageManager: "sbt", + PackageName: "com.github.sbt:sbt-native-packager", + Version: "1.9.16", + FilePath: manifestFile, + }, + } + + if len(packages) != len(expectedPackages) { + t.Fatalf("expected %d packages, got %d", len(expectedPackages), len(packages)) + } + + for i, pkg := range packages { + if pkg.PackageManager != expectedPackages[i].PackageManager { + t.Errorf("pkg[%d].PackageManager = %q, want %q", i, pkg.PackageManager, expectedPackages[i].PackageManager) + } + if pkg.PackageName != expectedPackages[i].PackageName { + t.Errorf("pkg[%d].PackageName = %q, want %q", i, pkg.PackageName, expectedPackages[i].PackageName) + } + if pkg.Version != expectedPackages[i].Version { + t.Errorf("pkg[%d].Version = %q, want %q", i, pkg.Version, expectedPackages[i].Version) + } + } +} diff --git a/internal/testdata/build.sbt b/internal/testdata/build.sbt new file mode 100644 index 0000000..d8b0a94 --- /dev/null +++ b/internal/testdata/build.sbt @@ -0,0 +1,37 @@ +// Project settings +name := "vulnerable-test-project" +version := "1.0.0" +scalaVersion := "2.13.12" + +val jacksonVersion = "2.13.0" +lazy val log4jVersion = "2.14.0" +def strutsVersion = "2.5.20" + +// Single dependency with % — CVE-2021-44228 (Log4Shell) +libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % log4jVersion + +// Single dependency with %% — safe dependency +libraryDependencies += "org.typelevel" %% "cats-core" % "2.9.0" + +// Seq block with mixed operators and vulnerable packages +libraryDependencies ++= Seq( + "com.fasterxml.jackson.core" % "jackson-databind" % jacksonVersion, + "org.apache.struts" % "struts2-core" % strutsVersion, + "commons-collections" % "commons-collections" % "3.2.1", + "org.yaml" % "snakeyaml" % "1.26", + "io.netty" %% "netty-codec-http" % "4.1.68.Final" % "test" +) + +/* + This is a block comment — dependencies here should NOT be parsed + "org.example" % "should-not-parse" % "1.0.0" +*/ + +// Scala.js dependency with %%% +libraryDependencies += "org.scala-js" %%% "scalajs-dom" % "2.4.0" + +// Dependency with exclude modifier +libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.4" exclude("org.slf4j", "slf4j-log4j12") + +// Dependency override +dependencyOverrides += "com.google.guava" % "guava" % "32.1.2-jre" diff --git a/internal/testdata/plugins.sbt b/internal/testdata/plugins.sbt new file mode 100644 index 0000000..47674cb --- /dev/null +++ b/internal/testdata/plugins.sbt @@ -0,0 +1,4 @@ +// SBT plugins +addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0") +addSbtPlugin("org.scalameta" % "sbt-scalafmt" % "2.5.2") +addSbtPlugin("com.github.sbt" % "sbt-native-packager" % "1.9.16") diff --git a/pkg/parser/manifest-file-selector.go b/pkg/parser/manifest-file-selector.go index 2710f99..e67643c 100644 --- a/pkg/parser/manifest-file-selector.go +++ b/pkg/parser/manifest-file-selector.go @@ -15,6 +15,7 @@ const ( DotnetPackagesConfig MavenPom GoMod + SbtBuild ) // selectManifestFile a method to select a manifest file type by its name @@ -55,5 +56,9 @@ func selectManifestFile(manifest string) Manifest { return GoMod } + if manifestFileExtension == ".sbt" { + return SbtBuild + } + return -1 } diff --git a/pkg/parser/manifest-file-selector_test.go b/pkg/parser/manifest-file-selector_test.go index 8d4d91c..d5e7188 100644 --- a/pkg/parser/manifest-file-selector_test.go +++ b/pkg/parser/manifest-file-selector_test.go @@ -66,3 +66,30 @@ func TestManifestFileSelector_ExpectGoMod(t *testing.T) { t.Errorf("selectManifestFile(%q) = %v; want %v", manifest, got, want) } } + +func TestManifestFileSelector_ExpectSbtBuild(t *testing.T) { + manifest := "build.sbt" + got := selectManifestFile(manifest) + want := SbtBuild + if got != want { + t.Errorf("selectManifestFile(%q) = %v; want %v", manifest, got, want) + } +} + +func TestManifestFileSelector_ExpectSbtPlugins(t *testing.T) { + manifest := "plugins.sbt" + got := selectManifestFile(manifest) + want := SbtBuild + if got != want { + t.Errorf("selectManifestFile(%q) = %v; want %v", manifest, got, want) + } +} + +func TestManifestFileSelector_ExpectSbtCustom(t *testing.T) { + manifest := "dependencies.sbt" + got := selectManifestFile(manifest) + want := SbtBuild + if got != want { + t.Errorf("selectManifestFile(%q) = %v; want %v", manifest, got, want) + } +} diff --git a/pkg/parser/parser_factory.go b/pkg/parser/parser_factory.go index 0f81e86..4163616 100644 --- a/pkg/parser/parser_factory.go +++ b/pkg/parser/parser_factory.go @@ -6,6 +6,7 @@ import ( "github.com/Checkmarx/manifest-parser/internal/parsers/maven" "github.com/Checkmarx/manifest-parser/internal/parsers/npm" "github.com/Checkmarx/manifest-parser/internal/parsers/pypi" + "github.com/Checkmarx/manifest-parser/internal/parsers/sbt" ) func ParsersFactory(manifest string) Parser { @@ -26,6 +27,8 @@ func ParsersFactory(manifest string) Parser { return &dotnet.DotnetPackagesConfigParser{} case GoMod: return &golang.GoModParser{} + case SbtBuild: + return &sbt.SbtParser{} default: return nil }