Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ## Project Overview to match the JIRA epic template. The epic also asks for "Purpose and status" — please add a one-liner on status (active / maintained / experimental / deprecated).


Go module that parses package manifests from multiple ecosystems (Maven, npm, Python, Go, .NET) and returns each declared dependency along with the **exact line/character range** of its declaration. Consumed by [AST-CLI](https://github.com/Checkmarx/ast-cli) to correlate manifest entries with Checkmarx runtime scans — so the `Locations` field is part of the public contract, not a debugging convenience.

## Commands

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic asks for a ## Development Setup section. Consider renaming this and adding: (a) prerequisites (Go ≥ 1.23, git), (b) a note that dependencies are vendored so no go mod download is required, (c) an example invocation against a fixture, e.g. go run ./cmd test/resources/pom.xml. Right now a new contributor has to guess what a valid input path looks like.


```bash
go test ./... # run all tests
go test ./internal/parsers/maven/... # run tests for a single parser
go test -run TestName ./path/... # run a single test by name
go test ./... -coverprofile cover.out # CI gate: total coverage must be >= 60%
go build -o manifest-parser ./cmd # build CLI
go run ./cmd <manifest-file> # run CLI against a manifest

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider showing a truncated sample output (a Package JSON snippet) so readers know what success looks like without opening cmd/main.go.

```

Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please promote this into a dedicated ## Technology Stack section (one of the epic's essential sections) listing: Go 1.23 / toolchain 1.24.2, github.com/stretchr/testify v1.8.4, golang.org/x/mod v0.24.0, stdlib encoding/xml + encoding/json. Explicitly state "no database" and "no web framework" so the N/A sections are unambiguous.


## Architecture

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ## Repository Structure section (or subsection) with the top-level folder tree: cmd/, pkg/parser/, internal/parsers/{maven,npm,pypi,golang,dotnet}/, test/resources/, vendor/. The epic lists "Repository Structure — Folder organization" as its own essential section.


The module is organized around one interface and a dispatching factory:

- [pkg/parser/parser.go](pkg/parser/parser.go) — `Parser` interface (`Parse(manifestFile string) ([]models.Package, error)`).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parser interface, ParsersFactory, and the Package/Location structs together form the public API of this module. The epic calls for a dedicated ## API / Endpoints / Interfaces section — consider splitting these out of Architecture so callers (AST-CLI) can find the contract quickly.

- [pkg/parser/parser_factory.go](pkg/parser/parser_factory.go) — `ParsersFactory(manifest string)` is the **only** public entry point. It calls `selectManifestFile` and returns the right concrete parser, or `nil` for unsupported files.
- [pkg/parser/manifest-file-selector.go](pkg/parser/manifest-file-selector.go) — maps filename/extension to a `Manifest` enum. Adding a new ecosystem means editing this file, the factory, and adding a package under `internal/parsers/`.
- [pkg/parser/models/package_model.go](pkg/parser/models/package_model.go) — the `Package` / `Location` structs returned to callers. `Locations` is a slice: Maven returns one entry per line of a multi-line `<dependency>` block; most others return a single entry.

Per-ecosystem parsers live under [internal/parsers/](internal/parsers/):
- `maven/` — parses `pom.xml` with `encoding/xml`, then re-scans the raw text to locate each `<dependency>` block line by line. Resolves `${property}` vars from `<properties>` and falls back to `<dependencyManagement>` for empty/ranged versions. Only **direct** `<dependencies>` are emitted (managed-only deps are intentionally skipped to avoid duplicates — see commit `9e490aa`).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing commit 9e490aa is brittle — commit hashes don't survive history rewrites, and readers have to git show to learn anything. Prefer linking the PR that introduced the change (#15), or inlining the reason: "managed-only deps are skipped to avoid duplicating entries already emitted from <dependencies>." Verified the commit exists today ("Fix Maven dependency location duplication for shared groupId") — but future-you will thank you for the inline explanation.

- `npm/` — parses `package.json` plus, if present as a sibling file, `package-lock.json` (v1 and v2/v3 formats). Ranged specifiers (`^`, `~`, `*`, `>`, `<`) trigger a lookup in the lockfile; `isLockVersionGreater` compares part-by-part numerically to decide whether the lockfile version satisfies the spec. Without a lock match, ranged versions resolve to `"latest"`.
- `pypi/` — line-oriented scan of `requirements*.txt` / `packages*.txt`. **Only `package==version` is supported** — `pip freeze`, Poetry, and pip-tools output are explicitly out of scope (see README "Known Limitations"). Comments (`#`) and environment markers (`;`) are stripped.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pypi ==-only limitation belongs in a dedicated ## Known Issues / Limitations section per the epic. Please consolidate there: pypi == only (no pip freeze / Poetry / pip-tools), npm ranged versions without a lockfile resolve to "latest", Maven managed-only deps not emitted, etc. Scatter-and-gather hurts discoverability.

- `golang/` — uses `golang.org/x/mod/modfile` to parse `go.mod`, then uses the parser's line metadata to compute character offsets.
- `dotnet/` — three separate parsers sharing patterns: `csproj_parser.go` (`.csproj`), `directory_packages_props_parser.go` (central package management), `packages_config_parser.go` (legacy). Versions are read from either a `Version` attribute or a nested `<Version>` element; bracketed ranges become `"latest"`.

### Invariants worth preserving

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic names this section "Project Rules — Don'ts and constraints". Consider renaming (or using both: ### Project Rules (Invariants)) for cross-repo consistency — the standardization goal is about predictable headings across repos, not just content.


- **`Location` uses 0-based line numbers** in most parsers (Maven, Go, npm, pypi use `lineNum - 1` or a 0-based counter). Downstream AST-CLI depends on this; don't "fix" it to 1-based without coordinating.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ambiguities in Location worth clarifying: are StartIndex / EndIndex 0-based or 1-based, and are they byte offsets or rune/character offsets? AST-CLI callers need to know to render the annotation correctly for non-ASCII manifests.

- **Unresolvable or ranged versions resolve to the literal string `"latest"`**, never an empty string. Callers branch on this value.
- **`PackageManager` strings are part of the contract**: `"mvn"`, `"npm"`, `"pypi"`, `"go"`, `"nuget"` (used by all three dotnet parsers). Don't rename them.
- Maven emits one `Location` per **non-comment line** of the `<dependency>` block (open tag, each child, close tag) so AST-CLI can annotate the whole block. Single-line `Locations` for Maven would be a regression.

## Tests & fixtures

Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add: (a) how to view coverage locally (go tool cover -html cover.out), (b) the expected pattern for a new parser (fixture under test/resources/ + *_test.go co-located with parser using testify), (c) any naming convention for fixtures. These answer "how do I add a test?" which is the epic's intent for this section.


CI ([.github/workflows/ci.yml](.github/workflows/ci.yml)) enforces a **60% total coverage floor** — adding an untested branch to an already-thin package can push the whole repo below the gate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sections from the epic template — please add, even as one-liners for the N/A ones, so every repo's CLAUDE.md has a predictable shape:

  • External Integrations — consumed by AST-CLI; Locations field + PackageManager strings (mvn/npm/pypi/go/nuget) are load-bearing downstream.
  • Deployment — N/A (library; consumed via go get github.com/Checkmarx/manifest-parser).
  • Database Schema — N/A.
  • Performance Considerations — Maven re-scans raw XML after encoding/xml parse (two passes); no streaming; large pom.xml files load fully into memory.
  • Security & Access — parsers consume untrusted manifest files. Note XXE posture (encoding/xml doesn't resolve external entities by default — worth stating explicitly) and whether there's a file-size bound.
  • Logging — CLI uses log.Fatalf on error; library returns error and does not log. Callers should not expect library log output.
  • Debugging Steps — how to run one parser against one fixture, how to set -v for verbose tests, common failure mode (location off-by-one → check 0-based invariant).
  • Coding Standardsgofmt / go vet clean; exported identifiers in pkg/, internal logic in internal/; parser packages follow <ecosystem>/<ecosystem>_parser.go + <ecosystem>_parser_test.go layout.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ type Location struct {
}
```


`Locations` points to the exact position of the dependency declaration in the source manifest, which downstream tools use for inline annotations and remediation.

## CLI
Expand Down
Loading