diff --git a/changelog/727.feature.md b/changelog/727.feature.md new file mode 100644 index 000000000..749c94f24 --- /dev/null +++ b/changelog/727.feature.md @@ -0,0 +1,5 @@ +Added the `ref test-cases ci-gate` command, which decides how CI should verify each +test case's regression baseline against the base branch: skip it, replay the cached +native baseline, re-run the diagnostic in full, or fail an unauthorised change. +Baselines are now coupled to their input catalogs, so changing a test case's inputs +without regenerating its baseline is reported as a failure rather than passing silently. diff --git a/docs/background/regression-baselines.md b/docs/background/regression-baselines.md new file mode 100644 index 000000000..c4c9b2c36 --- /dev/null +++ b/docs/background/regression-baselines.md @@ -0,0 +1,156 @@ +# Regression baselines and the CI coupling gate + +Climate REF pins each test case to a **regression baseline**: +a recorded, known-good output that a pull request is checked against +so that an unintended change in a diagnostic's results cannot land unnoticed. + +This page explains the two-layer baseline model, +the lifecycle commands that produce and verify it, +and the CI coupling gate that decides *how* each test case is verified in a pull request. + +## The two-layer baseline model + +A baseline is split into two layers with very different trust and portability properties. + +- The **committed bundle** is the small, text-only CMEC output + (`series.json`, `diagnostic.json`, `output.json`) + written into the test case's `regression/` directory and tracked in git. + Absolute paths are rewritten to portable placeholders + (``, ``) so the bytes are machine independent. + This bundle is **the gate signal**: it is what review actually sees in the diff. + +- The **native bundle** is the heavy binary output + (`.nc`, `.png`, ...) that the committed bundle references. + Native files are content-addressed by their sha256 digest in an object store, + fetched anonymously, and are **never required to be present**. + They are written only by the credentialed `mint` step. + +The two layers are bound by a **`manifest.json`** alongside the bundle, which records: + +- `test_case_version` — a monotonic, author-bumped integer that *authorises* a new baseline. +- `committed` — sha256 digests of the committed JSON artefacts, over the exact placeholder-substituted bytes on disk. +- `native` — sha256 + size of each curated native file (empty `{}` until minted). +- `catalog_hash` — the hash of the test case's input `catalog.yaml`, coupling the baseline to the inputs that produced it. + +!!! note "An empty native set is a permanent, valid state" + Fork contributors cannot mint + (minting needs object-store write credentials that never run on untrusted pull-request code). + A baseline with `native: {}` is fully gated by its committed JSON bundle; + the native layer is **opt-in extra verification** that only runs when the blobs exist. + +## Lifecycle commands + +The `ref test-cases` verbs produce and verify baselines. +Only `mint` needs write credentials; everything else is anonymous and safe on untrusted code. + +```mermaid +flowchart LR + run["run
(no creds)"] -->|"seed committed bundle
+ manifest, native = {}"| repo[("regression/
+ manifest.json")] + repo --> mint["mint
(write creds, CI)"] + mint -->|"upload blobs to store
populate manifest.native"| store[("native object
store (sha256)")] + store --> sync["sync
(public read)"] + store --> replay["replay
(public read)"] + sync -->|"fetch blobs locally"| local[(local native)] + replay -->|"materialise native +
re-run build_execution_result +
tolerant compare"| verdict{matches
committed?} + verdict -->|yes| ok([pass]) + verdict -->|no| bad([fail]) +``` + +| Verb | Credentials | What it does | +| --- | --- | --- | +| `run` | none | Execute the diagnostic, curate outputs, write the committed bundle and seed `manifest.json` (`native = {}`). | +| `mint` | write | Upload the curated native files to the object store and populate `manifest.native`. Generally run by CI. | +| `sync` | public read | Fetch the native blobs referenced by the manifest into the local tree. | +| `replay` | public read | Materialise the native baseline, re-run only `build_execution_result`, and tolerantly compare to the committed bundle. | + +## Tolerant comparison + +`replay` and the execute path do not require byte-equality of the regenerated JSON. +A byte-equal fast path short-circuits the common case; +otherwise values are compared with a small relative/absolute tolerance +(`math.isclose`), and volatile keys/values are sanitised to placeholders before comparison. +This absorbs platform-level floating-point noise without masking a real change in results. + +## The CI coupling gate + +For every test case, CI must decide *how* to verify the baseline in a pull request, +purely from what changed relative to the base branch. +`decide_coupling` is a pure function over the head and base manifests +plus three on-disk facts the caller supplies — +whether the diagnostic's extraction code changed, +whether the committed bundle still matches its manifest digests, +and whether the input catalog still matches its manifest hash. + +The gate **fails closed**: +any state it cannot positively verify is a failure, not a silent skip. +The single exception is the native layer — +because an empty native set is legitimate, a missing native baseline downgrades to `skip` (with a warning), never `fail`. + +### Actions + +- **`skip`** — nothing relevant to this case changed, or the case is not under regression management. +- **`replay`** — cheap, anonymous re-check against the cached native baseline (only when native blobs exist). +- **`execute`** — full end-to-end re-run with tolerant compare; required when `test_case_version` was bumped to authorise a new baseline. +- **`fail`** — an unauthorised or unverifiable change (committed bundle edited without a version bump, version moved backwards, bundle drifted from its manifest, or catalog changed without regenerating the baseline). + +### Decision flow + +```mermaid +flowchart TD + start([test case]) --> hasManifest{manifest.json
present?} + + hasManifest -->|no| baseHad{present on
base branch?} + baseHad -->|yes| failDeleted[["FAIL
managed baseline removed"]] + baseHad -->|no| skipUnmanaged[["SKIP
not under management"]] + + hasManifest -->|yes| committedOk{committed bundle
matches manifest
digests?} + committedOk -->|no| failDrift[["FAIL
bundle drifted from manifest"]] + + committedOk -->|yes| catalogOk{catalog.yaml
matches manifest
catalog_hash?} + catalogOk -->|no| failCatalog[["FAIL
inputs changed without
regenerating baseline"]] + + catalogOk -->|yes| seeding{newly added
this branch?} + seeding -->|yes| seedNative{native
non-empty?} + seedNative -->|yes| replaySeed[["REPLAY
seed against committed"]] + seedNative -->|no| skipSeed[["SKIP
committed-only baseline"]] + + seeding -->|no| versionCmp{version vs base} + versionCmp -->|decreased| failVersion[["FAIL
version not monotonic"]] + versionCmp -->|bumped| execute[["EXECUTE
authorised new baseline"]] + versionCmp -->|unchanged| committedChanged{committed
changed?} + + committedChanged -->|yes| failUnauthorised[["FAIL
baseline changed without
version bump"]] + committedChanged -->|no| nativeChanged{native
changed?} + + nativeChanged -->|yes| nativeEmpty{"head native
empty?
(de-mint)"} + nativeEmpty -->|no| replayNative[["REPLAY
confirm new native
reproduces committed"]] + nativeEmpty -->|yes| skipDemint[["SKIP + WARN
native removed"]] + + nativeChanged -->|no| extractionChanged{extraction
code changed?} + extractionChanged -->|no| skipNothing[["SKIP
nothing to verify"]] + extractionChanged -->|yes| extractNative{native
non-empty?} + extractNative -->|yes| replayExtract[["REPLAY
verify against cached native"]] + extractNative -->|no| skipExtract[["SKIP
no native to replay"]] +``` + +### How changed files map to the signals + +The gate diffs `base...HEAD` (against the merge base, so unrelated base-branch churn is excluded) +and maps the changed-file list onto each case: + +- **`extraction_changed`** is coarse and errs toward `replay` (cheap, credential-free): + any change under the diagnostic's provider package, + or under the core regression package, counts for every case in that provider. +- **`committed_integrity_ok`** recomputes the committed digests from the working tree + and compares them to `manifest.committed`. +- **`catalog_integrity_ok`** recomputes the input catalog hash + and compares it to `manifest.catalog_hash`, + catching an input change that was not accompanied by a regenerated baseline. + +The gate exits non-zero if any case is gated `fail`; +the `--json` output drives CI's dispatch of the `replay` and `execute` jobs. + +!!! warning "Credentials never cross the trust boundary" + `replay`, `sync`, `run`, and the gate itself are anonymous and safe to run on untrusted pull-request code. + Only `mint` holds object-store write credentials, + and it runs exclusively on the trusted-tier runner — never on fork pull-request code. diff --git a/mkdocs.yml b/mkdocs.yml index 4049eaa1e..093fadfa3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,6 +41,7 @@ nav: - Architecture: background/architecture.md - Explanation: background/explanation.md - Datasets: background/datasets.md + - Regression Baselines: background/regression-baselines.md - API Surface: api-surface.md - Versioning and Compatibility: versioning-and-compatibility.md - Quality Assurance: background/quality-assurance.md diff --git a/packages/climate-ref-core/src/climate_ref_core/paths.py b/packages/climate-ref-core/src/climate_ref_core/paths.py index d1cc3e699..2adb0c898 100644 --- a/packages/climate-ref-core/src/climate_ref_core/paths.py +++ b/packages/climate-ref-core/src/climate_ref_core/paths.py @@ -58,10 +58,10 @@ def safe_path( """ text = str(relpath) pure = PurePosixPath(text) - if not text or "\x00" in text or pure.is_absolute() or ".." in pure.parts: + if not text or "\x00" in text or pure.is_absolute() or ".." in pure.parts or not pure.parts: raise ValueError( f"Unsafe {label} {text!r}: must be a contained relative path " - "(no absolute paths, no '..', no NUL bytes)." + "(no absolute paths, no '.' or '..', no NUL bytes)." ) if single_segment and (len(pure.parts) != 1 or "\\" in text): raise ValueError( diff --git a/packages/climate-ref-core/src/climate_ref_core/regression/__init__.py b/packages/climate-ref-core/src/climate_ref_core/regression/__init__.py index e2519cb20..d92acc1ca 100644 --- a/packages/climate-ref-core/src/climate_ref_core/regression/__init__.py +++ b/packages/climate-ref-core/src/climate_ref_core/regression/__init__.py @@ -22,6 +22,12 @@ assert_bundle_regression, compare_json_content, ) +from climate_ref_core.regression.gate import ( + Action, + GateDecision, + decide_coupling, + paths_under, +) from climate_ref_core.regression.manifest import ( COMMITTED_BUNDLE_FILES, SCHEMA_VERSION, @@ -43,6 +49,8 @@ __all__ = [ "COMMITTED_BUNDLE_FILES", "SCHEMA_VERSION", + "Action", + "GateDecision", "LocalFilesystemStore", "Manifest", "NativeEntry", @@ -56,7 +64,9 @@ "capture_execution", "compare_json_content", "compute_committed_digests", + "decide_coupling", "materialise_native", + "paths_under", "sha256_bytes", "sha256_file", "verify_committed_integrity", diff --git a/packages/climate-ref-core/src/climate_ref_core/regression/compare.py b/packages/climate-ref-core/src/climate_ref_core/regression/compare.py index 5bc01c89c..7a7d610f0 100644 --- a/packages/climate-ref-core/src/climate_ref_core/regression/compare.py +++ b/packages/climate-ref-core/src/climate_ref_core/regression/compare.py @@ -23,12 +23,12 @@ class Tolerance: Relative tolerance — the allowed proportional difference between expected and actual float values (default ``1e-6``). atol - Absolute tolerance — the minimum absolute difference that can ever be - flagged, regardless of magnitude (default ``0.0``). + Absolute tolerance — the floor difference always permitted, + regardless of magnitude (default ``1e-8``). """ rtol: float = 1e-6 - atol: float = 0.0 + atol: float = 1e-8 def _rewrite_keys_and_values(obj: Any, replacements: list[tuple[str, str]]) -> Any: @@ -137,9 +137,10 @@ def _compare_recursive( is still identifiable. """ label = path or "" - # Type mismatch (treat int/float as numeric together) + # Type mismatch (treat int/float as numeric together). if type(expected) is not type(actual): - if isinstance(expected, (int, float)) and isinstance(actual, (int, float)): + either_is_bool = isinstance(expected, bool) or isinstance(actual, bool) + if not either_is_bool and isinstance(expected, (int, float)) and isinstance(actual, (int, float)): _compare_numeric(float(expected), float(actual), tol=tol, path=label, out=out) return out.append( diff --git a/packages/climate-ref-core/src/climate_ref_core/regression/gate.py b/packages/climate-ref-core/src/climate_ref_core/regression/gate.py new file mode 100644 index 000000000..17b75ac27 --- /dev/null +++ b/packages/climate-ref-core/src/climate_ref_core/regression/gate.py @@ -0,0 +1,259 @@ +""" +CI coupling gate for test case regression bundles. + +For each test case, CI must decide *how* to verify the regression baseline in a pull request, +based on what changed relative to the base branch: + +- **EXECUTE** — re-run the diagnostic end-to-end and compare the regenerated committed + bundle to the in-repo copy (tolerant compare). + Required when the author has bumped ``test_case_version`` to authorise a new baseline. +- **REPLAY** — materialise the cached native baseline and re-run only + ``build_execution_result`` against it, comparing to the committed bundle. + Cheap and anonymous; used when extraction code changed but the committed bundle did + not, and to seed a newly added manifest — but only when native blobs exist to replay. +- **FAIL** — the committed bundle (or its input catalog) changed without a + ``test_case_version`` bump (an unauthorised baseline change), or the version moved backwards. +- **SKIP** — nothing relevant to this test case changed, or the case is not yet under + regression-baseline management. + +See [Regression Baselines](https://climate-ref.readthedocs.io/en/latest/background/regression-baselines/) +for more information about the motivation and workflow for regression baselines. +""" + +from __future__ import annotations + +import enum +from collections.abc import Iterable + +from attrs import frozen + +from climate_ref_core.regression.manifest import Manifest + + +class Action(enum.Enum): + """The verification action the CI gate selects for a single test case.""" + + SKIP = "skip" + """Nothing relevant changed, or the case is not under regression management.""" + + REPLAY = "replay" + """Replay the cached native baseline and compare to the committed bundle.""" + + EXECUTE = "execute" + """Re-run the diagnostic end-to-end and compare to the committed bundle.""" + + FAIL = "fail" + """The change is not permissible (unauthorised baseline change or bad version).""" + + +@frozen +class GateDecision: + """The gate's decision for one test case: an :class:`Action` and why.""" + + action: Action + """The selected verification action.""" + + reason: str + """A human-readable explanation, surfaced in CI logs.""" + + +def decide_coupling( # noqa: PLR0911, PLR0912 + manifest: Manifest | None, + base_manifest: Manifest | None, + *, + extraction_changed: bool = False, + committed_integrity_ok: bool = True, + catalog_integrity_ok: bool = True, +) -> GateDecision: + """ + Decide how CI should verify a single test case's regression baseline. + + The decision is pure: it performs no I/O. + All on-disk reality is summarised by its arguments + — the two manifests, whether the diagnostic's extraction code + changed, and whether the committed bundle on disk still matches the current + manifest's digests. + + The gate fails if any state cannot be positively verified. + A deleted managed manifest, a committed bundle that drifted from its manifest, + an input catalog that drifted from its manifest + is a failure rather than a silent skip. + + Changes to the native baseline are not failures. + This is due to the workflow for minting requires credentials. + This means fork contributors cannot author or edit native blobs. + ``replay`` is therefore only selected when native blobs actually exist to replay; + an absent or removed native baseline downgrades to ``skip`` (with a warning in the reason), + never ``fail``. + + See the module docstring for the meaning of each :class:`Action`. + + Parameters + ---------- + manifest + The current ``manifest.json`` for the test case, + or ``None`` if the case has no manifest on this branch (never managed, or deleted in this change). + base_manifest + The ``manifest.json`` as it exists on the base branch, + or ``None`` if the manifest is newly added in this change (seeding). + extraction_changed + Whether code that influences ``build_execution_result`` for this test case + changed in this pull request. + Only consulted when the committed bundle is unchanged and the version was not bumped. + committed_integrity_ok + Whether the committed bundle on disk matches the current manifest's ``committed`` digests exactly + (no edited, added, or removed committed file). + The caller computes this against the working tree. + ``False`` means the manifest no longer describes the bundle it is supposed to gate, + which is a hard failure. + + Ignored when ``manifest`` is ``None`` (nothing to verify). + catalog_integrity_ok + Whether the test case's input ``catalog.yaml`` still matches the current + manifest's ``catalog_hash``. + The caller computes this against the working tree. + ``False`` means the inputs changed without the baseline being regenerated, + so the committed bundle no longer reflects its inputs — a hard failure. + + Always ``True`` when the manifest carries no ``catalog_hash`` + + Returns + ------- + : + The gate's decision, pairing an :class:`Action` with a reason. + """ + if manifest is None: + # Distinguish a never-managed case from the deletion of a managed baseline. + # Deleting manifest.json must not be a silent way to disable the gate. + if base_manifest is not None: + return GateDecision( + Action.FAIL, + "manifest.json is absent but exists on the base branch; a managed " + "regression baseline cannot be removed without review", + ) + return GateDecision( + Action.SKIP, + "no manifest.json; test case not under regression-baseline management", + ) + + # The manifest must faithfully describe the committed bundle it gates. + # If the bundle on disk drifted from the manifest digests + # (an edit/add/remove without regenerating the manifest), the metadata comparisons below are meaningless. + if not committed_integrity_ok: + return GateDecision( + Action.FAIL, + "committed bundle on disk does not match manifest.json digests; " + "regenerate the manifest with `ref test-cases run` after changing the bundle", + ) + + # The input catalog must still describe the baseline it produced. + if not catalog_integrity_ok: + return GateDecision( + Action.FAIL, + "input catalog.yaml does not match manifest.json catalog_hash; " + "regenerate the baseline with `ref test-cases run` after changing the inputs", + ) + + if base_manifest is None: + # Seeding a newly added manifest. + # Replay only verifies something when native blobs exist + if manifest.native: + return GateDecision( + Action.REPLAY, + "manifest newly added (seeding); replaying native baseline against committed bundle", + ) + return GateDecision( + Action.SKIP, + "manifest newly added (seeding) with no native baseline; " + "the committed bundle is the only signal and is reviewed in the diff", + ) + + if manifest.test_case_version < base_manifest.test_case_version: + return GateDecision( + Action.FAIL, + f"test_case_version decreased ({base_manifest.test_case_version} -> " + f"{manifest.test_case_version}); version must be monotonic", + ) + + version_bumped = manifest.test_case_version > base_manifest.test_case_version + committed_changed = manifest.committed != base_manifest.committed + native_changed = manifest.native != base_manifest.native + + if version_bumped: + return GateDecision( + Action.EXECUTE, + f"test_case_version bumped ({base_manifest.test_case_version} -> " + f"{manifest.test_case_version}); executing full run with tolerant compare", + ) + + if committed_changed: + return GateDecision( + Action.FAIL, + "committed bundle changed without a test_case_version bump; " + "bump test_case_version to authorise the new baseline", + ) + + if native_changed: + if manifest.native: + # Native blobs were re-authored (re-minted) without a version bump. + # The committed bundle is unchanged, + # so verify the new native snapshot still reproduces it rather than skipping unverified. + return GateDecision( + Action.REPLAY, + "native baseline changed with committed bundle unchanged; " + "replaying to confirm the new native snapshot reproduces the committed bundle", + ) + # De-mint: the native baseline was removed while the committed bundle stayed. + return GateDecision( + Action.SKIP, + "WARNING: native baseline removed (de-mint) with committed bundle unchanged; " + "the committed bundle still gates this case but native replay is no longer possible", + ) + + if extraction_changed: + if manifest.native: + return GateDecision( + Action.REPLAY, + "extraction code changed with committed bundle unchanged; " + "replaying cached native baseline to verify", + ) + return GateDecision( + Action.SKIP, + "extraction code changed but no native baseline exists to replay; " + "the committed bundle is unchanged and remains the only signal", + ) + + return GateDecision( + Action.SKIP, + "no committed-bundle, native, version, or extraction-code change; nothing to verify", + ) + + +def paths_under(changed_files: Iterable[str], roots: Iterable[str]) -> bool: + """ + Return whether any changed file lies within one of the given directory roots. + + A small helper for deriving the ``extraction_changed`` signal from a pull request's changed-file list. + Paths are compared textually as POSIX-style, repo-relative strings, + so callers must normalise both ``changed_files`` and ``roots`` to the same convention + (e.g. ``git diff --name-only`` output and a package source directory relative to the repo root). + + Parameters + ---------- + changed_files + Repo-relative paths changed in the pull request. + roots + Repo-relative directory prefixes to test against. A trailing slash is + optional; an empty root never matches. + + Returns + ------- + : + ``True`` if any changed file equals or sits beneath any root. + """ + normalised_roots = [root.rstrip("/") for root in roots if root.rstrip("/")] + for changed in changed_files: + for root in normalised_roots: + if changed == root or changed.startswith(root + "/"): + return True + return False diff --git a/packages/climate-ref-core/src/climate_ref_core/regression/manifest.py b/packages/climate-ref-core/src/climate_ref_core/regression/manifest.py index 743ea1f95..47dba4af1 100644 --- a/packages/climate-ref-core/src/climate_ref_core/regression/manifest.py +++ b/packages/climate-ref-core/src/climate_ref_core/regression/manifest.py @@ -60,7 +60,11 @@ def _validate_digest(digest: str) -> str: return digest -COMMITTED_BUNDLE_FILES: tuple[str, ...] = ("series.json", "diagnostic.json", "output.json") +COMMITTED_BUNDLE_FILES: tuple[str, ...] = ( + "series.json", + "diagnostic.json", + "output.json", +) """The committed CMEC artefacts tracked in git. Their digests are tracked in :attr:`Manifest.committed`. @@ -135,6 +139,12 @@ class Manifest: native: dict[str, NativeEntry] """Digests of curated native output files: ``{relpath: NativeEntry}``.""" + catalog_hash: str | None = None + """Hash of the test case input ``catalog.yaml`` (its ``_metadata.hash``) that produced this baseline. + This couples the baseline to its inputs. + The CI gate fails a case whose live catalog hash no longer matches this value. + """ + @classmethod def load(cls, path: Path) -> Manifest: """ @@ -156,11 +166,39 @@ def load(cls, path: Path) -> Manifest: If the manifest is missing required keys or has malformed native entries (e.g. hand-edited or written by an incompatible version). """ - data = json.loads(path.read_text(encoding="utf-8")) + return cls.loads(path.read_text(encoding="utf-8"), source=str(path)) + + @classmethod + def loads(cls, text: str, *, source: str = "") -> Manifest: + """ + Parse a manifest from its JSON text. + + Used when the manifest does not live on disk at parse time, + e.g. when reading the base-branch copy via ``git show`` for the CI coupling gate. + + Parameters + ---------- + text + The manifest JSON. + source + A label for the text's origin, used in error messages. + + Returns + ------- + : + The parsed manifest. + + Raises + ------ + ValueError + If the manifest is missing required keys or has malformed native entries + (e.g. hand-edited or written by an incompatible version). + """ + data = json.loads(text) missing = [key for key in ("schema", "test_case_version", "committed", "native") if key not in data] if missing: raise ValueError( - f"Invalid manifest {path}: missing required keys {missing}. " + f"Invalid manifest {source}: missing required keys {missing}. " "The manifest may be corrupted or written by an incompatible version; " "regenerate it with `ref test-cases run --force-regen`." ) @@ -171,7 +209,7 @@ def load(cls, path: Path) -> Manifest: } except (KeyError, TypeError, AttributeError) as exc: raise ValueError( - f"Invalid manifest {path}: malformed 'native' entry ({exc!r}). " + f"Invalid manifest {source}: malformed 'native' entry ({exc!r}). " "Each entry must be a mapping with 'sha256' and 'size' keys." ) from exc # Reject hand-edited or hostile manifests that could escape the @@ -180,13 +218,15 @@ def load(cls, path: Path) -> Manifest: try: safe_path(relpath, label="native path") except ValueError as exc: - raise ValueError(f"Invalid manifest {path}: {exc}") from exc + raise ValueError(f"Invalid manifest {source}: {exc}") from exc _validate_digest(entry.sha256) return cls( schema=data["schema"], test_case_version=data["test_case_version"], committed=dict(data["committed"]), native=native, + # TODO: remove optonality when all manifests have this field. + catalog_hash=data.get("catalog_hash"), ) def dump(self, path: Path) -> None: @@ -204,6 +244,7 @@ def dump(self, path: Path) -> None: payload = { "schema": self.schema, "test_case_version": self.test_case_version, + "catalog_hash": self.catalog_hash, "committed": self.committed, "native": {relpath: asdict(entry) for relpath, entry in self.native.items()}, } @@ -211,7 +252,7 @@ def dump(self, path: Path) -> None: path.write_text(text, encoding="utf-8") @classmethod - def seed_v1(cls, committed_digests: dict[str, str]) -> Manifest: + def seed_v1(cls, committed_digests: dict[str, str], catalog_hash: str | None = None) -> Manifest: """ Create an initial manifest at ``test_case_version == 1`` with no native outputs. @@ -219,6 +260,8 @@ def seed_v1(cls, committed_digests: dict[str, str]) -> Manifest: ---------- committed_digests Digests of the committed regression JSON artefacts. + catalog_hash + Hash of the input ``catalog.yaml`` that produced the baseline, if known. Returns ------- @@ -229,6 +272,7 @@ def seed_v1(cls, committed_digests: dict[str, str]) -> Manifest: schema=SCHEMA_VERSION, test_case_version=1, committed=dict(committed_digests), + catalog_hash=catalog_hash, native={}, ) diff --git a/packages/climate-ref-core/src/climate_ref_core/regression/store.py b/packages/climate-ref-core/src/climate_ref_core/regression/store.py index 8505a6e97..355416f96 100644 --- a/packages/climate-ref-core/src/climate_ref_core/regression/store.py +++ b/packages/climate-ref-core/src/climate_ref_core/regression/store.py @@ -21,6 +21,7 @@ """ import shutil +from functools import cache from pathlib import Path from typing import Protocol, runtime_checkable from urllib.parse import unquote, urlsplit @@ -34,6 +35,23 @@ from .manifest import _validate_digest, sha256_file +@cache +def _pooch_manager(base_url: str, cache_dir: str) -> pooch.Pooch: + """ + Build (and cache) a pooch manager for a ``(base_url, cache_dir)`` pair. + + ``pooch.create`` rebuilds the whole manager and registry, so doing it per + ``fetch`` is wasteful when many blobs are pulled from the same store. + The manager is keyed by its immutable inputs and reused across fetches; + per-blob registry entries are added on the shared instance at fetch time. + """ + return pooch.create( + path=Path(cache_dir), + base_url=base_url + "/", + retry_if_failed=10, + ) + + @runtime_checkable class NativeStore(Protocol): """ @@ -265,17 +283,11 @@ def fetch(self, digest: str, dest: Path) -> None: ValueError If the downloaded blob's sha256 does not match ``digest``. """ - # Build a single-file pooch registry: the URL path component is the digest - # and the expected hash IS the digest (content-addressed). - registry = pooch.create( - path=self.cache_dir, - base_url=self.base_url + "/", - retry_if_failed=10, - ) + registry = _pooch_manager(self.base_url, str(self.cache_dir)) registry.registry[digest] = digest # content-addressed: hash == name cached = registry.fetch(digest) - # Verify the cached copy against the expected digest. + # pooch should already verify the hash against the registry entry _verify_hash_matches(cached, digest) dest.parent.mkdir(parents=True, exist_ok=True) @@ -380,26 +392,29 @@ def build_native_store(config: _NativeStoreConfigProtocol, *, writable: bool) -> cache_dir: Path = config.cache_dir parts = urlsplit(url) - is_local = parts.scheme not in ("http", "https") - - if is_local: - if parts.scheme == "file": - # Parse properly so malformed variants fail loudly instead of - # silently producing a wrong (e.g. relative) path. - if parts.netloc not in ("", "localhost"): - raise ValueError( - f"Unsupported file URL {url!r}: a host component ({parts.netloc!r}) is not " - "supported. Use the file:///absolute/path form (three slashes)." - ) - # Percent-decode the path so escaped characters (e.g. %20) resolve to the real on-disk directory. - local_root = Path(unquote(parts.path)) - else: - # A bare filesystem path. - local_root = Path(url) - return LocalFilesystemStore(root=local_root) - elif writable: - # Remote writable store — R2 backend is deferred. - # Construction raises NotImplementedError until the follow-up lands. - return R2WriteStore() - else: + scheme = parts.scheme + + if scheme in ("http", "https"): + if writable: + # TODO: Construction raises NotImplementedError until the follow-up lands. + return R2WriteStore() return PoochReadStore(base_url=url.rstrip("/"), cache_dir=cache_dir) + + if scheme == "file": + # Parse properly so malformed variants fail loudly instead of + # silently producing a wrong (e.g. relative) path. + if parts.netloc not in ("", "localhost"): + raise ValueError( + f"Unsupported file URL {url!r}: a host component ({parts.netloc!r}) is not " + "supported. Use the file:///absolute/path form (three slashes)." + ) + return LocalFilesystemStore(root=Path(unquote(parts.path))) + + if scheme == "": + return LocalFilesystemStore(root=Path(url)) + + raise ValueError( + f"Unsupported native store URL {url!r}: scheme {scheme!r} is not recognised. " + "Use http(s):// for a remote store, or file:///absolute/path or a bare filesystem " + "path for a local store." + ) diff --git a/packages/climate-ref-core/src/climate_ref_core/testing.py b/packages/climate-ref-core/src/climate_ref_core/testing.py index 771057118..a2f60c9bd 100644 --- a/packages/climate-ref-core/src/climate_ref_core/testing.py +++ b/packages/climate-ref-core/src/climate_ref_core/testing.py @@ -30,9 +30,10 @@ from climate_ref_core.diagnostics import ExecutionDefinition, ExecutionResult from climate_ref_core.esgf.base import ESGFRequest from climate_ref_core.metric_values.typing import SeriesMetricValue -from climate_ref_core.output_files import from_placeholders +from climate_ref_core.output_files import from_placeholders, ordered_replacements from climate_ref_core.pycmec.metric import CMECMetric from climate_ref_core.pycmec.output import CMECOutput +from climate_ref_core.regression.manifest import Manifest if TYPE_CHECKING: from _pytest.mark.structures import ParameterSet @@ -200,11 +201,6 @@ def manifest(self) -> Path: """Path to manifest.json (the regression bundle manifest, tracked in git).""" return self.root / "manifest.json" - @property - def regression_catalog_hash(self) -> Path: - """Path to catalog hash file in regression directory.""" - return self.regression / ".catalog_hash" - @property def test_data_dir(self) -> Path: """Path to the test-data directory (parent of diagnostic slug dir).""" @@ -366,10 +362,14 @@ def catalog_changed_since_regression(paths: TestCasePaths) -> bool: """ Check if the catalog has changed since regression data was generated. + The baseline's input hash is read from ``manifest.json`` (``catalog_hash``), + the single coupling record; there is no separate sidecar. + Returns True if: - No regression data exists (new test case) - - No stored catalog hash exists (legacy regression data) - - The catalog hash differs from the stored one + - No manifest, or the manifest records no ``catalog_hash`` (legacy regression data) + - No catalog file exists + - The current catalog hash differs from the one recorded in the manifest Parameters ---------- @@ -383,15 +383,16 @@ def catalog_changed_since_regression(paths: TestCasePaths) -> bool: """ if not paths.regression.exists(): return True # No regression data, needs to run - if not paths.regression_catalog_hash.exists(): - return True # No stored hash, needs to run if not paths.catalog.exists(): return True # No catalog file, needs to run + if not paths.manifest.exists(): + return True # No manifest, needs to run - stored_hash = paths.regression_catalog_hash.read_text().strip() - current_hash = get_catalog_hash(paths.catalog) + stored_hash = Manifest.load(paths.manifest).catalog_hash + if stored_hash is None: + return True # Legacy manifest without a recorded catalog hash, needs to run - return stored_hash != current_hash + return stored_hash != get_catalog_hash(paths.catalog) def _sanitize_for_yaml(value: Any) -> Any: @@ -537,8 +538,9 @@ def _load_series_sanitised(path: Path, replacements: dict[str, str]) -> list[Ser Load series from ``path``, replacing real paths with regression placeholders. """ text = path.read_text(encoding="utf-8") - # Apply replacements longest-key-first in case of nested repleacements - for real, placeholder in sorted(replacements.items(), key=lambda kv: len(kv[0]), reverse=True): + # Apply replacements longest-key-first so a shorter key cannot shadow a longer one + # (the canonical ordering shared with all other sanitisation). + for real, placeholder in ordered_replacements(replacements): text = text.replace(real, placeholder) data = json.loads(text) if not isinstance(data, list): diff --git a/packages/climate-ref-core/tests/unit/regression/test_compare.py b/packages/climate-ref-core/tests/unit/regression/test_compare.py index 55ca671c5..e0693b4f3 100644 --- a/packages/climate-ref-core/tests/unit/regression/test_compare.py +++ b/packages/climate-ref-core/tests/unit/regression/test_compare.py @@ -26,7 +26,7 @@ class TestTolerance: def test_defaults(self) -> None: t = Tolerance() assert t.rtol == 1e-6 - assert t.atol == 0.0 + assert t.atol == 1e-8 def test_custom_values(self) -> None: t = Tolerance(rtol=0.01, atol=1e-9) @@ -97,6 +97,21 @@ def test_type_mismatch_fail(self) -> None: def test_int_float_numeric_comparison(self) -> None: """int vs float should be treated as numeric, not type-mismatch.""" assert compare_json_content(1, 1.0, tol=TOL) == [] + + def test_bool_vs_int_is_type_mismatch(self) -> None: + result = compare_json_content(True, 1, tol=TOL) + assert len(result) == 1 + assert "type mismatch" in result[0] + + def test_int_vs_bool_is_type_mismatch(self) -> None: + result = compare_json_content(0, False, tol=TOL) + assert len(result) == 1 + assert "type mismatch" in result[0] + + def test_bool_vs_float_is_type_mismatch(self) -> None: + result = compare_json_content(True, 1.0, tol=TOL) + assert len(result) == 1 + assert "type mismatch" in result[0] result = compare_json_content(1, 2.0, tol=TOL) assert len(result) == 1 assert "float mismatch" in result[0] diff --git a/packages/climate-ref-core/tests/unit/regression/test_gate.py b/packages/climate-ref-core/tests/unit/regression/test_gate.py new file mode 100644 index 000000000..b47f825a3 --- /dev/null +++ b/packages/climate-ref-core/tests/unit/regression/test_gate.py @@ -0,0 +1,175 @@ +""" +Tests for :mod:`climate_ref_core.regression.gate`. + +The full decision matrix of :func:`decide_coupling` is exercised offline; no I/O +or git interaction is involved. +""" + +from __future__ import annotations + +import pytest + +from climate_ref_core.regression.gate import ( + Action, + decide_coupling, + paths_under, +) +from climate_ref_core.regression.manifest import SCHEMA_VERSION, Manifest, NativeEntry + + +def make_manifest( + version: int, + committed: dict[str, str] | None = None, + native: dict[str, NativeEntry] | None = None, +) -> Manifest: + """Build a manifest with a given version and committed/native digests.""" + return Manifest( + schema=SCHEMA_VERSION, + test_case_version=version, + committed=committed if committed is not None else {"output.json": "a" * 64}, + native=native if native is not None else {}, + ) + + +class TestDecideCoupling: + def test_no_manifest_no_base_skips(self) -> None: + # Never managed: no manifest on this branch and none on the base. + decision = decide_coupling(None, None) + assert decision.action is Action.SKIP + assert "not under regression-baseline management" in decision.reason + + def test_manifest_deletion_fails(self) -> None: + # The manifest existed on the base branch and was removed: fail closed. + decision = decide_coupling(None, make_manifest(1)) + assert decision.action is Action.FAIL + assert "removed without review" in decision.reason + + def test_committed_integrity_failure_fails(self) -> None: + # The committed bundle on disk drifted from the manifest digests. + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(1, {"output.json": "a" * 64}) + decision = decide_coupling(current, base, committed_integrity_ok=False) + assert decision.action is Action.FAIL + assert "does not match manifest.json digests" in decision.reason + + def test_integrity_failure_outranks_seeding(self) -> None: + # Even a newly added manifest must describe its on-disk bundle. + decision = decide_coupling(make_manifest(1), None, committed_integrity_ok=False) + assert decision.action is Action.FAIL + + def test_native_change_replays(self) -> None: + # Native re-minted (different blob) with committed + version unchanged. + base = make_manifest(1, {"output.json": "a" * 64}, {"data.nc": NativeEntry("a" * 64, 10)}) + current = make_manifest(1, {"output.json": "a" * 64}, {"data.nc": NativeEntry("b" * 64, 12)}) + decision = decide_coupling(current, base) + assert decision.action is Action.REPLAY + assert "native baseline changed" in decision.reason + + def test_newly_added_manifest_seeds_with_replay(self) -> None: + # Seeding with a native baseline present: replay verifies it reproduces the bundle. + manifest = make_manifest(1, native={"data.nc": NativeEntry("a" * 64, 10)}) + decision = decide_coupling(manifest, None) + assert decision.action is Action.REPLAY + assert "seeding" in decision.reason + + def test_newly_added_manifest_without_native_skips(self) -> None: + # Seeding with no native baseline: nothing to replay; the committed bundle is the signal. + decision = decide_coupling(make_manifest(1), None) + assert decision.action is Action.SKIP + assert "seeding" in decision.reason + + def test_catalog_integrity_failure_fails(self) -> None: + # The input catalog drifted from the manifest's recorded hash. + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(1, {"output.json": "a" * 64}) + decision = decide_coupling(current, base, catalog_integrity_ok=False) + assert decision.action is Action.FAIL + assert "catalog.yaml does not match" in decision.reason + + def test_native_demint_skips_with_warning(self) -> None: + # Native baseline removed (de-mint) with committed bundle unchanged: warn, do not fail. + base = make_manifest(1, {"output.json": "a" * 64}, {"data.nc": NativeEntry("a" * 64, 10)}) + current = make_manifest(1, {"output.json": "a" * 64}, {}) + decision = decide_coupling(current, base) + assert decision.action is Action.SKIP + assert "WARNING" in decision.reason + assert "de-mint" in decision.reason + + def test_version_bump_executes(self) -> None: + base = make_manifest(1) + current = make_manifest(2) + decision = decide_coupling(current, base) + assert decision.action is Action.EXECUTE + assert "1 -> 2" in decision.reason + + def test_version_bump_executes_even_with_committed_change(self) -> None: + # A bump authorises a new baseline, so a committed change is expected. + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(2, {"output.json": "b" * 64}) + decision = decide_coupling(current, base) + assert decision.action is Action.EXECUTE + + def test_committed_change_without_bump_fails(self) -> None: + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(1, {"output.json": "b" * 64}) + decision = decide_coupling(current, base) + assert decision.action is Action.FAIL + assert "without a test_case_version bump" in decision.reason + + def test_version_decrease_fails(self) -> None: + base = make_manifest(3) + current = make_manifest(2) + decision = decide_coupling(current, base) + assert decision.action is Action.FAIL + assert "monotonic" in decision.reason + + def test_extraction_change_replays_when_bundle_unchanged(self) -> None: + native = {"data.nc": NativeEntry("a" * 64, 10)} + base = make_manifest(1, {"output.json": "a" * 64}, native) + current = make_manifest(1, {"output.json": "a" * 64}, native) + decision = decide_coupling(current, base, extraction_changed=True) + assert decision.action is Action.REPLAY + assert "extraction code changed" in decision.reason + + def test_extraction_change_without_native_skips(self) -> None: + # Extraction code changed but no native baseline exists to replay. + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(1, {"output.json": "a" * 64}) + decision = decide_coupling(current, base, extraction_changed=True) + assert decision.action is Action.SKIP + assert "no native baseline exists to replay" in decision.reason + + def test_no_change_skips(self) -> None: + base = make_manifest(1, {"output.json": "a" * 64}) + current = make_manifest(1, {"output.json": "a" * 64}) + decision = decide_coupling(current, base, extraction_changed=False) + assert decision.action is Action.SKIP + assert "nothing to verify" in decision.reason + + +class TestPathsUnder: + def test_file_directly_under_root_matches(self) -> None: + assert paths_under(["pkg/src/mod.py"], ["pkg/src"]) + + def test_file_equal_to_root_matches(self) -> None: + assert paths_under(["pkg/src"], ["pkg/src"]) + + def test_root_trailing_slash_normalised(self) -> None: + assert paths_under(["pkg/src/mod.py"], ["pkg/src/"]) + + def test_no_match_returns_false(self) -> None: + assert not paths_under(["other/mod.py"], ["pkg/src"]) + + def test_prefix_collision_does_not_match(self) -> None: + # "pkg/src2" must not match root "pkg/src". + assert not paths_under(["pkg/src2/mod.py"], ["pkg/src"]) + + def test_empty_root_never_matches(self) -> None: + assert not paths_under(["anything.py"], [""]) + + def test_multiple_roots(self) -> None: + assert paths_under(["b/x.py"], ["a", "b"]) + + @pytest.mark.parametrize("changed", [[], ["unrelated.txt"]]) + def test_empty_or_unrelated(self, changed: list[str]) -> None: + assert not paths_under(changed, ["pkg/src"]) diff --git a/packages/climate-ref-core/tests/unit/regression/test_store.py b/packages/climate-ref-core/tests/unit/regression/test_store.py index 5cc232a0b..fa4625fba 100644 --- a/packages/climate-ref-core/tests/unit/regression/test_store.py +++ b/packages/climate-ref-core/tests/unit/regression/test_store.py @@ -302,3 +302,10 @@ def test_writable_true_remote_url_raises_not_implemented(self, tmp_path: Path) - cfg = _StubConfig(url="https://baselines.example.com", cache_dir=tmp_path / "cache") with pytest.raises(NotImplementedError, match="deferred to a follow-up PR"): build_native_store(cfg, writable=True) + + def test_unsupported_scheme_raises_value_error(self, tmp_path: Path) -> None: + # An unrecognised remote scheme must fail loudly, not be coerced to a local path. + for url in ["s3://bucket/blobs", "gs://bucket/blobs", "ftp://host/blobs"]: + cfg = _StubConfig(url=url, cache_dir=tmp_path / "cache") + with pytest.raises(ValueError, match="not recognised"): + build_native_store(cfg, writable=False) diff --git a/packages/climate-ref-core/tests/unit/test_paths.py b/packages/climate-ref-core/tests/unit/test_paths.py index 38904e7a2..bac67f9be 100644 --- a/packages/climate-ref-core/tests/unit/test_paths.py +++ b/packages/climate-ref-core/tests/unit/test_paths.py @@ -33,12 +33,12 @@ class TestSingleSegment: def test_accepts_single_segments(self, relpath): assert safe_path(relpath, single_segment=True) == Path(relpath) - @pytest.mark.parametrize("relpath", ["sub/dir", "a/b/c.json", "foo\\bar", "."]) - def test_rejects_separators_and_dot(self, relpath): + @pytest.mark.parametrize("relpath", ["sub/dir", "a/b/c.json", "foo\\bar"]) + def test_rejects_separators(self, relpath): with pytest.raises(ValueError, match="single path segment"): safe_path(relpath, single_segment=True) - @pytest.mark.parametrize("relpath", ["", "/abs", "../escape"]) + @pytest.mark.parametrize("relpath", ["", "/abs", "../escape", "."]) def test_lexical_layer_still_applies(self, relpath): with pytest.raises(ValueError, match="contained relative path"): safe_path(relpath, single_segment=True) diff --git a/packages/climate-ref-core/tests/unit/test_testing.py b/packages/climate-ref-core/tests/unit/test_testing.py index 1d04873fd..63c60d9e8 100644 --- a/packages/climate-ref-core/tests/unit/test_testing.py +++ b/packages/climate-ref-core/tests/unit/test_testing.py @@ -9,6 +9,7 @@ from climate_ref_core.datasets import DatasetCollection, ExecutionDatasetCollection, SourceDatasetType from climate_ref_core.diagnostics import ExecutionResult from climate_ref_core.metric_values.typing import SeriesMetricValue +from climate_ref_core.regression.manifest import Manifest from climate_ref_core.testing import ( RegressionValidator, TestCase, @@ -156,13 +157,6 @@ def test_different_test_cases(self, tmp_path): assert default.root != custom.root assert default.regression != custom.regression - def test_regression_catalog_hash_property(self, tmp_path): - """Test regression_catalog_hash returns correct path.""" - paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") - - expected = tmp_path / "my-diag" / "default" / "regression" / ".catalog_hash" - assert paths.regression_catalog_hash == expected - class TestYamlSerialization: """Tests for YAML serialization functions.""" @@ -544,6 +538,12 @@ def test_returns_none_for_empty_file(self, tmp_path): class TestCatalogChangedSinceRegression: """Tests for catalog_changed_since_regression function.""" + @staticmethod + def _seed_manifest(paths, catalog_hash): + """Write a minimal manifest recording ``catalog_hash`` for the test case.""" + paths.regression.mkdir(parents=True, exist_ok=True) + Manifest.seed_v1({}, catalog_hash=catalog_hash).dump(paths.manifest) + def test_returns_true_when_no_regression_exists(self, tmp_path): """Test returns True when regression directory doesn't exist.""" paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") @@ -553,13 +553,23 @@ def test_returns_true_when_no_regression_exists(self, tmp_path): result = catalog_changed_since_regression(paths) assert result is True - def test_returns_true_when_no_catalog_hash_file(self, tmp_path): - """Test returns True when catalog hash file doesn't exist.""" + def test_returns_true_when_no_manifest(self, tmp_path): + """Test returns True when manifest.json doesn't exist.""" paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") paths.create() paths.catalog.write_text("_metadata:\n hash: abc123\ncmip6:\n datasets: []\n") paths.regression.mkdir(parents=True) - # No .catalog_hash file + # No manifest.json + + result = catalog_changed_since_regression(paths) + assert result is True + + def test_returns_true_when_manifest_has_no_catalog_hash(self, tmp_path): + """Test returns True for a legacy manifest with no recorded catalog hash.""" + paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") + paths.create() + paths.catalog.write_text("_metadata:\n hash: abc123\ncmip6:\n datasets: []\n") + self._seed_manifest(paths, catalog_hash=None) result = catalog_changed_since_regression(paths) assert result is True @@ -568,42 +578,28 @@ def test_returns_true_when_no_catalog_file(self, tmp_path): """Test returns True when catalog file doesn't exist.""" paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") paths.create() - paths.regression.mkdir(parents=True) - paths.regression_catalog_hash.write_text("abc123") + self._seed_manifest(paths, catalog_hash="abc123") # No catalog file result = catalog_changed_since_regression(paths) assert result is True def test_returns_true_when_hash_differs(self, tmp_path): - """Test returns True when catalog hash differs from stored hash.""" + """Test returns True when catalog hash differs from the manifest hash.""" paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") paths.create() paths.catalog.write_text("_metadata:\n hash: new_hash_456\ncmip6:\n datasets: []\n") - paths.regression.mkdir(parents=True) - paths.regression_catalog_hash.write_text("old_hash_123") + self._seed_manifest(paths, catalog_hash="old_hash_123") result = catalog_changed_since_regression(paths) assert result is True def test_returns_false_when_hash_matches(self, tmp_path): - """Test returns False when catalog hash matches stored hash.""" + """Test returns False when catalog hash matches the manifest hash.""" paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") paths.create() paths.catalog.write_text("_metadata:\n hash: same_hash_789\ncmip6:\n datasets: []\n") - paths.regression.mkdir(parents=True) - paths.regression_catalog_hash.write_text("same_hash_789") - - result = catalog_changed_since_regression(paths) - assert result is False - - def test_handles_whitespace_in_stored_hash(self, tmp_path): - """Test handles whitespace in stored hash file.""" - paths = TestCasePaths.from_test_data_dir(tmp_path, "my-diag", "default") - paths.create() - paths.catalog.write_text("_metadata:\n hash: hash_value\ncmip6:\n datasets: []\n") - paths.regression.mkdir(parents=True) - paths.regression_catalog_hash.write_text(" hash_value \n") + self._seed_manifest(paths, catalog_hash="same_hash_789") result = catalog_changed_since_regression(paths) assert result is False diff --git a/packages/climate-ref/src/climate_ref/cli/__init__.py b/packages/climate-ref/src/climate_ref/cli/__init__.py index abbc34b55..e64fe2f37 100644 --- a/packages/climate-ref/src/climate_ref/cli/__init__.py +++ b/packages/climate-ref/src/climate_ref/cli/__init__.py @@ -42,6 +42,7 @@ ("test-cases", "sync"), ("test-cases", "replay"), ("test-cases", "mint"), + ("test-cases", "ci-gate"), } diff --git a/packages/climate-ref/src/climate_ref/cli/test_cases.py b/packages/climate-ref/src/climate_ref/cli/test_cases.py index f7c91e2a5..61a43680f 100644 --- a/packages/climate-ref/src/climate_ref/cli/test_cases.py +++ b/packages/climate-ref/src/climate_ref/cli/test_cases.py @@ -35,7 +35,8 @@ from climate_ref.provider_registry import ProviderRegistry from climate_ref_core.datasets import ExecutionDatasetCollection, SourceDatasetType from climate_ref_core.diagnostics import Diagnostic - from climate_ref_core.testing import TestCase + from climate_ref_core.regression.manifest import Manifest, NativeEntry + from climate_ref_core.testing import TestCase, TestCasePaths app = typer.Typer(help=__doc__) @@ -466,6 +467,37 @@ def _print_regression_summary( # pragma: no cover console.print(summary) +def _write_test_case_manifest( + paths: TestCasePaths, + *, + test_case_version: int, + committed: dict[str, str], + native: dict[str, NativeEntry], + schema: int | None = None, +) -> Manifest: + """ + Construct and write a test case ``manifest.json``, recording the input catalog hash. + + Shared by ``run`` (which preserves the existing version and native block) and + ``mint`` (which authors the native block and may bump the version); the two + callers differ only in the ``test_case_version`` and ``native`` they supply. + The ``catalog_hash`` is always (re)derived from the current ``catalog.yaml`` so + the manifest stays coupled to the inputs that produced the committed bundle. + """ + from climate_ref_core.regression.manifest import SCHEMA_VERSION, Manifest + from climate_ref_core.testing import get_catalog_hash + + manifest = Manifest( + schema=SCHEMA_VERSION if schema is None else schema, + test_case_version=test_case_version, + committed=dict(committed), + native=native, + catalog_hash=get_catalog_hash(paths.catalog), + ) + manifest.dump(paths.manifest) + return manifest + + def _run_single_test_case( # noqa: PLR0911, PLR0912, PLR0915 config: Config, console: Console, @@ -489,7 +521,6 @@ def _run_single_test_case( # noqa: PLR0911, PLR0912, PLR0915 from climate_ref_core.regression.manifest import Manifest from climate_ref_core.testing import ( TestCasePaths, - get_catalog_hash, load_datasets_from_yaml, ) @@ -588,26 +619,26 @@ def _run_single_test_case( # noqa: PLR0911, PLR0912, PLR0915 test_data_dir=paths.test_data_dir, ) - # Refresh the manifest's committed block only; the native block is mint-owned. + # Refresh the manifest's committed block (and catalog hash) only; the native + # block is mint-owned, so preserve the previous one when the manifest exists. if paths.manifest.exists(): previous = Manifest.load(paths.manifest) - manifest = Manifest( - schema=previous.schema, + _write_test_case_manifest( + paths, test_case_version=previous.test_case_version, committed=committed_digests, native=previous.native, + schema=previous.schema, ) else: - manifest = Manifest.seed_v1(committed_digests) - manifest.dump(paths.manifest) + _write_test_case_manifest( + paths, + test_case_version=1, + committed=committed_digests, + native={}, + ) logger.info(f"Captured {len(native)} native output file(s) (not minted)") - - # Store catalog hash for change detection - catalog_hash = get_catalog_hash(paths.catalog) - if catalog_hash: - paths.regression_catalog_hash.write_text(catalog_hash) - logger.info(f"Updated regression data: {paths.regression}") _print_regression_summary(console, paths.regression, size_threshold) elif paths.regression.exists(): @@ -953,7 +984,11 @@ def replay_test_case( # noqa: PLR0912, PLR0915 from climate_ref.provider_registry import ProviderRegistry from climate_ref_core.diagnostics import ExecutionDefinition - from climate_ref_core.output_files import from_placeholders + from climate_ref_core.output_files import ( + PLACEHOLDER_OUTPUT_DIR, + PLACEHOLDER_TEST_DATA_DIR, + from_placeholders, + ) from climate_ref_core.regression import ( COMMITTED_BUNDLE_FILES, Manifest, @@ -1054,8 +1089,8 @@ def replay_test_case( # noqa: PLR0912, PLR0915 continue replacements = { - str(output_dir): "", - str(paths.test_data_dir): "", + str(output_dir): PLACEHOLDER_OUTPUT_DIR, + str(paths.test_data_dir): PLACEHOLDER_TEST_DATA_DIR, } # The comparator silently skips a bundle file with no committed copy, compared = [f for f in COMMITTED_BUNDLE_FILES if (paths.regression / f).exists()] @@ -1138,7 +1173,7 @@ def mint_native( # noqa: PLR0912, PLR0915 from climate_ref_core.regression.capture import capture_execution from climate_ref_core.regression.manifest import Manifest from climate_ref_core.regression.store import build_native_store - from climate_ref_core.testing import TestCasePaths, get_catalog_hash, load_datasets_from_yaml + from climate_ref_core.testing import TestCasePaths, load_datasets_from_yaml config: Config = ctx.obj.config db = ctx.obj.database @@ -1229,17 +1264,12 @@ def mint_native( # noqa: PLR0912, PLR0915 version = previous.test_case_version + 1 if bump_version else previous.test_case_version else: version = 1 - manifest = Manifest( - schema=1, + _write_test_case_manifest( + paths, test_case_version=version, committed=committed_digests, native=native, ) - manifest.dump(paths.manifest) - - catalog_hash = get_catalog_hash(paths.catalog) - if catalog_hash: - paths.regression_catalog_hash.write_text(catalog_hash) minted += 1 logger.info( @@ -1256,3 +1286,270 @@ def mint_native( # noqa: PLR0912, PLR0915 console.print(f" - {case}") raise typer.Exit(code=1) console.print(f"[green]Minted {minted} native baseline(s)[/green]") + + +def _provider_source_root(diag: Diagnostic, repo_root: Path) -> str | None: + """ + Return the diagnostic's provider package source directory, relative to the repo root. + + Used to decide whether a changed file touches the diagnostic's extraction code. + The returned path is POSIX-style so it can be compared against + ``git diff --name-only`` output. + + Parameters + ---------- + diag + The diagnostic whose provider source directory is wanted. + repo_root + The repository working-tree root. + + Returns + ------- + : + The provider package source directory relative to ``repo_root``, or ``None`` + if it cannot be located or lies outside the repository. + """ + import importlib.util + + top_package = type(diag).__module__.split(".")[0] + try: + spec = importlib.util.find_spec(top_package) + except (ImportError, ValueError): + return None + if spec is None or not spec.submodule_search_locations: + return None + package_dir = Path(next(iter(spec.submodule_search_locations))).resolve() + try: + return package_dir.relative_to(repo_root.resolve()).as_posix() + except ValueError: + return None + + +def _core_extraction_roots(repo_root: Path) -> list[str]: + """ + Return the core paths whose change affects replay/execute for every test case. + + ``build_execution_result`` (the function replay/execute re-run) depends on more + than the regression package: it builds and reads CMEC bundles via + :mod:`climate_ref_core.pycmec`, persists curated outputs via + :mod:`climate_ref_core.output_files`, and is dispatched through + :mod:`climate_ref_core.diagnostics`. A change under any of these can alter the + regenerated bundle, so all of them count as an extraction change. + + Detection is deliberately coarse and errs toward REPLAY (cheap, credential-free): + a false positive only triggers an unnecessary replay, never a missed one. + Roots are derived from the installed package location (rather than hardcoded + paths) so they survive a package move; any root outside the repository + (e.g. an installed wheel) is dropped. + + Parameters + ---------- + repo_root + The repository working-tree root. + + Returns + ------- + : + Repo-relative POSIX paths for the core extraction surfaces that lie inside the repo. + """ + import climate_ref_core + + core_dir = Path(climate_ref_core.__file__).resolve().parent + repo_root_resolved = repo_root.resolve() + candidates = [ + core_dir / "regression", + core_dir / "pycmec", + core_dir / "output_files.py", + core_dir / "diagnostics.py", + ] + roots: list[str] = [] + for candidate in candidates: + try: + roots.append(candidate.relative_to(repo_root_resolved).as_posix()) + except ValueError: + continue + return roots + + +@app.command(name="ci-gate") +def ci_gate( # noqa: PLR0912, PLR0915 + ctx: typer.Context, + base: Annotated[ + str, + typer.Option(help="Git ref to compare against (the PR base branch)"), + ] = "origin/main", + provider: Annotated[ + str | None, + typer.Option(help="Limit the gate to a single provider slug"), + ] = None, + diagnostic: Annotated[ + str | None, + typer.Option(help="Limit the gate to a single diagnostic slug"), + ] = None, + test_case: Annotated[ + str | None, + typer.Option(help="Limit the gate to a single test case name"), + ] = None, + output_json: Annotated[ + bool, + typer.Option("--json", help="Emit the per-case decisions as JSON on stdout"), + ] = False, +) -> None: + """ + Decide how CI should verify each test case's regression baseline. + + Compares each committed ``manifest.json`` to its counterpart on the base branch + and reports the action CI should take per case: ``replay`` (cheap, against the + cached native baseline), ``execute`` (full re-run, when ``test_case_version`` was + bumped), ``skip`` (nothing relevant changed), or ``fail`` (an unauthorised + baseline change). Exits non-zero if any case is gated ``fail``. + + The ``--json`` output is intended for CI to dispatch ``replay``/``run`` jobs. + + Examples + -------- + ref test-cases ci-gate # Gate all cases against origin/main + ref test-cases ci-gate --base origin/develop + ref test-cases ci-gate --provider example --json + """ + import json as _json + + from git import GitCommandError + + from climate_ref.provider_registry import ProviderRegistry + from climate_ref_core.regression.gate import Action, decide_coupling, paths_under + from climate_ref_core.regression.manifest import Manifest, compute_committed_digests + from climate_ref_core.testing import TestCasePaths, get_catalog_hash + + config: Config = ctx.obj.config + db = ctx.obj.database + console: Console = ctx.obj.console + + repo = get_repo_for_path(Path.cwd()) + if repo is None or repo.working_dir is None: + logger.error("ci-gate must be run inside a git repository") + raise typer.Exit(code=1) + repo_root = Path(repo.working_dir) + + # Resolve the set of files changed on this branch relative to the base ref. + # `base...HEAD` diffs against the merge-base, so unrelated base-branch churn + # is excluded. + try: + diff_output = repo.git.diff("--name-only", f"{base}...HEAD") + except GitCommandError as exc: + logger.error(f"Could not diff against base ref {base!r}: {exc}") + raise typer.Exit(code=1) from exc + changed_files = [line.strip() for line in diff_output.splitlines() if line.strip()] + + # The core machinery behind build_execution_result affects every replay/execute, + # so a change there counts as an extraction change for all cases. Extraction-change + # detection is deliberately coarse: any change under a diagnostic's provider package + # (see `_provider_source_root`) or under the core extraction surfaces counts for + # every case in that provider. This errs toward REPLAY (cheap, credential-free), + # never away from it. + core_changed = paths_under(changed_files, _core_extraction_roots(repo_root)) + + # Hoisted once: repo_root.resolve() is filesystem-touching, and a provider's source + # root is identical for every case in that provider, so memoise it per provider slug + # rather than recomputing find_spec on each case. + repo_root_resolved = repo_root.resolve() + source_root_cache: dict[str, str | None] = {} + + registry = ProviderRegistry.build_from_config(config, db) + + decisions: list[dict[str, str]] = [] + has_failure = False + + def record(case: str, action: Action, reason: str) -> None: + nonlocal has_failure + if action is Action.FAIL: + has_failure = True + decisions.append({"case": case, "action": action.value, "reason": reason}) + + for diag, tc in _iter_test_cases(registry, provider=provider, diagnostic=diagnostic, test_case=test_case): + case_id = f"{diag.provider.slug}/{diag.slug}/{tc.name}" + paths = TestCasePaths.from_diagnostic(diag, tc.name) + + # A corrupt manifest authored in this change is a hard failure for that case, + # not a crash for the whole gate. + manifest: Manifest | None = None + if paths is not None and paths.manifest.exists(): + try: + manifest = Manifest.load(paths.manifest) + except ValueError as exc: + logger.error(f"{case_id}: invalid manifest.json: {exc}") + record(case_id, Action.FAIL, f"invalid manifest.json: {exc}") + continue + + base_manifest: Manifest | None = None + if paths is not None: + try: + rel_manifest = paths.manifest.resolve().relative_to(repo_root_resolved).as_posix() + except ValueError: + rel_manifest = None + if rel_manifest is not None: + try: + base_text = repo.git.show(f"{base}:{rel_manifest}") + except GitCommandError: + base_manifest = None + else: + # A corrupt manifest on the base branch can't be compared against; + # fall back to seeding (REPLAY) rather than aborting the gate. + try: + base_manifest = Manifest.loads(base_text, source=f"{base}:{rel_manifest}") + except ValueError as exc: + logger.warning( + f"{case_id}: base manifest at {base}:{rel_manifest} is invalid " + f"({exc}); treating as newly added" + ) + base_manifest = None + + provider_slug = diag.provider.slug + if provider_slug not in source_root_cache: + source_root_cache[provider_slug] = _provider_source_root(diag, repo_root) + source_root = source_root_cache[provider_slug] + extraction_roots = [r for r in (source_root,) if r] + extraction_changed = core_changed or paths_under(changed_files, extraction_roots) + + # Verify the committed bundle on disk still matches the manifest digests. + # A drift (edited/added/removed committed file without regenerating the + # manifest) must fail closed rather than slip through as SKIP. + committed_integrity_ok = True + # Verify the input catalog still matches the manifest's recorded hash. A catalog + # edit without regenerating the baseline leaves it silently stale; fail closed. + # Legacy manifests without a catalog_hash have nothing to compare, so stay OK. + catalog_integrity_ok = True + if manifest is not None and paths is not None: + committed_integrity_ok = compute_committed_digests(paths.regression) == manifest.committed + if manifest.catalog_hash is not None: + catalog_integrity_ok = get_catalog_hash(paths.catalog) == manifest.catalog_hash + + decision = decide_coupling( + manifest, + base_manifest, + extraction_changed=extraction_changed, + committed_integrity_ok=committed_integrity_ok, + catalog_integrity_ok=catalog_integrity_ok, + ) + record(case_id, decision.action, decision.reason) + + if output_json: + console.print_json(_json.dumps(decisions)) + else: + table = Table(title=f"CI coupling gate (base: {base})") + table.add_column("Test case", style="cyan", no_wrap=True) + table.add_column("Action") + table.add_column("Reason") + style_for = { + Action.FAIL.value: "red", + Action.EXECUTE.value: "yellow", + Action.REPLAY.value: "green", + Action.SKIP.value: "dim", + } + for entry in decisions: + style = style_for[entry["action"]] + table.add_row(entry["case"], f"[{style}]{entry['action']}[/{style}]", entry["reason"]) + console.print(table) + + if has_failure: + raise typer.Exit(code=1) diff --git a/packages/climate-ref/tests/unit/cli/test_test_cases.py b/packages/climate-ref/tests/unit/cli/test_test_cases.py index f9434ad85..6055f5e15 100644 --- a/packages/climate-ref/tests/unit/cli/test_test_cases.py +++ b/packages/climate-ref/tests/unit/cli/test_test_cases.py @@ -173,8 +173,14 @@ def test_cmip6_data_returned(self, mock_diagnostic, mock_test_case, mock_fetcher with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), patch("climate_ref.datasets.CMIP6DatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=None, + ), ): result, written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -192,8 +198,14 @@ def test_saves_catalog_yaml(self, tmp_path, mock_diagnostic, mock_test_case, moc with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), patch("climate_ref.datasets.CMIP6DatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ), patch("climate_ref_core.testing.save_datasets_to_yaml", return_value=True) as mock_save, ): _, written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -227,8 +239,14 @@ def test_obs4mips_data_returned(self, mock_diagnostic, mock_test_case): with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), patch("climate_ref.datasets.Obs4MIPsDatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=None, + ), ): result, written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -539,12 +557,23 @@ def test_run_with_fetch_flag(self, invoke_cli, mocker): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref.cli.test_cases._fetch_and_build_catalog", return_value=(MagicMock(), True)) + mocker.patch( + "climate_ref.cli.test_cases._fetch_and_build_catalog", + return_value=(MagicMock(), True), + ) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None) result = invoke_cli( - ["test-cases", "run", "--provider", "example", "--diagnostic", "test-diag", "--fetch"], + [ + "test-cases", + "run", + "--provider", + "example", + "--diagnostic", + "test-diag", + "--fetch", + ], ) assert result.exit_code == 0 @@ -571,7 +600,15 @@ def test_run_with_fetch_flag_raises_error(self, invoke_cli, mocker): ) invoke_cli( - ["test-cases", "run", "--provider", "example", "--diagnostic", "test-diag", "--fetch"], + [ + "test-cases", + "run", + "--provider", + "example", + "--diagnostic", + "test-diag", + "--fetch", + ], expected_exit_code=1, ) @@ -612,7 +649,10 @@ def test_run_handles_exceptions(self, invoke_cli, mocker, tmp_path, exception_cl "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) mocker.patch("climate_ref_core.testing.load_datasets_from_yaml", return_value=MagicMock()) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) @@ -648,7 +688,9 @@ def test_run_successful_execution(self, invoke_cli, mocker, tmp_path): mock_paths.regression = regression_dir # Use real dir that exists mock_result = MagicMock( - successful=True, metric_bundle_filename="metrics.json", output_bundle_filename="output.json" + successful=True, + metric_bundle_filename="metrics.json", + output_bundle_filename="output.json", ) mock_result.to_output_path.side_effect = lambda x: tmp_path / x mock_runner = MagicMock() @@ -658,7 +700,10 @@ def test_run_successful_execution(self, invoke_cli, mocker, tmp_path): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) mocker.patch("climate_ref_core.testing.load_datasets_from_yaml", return_value=MagicMock()) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) @@ -695,7 +740,10 @@ def test_run_unsuccessful_execution(self, invoke_cli, mocker, tmp_path): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) mocker.patch("climate_ref_core.testing.load_datasets_from_yaml", return_value=MagicMock()) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) @@ -760,13 +808,24 @@ def test_run_with_force_regen(self, invoke_cli, mocker, tmp_path): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=real_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=real_paths, + ) mocker.patch("climate_ref_core.testing.load_datasets_from_yaml", return_value=MagicMock()) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) mocker.patch("climate_ref_core.testing.get_catalog_hash", return_value="abc123") result = invoke_cli( - ["test-cases", "run", "--provider", "example", "--diagnostic", "test-diag", "--force-regen"], + [ + "test-cases", + "run", + "--provider", + "example", + "--diagnostic", + "test-diag", + "--force-regen", + ], ) assert result.exit_code == 0 @@ -782,7 +841,11 @@ def test_run_with_force_regen(self, invoke_cli, mocker, tmp_path): manifest = Manifest.load(real_paths.manifest) assert manifest.test_case_version == 1 assert manifest.native == {} - assert set(manifest.committed) == {"diagnostic.json", "output.json", "series.json"} + assert set(manifest.committed) == { + "diagnostic.json", + "output.json", + "series.json", + } def test_run_with_existing_baseline(self, invoke_cli, mocker, tmp_path): """Test run command logs when baseline exists.""" @@ -814,7 +877,9 @@ def test_run_with_existing_baseline(self, invoke_cli, mocker, tmp_path): mock_paths.regression = regression_dir mock_result = MagicMock( - successful=True, metric_bundle_filename="metrics.json", output_bundle_filename=None + successful=True, + metric_bundle_filename="metrics.json", + output_bundle_filename=None, ) mock_result.to_output_path.return_value = tmp_path / "metrics.json" mock_runner = MagicMock() @@ -824,7 +889,10 @@ def test_run_with_existing_baseline(self, invoke_cli, mocker, tmp_path): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) mocker.patch("climate_ref_core.testing.load_datasets_from_yaml", return_value=MagicMock()) mocker.patch("climate_ref.testing.TestCaseRunner", return_value=mock_runner) @@ -859,7 +927,10 @@ def test_run_with_only_missing_skips_existing(self, invoke_cli, mocker, tmp_path "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) # With --only-missing, test case should be skipped and exit 0 result = invoke_cli( @@ -887,9 +958,15 @@ def test_run_with_if_changed_skips_unchanged(self, invoke_cli, mocker, tmp_path) "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) # Catalog not changed since regression - mocker.patch("climate_ref_core.testing.catalog_changed_since_regression", return_value=False) + mocker.patch( + "climate_ref_core.testing.catalog_changed_since_regression", + return_value=False, + ) result = invoke_cli( ["test-cases", "run", "--provider", "example", "--if-changed"], @@ -926,8 +1003,14 @@ def test_cmip7_data_returned(self, mock_diagnostic, mock_test_case): with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), patch("climate_ref.datasets.CMIP7DatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=None, + ), ): result, written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -958,9 +1041,18 @@ def test_pmp_climatology_data_returned(self, mock_diagnostic, mock_test_case): with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), - patch("climate_ref.datasets.PMPClimatologyDatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None), + patch( + "climate_ref.datasets.PMPClimatologyDatasetAdapter", + return_value=mock_adapter, + ), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=None, + ), ): result, written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -994,8 +1086,14 @@ def test_force_flag_passed_to_save(self, tmp_path, mock_diagnostic, mock_test_ca with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), patch("climate_ref.datasets.CMIP6DatasetAdapter", return_value=mock_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ), patch("climate_ref_core.testing.save_datasets_to_yaml", return_value=True) as mock_save, ): _fetch_and_build_catalog(mock_diagnostic, mock_test_case, force=True) @@ -1035,10 +1133,22 @@ def test_mixed_source_types(self, mock_diagnostic, mock_test_case): with ( patch("climate_ref_core.esgf.ESGFFetcher", return_value=mock_fetcher), - patch("climate_ref.datasets.CMIP6DatasetAdapter", return_value=mock_cmip6_adapter), - patch("climate_ref.datasets.Obs4MIPsDatasetAdapter", return_value=mock_obs_adapter), - patch("climate_ref.cli.test_cases._solve_test_case", return_value=mock_datasets), - patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None), + patch( + "climate_ref.datasets.CMIP6DatasetAdapter", + return_value=mock_cmip6_adapter, + ), + patch( + "climate_ref.datasets.Obs4MIPsDatasetAdapter", + return_value=mock_obs_adapter, + ), + patch( + "climate_ref.cli.test_cases._solve_test_case", + return_value=mock_datasets, + ), + patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=None, + ), ): result, _written = _fetch_and_build_catalog(mock_diagnostic, mock_test_case) @@ -1156,7 +1266,10 @@ def test_list_with_test_cases_and_paths(self, invoke_cli, mocker, tmp_path): "climate_ref.provider_registry.ProviderRegistry.build_from_config", return_value=mock_registry, ) - mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=mock_paths) + mocker.patch( + "climate_ref_core.testing.TestCasePaths.from_diagnostic", + return_value=mock_paths, + ) result = invoke_cli(["test-cases", "list"]) assert result.exit_code == 0 @@ -1315,7 +1428,14 @@ def test_replay_empty_native_is_hard_failure(self, invoke_cli, mocker, tmp_path) mocker.patch("climate_ref_core.regression.store.build_native_store", return_value=store) result = invoke_cli( - ["test-cases", "replay", "--provider", "example", "--diagnostic", "test-diag"], + [ + "test-cases", + "replay", + "--provider", + "example", + "--diagnostic", + "test-diag", + ], expected_exit_code=1, ) assert "not yet minted" in result.stderr.lower() or "mint" in result.stderr.lower() @@ -1351,7 +1471,14 @@ def test_replay_integrity_mismatch_warns_and_continues(self, invoke_cli, mocker, mocker.patch("climate_ref_core.regression.store.build_native_store", return_value=store) result = invoke_cli( - ["test-cases", "replay", "--provider", "example", "--diagnostic", "test-diag"], + [ + "test-cases", + "replay", + "--provider", + "example", + "--diagnostic", + "test-diag", + ], expected_exit_code=1, ) # The mismatch warned rather than gated, and execution fell through to the native guard. @@ -1402,7 +1529,16 @@ def test_replay_reconciles_integrity_mismatch_within_tolerance(self, invoke_cli, mocker.patch("climate_ref_core.diagnostics.ExecutionDefinition") mocker.patch("climate_ref_core.regression.assert_bundle_regression") - result = invoke_cli(["test-cases", "replay", "--provider", "example", "--diagnostic", "test-diag"]) + result = invoke_cli( + [ + "test-cases", + "replay", + "--provider", + "example", + "--diagnostic", + "test-diag", + ] + ) assert "Replay reconciled committed bundle" in result.stderr assert "equivalent within tolerance" in result.stderr assert "3 bundle file(s)" in result.stderr @@ -1490,3 +1626,276 @@ def test_mint_writes_blobs_and_manifest(self, invoke_cli, mocker, tmp_path): # Each native blob was PUT into the store. for entry in manifest.native.values(): assert store.has(entry.sha256) + + +class TestCIGateCommand: + """Tests for the ``ref test-cases ci-gate`` command.""" + + def _setup( + self, + mocker, + tmp_path, + *, + current_version=None, + committed_content=None, + native=None, + ): + """ + Wire a single example test case for the gate. + + Writes the *current* manifest (when ``current_version`` is set) alongside real + committed files in ``regression/``, so the gate's on-disk integrity check sees a + manifest that faithfully describes the bundle. + The base-branch manifest is left absent by default (``git show`` raises); + a test sets ``repo.git.show`` to supply one, + using the returned committed digests so they line up with the current bundle. + + Parameters + ---------- + current_version + ``test_case_version`` of the current manifest, + or ``None`` to leave the manifest absent (never managed / deleted on this branch). + committed_content + ``{filename: text}`` written to ``regression/`` for the current manifest. + Defaults to a single ``output.json``. + native + Native entries for the current manifest. + + Returns + ------- + : + ``(repo, paths, committed_digests)`` — the mock repo, the test case paths, + and the digests computed from the on-disk committed bundle. + """ + from git import GitCommandError + + from climate_ref_core.regression.manifest import ( + SCHEMA_VERSION, + Manifest, + compute_committed_digests, + ) + from climate_ref_core.testing import TestCasePaths + + mock_tc = MagicMock() + mock_tc.name = "default" + mock_spec = MagicMock(test_cases=[mock_tc]) + mock_diag = MagicMock(slug="test-diag", test_data_spec=mock_spec) + mock_diag.provider = MagicMock(slug="example") + mock_provider = MagicMock(slug="example") + mock_provider.diagnostics.return_value = [mock_diag] + mock_registry = MagicMock(providers=[mock_provider]) + + case_dir = tmp_path / "test-diag" / "default" + case_dir.mkdir(parents=True) + paths = TestCasePaths(root=case_dir) + + committed_digests: dict[str, str] = {} + if current_version is not None: + content = committed_content if committed_content is not None else {"output.json": '{"x": 1}\n'} + paths.regression.mkdir(parents=True, exist_ok=True) + for name, text in content.items(): + (paths.regression / name).write_text(text, encoding="utf-8") + committed_digests = compute_committed_digests(paths.regression) + Manifest( + schema=SCHEMA_VERSION, + test_case_version=current_version, + committed=committed_digests, + native=native or {}, + ).dump(paths.manifest) + + repo = MagicMock() + repo.working_dir = str(tmp_path) + repo.git.diff.return_value = "" + # Base manifest absent by default; tests that need one override repo.git.show. + repo.git.show.side_effect = GitCommandError("show", 128) + + mocker.patch( + "climate_ref.provider_registry.ProviderRegistry.build_from_config", + return_value=mock_registry, + ) + mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=paths) + mocker.patch("climate_ref.cli.test_cases.get_repo_for_path", return_value=repo) + return repo, paths, committed_digests + + @staticmethod + def _set_base(repo, version, committed, native=None): + """Configure ``repo.git.show`` to return a base manifest with these fields.""" + import json as _json + + from attrs import asdict + + from climate_ref_core.regression.manifest import SCHEMA_VERSION + + payload = { + "schema": SCHEMA_VERSION, + "test_case_version": version, + "committed": committed, + "native": {relpath: asdict(entry) for relpath, entry in (native or {}).items()}, + } + repo.git.show.side_effect = None + repo.git.show.return_value = _json.dumps(payload, indent=2, sort_keys=True) + + def test_seeding_replays(self, invoke_cli, mocker, tmp_path): + # Current manifest present with a native baseline, no base manifest -> seeding -> REPLAY. + from climate_ref_core.regression.manifest import NativeEntry + + self._setup(mocker, tmp_path, current_version=1, native={"data.nc": NativeEntry("a" * 64, 10)}) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "replay" in result.output + + def test_seeding_without_native_skips(self, invoke_cli, mocker, tmp_path): + # Current manifest present with an empty native set, no base manifest -> seeding -> SKIP. + self._setup(mocker, tmp_path, current_version=1) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "skip" in result.output + + def test_version_bump_executes(self, invoke_cli, mocker, tmp_path): + repo, _, digests = self._setup(mocker, tmp_path, current_version=2) + self._set_base(repo, 1, digests) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "execute" in result.output + + def test_committed_change_without_bump_fails(self, invoke_cli, mocker, tmp_path): + repo, _, _ = self._setup(mocker, tmp_path, current_version=1) + # Base committed differs from the on-disk bundle, with no version bump. + self._set_base(repo, 1, {"output.json": "f" * 64}) + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + assert "fail" in result.output + + def test_no_change_skips(self, invoke_cli, mocker, tmp_path): + repo, _, digests = self._setup(mocker, tmp_path, current_version=1) + self._set_base(repo, 1, digests) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "skip" in result.output + + def test_core_change_replays(self, invoke_cli, mocker, tmp_path): + repo, _, digests = self._setup(mocker, tmp_path, current_version=1) + self._set_base(repo, 1, digests) + repo.git.diff.return_value = "packages/climate-ref-core/src/climate_ref_core/regression/compare.py" + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "replay" in result.output + + def test_native_change_replays(self, invoke_cli, mocker, tmp_path): + from climate_ref_core.regression.manifest import NativeEntry + + repo, _, digests = self._setup( + mocker, + tmp_path, + current_version=1, + native={"data.nc": NativeEntry("a" * 64, 10)}, + ) + # Committed + version unchanged, but native blob re-minted -> REPLAY. + self._set_base(repo, 1, digests, native={"data.nc": NativeEntry("b" * 64, 12)}) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "replay" in result.output + + def test_committed_integrity_drift_fails(self, invoke_cli, mocker, tmp_path): + repo, paths, digests = self._setup(mocker, tmp_path, current_version=1) + self._set_base(repo, 1, digests) + # Edit the on-disk committed bundle without regenerating the manifest. + (paths.regression / "output.json").write_text('{"x": 999}\n', encoding="utf-8") + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + assert "fail" in result.output + + def test_json_output(self, invoke_cli, mocker, tmp_path): + repo, _, digests = self._setup(mocker, tmp_path, current_version=2) + self._set_base(repo, 1, digests) + + result = invoke_cli(["test-cases", "ci-gate", "--json"]) + assert result.exit_code == 0 + # The test runner mixes loguru's stderr lines into the captured output; in real + # use stdout (the JSON) is captured separately. Slice from the JSON array start. + payload = json.loads(result.output[result.output.index("[") :]) + assert payload[0]["action"] == "execute" + assert payload[0]["case"] == "example/test-diag/default" + + def test_manifest_deletion_fails(self, invoke_cli, mocker, tmp_path): + # Manifest absent on this branch but present on the base -> deletion -> FAIL. + repo, _, _ = self._setup(mocker, tmp_path, current_version=None) + self._set_base(repo, 1, {"output.json": "a" * 64}) + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + assert "fail" in result.output + + def test_never_managed_skips(self, invoke_cli, mocker, tmp_path): + # No current manifest and none on the base -> never managed -> SKIP. + self._setup(mocker, tmp_path, current_version=None) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "skip" in result.output + + def test_not_in_repo_fails(self, invoke_cli, mocker, tmp_path): + self._setup(mocker, tmp_path, current_version=1) + mocker.patch("climate_ref.cli.test_cases.get_repo_for_path", return_value=None) + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + + def test_paths_none_skips(self, invoke_cli, mocker, tmp_path): + # A non-dev checkout: from_diagnostic cannot locate the test case directory. + self._setup(mocker, tmp_path, current_version=1) + mocker.patch("climate_ref_core.testing.TestCasePaths.from_diagnostic", return_value=None) + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "skip" in result.output + + def test_corrupt_current_manifest_fails(self, invoke_cli, mocker, tmp_path): + # A valid manifest is written, then overwritten with garbage. + _, paths, _ = self._setup(mocker, tmp_path, current_version=1) + paths.manifest.write_text("not json{", encoding="utf-8") + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + assert "fail" in result.output + + def test_corrupt_base_manifest_treated_as_seeding(self, invoke_cli, mocker, tmp_path): + # git show returns non-JSON for the base manifest: must not crash; seed (replay). + from climate_ref_core.regression.manifest import NativeEntry + + repo, _, _ = self._setup( + mocker, tmp_path, current_version=1, native={"data.nc": NativeEntry("a" * 64, 10)} + ) + repo.git.show.side_effect = None + repo.git.show.return_value = "}}not json" + + result = invoke_cli(["test-cases", "ci-gate"]) + assert result.exit_code == 0 + assert "replay" in result.output + + def test_catalog_drift_fails(self, invoke_cli, mocker, tmp_path): + # The manifest records a catalog_hash that the on-disk catalog no longer matches. + from climate_ref_core.regression.manifest import Manifest + + _, paths, _ = self._setup(mocker, tmp_path, current_version=1) + manifest = Manifest.load(paths.manifest) + Manifest( + schema=manifest.schema, + test_case_version=manifest.test_case_version, + committed=manifest.committed, + native=manifest.native, + catalog_hash="expected_hash", + ).dump(paths.manifest) + paths.catalog.write_text("_metadata:\n hash: different_hash\ncmip6:\n datasets: []\n") + + result = invoke_cli(["test-cases", "ci-gate"], expected_exit_code=1) + assert result.exit_code == 1 + assert "fail" in result.output