Skip to content

Commit d14bfc5

Browse files
committed
Enforce single top-level heading in docs notebooks and add pr-review skill
Closes #863 by extending scripts/validate_notebooks.py with a single-H1 check scoped to docs/source/notebooks/. The check ignores code cells and fenced code blocks inside markdown cells, so Python comments cannot trigger false positives. Tests cover valid notebooks, multi-H1 / no-H1 violations, code-cell comments, fenced-code-block comments, the out-of-scope exemption, and a regression test that walks every shipped docs notebook. Also adds a new .github/skills/pr-review/ skill — orchestrator SKILL.md plus six focused reference files — capturing the review patterns distilled from the PR #826 review iteration: workflow, severity-sorted look-fors, CausalPy code conventions, docs/notebook conventions, comment-drafting templates, and an extension/pruning protocol. Made-with: Cursor
1 parent deb8774 commit d14bfc5

9 files changed

Lines changed: 1079 additions & 3 deletions

File tree

.github/skills/pr-review/SKILL.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
name: pr-review
3+
description: Review CausalPy pull requests — fetch context, run a structured pass over code, tests, docs, and process, then produce maintainer-quality comments for human review before posting. Use when the user asks to review a PR, check a PR, evaluate review feedback, or audit PR comments. Complements pr-to-green (which fixes PRs) and pr-workflows (which creates them).
4+
---
5+
6+
# PR Review
7+
8+
This skill is for **reviewing** a CausalPy pull request — diagnosing what's there, judging it against project conventions, and producing actionable feedback. It is the read/diagnose/write-comment counterpart to `pr-to-green` (fix in-flight PR) and `pr-workflows` (create PR from issue).
9+
10+
## When to use
11+
12+
- The user asks to review, check, or evaluate a PR (e.g. "review #826", "look at PR 826", "what's outstanding on PR 826").
13+
- The user asks whether a PR is ready to approve, merge, or land.
14+
- The user asks to evaluate a contributor's response to prior review feedback ("did they address my comments?").
15+
- The user asks to audit existing comments on a PR for relevance, accuracy, or remaining gaps.
16+
17+
Do **not** use this skill for:
18+
19+
- Fixing a failing PR — use [`pr-to-green`](../pr-to-green/SKILL.md).
20+
- Turning an issue into a PR — use [`pr-workflows`](../pr-workflows/SKILL.md).
21+
- Creating new issues that come out of a review — delegate to [`github-issues`](../github-issues/SKILL.md).
22+
23+
## Division of labour with automation
24+
25+
A pre-commit hook never misses; an agent review is judgement-heavy. Don't duplicate the hook's job. Mechanical checks already enforced by `prek` / CI (lint, format, mypy, schema validation, single-H1-per-notebook, etc.) are not part of this skill — assume they pass and focus the human-equivalent attention on what they can't catch.
26+
27+
If during review you notice a *class* of issue that automation could catch but doesn't, file it as a follow-up issue via [`github-issues`](../github-issues/SKILL.md) rather than only flagging the one instance.
28+
29+
## Workflow
30+
31+
The full workflow is in [reference/workflow.md](reference/workflow.md). At a glance:
32+
33+
1. **Establish context** — fetch PR metadata, all commits, all reviews, all comments (review + issue threads), file changes.
34+
2. **Read the relevant subset** — files changed, plus surrounding context for each change (base class for new subclass, sibling tests for new tests, etc.).
35+
3. **Pass over the diff with the patterns checklist** — see [reference/what-to-look-for.md](reference/what-to-look-for.md). Group findings by severity.
36+
4. **Verify claims** — if the contributor says "addressed all feedback", confirm against the actual code; if they say a rebase is clean, confirm against `git log`.
37+
5. **Draft comments for human review** — never post directly. See [reference/posting-comments.md](reference/posting-comments.md).
38+
6. **Post on approval** and report URLs back.
39+
40+
For batched PR work (multi-file diffs, parallel exploration), follow `pr-to-green`'s subagent delegation patterns — same `Task` triggers (`generalPurpose` for exploration, `ci-failure-investigator` for noisy CI logs) apply.
41+
42+
## What to look for
43+
44+
Severity-sorted patterns (must-fix → should-fix → nits) live in [reference/what-to-look-for.md](reference/what-to-look-for.md). The two domain-specific deep-dives are:
45+
46+
- [reference/code-patterns.md](reference/code-patterns.md)`BaseExperiment` contract, `PyMCModel` vs. `RegressorMixin` dispatch, `_clone()` patterns, `Literal` for constrained strings, custom exceptions, `__repr__` style, memory-heavy retainers.
47+
- [reference/docs-patterns.md](reference/docs-patterns.md) — notebook narrative quality, glossary linking, `:::{note}` admonitions, hide-input/hide-output cell tags, citations, sampler-output noise, helper-promotion judgement.
48+
49+
## Drafting comments
50+
51+
The full template is in [reference/posting-comments.md](reference/posting-comments.md). Two non-negotiable rules:
52+
53+
1. **Always draft for human review before posting.** Read the draft back and offer 2–3 framing variants (stricter / softer / approval-pending).
54+
2. **Cite specific code locations.** Use line:line:filepath references when calling out code, not vague paraphrases.
55+
56+
## Mandatory pre-flight reads
57+
58+
Before reviewing any PR, the reviewing agent should have read or be able to load:
59+
60+
- `AGENTS.md` (root) — code structure, style, type hints, testing preferences.
61+
- `CONTRIBUTING.md` — process expectations.
62+
- `docs/source/notebooks/index.md` — toctree layout, where new notebooks should go.
63+
- The PR's own description and any prior review iterations.
64+
65+
## How to extend this skill
66+
67+
When a review surfaces a new recurring pattern, add it. The process is in [reference/how-to-extend.md](reference/how-to-extend.md). Skill drift is real — every six months or so, prune patterns that have been formalised into hooks.
68+
69+
## Important behaviour
70+
71+
- Never post review comments without explicit human approval.
72+
- Never approve, request changes, or merge a PR through `gh pr review --approve` unless explicitly instructed.
73+
- When evaluating a contributor's response to prior feedback, verify against the code, not their summary. A "fixed all feedback" claim is a hypothesis to test, not a fact to accept.
74+
- If a reviewer-bot identity has been used in the past on the PR (e.g. `claude-opus-4-7-xhigh`), preserve that attribution in posted comments so the human reviewer's voice and the agent's voice stay distinguishable.
75+
- Do not duplicate work already covered by hooks/CI. If something is mechanically enforceable and not enforced, file an issue rather than re-flagging the instance.
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# CausalPy Code Patterns
2+
3+
Repo-specific conventions and contracts to check against when reviewing source-code diffs. These are the patterns most likely to be inadvertently broken by a contributor (especially a new one) because they're enforced by convention rather than by the type system.
4+
5+
## `BaseExperiment` contract
6+
7+
All experiment classes in `causalpy/experiments/` inherit from `BaseExperiment`. Required:
8+
9+
- Class attributes `supports_ols: bool` and `supports_bayes: bool` declared.
10+
- If `supports_bayes`: implement `_bayesian_plot()` and `get_plot_data_bayesian()`.
11+
- If `supports_ols`: implement `_ols_plot()` and `get_plot_data_ols()`.
12+
- Model-agnostic dispatch via `isinstance(self.model, PyMCModel)` vs. `isinstance(self.model, RegressorMixin)`.
13+
- Data index named `"obs_ind"`.
14+
- Formula parsing via patsy `dmatrices()`.
15+
- Custom exceptions from `causalpy.custom_exceptions`: `FormulaException`, `DataException`, `BadIndexException`.
16+
17+
Review prompts:
18+
19+
- New `BaseExperiment` subclass? Check both class attributes are declared.
20+
- Implements only `_bayesian_plot` but `supports_ols=True`? That's a bug.
21+
- Uses `assert` for input validation? Should be a custom exception.
22+
23+
## `PyMCModel` contract
24+
25+
PyMC models inherit from `PyMCModel` (extends `pm.Model`). Common interface: `fit()`, `predict()`, `score()`, `calculate_impact()`, `print_coefficients()`.
26+
27+
- Data: `xarray.DataArray` with coords (`coeffs`, `obs_ind`, `treated_units`).
28+
- Stores user-supplied priors on `self._user_priors` for round-trip via `_clone()`.
29+
30+
## `_clone()` pattern
31+
32+
The base `PyMCModel._clone()` forwards `priors=self._user_priors`. Every override **must** do the same, or user customisations are silently dropped on clone.
33+
34+
Spot-check template:
35+
36+
```python
37+
# In the new subclass:
38+
def _clone(self):
39+
return type(self)(
40+
sample_kwargs=self.sample_kwargs,
41+
priors=self._user_priors, # <-- must be present
42+
# ... any other config kwargs the subclass adds ...
43+
)
44+
```
45+
46+
If `priors=` is absent, that's a must-fix (MF-1 in [what-to-look-for.md](what-to-look-for.md)).
47+
48+
## `RegressorMixin` (sklearn) compatibility
49+
50+
Scikit-learn models use `RegressorMixin` and are made causalpy-compatible via `create_causalpy_compatible_class()`. Same external interface as `PyMCModel`. Data is numpy arrays, not xarray.
51+
52+
Review prompt: experiment classes should branch with `isinstance(self.model, PyMCModel)` vs. `isinstance(self.model, RegressorMixin)` rather than feature-detect on the model.
53+
54+
## `CheckResult` contract
55+
56+
Lives in `causalpy/checks/base.py`. Has a documented `figures: list[Any]` field — but as of writing, no check populates it. If a new check defines a non-trivial plotting helper in a notebook (see [what-to-look-for.md § N-7](what-to-look-for.md#n-7-helper-used-n-times-in-a-notebook-is-library-material)), promoting it to populate `CheckResult.figures` is the natural integration point.
57+
58+
`passed=None` is the right design for informational-only checks (e.g. `OutcomeFalsification`) — they should inform, not gate.
59+
60+
## Type hints
61+
62+
`AGENTS.md` is explicit. Required style (Python 3.10+):
63+
64+
- `X | None`, not `Optional[X]`.
65+
- Lowercase `dict`, `list`, `tuple`, not `Dict`, `List`, `Tuple` from `typing`.
66+
- `Literal` for constrained string parameters (see [what-to-look-for.md § MF-2](what-to-look-for.md#mf-2-constrained-string-parameter-not-typed-as-literal)).
67+
68+
## Custom exceptions
69+
70+
Use, in order of preference:
71+
72+
- `FormulaException` — patsy/formula problems.
73+
- `DataException` — data-shape, missing-column, dtype problems.
74+
- `BadIndexException` — index issues.
75+
76+
Plain `ValueError` is acceptable when nothing more specific fits; `assert` is not.
77+
78+
## `__repr__` style
79+
80+
Convention: show non-default values only.
81+
82+
```python
83+
def __repr__(self) -> str:
84+
parts = [f"alpha={self.alpha}"] if self.alpha != 0.05 else []
85+
if not self.store_experiments:
86+
parts.append("store_experiments=False")
87+
return f"OutcomeFalsification({', '.join(parts)})"
88+
```
89+
90+
A new class with a `__repr__` that shows defaults is inconsistent with sibling classes — flag as N-1.
91+
92+
## Helper promotion
93+
94+
When a docs notebook defines a non-trivial helper (≥50 lines) called ≥3 times, suggest promoting to the library. Standard cleanup checklist before promotion:
95+
96+
1. Drop UI side-effects: no `print()` in library code (use `warnings.warn`); no `plt.show()` (return the `Figure`).
97+
2. Drop hard-coded constants that limit reuse: e.g. `FOLD_COLORS = [...5 colors...]` should become a module-level constant or use matplotlib's color cycle.
98+
3. Add `figsize` and `axes` kwargs so the function is composable in user code and unit-testable.
99+
4. Reduce arg surface: if two args are always paired, fold one into the other (or store the dependency on the producing class so the consumer takes one arg, not two).
100+
5. Add a tiny test that asserts shape: `assert isinstance(out, Figure)`, `assert len(out.axes) == expected`.
101+
102+
## Memory-heavy retainers
103+
104+
`InferenceData` is large. Any class that retains it by default imposes a hidden cost. Default to summary-only with an opt-in (`store_X=True`) for users who want to inspect.
105+
106+
## Backwards compatibility
107+
108+
`AGENTS.md`: maintain compatibility for previously-released APIs only; APIs introduced in the same PR can be reshaped freely. So new public APIs (added in the PR being reviewed) are legitimate review targets for "I'd rather it look like X" feedback; APIs that already shipped are not.
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# CausalPy Docs Patterns
2+
3+
Repo-specific conventions for notebooks (`docs/source/notebooks/`) and knowledgebase pages (`docs/source/knowledgebase/`). The mechanical rules (single H1, schema validation) are enforced by hooks; this file is for the judgement-heavy patterns a hook can't catch.
4+
5+
## Notebook structure
6+
7+
### Single H1 — title only
8+
9+
Every notebook must have exactly one `# ` markdown heading (the page title). All other section headings are `##` or below. (Mechanical check planned, see issue #863.) During review:
10+
11+
- If you see multiple `#` headings, that's a hook job — but flag it on the PR if the hook hasn't landed yet.
12+
- If you see no `#` heading at all, the notebook will render with the filename as title, which is usually wrong.
13+
14+
### `toctree` placement
15+
16+
A new notebook in `docs/source/notebooks/` must be referenced from somewhere — typically `docs/source/notebooks/index.md`. Cross-link from related prose pages (e.g. `sensitivity_checks.md`) where applicable.
17+
18+
Section assignment is a curation question:
19+
20+
- ITS examples → `Interrupted Time Series` toctree.
21+
- DiD examples → `Difference in Differences`.
22+
- New method introduction → its own toctree section.
23+
- Workflow / cross-cutting → `Workflow` section.
24+
25+
If the placement isn't obvious, ask the contributor to confirm rather than guessing.
26+
27+
## Notebook content
28+
29+
### Pedagogical structure
30+
31+
Good notebooks have a recognisable arc: problem → method → data → analysis → interpretation → caveats. Reviews should call out when the arc is missing or scrambled.
32+
33+
When evaluating pedagogy, useful prompts:
34+
35+
- Can a reader who knows causal inference but not CausalPy follow it?
36+
- Is the *causal question* stated before the method is introduced?
37+
- Are caveats and assumptions called out, not buried?
38+
- Does the conclusion answer the question posed at the top?
39+
40+
### Cell tags
41+
42+
Two tags matter for readability:
43+
44+
- **`hide-input`** on plotting cells — collapses the matplotlib boilerplate so the rendered docs show the plot output prominently.
45+
- **`hide-output`** on cells with verbose output (sampler progress, large tables, debug prints) — collapses the noise while keeping the cell runnable.
46+
47+
Review prompt: any cell that produces ≥10 lines of sampler/progress output without `hide-output` is a candidate for the tag.
48+
49+
### Sampler warnings
50+
51+
Visible sampler warnings (divergences, low ESS, R̂ > 1.01) in a docs notebook are confusing for readers. Two fixes:
52+
53+
1. Tune the sampler config (more chains, more tuning steps) to make the warnings go away genuinely.
54+
2. Hide them under `hide-output` if the warnings aren't a core part of the narrative.
55+
56+
(1) is preferred when feasible. (2) is acceptable when the analysis is fundamentally about a borderline case.
57+
58+
### External data
59+
60+
Notebooks should prefer bundled CausalPy example datasets (loaded via `cp.load_data(...)`) over runtime fetches from external URLs. If an external fetch is unavoidable, document it with a comment so readers know what they're depending on.
61+
62+
## MyST and cross-references
63+
64+
### Glossary linking
65+
66+
`AGENTS.md`: link to glossary terms (defined in `glossary.rst`) on first mention in a file.
67+
68+
- Markdown / `.ipynb`: `` {term}`glossary term` ``
69+
- RST: `` :term:`glossary term` ``
70+
71+
If a notebook introduces concepts like "treatment effect", "potential outcomes", "propensity score", etc., the first mention should be a glossary link. Subsequent mentions don't need to link.
72+
73+
### Cross-references
74+
75+
In Markdown, use MyST role syntax:
76+
77+
- `` {doc}`path/to/doc` `` — link to another docs page.
78+
- `` {ref}`label-name` `` — link to a labelled section.
79+
80+
Don't use raw markdown links (`[text](path/to/doc.md)`) for cross-doc references — MyST roles get full Sphinx resolution and break-link detection.
81+
82+
### Admonitions
83+
84+
Use MyST `:::{note}`, `:::{warning}`, `:::{important}` for callouts. Plain bold/italic emphasis is not the same — admonitions render as styled boxes.
85+
86+
### Citations
87+
88+
Citations live in `docs/source/references.bib`. Cite sources in example notebooks where possible. Include a reference section at the bottom of notebooks using:
89+
90+
```markdown
91+
:::{bibliography}
92+
:filter: docname in docnames
93+
:::
94+
```
95+
96+
If a new notebook has prose claims like "as shown in Smith et al. (2020)" without a corresponding `references.bib` entry and `:::{bibliography}` block, flag it.
97+
98+
## Naming and organisation
99+
100+
### Notebook filenames
101+
102+
Pattern: `{method}_{model}.ipynb`.
103+
104+
- `did_pymc.ipynb`, `rd_skl.ipynb`, `iv_pymc.ipynb`, `sc_pymc_brexit.ipynb`.
105+
- Composite topics: keep the prefix scheme so `:glob:` patterns would still work even though no section uses one today.
106+
107+
### File placement
108+
109+
- How-to examples → `docs/source/notebooks/`.
110+
- Educational / conceptual content → `docs/source/knowledgebase/`.
111+
- Scratch / temporary → `.scratch/` (untracked).
112+
113+
A notebook that's mostly explaining a concept (rather than demonstrating CausalPy usage) probably belongs in `knowledgebase/`, not `notebooks/`.
114+
115+
## API docs
116+
117+
Auto-generated from docstrings via Sphinx autodoc. No manual API docs to write — but the docstrings themselves matter:
118+
119+
- NumPy-style sections (`Parameters`, `Returns`, `Examples`).
120+
- `Examples` blocks should be runnable doctests where feasible (`make doctest` in CI).
121+
122+
If a new public function has no `Examples` block, that's a should-fix, not a nit, because the API docs page will be terse.

0 commit comments

Comments
 (0)