|
| 1 | +--- |
| 2 | +name: causal-detective |
| 3 | +description: Structured falsification process for challenging causal claims. Guides users through questioning counterfactuals, hunting confounders, checking bias direction, and running computational falsification tests with CausalPy. Use when validating whether a causal effect is real or when a user says "is this effect real?" or "can I trust this result?" |
| 4 | +--- |
| 5 | + |
| 6 | +# Causal Detective |
| 7 | + |
| 8 | +A structured investigation process for challenging causal claims — inspired by the detective metaphor from *The Causal Mindset Handbook* (Gallea, 2026). Like a police detective eliminating alibis, we systematically rule out alternative explanations until only the causal claim remains (or doesn't). |
| 9 | + |
| 10 | +This skill combines **critical thinking** (the 5-step Causal Mindset Framework) with **computational falsification** (CausalPy's 11 sensitivity and diagnostic checks). |
| 11 | + |
| 12 | +## The Investigation Process |
| 13 | + |
| 14 | +### Phase 1: Frame the Claim |
| 15 | + |
| 16 | +Before touching any code, establish what's being claimed. |
| 17 | + |
| 18 | +**Questions to answer:** |
| 19 | +- What is the causal claim? (X causes Y) |
| 20 | +- What is the treatment/intervention? |
| 21 | +- What is the outcome being measured? |
| 22 | +- What is the proxy counterfactual being used? (What's being compared?) |
| 23 | +- How far is this proxy from the ideal "parallel world"? |
| 24 | + |
| 25 | +**Output:** A clear statement: *"We claim that [treatment] caused [outcome], using [counterfactual] as our comparison."* |
| 26 | + |
| 27 | +See [Counterfactual Analysis](reference/counterfactual_analysis.md) for guidance on evaluating proxy counterfactuals. |
| 28 | + |
| 29 | +### Phase 2: Hunt for Alternative Explanations |
| 30 | + |
| 31 | +This is the core detective work. For each question, generate concrete hypotheses. |
| 32 | + |
| 33 | +**Question 1: "Is there something else?"** (Confounders) |
| 34 | +- What third variables affect BOTH the treatment and outcome? |
| 35 | +- Draw the causal graph — does any variable have arrows pointing to both X and Y? |
| 36 | +- Is there selection bias? Are treated and untreated groups systematically different? |
| 37 | +- Could there be measurement error that correlates with treatment? |
| 38 | + |
| 39 | +**Question 2: "Could it be the reverse?"** (Reverse causation) |
| 40 | +- Could Y be causing X instead of X causing Y? |
| 41 | +- Could there be simultaneity — X and Y reinforcing each other? |
| 42 | +- Does the timeline make sense? (cause must precede effect) |
| 43 | + |
| 44 | +**Question 3: "What is the direction of bias?"** |
| 45 | +- If confounders exist, do they inflate or shrink the estimated effect? |
| 46 | +- Is the bias likely to make the effect look bigger (upward bias) or smaller (attenuation bias)? |
| 47 | +- Could measurement error be systematically linked to the treatment? |
| 48 | + |
| 49 | +See [Threat Catalog](reference/threat_catalog.md) for the full list of threats with examples. |
| 50 | + |
| 51 | +### Phase 3: Design Falsification Tests |
| 52 | + |
| 53 | +Translate each alternative explanation into a testable prediction, then test it with CausalPy. The logic: if the alternative explanation is true, we should observe a specific pattern in the data. If we don't observe it, we can rule it out. |
| 54 | + |
| 55 | +| Alternative Explanation | Falsification Test | CausalPy Check | |
| 56 | +|---|---|---| |
| 57 | +| Effect existed before treatment | Test for pre-treatment effects | `PreTreatmentPlaceboCheck` | |
| 58 | +| Model picks up noise, not signal | Shift treatment time to placebo periods | `PlaceboInTime` | |
| 59 | +| Effect is an artifact of one donor/unit | Remove units one at a time | `LeaveOneOut`, `PlaceboInSpace` | |
| 60 | +| Wrong outcome is being affected | Test on an outcome that should NOT change | `OutcomeFalsification` | |
| 61 | +| Result sensitive to modeling choices | Vary bandwidth, priors, specifications | `BandwidthSensitivity`, `PriorSensitivity` | |
| 62 | +| Effect doesn't persist | Check if effect fades or reverses | `PersistenceCheck` | |
| 63 | +| Manipulation at RD threshold | Test for density discontinuity | `McCraryDensityTest` | |
| 64 | +| Treated unit outside donor range | Check convex hull | `ConvexHullCheck` | |
| 65 | + |
| 66 | +See [Falsification Tests](reference/falsification_tests.md) for implementation patterns. |
| 67 | + |
| 68 | +### Phase 4: Evaluate the Evidence |
| 69 | + |
| 70 | +After running tests, assess the overall strength of the causal claim. |
| 71 | + |
| 72 | +**For each alternative explanation:** |
| 73 | +- Was it testable? If not, discuss it qualitatively. |
| 74 | +- Was the falsification test passed? (alternative ruled out) |
| 75 | +- Was the falsification test failed? (alternative NOT ruled out — threat remains) |
| 76 | +- How confident are we? (check `p_effect_outside_null`, effect sizes, credible intervals) |
| 77 | + |
| 78 | +**Overall verdict categories:** |
| 79 | +- **Strong evidence**: All major alternatives ruled out, effect robust to specification changes |
| 80 | +- **Moderate evidence**: Some alternatives ruled out, effect stable but some threats remain |
| 81 | +- **Weak evidence**: Key alternatives not tested or not ruled out |
| 82 | +- **Falsified**: One or more tests indicate the effect is likely an artifact |
| 83 | + |
| 84 | +### Phase 5: Assess Generalizability |
| 85 | + |
| 86 | +Even if the effect is real in this context, can we extrapolate? |
| 87 | + |
| 88 | +- **Across populations**: Would the effect hold for different groups? (age, geography, demographics) |
| 89 | +- **Across time**: Was this a one-time context or a stable relationship? |
| 90 | +- **Across scale**: If we increase/decrease the treatment, is the effect linear? |
| 91 | +- **Across contexts**: Different market, different policy environment, different culture? |
| 92 | + |
| 93 | +## Quick Reference: The 5 Questions |
| 94 | + |
| 95 | +These questions can be asked of ANY causal claim, anywhere: |
| 96 | + |
| 97 | +1. **What is the counterfactual?** — What is being compared, and how far is it from the ideal? |
| 98 | +2. **Is there something else?** — What confounders, mediators, or colliders exist? |
| 99 | +3. **Could it be the reverse?** — Is the direction of causation correct? |
| 100 | +4. **What is the direction of bias?** — Is the effect over- or under-estimated? |
| 101 | +5. **Can we extrapolate?** — Does this result generalize beyond this specific context? |
| 102 | + |
| 103 | +## Agents |
| 104 | + |
| 105 | +Three specialized agents support this workflow: |
| 106 | + |
| 107 | +- **threat-assessor** — Identifies confounders, reverse causation, and other threats to a causal claim through structured questioning |
| 108 | +- **falsification-runner** — Designs and executes computational falsification tests using CausalPy checks |
| 109 | +- **evidence-synthesizer** — Weighs all evidence and produces a final verdict on the causal claim |
| 110 | + |
| 111 | +## References |
| 112 | + |
| 113 | +| Reference | Contents | |
| 114 | +|---|---| |
| 115 | +| [Counterfactual Analysis](reference/counterfactual_analysis.md) | Evaluating proxy counterfactuals and the "parallel world" key | |
| 116 | +| [Threat Catalog](reference/threat_catalog.md) | All threats to causal claims with detection strategies | |
| 117 | +| [Falsification Tests](reference/falsification_tests.md) | CausalPy checks mapped to alternative explanations | |
0 commit comments