You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature Request: Hierarchical Interrupted Time Series with Staggered Treatment
Scenario
A common setting in product analytics and program evaluation where all units are eventually treated (no untreated control group) at different times:
A company launches 15+ product SKUs over a multi-year window. Every SKU is eventually launched — there is no "never-treated" control group.
The goal is to estimate (a) the population-average launch effect, (b) per-unit launch effects with uncertainty, and (c) the dynamic shape of the effect over time (does impact materialise instantly or ramp up over weeks/months?).
Key features that make this distinct
All units treated, no controls → ITS, not DiD. Standard DiD and staggered DiD (including the Borusyak et al. imputation estimator already in CausalPy) require untreated or not-yet-treated units to construct counterfactuals. When every unit is eventually treated and treatment timing is spread across years, identification relies on the ITS framework: each unit's own pre-treatment trajectory is its counterfactual, with staggered timing providing cross-unit variation.
Staggered treatment times across units. Treatment onset varies by unit (e.g., product launches spread between 2020–2025). The model re-indexes to event time (τ = t − T*_i) so that all units are aligned relative to their treatment date, enabling pooling.
Hierarchical partial pooling across units. Rather than fitting N independent ITS models, a single hierarchical model pools information: units with short post-treatment windows borrow strength from units with longer histories. This is critical when some products launched recently and have limited post-launch data.
Dynamic (non-step-function) treatment effects. Real interventions rarely produce instantaneous level shifts. The effect may ramp up over weeks or months (e.g., logistic uptake as distribution and awareness build). The model should support both:
Nonparametric event-study bins: estimate a separate coefficient δ_k for each post-launch time bin, pooled hierarchically across units — letting the data reveal the shape.
Parametric curves (optional extension): e.g., logistic/sigmoidal ramp with hierarchical parameters (asymptote, midpoint, steepness) per unit.
Pre-trend / placebo testing. Adding pre-launch event-time bins (with the most-negative bin as the omitted reference) enables a direct test of the no-anticipation assumption. Population-level pre-launch coefficients should be indistinguishable from zero.
How this differs from existing CausalPy functionality
CausalPy feature
What it does
Gap
InterruptedTimeSeries
Single unit, step-function treatment onset
No hierarchy, no staggered timing, no dynamic effects
PiecewiseITS
Single unit, step + linear ramp terms
No hierarchy, no staggered timing, only piecewise-linear shapes
StaggeredDifferenceInDifferences
Staggered treatment timing, panel DiD (Borusyak et al.)
Requires untreated/not-yet-treated controls; not applicable when all units are eventually treated with wide timing spread
No single existing issue or PR covers the combination of: ITS identification (no controls needed) + staggered treatment + hierarchical pooling + dynamic effect estimation.
Prior art and published methods
Directly relevant
Bayesian hierarchical ITS / event studies: While not a single canonical paper, the approach combines well-established components. The hierarchical event-study structure follows the logic of Sun & Abraham (2021) ("Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects", Journal of Econometrics), adapted from DiD to an ITS identification strategy. Sun & Abraham highlight the contamination bias in pooled event-study coefficients under heterogeneous effects — the hierarchical structure addresses this by allowing unit-specific dynamic trajectories.
Goodman-Bacon (2021) ("Difference-in-Differences with Variation in Treatment Timing", Journal of Econometrics) documents how staggered adoption creates contaminated two-way fixed effects estimates. While focused on DiD, the decomposition insight applies: with staggered ITS, the event-time alignment avoids the calendar-time confound that plagues pooled regressions.
Callaway & Sant'Anna (2021) ("Difference-in-Differences with Multiple Time Periods", Journal of Econometrics) propose group-time ATT estimation that avoids contamination. Their framework could inspire a cohort-aware extension where launch-year cohorts get separate dynamic trajectories.
Hierarchical time series / partial pooling
Gelman & Hill (2006), Data Analysis Using Regression and Multilevel/Hierarchical Models — foundational reference for partial pooling in panel settings. The non-centred parameterisation used in the prototype (z-trick for lift parameters) is standard practice.
Hierarchical Bayesian structural time series: Brodersen et al. (2015) ("Inferring Causal Impact Using Bayesian Structural Time Series Models", Annals of Applied Statistics) — the CausalImpact approach. Single-unit, but the Bayesian structural time series foundation is relevant. Extending to hierarchical multi-unit is a natural step.
Adstock / saturation functions in MMM literature: Hill (logistic) and geometric adstock functions model delayed, decaying effects. The logistic ramp f(τ) = 1/(1 + exp(-(τ - m)/s)) used in the prototype is structurally identical to a Hill saturation curve parameterised over event time rather than ad spend.
Closest applied examples
Pang et al. (2022) ("A Bayesian Alternative to Synthetic Control for Comparative Case Studies", Political Analysis) — Bayesian approach to comparative case studies without requiring a formal control group, relying on unit's own pre-treatment fit.
Rambachan & Roth (2023) ("A More Credible Approach to Parallel Trends", Review of Economic Studies) — sensitivity analysis for pre-trend violations in event studies. Relevant for robustness checks on the placebo/pre-trend test.
Prototype
Nathaniel Forde's notebook (shared 2026-04-07) demonstrates a working prototype in raw PyMC with three models of increasing complexity:
Hierarchical ITS with step-function lift — partial pooling of launch effects λ_i ~ N(μ_λ, σ_λ) across 15 products. Recovers population-level mean lift (μ_λ = 12.0) and cross-product variation (σ_λ = 4.0) accurately.
Hierarchical event-study with binned dynamic effects — replaces the step function with K event-time bins (4-week bins early, wider bins later). Each bin gets a hierarchical coefficient δ_ik = μ_δk + σ_δk · z_ik. Recovers a logistic ramp-up curve from data generated with gradual onset.
Placebo / pre-trend extension — adds pre-launch event-time bins to test the no-anticipation assumption. Pre-launch coefficients cluster around zero as expected under the simulated DGP.
Known limitations noted in the prototype
Sampler divergences (13–47 depending on model) — needs non-centring of intercepts and tighter priors
Independent residuals (no AR(1) or GP autocorrelation structure)
No smoothness prior across event-time bins (random walk or GP over event time would help)
Composition bias in the open-ended last bin (only long-history products contribute)
Goodman-Bacon / Sun-Abraham contamination concern under heterogeneous dynamics across launch cohorts
Suggested scope for CausalPy
A minimal viable feature could include:
A HierarchicalITS (or StaggeredITS) experiment class that accepts panel data with unit identifiers, treatment dates, and covariates
Event-time alignment (τ = t − T*_i) handled internally
Hierarchical partial pooling on the treatment effect (at minimum a step-function lift, with event-study bins as an option)
Pre-trend / placebo diagnostics via pre-launch event-time bins
Forest plot of per-unit effects and population-level dynamic effect curve
Integration with the existing sensitivity check / pipeline framework
Feature Request: Hierarchical Interrupted Time Series with Staggered Treatment
Scenario
A common setting in product analytics and program evaluation where all units are eventually treated (no untreated control group) at different times:
Key features that make this distinct
All units treated, no controls → ITS, not DiD. Standard DiD and staggered DiD (including the Borusyak et al. imputation estimator already in CausalPy) require untreated or not-yet-treated units to construct counterfactuals. When every unit is eventually treated and treatment timing is spread across years, identification relies on the ITS framework: each unit's own pre-treatment trajectory is its counterfactual, with staggered timing providing cross-unit variation.
Staggered treatment times across units. Treatment onset varies by unit (e.g., product launches spread between 2020–2025). The model re-indexes to event time (τ = t − T*_i) so that all units are aligned relative to their treatment date, enabling pooling.
Hierarchical partial pooling across units. Rather than fitting N independent ITS models, a single hierarchical model pools information: units with short post-treatment windows borrow strength from units with longer histories. This is critical when some products launched recently and have limited post-launch data.
Dynamic (non-step-function) treatment effects. Real interventions rarely produce instantaneous level shifts. The effect may ramp up over weeks or months (e.g., logistic uptake as distribution and awareness build). The model should support both:
Pre-trend / placebo testing. Adding pre-launch event-time bins (with the most-negative bin as the omitted reference) enables a direct test of the no-anticipation assumption. Population-level pre-launch coefficients should be indistinguishable from zero.
How this differs from existing CausalPy functionality
InterruptedTimeSeriesPiecewiseITSStaggeredDifferenceInDifferencesRelationship to open issues and PRs
No single existing issue or PR covers the combination of: ITS identification (no controls needed) + staggered treatment + hierarchical pooling + dynamic effect estimation.
Prior art and published methods
Directly relevant
Bayesian hierarchical ITS / event studies: While not a single canonical paper, the approach combines well-established components. The hierarchical event-study structure follows the logic of Sun & Abraham (2021) ("Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects", Journal of Econometrics), adapted from DiD to an ITS identification strategy. Sun & Abraham highlight the contamination bias in pooled event-study coefficients under heterogeneous effects — the hierarchical structure addresses this by allowing unit-specific dynamic trajectories.
Goodman-Bacon (2021) ("Difference-in-Differences with Variation in Treatment Timing", Journal of Econometrics) documents how staggered adoption creates contaminated two-way fixed effects estimates. While focused on DiD, the decomposition insight applies: with staggered ITS, the event-time alignment avoids the calendar-time confound that plagues pooled regressions.
Callaway & Sant'Anna (2021) ("Difference-in-Differences with Multiple Time Periods", Journal of Econometrics) propose group-time ATT estimation that avoids contamination. Their framework could inspire a cohort-aware extension where launch-year cohorts get separate dynamic trajectories.
Hierarchical time series / partial pooling
Gelman & Hill (2006), Data Analysis Using Regression and Multilevel/Hierarchical Models — foundational reference for partial pooling in panel settings. The non-centred parameterisation used in the prototype (z-trick for lift parameters) is standard practice.
Hierarchical Bayesian structural time series: Brodersen et al. (2015) ("Inferring Causal Impact Using Bayesian Structural Time Series Models", Annals of Applied Statistics) — the CausalImpact approach. Single-unit, but the Bayesian structural time series foundation is relevant. Extending to hierarchical multi-unit is a natural step.
Dynamic treatment effects / ramp-up modeling
Transfer function models: Box-Tiao intervention analysis with gradual onset functions (step, ramp, exponential decay) — classical time series econometrics. PR Transfer-Function ITS: Graded Interventions with Saturation & Adstock Transforms #548 draws on this tradition.
Adstock / saturation functions in MMM literature: Hill (logistic) and geometric adstock functions model delayed, decaying effects. The logistic ramp
f(τ) = 1/(1 + exp(-(τ - m)/s))used in the prototype is structurally identical to a Hill saturation curve parameterised over event time rather than ad spend.Closest applied examples
Pang et al. (2022) ("A Bayesian Alternative to Synthetic Control for Comparative Case Studies", Political Analysis) — Bayesian approach to comparative case studies without requiring a formal control group, relying on unit's own pre-treatment fit.
Rambachan & Roth (2023) ("A More Credible Approach to Parallel Trends", Review of Economic Studies) — sensitivity analysis for pre-trend violations in event studies. Relevant for robustness checks on the placebo/pre-trend test.
Prototype
Nathaniel Forde's notebook (shared 2026-04-07) demonstrates a working prototype in raw PyMC with three models of increasing complexity:
Hierarchical ITS with step-function lift — partial pooling of launch effects λ_i ~ N(μ_λ, σ_λ) across 15 products. Recovers population-level mean lift (μ_λ = 12.0) and cross-product variation (σ_λ = 4.0) accurately.
Hierarchical event-study with binned dynamic effects — replaces the step function with K event-time bins (4-week bins early, wider bins later). Each bin gets a hierarchical coefficient δ_ik = μ_δk + σ_δk · z_ik. Recovers a logistic ramp-up curve from data generated with gradual onset.
Placebo / pre-trend extension — adds pre-launch event-time bins to test the no-anticipation assumption. Pre-launch coefficients cluster around zero as expected under the simulated DGP.
Known limitations noted in the prototype
Suggested scope for CausalPy
A minimal viable feature could include:
HierarchicalITS(orStaggeredITS) experiment class that accepts panel data with unit identifiers, treatment dates, and covariates