Skip to content

Commit 955a93d

Browse files
Add PiecewiseITS experiment for known interruption dates (#614)
* initial commit - MVP * Expand Piecewise ITS notebook with new examples and explanations Added detailed explanations comparing Piecewise ITS to Regression Discontinuity and Regression Kink designs. Introduced new real-world scenarios for level and slope changes, multiple interventions, and level-only models. Enhanced example code and output to illustrate these cases, improving clarity and practical guidance for users. * Revise and clarify Piecewise ITS notebook explanations Improved clarity and conciseness throughout the Piecewise Interrupted Time Series (ITS) notebook. Rewrote several sections for better readability, combined and streamlined example scenarios, and clarified distinctions between level and slope changes, as well as the relationship to regression discontinuity and regression kink designs. * Refactor PiecewiseITS to use patsy step/ramp stateful transforms Refactors the PiecewiseITS experiment to use flexible patsy formulas with new stateful step() and ramp() transforms for specifying level and slope changes at interventions. Adds the causalpy.transforms module with robust, datetime-aware step/ramp transforms, updates tests to cover new formula interface and transform behavior, and improves documentation and error handling. This enables more flexible modeling of multiple interventions and supports both numeric and datetime time columns. * Expand PiecewiseITS notebook with formula API details Added a new section describing the formula-based API for PiecewiseITS, including explanations of the custom step() and ramp() transforms, usage examples, and clarification on how the counterfactual is computed. This improves documentation clarity and helps users understand flexible model specification. * Add effect_summary compatibility to PiecewiseITS Implemented creation of post_impact, datapost, and post_pred attributes in PiecewiseITS for compatibility with effect_summary() from BaseExperiment. Added tests to verify effect_summary works for both OLS and PyMC models and that the new attributes are correctly created. * Clarify usage of step and ramp transforms in docs * Enhance piecewise ITS notebook with math and intro code Added mathematical definitions for step and ramp functions using LaTeX for clarity, and moved import/setup code to the top of the notebook for better organization. Improved explanations of function arguments and removed duplicate import cell. * Expand and clarify Piecewise ITS notebook introduction The introductory markdown in the piecewise_its_pymc.ipynb notebook has been significantly expanded and reorganized. The new content provides clearer explanations of when to use Piecewise ITS, the distinction between level and slope changes, the mathematical model, and its relationship to regression discontinuity and regression kink designs. Redundant sections were removed and a more structured, didactic flow was introduced. * Clarify level and slope change concepts in notebook Expanded explanations of level and slope changes in piecewise ITS, referencing a new illustrative figure. Added a code cell to display the figure, and clarified the description of multiple interventions for improved instructional clarity. * Add model formula table to piecewise ITS notebook Inserted a markdown cell with a table summarizing model formulas for single and two intervention cases, covering level, slope, and combined effects. This provides clearer guidance on specifying models for each panel in the notebook. * Revise and restructure piecewise ITS notebook intro Condenses and reorganizes introductory explanations for piecewise interrupted time series (ITS), splitting out key concepts, model details, and comparisons to related methods into clearer, more focused sections. Adds collapsible dropdowns and card formatting for scenario examples, and improves clarity and flow for users learning the model and its API. * Add extensive coverage tests for PiecewiseITS Adds a comprehensive suite of tests for the PiecewiseITS class, including class and instance attribute checks, formula parsing, plotting, PyMC integration, counterfactuals, data generation, and error handling. Also updates the interrogate badge to reflect increased coverage. * Add references and citations for piecewise ITS notebook Added detailed references and in-text citations to the piecewise_its_pymc.ipynb notebook to support methodological explanations. Updated the references.bib file with key literature on segmented regression and interrupted time series analysis. Improved clarity on model parameterization and corrected the references section to use the Sphinx bibliography directive. * run pre-commit checks * Fail fast on PiecewiseITS threshold parsing. Enforce a single step/ramp time variable and canonicalize interruption thresholds at initialization so invalid mixed-variable or malformed thresholds fail early with clear errors. Co-authored-by: Cursor <cursoragent@cursor.com> * Implement effect_summary for PiecewiseITS. Add PiecewiseITS effect_summary support for both OLS and PyMC paths using shared reporting helpers so PiecewiseITS no longer falls back to BaseExperiment's NotImplementedError. Co-authored-by: Cursor <cursoragent@cursor.com> * Increase PiecewiseITS patch coverage for effect summary. Add a focused test for unsupported period handling in PiecewiseITS effect_summary and simplify OLS array conversion logic to keep the new effect summary path fully exercised. Co-authored-by: Cursor <cursoragent@cursor.com> * Increase patch coverage for transforms and PiecewiseITS validation Add tests for: datetime Series (non-Index) paths, chunked memorize_chunk origin tracking, pd.Timestamp threshold parsing, step/ramp variable not in data, and non-numeric/non-datetime time column validation. Made-with: Cursor * Add _default_model_class and minor plot fix for BaseExperiment compatibility After rebase onto main, PiecewiseITS needs _default_model_class = LinearRegression to comply with the enforced BaseExperiment abstract methods (#694). Also simplify ax[1 + 1] to ax[2]. Made-with: Cursor --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 10a3b0b commit 955a93d

10 files changed

Lines changed: 4477 additions & 3 deletions

File tree

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ repos:
2424
- id: check-toml
2525
- id: check-json
2626
- id: check-added-large-files
27-
exclude: &exclude_pattern '(iv_weak_instruments|its_lift_test)\.ipynb'
27+
exclude: &exclude_pattern '(iv_weak_instruments|its_lift_test|piecewise_its_pymc)\.ipynb'
2828
args: ["--maxkb=1500"]
2929
- id: check-merge-conflict
3030
- id: check-case-conflict

causalpy/__init__.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,15 @@
1616
import causalpy.skl_models as skl_models
1717
import causalpy.variable_selection_priors as variable_selection_priors
1818
from causalpy.skl_models import create_causalpy_compatible_class
19+
from causalpy.transforms import ramp, step
1920
from causalpy.version import __version__
2021

2122
from .data import load_data
2223
from .experiments.diff_in_diff import DifferenceInDifferences
2324
from .experiments.instrumental_variable import InstrumentalVariable
2425
from .experiments.interrupted_time_series import InterruptedTimeSeries
2526
from .experiments.inverse_propensity_weighting import InversePropensityWeighting
27+
from .experiments.piecewise_its import PiecewiseITS
2628
from .experiments.prepostnegd import PrePostNEGD
2729
from .experiments.regression_discontinuity import RegressionDiscontinuity
2830
from .experiments.regression_kink import RegressionKink
@@ -32,19 +34,22 @@
3234

3335
__all__ = [
3436
"__version__",
35-
"DifferenceInDifferences",
3637
"create_causalpy_compatible_class",
38+
"DifferenceInDifferences",
3739
"extract_lift_for_mmm",
3840
"InstrumentalVariable",
3941
"InterruptedTimeSeries",
4042
"InversePropensityWeighting",
4143
"load_data",
44+
"PiecewiseITS",
4245
"PrePostNEGD",
4346
"pymc_models",
47+
"ramp",
4448
"RegressionDiscontinuity",
4549
"RegressionKink",
4650
"skl_models",
4751
"StaggeredDifferenceInDifferences",
52+
"step",
4853
"SyntheticControl",
4954
"variable_selection_priors",
5055
]

causalpy/data/simulate_data.py

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -645,3 +645,180 @@ def generate_staggered_did_data(
645645

646646
df = pd.DataFrame(rows)
647647
return df
648+
649+
650+
def generate_piecewise_its_data(
651+
N: int = 100,
652+
interruption_times: list[int] | None = None,
653+
baseline_intercept: float = 10.0,
654+
baseline_slope: float = 0.1,
655+
level_changes: list[float] | None = None,
656+
slope_changes: list[float] | None = None,
657+
noise_sigma: float = 1.0,
658+
seed: int | None = None,
659+
) -> tuple[pd.DataFrame, dict]:
660+
"""
661+
Generate piecewise Interrupted Time Series data with known ground truth parameters.
662+
663+
This function creates synthetic data for testing and demonstrating piecewise ITS
664+
/ segmented regression models. The data follows the model:
665+
666+
y_t = β₀ + β₁t + Σₖ(level_k · I_k(t) + slope_k · R_k(t)) + ε_t
667+
668+
Where:
669+
- I_k(t) = 1 if t >= T_k else 0 (step function for level change)
670+
- R_k(t) = max(0, t - T_k) (ramp function for slope change)
671+
672+
Parameters
673+
----------
674+
N : int, default=100
675+
Number of time points in the series.
676+
interruption_times : list[int], optional
677+
List of time indices where interruptions occur. Defaults to [50].
678+
baseline_intercept : float, default=10.0
679+
The intercept (β₀) of the baseline trend.
680+
baseline_slope : float, default=0.1
681+
The slope (β₁) of the baseline trend.
682+
level_changes : list[float], optional
683+
List of level changes at each interruption. Length must match
684+
interruption_times. If None, defaults to [5.0] for single interruption.
685+
slope_changes : list[float], optional
686+
List of slope changes at each interruption. Length must match
687+
interruption_times. If None, defaults to [0.0] (no slope change).
688+
noise_sigma : float, default=1.0
689+
Standard deviation of the Gaussian noise.
690+
seed : int, optional
691+
Random seed for reproducibility.
692+
693+
Returns
694+
-------
695+
df : pd.DataFrame
696+
DataFrame with columns:
697+
- 't': time index (0 to N-1)
698+
- 'y': observed outcome with noise
699+
- 'y_true': outcome without noise (ground truth)
700+
- 'counterfactual': baseline trend without intervention effects
701+
- 'effect': true causal effect at each time point
702+
params : dict
703+
Dictionary containing the true parameters:
704+
- 'baseline_intercept': β₀
705+
- 'baseline_slope': β₁
706+
- 'level_changes': list of level changes
707+
- 'slope_changes': list of slope changes
708+
- 'interruption_times': list of interruption times
709+
- 'noise_sigma': noise standard deviation
710+
711+
Examples
712+
--------
713+
>>> from causalpy.data.simulate_data import generate_piecewise_its_data
714+
>>> # Single interruption with level and slope change
715+
>>> df, params = generate_piecewise_its_data(
716+
... N=100,
717+
... interruption_times=[50],
718+
... level_changes=[5.0],
719+
... slope_changes=[0.2],
720+
... seed=42,
721+
... )
722+
>>> df.shape
723+
(100, 5)
724+
725+
>>> # Multiple interruptions
726+
>>> df, params = generate_piecewise_its_data(
727+
... N=150,
728+
... interruption_times=[50, 100],
729+
... level_changes=[3.0, -2.0],
730+
... slope_changes=[0.1, -0.15],
731+
... seed=42,
732+
... )
733+
>>> len(params["interruption_times"])
734+
2
735+
736+
>>> # Level change only (no slope change)
737+
>>> df, params = generate_piecewise_its_data(
738+
... N=100,
739+
... interruption_times=[50],
740+
... level_changes=[5.0],
741+
... slope_changes=[0.0],
742+
... seed=42,
743+
... )
744+
"""
745+
# Set defaults
746+
if interruption_times is None:
747+
interruption_times = [50]
748+
749+
n_interruptions = len(interruption_times)
750+
751+
if level_changes is None:
752+
level_changes = [5.0] * n_interruptions
753+
754+
if slope_changes is None:
755+
slope_changes = [0.0] * n_interruptions
756+
757+
# Validate inputs
758+
if len(level_changes) != n_interruptions:
759+
raise ValueError(
760+
f"level_changes length ({len(level_changes)}) must match "
761+
f"interruption_times length ({n_interruptions})"
762+
)
763+
764+
if len(slope_changes) != n_interruptions:
765+
raise ValueError(
766+
f"slope_changes length ({len(slope_changes)}) must match "
767+
f"interruption_times length ({n_interruptions})"
768+
)
769+
770+
for t_k in interruption_times:
771+
if t_k < 0 or t_k >= N:
772+
raise ValueError(
773+
f"Interruption time {t_k} is outside valid range [0, {N - 1}]"
774+
)
775+
776+
# Set random seed
777+
if seed is not None:
778+
np.random.seed(seed)
779+
780+
# Generate time index
781+
t = np.arange(N)
782+
783+
# Compute baseline (counterfactual)
784+
counterfactual = baseline_intercept + baseline_slope * t
785+
786+
# Compute intervention effects
787+
effect = np.zeros(N)
788+
for k, t_k in enumerate(interruption_times):
789+
# Step function: I_k(t) = 1 if t >= t_k
790+
step = (t >= t_k).astype(float)
791+
# Ramp function: R_k(t) = max(0, t - t_k)
792+
ramp = np.maximum(0, t - t_k).astype(float)
793+
794+
effect += level_changes[k] * step + slope_changes[k] * ramp
795+
796+
# Compute true outcome (without noise)
797+
y_true = counterfactual + effect
798+
799+
# Add noise
800+
noise = np.random.normal(0, noise_sigma, N)
801+
y = y_true + noise
802+
803+
# Create DataFrame
804+
df = pd.DataFrame(
805+
{
806+
"t": t,
807+
"y": y,
808+
"y_true": y_true,
809+
"counterfactual": counterfactual,
810+
"effect": effect,
811+
}
812+
)
813+
814+
# Store parameters
815+
params = {
816+
"baseline_intercept": baseline_intercept,
817+
"baseline_slope": baseline_slope,
818+
"level_changes": level_changes,
819+
"slope_changes": slope_changes,
820+
"interruption_times": interruption_times,
821+
"noise_sigma": noise_sigma,
822+
}
823+
824+
return df, params

causalpy/experiments/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
from .instrumental_variable import InstrumentalVariable
1818
from .interrupted_time_series import InterruptedTimeSeries
1919
from .inverse_propensity_weighting import InversePropensityWeighting
20+
from .piecewise_its import PiecewiseITS
2021
from .prepostnegd import PrePostNEGD
2122
from .regression_discontinuity import RegressionDiscontinuity
2223
from .regression_kink import RegressionKink
@@ -26,11 +27,12 @@
2627
__all__ = [
2728
"DifferenceInDifferences",
2829
"InstrumentalVariable",
30+
"InterruptedTimeSeries",
2931
"InversePropensityWeighting",
32+
"PiecewiseITS",
3033
"PrePostNEGD",
3134
"RegressionDiscontinuity",
3235
"RegressionKink",
3336
"StaggeredDifferenceInDifferences",
3437
"SyntheticControl",
35-
"InterruptedTimeSeries",
3638
]

0 commit comments

Comments
 (0)