Commit 57efa72
Fix simulate_data.py reproducibility + remove synthetic CSVs (#794)
* fix: make all data generators reproducible via seed parameter
- Add `seed` parameter to all generator functions that lacked it
- Replace scipy.stats RNG calls (norm().rvs, uniform.rvs, dirichlet().rvs)
with numpy Generator methods (rng.normal, rng.uniform, rng.dirichlet)
- Replace np.random.* global state calls with local rng instances
- Fix create_series() bug: hardcoded length_scale=2 now uses parameter
- Rename create_series, generate_seasonality, periodic_kernel to private
(_create_series, _generate_seasonality, _periodic_kernel) since they
are internal helpers that require an rng instance from their caller
- Fix deprecated pandas freq="M" → freq="ME"
- Remove unused scipy.stats imports (norm, uniform, dirichlet)
- Remove module-level global rng; keep RANDOM_SEED constant for use
by load_data() and test fixtures
- Standardize generate_staggered_did_data to use same rng pattern
Closes #545
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: generate synthetic datasets on-the-fly in load_data()
Split DATASETS dict into SYNTHETIC_DATASETS (generated via seeded
functions) and REAL_WORLD_DATASETS (loaded from CSV). This removes
the dependency on synthetic CSV files while keeping real-world data
as shipped CSVs.
- Replace _get_data_home() with simple _DATA_DIR = Path(__file__).parent
- Remove circular `import causalpy as cp` dependency
- All 8 synthetic datasets now call their generator with RANDOM_SEED
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add session-scoped fixtures for all synthetic datasets
Eight fixtures (did_data, its_data, its_simple_data, rd_data, sc_data,
anova1_data, geolift1_data, geolift_multi_cell_data) generate data once
per test session using RANDOM_SEED, avoiding redundant calls to
load_data() or generators in individual tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: replace cp.load_data() with fixtures for synthetic datasets
Update 8 test files to use session-scoped fixtures (did_data, its_data,
rd_data, sc_data, anova1_data, geolift1_data) instead of calling
cp.load_data() for synthetic datasets. Real-world dataset loading
(banks, brexit, drinking, risk, nhefs) remains unchanged.
Also rewrite test_data_loading.py to parametrize over all datasets
(synthetic + real-world) and add reproducibility + unknown-key tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: delete synthetic CSVs and unused gt_social_media_data.csv
Remove 8 synthetic CSV files (~172K) now generated programmatically:
did.csv, regression_discontinuity.csv, synthetic_control.csv,
its.csv, its_simple.csv, ancova_generated.csv, geolift1.csv,
geolift_multi_cell.csv
Also remove gt_social_media_data.csv which was never referenced
in the DATASETS dict or any code.
Real-world CSVs (12 files, ~3.2MB) remain in the repo.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(datasets): use Callable type hint instead of callable for mypy
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(simulate_data): remove unused generate_time_series_data function
Superseded by generate_time_series_data_seasonal and
generate_time_series_data_simple. Not imported or called anywhere.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Benjamin T. Vincent <inferencelab@gmail.com>1 parent eaa37f8 commit 57efa72
20 files changed
Lines changed: 442 additions & 1703 deletions
File tree
- causalpy
- data
- tests
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
45 | 57 | | |
46 | 58 | | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
51 | 74 | | |
52 | 75 | | |
53 | 76 | | |
| |||
84 | 107 | | |
85 | 108 | | |
86 | 109 | | |
| 110 | + | |
87 | 111 | | |
88 | 112 | | |
89 | 113 | | |
| |||
106 | 130 | | |
107 | 131 | | |
108 | 132 | | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
115 | 137 | | |
116 | | - | |
| 138 | + | |
This file was deleted.
0 commit comments