Skip to content

Flip keep_zip default to True#94

Merged
nick-gorman merged 1 commit into
masterfrom
keep-zip-default-true
May 25, 2026
Merged

Flip keep_zip default to True#94
nick-gorman merged 1 commit into
masterfrom
keep-zip-default-true

Conversation

@nick-gorman
Copy link
Copy Markdown
Member

Summary

  • Flips the default of keep_zip from False to True in dynamic_data_compiler, cache_compiler, and all internal helpers that thread the parameter through.
  • Re-asserts the Option to keep .zip not csv #56 rationale (retain the compressed archive so subsequent runs / cache rebuilds / flaky-connection re-runs don't have to re-fetch from AEMO) as the default behaviour. Lean-cache callers now opt out via explicit keep_zip=False.
  • Docstrings updated to describe the new default and explain the trade-off. download_unzip_csv's docstring swaps which branch carries the (default) annotation.

Test plan

  • test_keep_zip_default_is_false renamed to test_keep_zip_default_is_true with inverted assertion (default now retains zips).
  • test_keep_zip_true_retains_zip kept with explicit keep_zip=True so the contract is locked independent of the default.
  • New test_keep_zip_false_removes_zip covers the explicit opt-out, replacing the dangling reference from test_fformat_csv.py (which pre-dated this PR — that reference now points at the real location).
  • CI run on this branch passes the cache-compiler end-to-end suite.

Context

PR #87 had set keep_zip=False for symmetry with keep_csv=False and a leaner default cache. In practice the slow-internet / re-download use case from #56 is the more common need — Matt's original streaming work in 2a5812f left zips on disk after extraction for exactly this reason. keep_csv=False stays as-is because raw CSVs are cheap to re-extract from the retained zip, so there's no symmetric pressure to keep them.

The (path, downloaded) cleanup contract from PR #87 is preserved — when a caller passes keep_zip=False explicitly, cleanup only fires for zips this call actually wrote, so concurrent-same-cache use is still safe.

🤖 Generated with Claude Code

…che_compiler`

PR #87 had set `keep_zip=False` by default for a leaner cache, paired
with `keep_csv=False`. In practice the slow-internet / re-download use
case from #56 is the more common need — having the compressed archive
on disk means subsequent runs (cache rebuilds, format changes, re-runs
on a flaky connection) don't have to re-fetch from AEMO. Matt's
original streaming work in `2a5812f` left zips on disk after extraction
for this reason; this change brings the default back in line with that
intent. `keep_csv=False` stays as-is (raw CSVs are cheap to re-extract
from the retained zip, so there's no symmetric pressure to keep them).

Changes:

1. `dynamic_data_compiler`, `cache_compiler`, `_dynamic_data_fetch_loop`,
   `_download_data` (data_fetch_methods.py) — `keep_zip` default flips
   to `True`. Docstrings updated: "True by default — keeps the
   compressed archive so subsequent runs don't have to re-download
   from AEMO (see #56). Set False for a leaner cache when re-download
   cost is not a concern."

2. All internal helpers in `downloader.py` get the same default flip
   so any internal caller that hits the default sees the same value
   as the public API: `run`, `run_bid_tables`,
   `run_next_day_region_tables`, `run_next_dispatch_tables`,
   `run_intermittent_gen_scada`, `run_fcas4s`, the four
   `_download_and_unpack_*` helpers, and `download_unzip_csv`.
   `download_unzip_csv`'s docstring swaps which branch is annotated
   `(default)`.

3. Tests in `tests/end_to_end_table_tests/test_cache_compiler.py`:
   - `test_keep_zip_default_is_false` renamed to
     `test_keep_zip_default_is_true` with inverted assertion.
   - `test_keep_zip_true_retains_zip` keeps its explicit `keep_zip=True`
     to lock the contract independent of the default; docstring notes
     that it's now also the default.
   - New `test_keep_zip_false_removes_zip` covers the explicit opt-out
     path. The dangling reference to this test in `test_fformat_csv.py`
     pre-dated this PR; that reference is now fixed to point at the
     real location.

The `(path, downloaded)` cleanup contract from PR #87 is preserved —
when a caller does pass `keep_zip=False`, cleanup only fires for zips
this call actually wrote, so concurrent-same-cache use is still safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-gorman nick-gorman merged commit 263e1f7 into master May 25, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant