fformat default → parquet, NEXT_DAY_DISPATCH URL fix, test fixture refresh, README docs#95
Merged
Merged
Conversation
Adds a new era key "2026-04" to tests/fixtures/spec.py and extends the 12 dynamic tables that already cover 2025-01 to also cover 2026-04 in their `eras` lists: DISPATCHPRICE, DISPATCHLOAD, DISPATCH_UNIT_SCADA, DISPATCHREGIONSUM, DISPATCHINTERCONNECTORRES, DISPATCHCONSTRAINT, TRADINGPRICE, TRADINGINTERCONNECT, BIDPEROFFER_D, BIDDAYOFFER_D, MNSP_DAYOFFER, ROOFTOP_PV_ACTUAL. Why: general recent-month coverage in the PUBLIC_ARCHIVE# format. Not a year boundary and not tied to a known AEMO transition — just keeps the matrix from drifting too far behind the live data we'd see on nemweb today. The boundary-test matrix in _boundaries.py reads from spec.DYNAMIC_TABLES and auto-generates `at` and `before` cases per (table, era), so 24 new test cases (12 tables x 2 flavours) flow through without any test-file edits. Hand-written era tests (e.g. test_dispatch_price.py's `era_start` parametrize) still only reference the older eras and are unchanged. Fixtures for 2026-04 plus the 2026-03 prev-month buffer were downloaded via `uv run python tests/fixtures/build.py` and committed alongside the spec change. Full offline suite: 428 passed, 1 skipped (the pre-existing ROOFTOP_PV_ACTUAL disjoint-archives skip, unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the README into line with code already on master: - `keep_csv` default flipped from True to False in PR #87 — the "Caching options" section still claimed True. Updated. - `keep_zip` was added in PR #87 and its default flipped to True in 45285d2 — was entirely undocumented in the README. Added a paragraph plus a 4-row keep_csv x keep_zip matrix showing what ends up on disk under each combination. - "Using the default settings" paragraph now mentions the zip-retention behaviour and points readers at Caching options. - Cache compiler description now mentions zip retention; "covert" typo fixed. - "Accessing additional table columns" notes that with keep_zip=True (the default), a rebuild re-extracts from the cached zip rather than re-downloading from AEMO. Also fixed a small "If you using" -> "If you are using" grammar slip. No behavioural change — docs only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dynamic_data_compiler` and `cache_compiler` now default to `fformat="parquet"` (previously `"feather"`). Why: parquet has better compression characteristics and broader interoperability with downstream tooling (Dask, Arrow, BigQuery loaders, etc.) than feather, at a fairly small read/write performance cost on the workloads NEMOSIS produces. Existing feather users opt in explicitly via `fformat="feather"`. Changes: - src/nemosis/data_fetch_methods.py: flipped the default in `dynamic_data_compiler`, `cache_compiler`, and the private `_dynamic_data_fetch_loop` (the private one is belt-and-braces since both public callers always pass `fformat` explicitly). - README.md: rewrote the "Using the default settings" paragraph, the "Caching options" section + matrix, and the "Cache compiler" intro to lead with parquet. Feather is now documented as the opt-in alternative. - tests/end_to_end_table_tests/test_cache_compiler.py: five default-sensitive `*.feather` globs/pre-populated files updated to `*.parquet`. Includes a test rename (`test_existing_feather_means_no_csv_is_fetched` -> `test_existing_parquet_means_no_csv_is_fetched`) and a few docstring touch-ups. - tests/end_to_end_table_tests/test_datetime_inputs.py: two default-sensitive `*.feather` globs updated to `*.parquet`. `tests/test_errors.py` and `tests/end_to_end_table_tests/test_fformat_csv.py` pass `fformat` explicitly and are unaffected. Full suite locally: 421 passed; 7 failures are pre-existing fixture gaps for the 2026-04 era (unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt era to 2026-05-15 Two threads in one commit (this branch is targeting a single PR today): 1. Production URL fix ------------------- AEMO renamed the parent directory for NEXT_DAY_DISPATCH files from `/Reports/Current/NEXT_DAY_DISPATCH/` to `/Reports/Current/Next_Day_Dispatch/` (title case — bringing it in line with the other Reports/Current/ subdirs like Daily_Reports/ and Next_Day_Intermittent_Gen_Scada/). The old URL now 404s. Anyone calling `dynamic_data_compiler(table_name="NEXT_DAY_DISPATCHLOAD", ...)` against live AEMO was getting NoDataToReturn — the scraper hit the index URL, got a 404, found no matching anchors, and the date generator ran out empty. Fix: - src/nemosis/defaults.py — flip the path in `current_data_page_urls`. - tests/fixtures/build.py — same flip in `SCRAPE_FILES` + `TABLE_SCRAPE_FILE` (build.py keeps its own hardcoded scrape paths, independent of defaults). - tests/fixtures/data/Reports/Current/NEXT_DAY_DISPATCH/ — `git mv` to the new title-case name so the offline mock server serves the fixture tree at the URL NEMOSIS now requests. Backward compat with pre-existing user caches: not affected. NEMOSIS's cache filename comes from the ZIP basename (`PUBLIC_NEXT_DAY_DISPATCH_YYYYMMDD_*.zip`), which AEMO didn't rename — only the parent directory moved. Existing cached ZIPs keep matching; new downloads land under the same names. Verified live: GET `/Reports/Current/NEXT_DAY_DISPATCH/` returns 404; GET `/Reports/Current/Next_Day_Dispatch/` returns 200. Probed the other current-data URLs (Bidmove_Complete/, Daily_Reports/, Next_Day_Intermittent_Gen_Scada/, Causer_Pays/) — all still 200, so this is the only directory AEMO has renamed. This is the same shape of bug as #74 (`%23` URL encoding): AEMO changed something upstream, and the offline test suite couldn't detect it because the local mock server's behaviour diverged from the real nemweb. In this case Windows's case-insensitive filesystem masked the case mismatch between NEMOSIS's request and the on-disk fixture path. Case-sensitive filesystems (macOS/Linux CI) would have surfaced this. 2. Recent era shift to 2026-05-15 --------------------------------- `spec.ERAS["recent"]` was pinned to 2026-03-15, which is approaching the edge of AEMO's rolling current-data retention. Bumped to 2026-05-15 (~10 days back from today) for deeper buffer. Side effects: - The 6 stale March 14/15 fixtures (3 scrape tables x 2 days) are `git rm`'d in favour of fresh May 14/15 fixtures fetched via `uv run python tests/fixtures/build.py`. - index.html files in each scrape dir are regenerated by build.py to list only the current fixtures. - 3 test files (test_daily_region_summary, test_intermittent_gen_scada, test_next_day_dispatch_load) had hard-coded `2026/03/15` dates in their assertions — bumped to `2026/05/15`. Docstring file-name references (`20260314`, `20260315`) bumped too. Suite: 428 passed, 1 skipped (pre-existing ROOFTOP_PV_ACTUAL disjoint- archives skip, unrelated), 1 warning (pre-existing pandas FutureWarning). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Caching options matrix row and the keep_csv=True example caption both implied that the GUI reads the raw AEMO CSV from the cache and therefore needs keep_csv=True to function. That's wrong: The GUI calls dynamic_data_compiler via _dynamic_data_wrapper_for_gui (method_map.py) with parse_data_types=False and no fformat/keep_csv arguments. It reads the parquet/feather cache, not the raw CSV. The CSV the GUI writes (gui.py:321) is an output artefact going to save_location, separate from raw_data_location. The keep_csv flag only controls whether the *raw AEMO CSV* (extracted from the zip before parquet/feather conversion) is retained. Nothing in the NEMOSIS distribution reads that file — keep_csv=True is purely for external tools that want the original CSV on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "for downstream tools that consume the raw CSV" annotation attached to row 3 (keep_csv=True, keep_zip=False) implied that this specific combination is the CSV-consumer use case. That's wrong — the CSV-consumer rationale applies whenever keep_csv=True, regardless of keep_zip. The example caption below the matrix already covers the use case in prose, so the matrix just needs to describe what's on disk. Matrix is now symmetric: rows 2 and 4 carry the "leanest cache" / "full raw retention" extreme markers; rows 1 and 3 are described purely by contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bulk PR consolidating six commits that have accumulated on this branch since the
keep_zipflip (PR #94) merged. Three loosely-related threads:Behaviour changes
fformatdefault flipped fromfeathertoparquet(1f034a7).dynamic_data_compilerandcache_compilernow default to parquet; existing feather users opt in explicitly withfformat="feather". Parquet has better compression and broader downstream tooling support (Dask, Arrow, BigQuery loaders).Bug fix (production)
NEXT_DAY_DISPATCHLOADURL fix (6e65ea5). AEMO renamed the directory/Reports/Current/NEXT_DAY_DISPATCH/→/Reports/Current/Next_Day_Dispatch/(title case, in line with the otherReports/Current/subdirs). The old URL now 404s; anyone fetchingNEXT_DAY_DISPATCHLOADagainst live AEMO was gettingNoDataToReturn. Same shape as issue Monthly MMS dynamic table fetches fail from 2024-08 onward (PUBLIC_ARCHIVE#...) while olderPUBLIC_DVD_*months still work #74 (%23encoding) — local mock server's case-insensitive Windows filesystem masked the upstream change. Cache backward-compat: not affected — only the parent directory was renamed, the file basenames (which NEMOSIS keys its cache on) are unchanged.Test fixture maintenance
_boundaries.py. Fixtures for 2026-04 + the 2026-03 prev-month buffer downloaded and committed.recentera to 2026-05-15 (part of 6e65ea5). Previous2026-03-15was approaching the edge of AEMO's rolling current-data retention window. Old March fixtures removed, fresh May 14/15 fixtures committed.Docs
keep_csv=Falseflip and thekeep_zip=Truedefault added in 45285d2. Removed an incorrect "GUI co-use needskeep_csv=True" claim, dropped misplaced rationale from thekeep_csv/keep_zipmatrix, and fixed a few typos/grammar slips.Test plan
uv run pytest tests/→ 428 passed, 1 skipped (pre-existing ROOFTOP_PV_ACTUAL skip), 1 warning (pre-existing pandas FutureWarning).uv run python tests/fixtures/build.py→ no failures, 24 new MMS fixtures + 6 new scrape fixtures land where expected.NEXT_DAY_DISPATCH/→ 404,Next_Day_Dispatch/→ 200.current_data_page_urlsentries probed → still 200 (no other renames hiding).NEXT_DAY_DISPATCHcasing bug earlier — worth confirming the fixture rename took on those platforms too).Notes for the reviewer
keep-zip-default-trueis a recycled branch name — its original purpose (thekeep_zipflip itself) shipped in PR Flipkeep_zipdefault toTrue#94. This PR is the continuation work that landed on top.🤖 Generated with Claude Code