Skip to content

Polish from DISPATCHPRICE user-testing: input validation, log clarity, index reset#98

Merged
nick-gorman merged 2 commits into
masterfrom
docs-start-end-time
May 27, 2026
Merged

Polish from DISPATCHPRICE user-testing: input validation, log clarity, index reset#98
nick-gorman merged 2 commits into
masterfrom
docs-start-end-time

Conversation

@nick-gorman
Copy link
Copy Markdown
Member

Summary

Two commits' worth of polish that came out of an exploratory user-testing pass on DISPATCHPRICE workflows. No API breakage; the changes either convert previously-silent footguns into clear UserInputError or quiet WARNINGs that were either redundant or misleading.

What's in here

d05a728 — Distinguish "Downloading" from "Extracting cached zip" in INFO logs

Before: every call printed "Downloading data for table X" whether or not a network fetch actually happened. Now the verb reflects what the call did (Downloading and extracting vs Extracting cached zip), based on the bool already bubbled up from the download helpers.

63edfee — Tighten input validation and quiet redundant WARNINGs

Validation (now raises UserInputError instead of going silent):

Before After
select_columns=['SETTLEMENTDATE', 'rrp'] (typo) returned a 1-column stub DataFrame with a WARNING Raises with a list of what is available
filter_cols=['A','B'] + filter_values=(['x'],) silently filtered by A only (zip-truncate) Raises naming the length mismatch
filter_cols=['A'] with no filter_values raised bare TypeError: 'NoneType' object is not iterable Raises a clear NEMOSIS message
end_time < start_time silently returned empty DataFrame Raises
start_time == end_time downloaded a whole month for (0, N) rows Raises

Validation happens up-front in dynamic_data_compiler / cache_compiler / static_table, plus a post-load _check_loaded_select_columns that catches "user asked for a column that doesn't exist anywhere in their window."

Polish:

  • reset_index(drop=True) on returned DataFrames so df.loc[0] / df.iloc[0] work. Without this, rows carry the underlying parquet indices (e.g. 20162, 20167, …).

Quieter logs:

  • Defaults-only column misses (e.g. RAISE1SECRRP/LOWER1SECRRP on pre-1-sec-FCAS vintages) now log at DEBUG. They previously fired WARNING on every cache read — noise about NEMOSIS's own internal accounting, since the user never typed those columns. User-typed missing columns still WARNING + raise via the post-load check.
  • The redundant "Loading data from .../FILE.parquet failed." WARNING that fires after a 404 download is demoted to DEBUG with rewording. The 404 itself is already announced upstream; the old message read as local cache corruption.

Test fix:

  • test_interconnector_constraint_returns_fixtured_rows was asking for IMPORTLIMIT/EXPORTLIMIT in select_columns but not asserting anything about them, and the fixture parquet doesn't contain those columns — the test was relying on the silent-stub bug now being fixed. Trimmed those names.

Test plan

  • uv run pytest tests/ — 428 passed, 1 skipped, 1 unrelated pre-existing FutureWarning
  • Smoke-tested each new UserInputError path against a live AEMO query
  • Smoke-tested RAISE1SECRRP noise reduction on pre-2024 DISPATCHPRICE and confirmed it's silent at INFO level
  • Smoke-tested future-date query: four-warnings-per-month down to two

🤖 Generated with Claude Code

nick-gorman and others added 2 commits May 26, 2026 15:25
Before this change, _download_data unconditionally logged
"Downloading data for table X, year Y, month M" whenever the cache-hit
check at the outer layer missed — even when the actual network fetch
inside download_to_path short-circuited because the zip was already on
disk (e.g. fformat='csv' with keep_csv=False, where the CSV is deleted
after every read but the zip persists). Reading the log line, users
reasonably conclude every call hammers AEMO when in fact only the local
unzip is being repeated.

Bubble the True/False "did a network fetch happen" signal that
download_to_path already returns up through download_unzip_csv, the
four _download_and_unpack_* helpers, and all six run* functions. The
run* functions now return True on a real fetch, False on a zip reuse,
None on failure (a warning has already been emitted downstream).

_download_data reads that signal and picks the verb: "Downloading and
extracting data" on a real fetch, "Extracting cached zip" on a reuse,
nothing on failure. The first-call verb names both steps explicitly so
the second message reads as a proper subset (network skipped) rather
than a different operation.

No existing test asserts on the prior implicit None return of the run*
functions or the absent return of download_unzip_csv, so the contract
change is backwards-compatible. tests/test_downloader.py still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to exploratory user-testing of DISPATCHPRICE workflows. Five
small fixes, all in src/nemosis/data_fetch_methods.py (plus one loose
test trimmed to match the new contract).

Input validation (new UserInputErrors instead of silent bad behaviour):

- select_columns / filter_cols / filter_values: catch typos, length
  mismatches, and missing pairs up front via new helpers
  _validate_user_select_columns, _validate_filter_args, and a
  _check_loaded_select_columns post-load membership check.
  Previously select_columns=['SETTLEMENTDATE', 'rrp'] returned a stub
  DataFrame, mismatched filter_cols/filter_values lengths silently
  dropped filters, and filter_cols with no filter_values raised a bare
  TypeError from inside zip().
- start_time / end_time: reject inverted (end < start) and zero-length
  (end == start) windows. Previously both silently returned an empty
  DataFrame after downloading a whole month for nothing. New helper
  _validate_time_window wired into dynamic_data_compiler and
  cache_compiler.

Polish:

- reset_index(drop=True) on dynamic_data_compiler and static_table
  returns so callers can use .loc[0] / .iloc[0] without hitting the
  underlying parquet row indices (e.g. 20162, 20167, ...).

Quieter logs:

- Plumb a user_select_columns flag through _dynamic_data_fetch_loop /
  _perform_column_selection / _validate_select_columns. Defaults-only
  column rejections (e.g. RAISE1SECRRP / LOWER1SECRRP on pre-1-sec-FCAS
  vintages) now log at DEBUG instead of WARNING. User-typed missing
  columns still WARNING + raise via _check_loaded_select_columns.
- Demote the redundant "Loading data from .../FILE.parquet failed."
  WARNING that fires after a download 404 to DEBUG, with rewording
  ("No cached data file present at X (upstream download likely
  404'd)."). The 404 itself is already announced by the downloader;
  the old message read as local cache corruption.

Test:

- tests/.../test_interconnector_constraint.py: drop IMPORTLIMIT and
  EXPORTLIMIT from select_columns. The test asserted neither, and these
  columns aren't in the cached fixture parquet — the test was relying
  on the silent-stub behaviour now being fixed.

All 428 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-gorman nick-gorman merged commit 7b812ba into master May 27, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant