Skip to content

Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180

Open
nikosbosse wants to merge 2 commits into
mainfrom
fix/summarise-scores-empty-metrics-1179
Open

Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180
nikosbosse wants to merge 2 commits into
mainfrom
fix/summarise-scores-empty-metrics-1179

Conversation

@nikosbosse
Copy link
Copy Markdown
Collaborator

@nikosbosse nikosbosse commented May 28, 2026

CLAUDE: Closes #1179.

Summary

  • summarise_scores() previously selected which columns to summarise via colnames(scores) %like% paste(metrics, collapse = "|"). When the metrics attribute is character(0) (i.e. every metric in score() warned and returned nothing), this pattern becomes the empty string, which %like% matches against every column — including the by column. The by column was then passed to the summary function, producing the spurious "argument is not numeric or logical" warning and a data.table with a duplicate by column. The duplicate is invisible inside data.table but breaks downstream conversion to tibble.
  • Switched to exact column-name matching via intersect(colnames(scores), metrics). This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill").

Both tightenings from #1179 are implemented

The issue suggested two fixes (either or both); this PR does both:

  1. summarise_scores() no longer summarises its by columns. With exact-name matching, .SDcols is now intersect(colnames(scores), metrics), so task-ID columns (including the by column) are never passed to the summary function. This removes the duplicate-column root cause.
  2. summarise_scores() errors early when there are no score columns to summarise. When intersect(colnames(scores), metrics) is empty there is nothing meaningful to return, so it now aborts with a clear message rather than producing a malformed object.

Tests

  • End-to-end regression test for the issue reprex: scoring example_quantile with only interval_coverage_55 warns and produces no score columns, after which summarise_scores() errors.
  • A unit-level test of the same empty-metrics case (manually cleared score columns).
  • A test guarding against partial-name matching.

Out of scope

The issue also raises whether score() should itself fail (rather than return an empty scores object) when every metric fails. That's a real question but a bigger design call; leaving it for a separate discussion.

Test plan

  • Targeted tests pass locally (testthat::test_file("tests/testthat/test-summarise_scores.R") — 16 pass)
  • lintr::lint() clean on changed files
  • CI green (full R CMD check / covr relied on CI)
  • Branch rebased onto current main, which also picks up the multivariate-sample snapshot fix (Fix macOS CI snapshot precision + lint-changed-files workflow #1182) so macOS CI should be green.

This was opened by a bot. Please ping @seabbs for any questions.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.99%. Comparing base (abd733e) to head (d124e69).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1180   +/-   ##
=======================================
  Coverage   97.98%   97.99%           
=======================================
  Files          38       38           
  Lines        2036     2045    +9     
=======================================
+ Hits         1995     2004    +9     
  Misses         41       41           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nikosbosse
Copy link
Copy Markdown
Collaborator Author

CLAUDE: All checks pass except macOS-latest (release), which fails on the same two tests at the same line numbers (test-class-forecast-multivariate-sample.R:343 and :376 in score.forecast_multivariate_sample() creates expected output) on main as well (e.g. run 25738222052). This is pre-existing macOS-specific breakage in the multivariate sample tests, unrelated to this PR.

@seabbs seabbs self-requested a review May 29, 2026 08:24
@seabbs
Copy link
Copy Markdown
Contributor

seabbs commented May 29, 2026

#1179 contains a nice reprex test case can we test against it. I wasn't clear if this was from you @nikosbosse or claude. I guess our practice for review is to wait to be tagged but maybe making it clearer in the PR description would be good:?

@seabbs
Copy link
Copy Markdown
Contributor

seabbs commented May 29, 2026

This looks good to go otherwise though it would be nice to explicitly state that both the tightenings suggested in #1179 have been implemented (it looks to me that they have)

@seabbs-bot seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 466b46e to 487fcc2 Compare May 29, 2026 09:26
@seabbs-bot
Copy link
Copy Markdown
Collaborator

Addressed the review feedback:

This was opened by a bot. Please ping @seabbs for any questions.

@seabbs-bot
Copy link
Copy Markdown
Collaborator

Automated review pass (agent quality gate)

No Critical or Important findings. The change is correct and well-scoped.

Observations:

  • The fix replaces a fragile regex partial-match (`%like% paste(metrics, collapse = "|")`) with exact-name matching (`intersect(colnames(scores), metrics)`), which also fixes the latent partial-match bug (e.g. "wis" matching "wis_relative_skill"). Good.
  • Turning the previously-silent duplicate-column case into an explicit `cli_abort` is the right call and matches the issue intent.
  • Line 82 sets `attr(scores, "metrics") <- metrics` (full vector) rather than `metric_cols`. This preserves prior behaviour and is reachable only when `metric_cols` is non-empty, so it is fine; just noting it is intentional.
  • `cli` and `cli_abort` are already package deps/imports; the multi-line message string matches the existing codebase convention (cli collapses internal whitespace on render).
  • Regression tests cover the empty-metrics error, the end-to-end reprex from summarize_scores() produces duplicate column names when input has no score columns #1179, and the partial-match guard. `scores_quantile` fixture confirmed present in setup.R.
  • `lintr::lint()` clean on both changed files locally.

CI is pending; will continue to monitor checks and mergeability.

This was opened by a bot. Please ping @seabbs for any questions.

@seabbs-bot seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 487fcc2 to 409145b Compare May 29, 2026 09:32
Copy link
Copy Markdown
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seabbs-bot added the explicit reprex so this looks good to me now.

@seabbs seabbs enabled auto-merge May 29, 2026 09:32
nikosbosse and others added 2 commits May 29, 2026 10:35
`summarise_scores()` selected the columns to summarise via
`colnames(scores) %like% paste(metrics, collapse = "|")`. When the
`metrics` attribute is empty (which happens when every metric passed
to `score()` warned and returned nothing), the pattern becomes the
empty string, which `%like%` matches against every column. The `by`
column was then passed to the summary function, producing a duplicate
`by` column in the output and the spurious "argument is not numeric or
logical" warning.

Switch to exact column-name matching via `intersect()` and error early
when there is nothing to summarise. This also incidentally fixes a
latent issue where a metric named e.g. "wis" would have matched any
column whose name contained "wis" (such as "wis_relative_skill").

Closes #1179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a regression test exercising the exact reprex from #1179: scoring
example_quantile with only `interval_coverage_55` warns and produces no
score columns, after which `summarise_scores()` must error rather than
return a data.table with a duplicate `by` column.
@seabbs-bot seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 409145b to d124e69 Compare May 29, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

summarize_scores() produces duplicate column names when input has no score columns

3 participants