Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180
Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180nikosbosse wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1180 +/- ##
=======================================
Coverage 97.98% 97.99%
=======================================
Files 38 38
Lines 2036 2045 +9
=======================================
+ Hits 1995 2004 +9
Misses 41 41 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
CLAUDE: All checks pass except |
|
#1179 contains a nice reprex test case can we test against it. I wasn't clear if this was from you @nikosbosse or claude. I guess our practice for review is to wait to be tagged but maybe making it clearer in the PR description would be good:? |
|
This looks good to go otherwise though it would be nice to explicitly state that both the tightenings suggested in #1179 have been implemented (it looks to me that they have) |
466b46e to
487fcc2
Compare
|
Addressed the review feedback:
This was opened by a bot. Please ping @seabbs for any questions. |
|
Automated review pass (agent quality gate) No Critical or Important findings. The change is correct and well-scoped. Observations:
CI is pending; will continue to monitor checks and mergeability. This was opened by a bot. Please ping @seabbs for any questions. |
487fcc2 to
409145b
Compare
seabbs
left a comment
There was a problem hiding this comment.
@seabbs-bot added the explicit reprex so this looks good to me now.
`summarise_scores()` selected the columns to summarise via `colnames(scores) %like% paste(metrics, collapse = "|")`. When the `metrics` attribute is empty (which happens when every metric passed to `score()` warned and returned nothing), the pattern becomes the empty string, which `%like%` matches against every column. The `by` column was then passed to the summary function, producing a duplicate `by` column in the output and the spurious "argument is not numeric or logical" warning. Switch to exact column-name matching via `intersect()` and error early when there is nothing to summarise. This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill"). Closes #1179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a regression test exercising the exact reprex from #1179: scoring example_quantile with only `interval_coverage_55` warns and produces no score columns, after which `summarise_scores()` must error rather than return a data.table with a duplicate `by` column.
409145b to
d124e69
Compare
CLAUDE: Closes #1179.
Summary
summarise_scores()previously selected which columns to summarise viacolnames(scores) %like% paste(metrics, collapse = "|"). When themetricsattribute ischaracter(0)(i.e. every metric inscore()warned and returned nothing), this pattern becomes the empty string, which%like%matches against every column — including thebycolumn. Thebycolumn was then passed to the summary function, producing the spurious "argument is not numeric or logical" warning and a data.table with a duplicatebycolumn. The duplicate is invisible inside data.table but breaks downstream conversion to tibble.intersect(colnames(scores), metrics). This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill").Both tightenings from #1179 are implemented
The issue suggested two fixes (either or both); this PR does both:
summarise_scores()no longer summarises itsbycolumns. With exact-name matching,.SDcolsis nowintersect(colnames(scores), metrics), so task-ID columns (including thebycolumn) are never passed to the summary function. This removes the duplicate-column root cause.summarise_scores()errors early when there are no score columns to summarise. Whenintersect(colnames(scores), metrics)is empty there is nothing meaningful to return, so it now aborts with a clear message rather than producing a malformed object.Tests
example_quantilewith onlyinterval_coverage_55warns and produces no score columns, after whichsummarise_scores()errors.Out of scope
The issue also raises whether
score()should itself fail (rather than return an empty scores object) when every metric fails. That's a real question but a bigger design call; leaving it for a separate discussion.Test plan
testthat::test_file("tests/testthat/test-summarise_scores.R")— 16 pass)lintr::lint()clean on changed filesR CMD check/covrrelied on CI)main, which also picks up the multivariate-sample snapshot fix (Fix macOS CI snapshot precision + lint-changed-files workflow #1182) so macOS CI should be green.This was opened by a bot. Please ping @seabbs for any questions.