Skip to content
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
42bf377
test(red): add failing tests for impute_missing_scores and strategy f…
seabbs-bot Mar 30, 2026
9342685
feat: implement build_missing_grid, filter_missing_scores, and filter…
seabbs-bot Mar 30, 2026
8d81290
feat(green): implement impute_missing_scores and strategy factories
seabbs-bot Mar 30, 2026
3e1298d
style: fix lint warnings in impute tests
seabbs-bot Mar 30, 2026
b15c5cf
style: suppress lint warning for internal function reference
seabbs-bot Mar 30, 2026
6253cab
Rename filter_missing_scores() to filter_scores()
seabbs-bot Mar 30, 2026
26fd2d8
fix: pass compare param through to strategy functions
seabbs-bot Mar 30, 2026
1d50511
fix: update integration test to use renamed filter_scores
seabbs-bot Mar 30, 2026
06b65e7
style: fix indentation in impute test
seabbs-bot Mar 30, 2026
dbcddb2
refactor: rename models→include, fix review issues
seabbs-bot Mar 30, 2026
6ca383f
style: fix redundant_equals_linter in test
seabbs-bot Mar 30, 2026
797b926
docs: add vignette for handling missing forecasts
seabbs-bot Mar 30, 2026
048be7c
docs: add vignette and NEWS entry for missing scores handling
seabbs-bot Mar 30, 2026
3d266ab
docs: improve vignette clarity and fix review issues
seabbs-bot Mar 30, 2026
a84aea2
docs: rewrite vignette and add CLI messages to impute_missing_scores
seabbs-bot Mar 31, 2026
3fde364
style: put cli_inform message on single line
seabbs-bot Mar 31, 2026
3ff432a
fix: use cli::qty() for correct pluralisation in impute message
seabbs-bot Mar 31, 2026
3490100
docs: credit Kim et al (2026) as inspiration for missing scores handling
seabbs-bot Mar 31, 2026
e11bc62
docs: soften absolute claims in vignette prose
seabbs-bot Mar 31, 2026
eb1ebcd
test: cover missing metric columns and nonexistent ref model
seabbs-bot Mar 31, 2026
8856e8d
docs: add articles section to pkgdown config
seabbs-bot Mar 31, 2026
bf60b74
docs: rename articles group from Getting started to Articles
seabbs-bot Mar 31, 2026
12ef1a7
revert: remove articles section from pkgdown config
seabbs-bot Mar 31, 2026
d9aa0fd
docs: address vignette review TODOs
seabbs-bot Mar 31, 2026
3cfcfcc
test: add integration tests for filter and impute with real data
seabbs-bot Mar 31, 2026
22c790f
docs: address vignette TODOs, suppress test messages
seabbs-bot Mar 31, 2026
a7934c6
style: use bare i = in cli_inform to avoid nolint blocks
seabbs-bot Mar 31, 2026
7c88db2
style: remove unhelpful comment in build_missing_grid
seabbs-bot Mar 31, 2026
1803d9c
docs: add Post-process scores section to pkgdown reference
seabbs-bot Mar 31, 2026
cfdc3c4
test: verify impute_mean_score values, original rows unchanged, filte…
seabbs-bot Mar 31, 2026
1cf4beb
test: update vdiffr plot snapshots after merge from main
seabbs-bot Apr 1, 2026
f9f91f1
Merge branch 'main' into issue-1122-missing-scores
seabbs Apr 8, 2026
5da130a
Merge branch 'main' into issue-1122-missing-scores
seabbs Apr 20, 2026
8e380f8
Merge branch 'main' into issue-1122-missing-scores
seabbs Apr 20, 2026
1953157
Merge branch 'main' into issue-1122-missing-scores
seabbs Apr 20, 2026
0d3cf4b
refactor(filter-scores): split intersection strategy and tidy
seabbs-bot Apr 22, 2026
5a4beee
refactor(impute-scores): share summary helper and guard all-NA
seabbs-bot Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ export(dispersion_quantile)
export(dispersion_sample)
export(dss_sample)
export(energy_score_multivariate)
export(filter_scores)
export(filter_to_intersection)
export(get_correlations)
export(get_coverage)
export(get_duplicate_forecasts)
Expand All @@ -88,6 +90,11 @@ export(get_grouping)
export(get_metrics)
export(get_pairwise_comparisons)
export(get_pit_histogram)
export(impute_mean_score)
export(impute_missing_scores)
export(impute_model_score)
export(impute_na_score)
export(impute_worst_score)
export(interval_coverage)
export(is_forecast)
export(is_forecast_binary)
Expand Down Expand Up @@ -178,6 +185,7 @@ importFrom(data.table,key)
importFrom(data.table,melt)
importFrom(data.table,nafill)
importFrom(data.table,rbindlist)
importFrom(data.table,set)
importFrom(data.table,setDT)
importFrom(data.table,setattr)
importFrom(data.table,setcolorder)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# scoringutils (development version)

- Added `filter_scores()` and `impute_missing_scores()` for handling missing forecasts before summarisation. `filter_scores()` removes target combinations with insufficient model coverage, while `impute_missing_scores()` fills in missing scores using configurable strategies (worst, mean, NA, or reference model). Both use a strategy function pattern for extensibility. See `vignette("handling-missing-forecasts")` for details (#1122).
- Added internal S3 generic `get_forecast_type_ids()` so each forecast type declares the columns (beyond the forecast unit) that identify a unique row. `get_duplicate_forecasts()` now uses this instead of hard-coded column names (#888).
- Removed the deprecated vignettes `Deprecated-functions` and `Deprecated-visualisations`. The code for removed functions (`plot_predictions()`, `make_NA()`, `plot_ranges()`, `plot_score_table()`, `merge_pred_and_obs()`) can still be found in the [git history](https://github.com/epiforecasts/scoringutils/tree/d0cd8e2/vignettes) (#1158).

Expand Down
166 changes: 166 additions & 0 deletions R/filter-scores.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
#' @title Filter scores
#'
#' @description
#' Filters a `scores` object according to a given strategy.
#' The filtering behaviour is controlled by the `strategy`
#' argument, which defaults to [filter_to_intersection()].
#' This is a general-purpose filtering function that delegates
#' all logic to the strategy.
#'
#' @param scores An object of class `scores` (a data.table with
#' scores and an additional attribute `metrics` as produced
#' by [score()]).
#' @param strategy A strategy function as returned by
#' [filter_to_intersection()]. Default is
#' `filter_to_intersection()`.
Comment thread
sbfnk marked this conversation as resolved.
Outdated
#' @param compare Character string (default `"model"`) naming the
#' column whose values are compared for filtering.
#'
#' @return A filtered `scores` object with the same class and
#' `metrics` attribute as the input.
#'
#' @seealso \code{vignette("handling-missing-forecasts")}
#' @importFrom cli cli_inform
#' @importFrom checkmate assert_class assert_character
#' assert_function assert_subset
#' @export
#' @keywords postprocess-scores
Comment thread
sbfnk marked this conversation as resolved.
filter_scores <- function(
scores,
strategy = filter_to_intersection(),
Comment thread
sbfnk marked this conversation as resolved.
compare = "model"
Comment thread
sbfnk marked this conversation as resolved.
) {
assert_class(scores, "scores")
assert_character(compare, len = 1)
assert_subset(compare, names(scores))
assert_function(strategy)

original_class <- class(scores)
original_metrics <- attr(scores, "metrics")

result <- strategy(scores, compare = compare)
Comment thread
sbfnk marked this conversation as resolved.

n_before <- nrow(scores)
n_after <- nrow(result)
#nolint start: object_usage_linter
Comment thread
sbfnk marked this conversation as resolved.
Outdated
n_dropped <- n_before - n_after
#nolint end

if (n_dropped == 0) {
cli_inform(c(
i = "No rows filtered. Returning scores unchanged."
))
return(scores)
}

cli_inform(c(
i = "Filtered out {n_dropped} rows.",
i = "{n_after} of {n_before} rows remaining." # nolint: duplicate_argument_linter
))

# Preserve class and metrics
class(result) <- original_class
data.table::setattr(result, "metrics", original_metrics)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is different to what's done in impute_missing_scores() (calling new_scores())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched filter_scores() to return via new_scores() so the class and metric preservation path matches impute_missing_scores(). 0d3cf4b.


return(result)
}


#' @title Filter to intersection of model-target combinations
#'
#' @description
#' Strategy factory for [filter_scores()].
#' Returns a function that keeps only target combinations
#' covered by a minimum proportion of comparators.
#'
#' @param min_coverage Numeric between 0 and 1 (default `1`).
#' Minimum proportion of comparators that must cover a
#' target combination for it to be kept.
#' @param include Character vector or `NULL` (default). If
#' provided, the target grid is restricted to targets
#' covered by these values of the `compare` column. When
Copy link
Copy Markdown
Contributor

@sbfnk sbfnk Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this hard to parse - what's the "compare" column? This hasn't been mentioned yet at this stage of the docs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the include docstring to refer to values of the compare column (the compare arg of filter_scores()) rather than a floating 'compare column'. 0d3cf4b.

#' multiple values are given, only the intersection of
#' their targets is used.
#'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking through the code I think min_coverage and include are mutually exclusive - should this be two different functions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally the documentation is quite dense (I didn't understand it before looking at the code) and really could do with some examples.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split: filter_to_intersection(min_coverage) handles the coverage threshold, filter_to_include(include) handles the named intersection. Each now has single-purpose docs and its own examples. 0d3cf4b.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs rewritten and examples added on both strategy factories. 0d3cf4b.

#' @return A function with signature `function(scores, compare)`
#' suitable for use as a strategy in
#' [filter_scores()].
#'
#' @importFrom data.table as.data.table setkeyv
#' @importFrom checkmate assert_number assert_character
#' @export
#' @keywords postprocess-scores
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @examples on both filter_to_intersection() and the new filter_to_include(). 0d3cf4b.

filter_to_intersection <- function(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make clear that users should only call this inside filter_scores()? As calling filter_to_intersection()(scores, compare = "model") won't do the class/metrics preserving? I can see the rationale for having this so as to be able to pass filter-specific arguments to filter_scores() but as a set up I think it risks being confusing and potentially lead to opaque errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes or least they should mostly do so as that is where the guard rails are. Open to other designs but see general comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the main comment I think just making this the callable function would be clearer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs on both strategy factories now explicitly note they are intended to be passed to filter_scores() rather than called directly (that's where class/metrics preservation happens). 0d3cf4b.

min_coverage = 1,
include = NULL
) {
assert_number(min_coverage, lower = 0, upper = 1)
if (!is.null(include)) {
assert_character(include, min.len = 1)
}

function(scores, compare = "model") {
scores <- data.table::as.data.table(scores)
forecast_unit <- get_forecast_unit(scores)
target_cols <- setdiff(forecast_unit, compare)

if (!is.null(include)) {
unknown <- setdiff(include, unique(scores[[compare]]))
if (length(unknown) > 0) {
cli::cli_abort(c(
"!" = paste0(
"{.val {unknown}} not found in ",
"{.arg {compare}} column."
)
))
}
# Restrict to targets covered by specified values
model_targets <- lapply(include, function(m) {
unique(
scores[
scores[[compare]] == m,
target_cols,
with = FALSE
]
)
})
# Intersection of all specified values' targets
qualifying <- model_targets[[1]]
if (length(model_targets) > 1) {
for (i in seq(2, length(model_targets))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be shortened with Reduce() potentially?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — now Reduce(function(a, b) merge(a, b, by = target_cols), target_sets). 0d3cf4b.

data.table::setkeyv(qualifying, target_cols)
data.table::setkeyv(
model_targets[[i]], target_cols
)
Comment thread
seabbs marked this conversation as resolved.
Outdated
qualifying <- merge(
qualifying, model_targets[[i]],
by = target_cols
)
}
}
} else {
# Count include per target combination
all_include <- unique(scores[[compare]])
n_total <- length(all_include)

target_coverage <- scores[
, .(n_include = data.table::uniqueN(get(compare))),
by = target_cols
]
#nolint start: object_usage_linter
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really needed? Maybe with the data.table stuff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is what is driving it. Agree annoying will see if can see a way to avoid it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the for-loop setkeyv + merge chain with Reduce(), which only needs by = target_cols — the intermediate setkeyv calls are gone. 0d3cf4b.

qualifying <- target_coverage[
n_include / n_total >= min_coverage,
#nolint end
target_cols,
with = FALSE
]
}

# Semi-join: keep scores rows matching qualifying targets
data.table::setkeyv(scores, target_cols)
data.table::setkeyv(qualifying, target_cols)
result <- scores[qualifying, nomatch = NULL]

return(result)
}
}
1 change: 1 addition & 0 deletions R/get-protected-columns.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ get_protected_columns <- function(data = NULL) {

protected_columns <- c(
".mv_group_id",
".imputed",
"predicted", "observed", "sample_id", "quantile_level", "upper", "lower",
"pit_value", "interval_range", "boundary", "predicted_label",
"interval_coverage", "interval_coverage_deviation",
Expand Down
Loading
Loading