-
Notifications
You must be signed in to change notification settings - Fork 25
Handle missing forecasts before summarisation #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 35 commits
42bf377
9342685
8d81290
3e1298d
b15c5cf
6253cab
26fd2d8
1d50511
06b65e7
dbcddb2
6ca383f
797b926
048be7c
3d266ab
a84aea2
3fde364
3ff432a
3490100
e11bc62
eb1ebcd
8856e8d
bf60b74
12ef1a7
d9aa0fd
3cfcfcc
22c790f
a7934c6
7c88db2
1803d9c
cfdc3c4
1cf4beb
f9f91f1
5da130a
8e380f8
1953157
0d3cf4b
5a4beee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| #' @title Filter scores | ||
| #' | ||
| #' @description | ||
| #' Filters a `scores` object according to a given strategy. | ||
| #' The filtering behaviour is controlled by the `strategy` | ||
| #' argument, which defaults to [filter_to_intersection()]. | ||
| #' This is a general-purpose filtering function that delegates | ||
| #' all logic to the strategy. | ||
| #' | ||
| #' @param scores An object of class `scores` (a data.table with | ||
| #' scores and an additional attribute `metrics` as produced | ||
| #' by [score()]). | ||
| #' @param strategy A strategy function as returned by | ||
| #' [filter_to_intersection()]. Default is | ||
| #' `filter_to_intersection()`. | ||
| #' @param compare Character string (default `"model"`) naming the | ||
| #' column whose values are compared for filtering. | ||
| #' | ||
| #' @return A filtered `scores` object with the same class and | ||
| #' `metrics` attribute as the input. | ||
| #' | ||
| #' @seealso \code{vignette("handling-missing-forecasts")} | ||
| #' @importFrom cli cli_inform | ||
| #' @importFrom checkmate assert_class assert_character | ||
| #' assert_function assert_subset | ||
| #' @export | ||
| #' @keywords postprocess-scores | ||
|
sbfnk marked this conversation as resolved.
|
||
| filter_scores <- function( | ||
| scores, | ||
| strategy = filter_to_intersection(), | ||
|
sbfnk marked this conversation as resolved.
|
||
| compare = "model" | ||
|
sbfnk marked this conversation as resolved.
|
||
| ) { | ||
| assert_class(scores, "scores") | ||
| assert_character(compare, len = 1) | ||
| assert_subset(compare, names(scores)) | ||
| assert_function(strategy) | ||
|
|
||
| original_class <- class(scores) | ||
| original_metrics <- attr(scores, "metrics") | ||
|
|
||
| result <- strategy(scores, compare = compare) | ||
|
sbfnk marked this conversation as resolved.
|
||
|
|
||
| n_before <- nrow(scores) | ||
| n_after <- nrow(result) | ||
| #nolint start: object_usage_linter | ||
|
sbfnk marked this conversation as resolved.
Outdated
|
||
| n_dropped <- n_before - n_after | ||
| #nolint end | ||
|
|
||
| if (n_dropped == 0) { | ||
| cli_inform(c( | ||
| i = "No rows filtered. Returning scores unchanged." | ||
| )) | ||
| return(scores) | ||
| } | ||
|
|
||
| cli_inform(c( | ||
| i = "Filtered out {n_dropped} rows.", | ||
| i = "{n_after} of {n_before} rows remaining." # nolint: duplicate_argument_linter | ||
| )) | ||
|
|
||
| # Preserve class and metrics | ||
| class(result) <- original_class | ||
| data.table::setattr(result, "metrics", original_metrics) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why this is different to what's done in
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Switched |
||
|
|
||
| return(result) | ||
| } | ||
|
|
||
|
|
||
| #' @title Filter to intersection of model-target combinations | ||
| #' | ||
| #' @description | ||
| #' Strategy factory for [filter_scores()]. | ||
| #' Returns a function that keeps only target combinations | ||
| #' covered by a minimum proportion of comparators. | ||
| #' | ||
| #' @param min_coverage Numeric between 0 and 1 (default `1`). | ||
| #' Minimum proportion of comparators that must cover a | ||
| #' target combination for it to be kept. | ||
| #' @param include Character vector or `NULL` (default). If | ||
| #' provided, the target grid is restricted to targets | ||
| #' covered by these values of the `compare` column. When | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this hard to parse - what's the "compare" column? This hasn't been mentioned yet at this stage of the docs.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rewrote the |
||
| #' multiple values are given, only the intersection of | ||
| #' their targets is used. | ||
| #' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looking through the code I think
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More generally the documentation is quite dense (I didn't understand it before looking at the code) and really could do with some examples.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Split:
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Docs rewritten and examples added on both strategy factories. 0d3cf4b. |
||
| #' @return A function with signature `function(scores, compare)` | ||
| #' suitable for use as a strategy in | ||
| #' [filter_scores()]. | ||
| #' | ||
| #' @importFrom data.table as.data.table setkeyv | ||
| #' @importFrom checkmate assert_number assert_character | ||
| #' @export | ||
| #' @keywords postprocess-scores | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. examples?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| filter_to_intersection <- function( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to make clear that users should only call this inside
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes or least they should mostly do so as that is where the guard rails are. Open to other designs but see general comment.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned in the main comment I think just making this the callable function would be clearer.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Docs on both strategy factories now explicitly note they are intended to be passed to |
||
| min_coverage = 1, | ||
| include = NULL | ||
| ) { | ||
| assert_number(min_coverage, lower = 0, upper = 1) | ||
| if (!is.null(include)) { | ||
| assert_character(include, min.len = 1) | ||
| } | ||
|
|
||
| function(scores, compare = "model") { | ||
| scores <- data.table::as.data.table(scores) | ||
| forecast_unit <- get_forecast_unit(scores) | ||
| target_cols <- setdiff(forecast_unit, compare) | ||
|
|
||
| if (!is.null(include)) { | ||
| unknown <- setdiff(include, unique(scores[[compare]])) | ||
| if (length(unknown) > 0) { | ||
| cli::cli_abort(c( | ||
| "!" = paste0( | ||
| "{.val {unknown}} not found in ", | ||
| "{.arg {compare}} column." | ||
| ) | ||
| )) | ||
| } | ||
| # Restrict to targets covered by specified values | ||
| model_targets <- lapply(include, function(m) { | ||
| unique( | ||
| scores[ | ||
| scores[[compare]] == m, | ||
| target_cols, | ||
| with = FALSE | ||
| ] | ||
| ) | ||
| }) | ||
| # Intersection of all specified values' targets | ||
| qualifying <- model_targets[[1]] | ||
| if (length(model_targets) > 1) { | ||
| for (i in seq(2, length(model_targets))) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could be shortened with
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done — now |
||
| data.table::setkeyv(qualifying, target_cols) | ||
| data.table::setkeyv( | ||
| model_targets[[i]], target_cols | ||
| ) | ||
|
seabbs marked this conversation as resolved.
Outdated
|
||
| qualifying <- merge( | ||
| qualifying, model_targets[[i]], | ||
| by = target_cols | ||
| ) | ||
| } | ||
| } | ||
| } else { | ||
| # Count include per target combination | ||
| all_include <- unique(scores[[compare]]) | ||
| n_total <- length(all_include) | ||
|
|
||
| target_coverage <- scores[ | ||
| , .(n_include = data.table::uniqueN(get(compare))), | ||
| by = target_cols | ||
| ] | ||
| #nolint start: object_usage_linter | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. really needed? Maybe with the data.table stuff.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that is what is driving it. Agree annoying will see if can see a way to avoid it.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replaced the for-loop |
||
| qualifying <- target_coverage[ | ||
| n_include / n_total >= min_coverage, | ||
| #nolint end | ||
| target_cols, | ||
| with = FALSE | ||
| ] | ||
| } | ||
|
|
||
| # Semi-join: keep scores rows matching qualifying targets | ||
| data.table::setkeyv(scores, target_cols) | ||
| data.table::setkeyv(qualifying, target_cols) | ||
| result <- scores[qualifying, nomatch = NULL] | ||
|
|
||
| return(result) | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.