Skip to content

modified sort_ard_hierarchical function#550

Open
rikoprogrammer wants to merge 14 commits into
insightsengineering:mainfrom
rikoprogrammer:ard_sort
Open

modified sort_ard_hierarchical function#550
rikoprogrammer wants to merge 14 commits into
insightsengineering:mainfrom
rikoprogrammer:ard_sort

Conversation

@rikoprogrammer
Copy link
Copy Markdown

Propose to add the ability to sort an ard based on a particular treatment column

There was a need to sort hierarchical tables based on one of the treatment columns eg Placebo. The current function sorts by getting the sum across all the treatment columns. So I proposed if we can get an extra argument in the function that can be used for this purpose.

Close issue #548 .


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅ complete
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main() : yes
  • If a bug was fixed, a unit test was added. complete
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage() :complete
  • Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge".

Optional Reverse Dependency Checks:

Install checked with pak::pak("Genentech/checked") or pak::pak("checked")

# Check dev versions of `cardx`, `gtsummary`, and `tfrmt` which are in the `ddsjoberg` R Universe
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2L, repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))"

# Check CRAN reverse dependencies but run tests skipped on CRAN
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

# Check CRAN reverse dependencies in a CRAN-like environment
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

✅ All contributors have signed the CLA
Posted by the CLA Assistant Lite bot.

@rikoprogrammer
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@ddsjoberg
Copy link
Copy Markdown
Collaborator

@rikoprogrammer thanks for the PR. Can you address the CI/CD failures, please? The contributing guide often guides us to the resolution. https://github.com/insightsengineering/cards/blob/main/.github/CONTRIBUTING.md

Thanks!

@rikoprogrammer
Copy link
Copy Markdown
Author

rikoprogrammer commented Apr 28, 2026 via email

Comment thread .Rhistory Outdated
Comment on lines +1 to +3
Sys.getenv()
library(cards)
library(gtsummary)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add this to .gitignore

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the file)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my goodness! can you add, please? I am surprised it wasn't already there! 😱

@rikoprogrammer
Copy link
Copy Markdown
Author

rikoprogrammer commented Apr 28, 2026

Hello, How do I correct these Latex related errors. They don't show up in my local R session, so its a bit difficult for me to catch them. Thanks

image

Comment thread R/sort_ard_hierarchical.R Outdated
#' sums, otherwise `p` is used. If neither `n` nor `p` are present in `x` for the variable, an error will occur.
#'
#' Defaults to `everything() ~ "descending"`.
#' @param sort_col \cr
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the argument name, we put the expected type of input. maybe that is causing your latex issue

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay,thanks for the hint.

Comment thread .gitignore
@ddsjoberg
Copy link
Copy Markdown
Collaborator

Code Review

1. Committed .Rhistory file

The .Rhistory file (181 lines of interactive R session history) is committed to the repository. This file contains local filesystem paths (D:/ARDs/cards), Sys.getenv() calls, and development workflow noise. It must be removed from the PR.

While .Rhistory was added to .gitignore in this PR, that only prevents future tracking — the file is already staged and will be part of the repo. Run:

git rm --cached .Rhistory

2. Correctness: stat column mutation is destructive

This is the most significant issue. The core logic (lines 291–300 of the diff) mutates the stat column in-place, zeroing out stats for non-matching treatment arms:

if (!is.null(sort_col)) {
  x <- x |>
    dplyr::mutate(stat = dplyr::case_when(
      stat_name == sort_stat & variable == dplyr::last(ard_args$variables) & group1_level == sort_col ~ stat,
      stat_name == sort_stat & variable == dplyr::last(ard_args$variables) & group1_level != sort_col ~ list(0),
      TRUE ~ stat
    ))
}

Problems:

  • Destroys data: The modified x with zeroed-out stats flows into x_sums computation and then is returned via x |> dplyr::left_join(x_sums, ...) at the end of .append_hierarchy_sums(). The caller in sort_ard_hierarchical() assigns this back to x_sort, which carries the corrupted stat values. While the final output uses x[idx_sorted, ] (the original x), this only works because x_sort is used solely for index extraction. This is fragile — any future refactor that uses x_sort values directly would silently produce wrong results.

  • Only filters on dplyr::last(ard_args$variables): The zeroing only applies to the innermost variable. For outer hierarchy levels (e.g., AESOC when variables are c(AESOC, AEDECOD)), the sort_col filtering has no effect — sums are still computed across all treatment arms. This means the feature only partially works for multi-level hierarchies.

  • Hardcodes group1_level: The comparison group1_level == sort_col assumes the treatment/by variable is always in group1_level. This breaks when there are multiple by variables or when the ARD structure places the treatment column elsewhere.

Suggested approach: Instead of mutating stat, filter the rows used for sum computation. The x_sums pipeline already has a dplyr::filter(...) step — add the sort_col filter there:

x_sums <- x |>
  dplyr::filter(
    .data$stat_name == sort_stat,
    if (!is.null(sort_col)) .data$group1_level == sort_col else TRUE,
    # ... existing filters ...
  )

This avoids mutating x entirely and is both simpler and safer.

3. No input validation for sort_col

There is no validation that sort_col:

  • Is a single string (not a vector)
  • Actually exists as a level in the by variable of the ARD
  • Is only used when sort includes "descending" (it has no meaning with "alphanumeric")

Passing a non-existent treatment name would silently zero out all stats and produce an arbitrary sort order with no error or warning. Add a check_string() call and validate that the value exists in the ARD's group levels.

4. No unit tests for sort_col

The PR adds no tests for the new sort_col parameter. The existing test file (test-sort_ard_hierarchical.R) has thorough coverage for sort but nothing for sort_col. At minimum, tests should cover:

  • Basic sort_col usage with a valid treatment arm name
  • sort_col with a non-existent treatment name (should error)
  • sort_col with multi-level hierarchies (verify all levels sort correctly)
  • sort_col combined with mixed sort types (sort = list(VAR1 ~ "alphanumeric", VAR2 ~ "descending"))

5. Default argument sort_col = sort_col in internal function

.append_hierarchy_sums <- function(x, ard_args, cols, i, sort_col = sort_col) {

The default sort_col = sort_col is a self-referencing default that relies on R's lazy evaluation to resolve sort_col from the calling scope. While this works in R, it's confusing and fragile. Use sort_col = NULL instead, which matches the public API's default and is explicit.

6. else { x <- x } is a no-op

} else {
  x <- x
}

This branch does nothing. Remove it.

7. Documentation

  • The @param sort_col description has a typo: "leave it blank if you to sort" → "leave it blank if you want to sort".
  • The parameter name sort_col is misleading — it's not a column name, it's a level value within the by column. A name like sort_by_level or by_level would be clearer. The current name suggests you're specifying which column to sort by, not which treatment arm level to use for sorting.
  • The description says "treatment column" but the feature applies to any by variable, not just treatment.

8. .gitignore changes

Adding .Rhistory, .RData, .Ruserdata to .gitignore is a good improvement, but it should be a separate commit or PR since it's unrelated to the sort_col feature. The trailing newline addition is fine.

Summary

The feature addresses a real need (issue #548), but the implementation has correctness concerns around destructive mutation of the stat column and incomplete handling of multi-level hierarchies. The PR also needs the .Rhistory file removed, input validation added, and unit tests written before it's ready for merge.

Signed-off-by: Daniel Sjoberg <danield.sjoberg@gmail.com>
@rikoprogrammer
Copy link
Copy Markdown
Author

Thanks for the comments @ddsjoberg
I will check this in due course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants