Skip to content

Evaluate CWL jobs that should be skipped on the leader#5507

Open
lonbar wants to merge 1 commit into
DataBiosphere:masterfrom
lonbar:master
Open

Evaluate CWL jobs that should be skipped on the leader#5507
lonbar wants to merge 1 commit into
DataBiosphere:masterfrom
lonbar:master

Conversation

@lonbar
Copy link
Copy Markdown

@lonbar lonbar commented Apr 30, 2026

CWL jobs that have a when condition that evaluates to False should not be executed. Currently anything that is not a Workflow or ExpressionTool will always run on a worker node, which means that the check on the Conditional is only done when the worker node is already allocated. If the job is instead run on the leader (local), the step won't be executed.

Checking at the instantiation level makes it possible to determine dynamically if the step should be run on the leader or the worker. This prevents unnecessary overhead in scheduling systems.

Resolves #3990.

Changelog Entry

To be copied to the draft changelog by merger:

  • Toil checks the CWL when conditional on the leader.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak adamnovak requested a review from annagiroti May 7, 2026 15:36
CWL jobs that have a when condition that evaluates to False should
not be executed. Currently anything that is not a Workflow or
ExpressionTool will always run on a worker node, which means that
the check on the Conditional is only done when the worker node is
already allocated. If the job is instead run on the leader (local),
the step won't be executed.

Checking at the instantiation level makes it possible to determine
dynamically if the step should be run on the leader or the worker.
This prevents unnecessary overhead in scheduling systems.
Comment thread src/toil/cwl/cwltoil.py
# If not using the Toil file store, output files just go directly to
# their final homes their space doesn't need to be accounted per-job.

options_dict: dict = {} # type: ignore
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the # type: ignore here intentional? As far as I can tell dict = {} is valid Python and shouldn't produce a type error. Would dict[str, Any] be a more precise annotation, and would that remove the need for the ignore comment entirely?

Comment thread src/toil/cwl/cwltoil.py
# their final homes their space doesn't need to be accounted per-job.

options_dict: dict = {} # type: ignore
run_local: bool = self.conditional.is_false(cwljob)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cwljob may still have unresolved Promise objects at init time if when references an output from an upstream step. Since Conditional.is_false resolves promises without a file store, could this either crash or return the wrong result in that case? The worst case I can think of is is_false incorrectly returning True here, setting local=True with no resources, but then the fully-resolved condition at run() time returning False, meaning real work runs on the leader with no reserved resources. Would wrapping this in a try/except that falls back to run_local = False be a safe way to handle that?

Comment thread src/toil/cwl/cwltoil.py
isinstance(tool, cwltool.command_line_tool.ExpressionTool)
or run_local
),
**options_dict,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When run_local is True, options_dict is empty so cores, memory, disk, accelerators, and preemptible all fall back to Job defaults. CWLJobWrapper, which also runs locally, explicitly passes cores=1, memory="1GiB", disk="1MiB" for its local run. Would it be worth doing the same here for consistency, rather than relying on the defaults being equivalent?

Copy link
Copy Markdown
Collaborator

@annagiroti annagiroti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach appears to be clean and the options_dict pattern for conditionally passing resources is a nice solution. My main concern is the is_false being called at init time before promises are fully resolved. This is worth making sure that can't cause issues for when conditions that reference upstream step outputs. Would it also be worth adding test cases? For example, one where the when condition is false (verifying the job isn't submitted to the batch system) and one where it references an output from a previous step (to confirm it doesn't crash or mis-schedule).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CWL Conditional steps that aren't running still get scheduled to a job scheduler.

2 participants