Skip to content

Various YAML Tool Hardening and Progress toward Tool State Goals#21828

Merged
mvdbeek merged 45 commits intogalaxyproject:devfrom
jmchilton:yaml_tool_harden_1
Feb 13, 2026
Merged

Various YAML Tool Hardening and Progress toward Tool State Goals#21828
mvdbeek merged 45 commits intogalaxyproject:devfrom
jmchilton:yaml_tool_harden_1

Conversation

@jmchilton
Copy link
Copy Markdown
Member

Extracted from #17393.

@mvdbeek this is probably too much to review in this format but it is only growing. Any ideas about stuff you'd like to see pulled out and reviewed in isolation?

I think the biggest changes to review are probably the parameter_specification.yml file and runtime.py / evaluation.py

AI Generated Summary:

Structured tool state: typed collection runtime models and YAML tool hardening

Summary

  • Typed collection runtime models: Replace untyped Dict[str, DataInternalJson] collection representations with a hierarchy of Pydantic models (DataCollectionPairedRuntime, DataCollectionListRuntime, etc.) using discriminated unions routed by collection_type. A dynamic model factory (build_collection_model_for_type) generates precise models for nested types like list:paired with exact inner type validation.
  • Persist validated tool state on Job: Add Job.tool_state column (with Alembic migration) to store Pydantic-validated internal tool state at execution time, enabling the new runtimeify path that converts internal state to CWL-style runtime representations without the legacy to_cwl function.
  • YAML tool test case validation: YAML tools now validate test inputs against Pydantic parameter models at parse time via a new test_case_json state representation. Test inputs are flattened from structured JSON to the pipe-delimited format the test framework expects.
  • Collection inputs for YAML tools: Wire collection input support (data_collection type) through the YAML tool parser, including output collections, dce source type for subcollection mapping, and comma-separated collection_type (e.g., list,paired).
  • New state representations: Add job_runtime and test_case_json to the state representation enum with full parameter specification coverage for all scalar and collection types.

Details

Typed Collection Runtime Models (lib/galaxy/tool_util_models/parameters.py)

The old representation used raw dicts (Dict[str, DataInternalJson], DataCollectionPaired). This branch introduces:

  • DataCollectionInternalJsonBase -- base model with class, name, collection_type, tags, and optional metadata fields (column_definitions, fields, has_single_item, columns)
  • Leaf models: DataCollectionListRuntime, DataCollectionPairedRuntime, DataCollectionRecordRuntime, DataCollectionPairedOrUnpairedRuntime, DataCollectionSampleSheetRuntime
  • Nested models: DataCollectionNestedListRuntime (list-like outer), DataCollectionNestedRecordRuntime (record-like outer) with recursive element unions
  • CollectionRuntimeDiscriminated -- a Discriminator-based union routing by collection_type pattern
  • build_collection_model_for_type() -- LRU-cached factory that recursively builds create_model() types with Literal[collection_type] and narrowed elements types. For list:paired, elements is List[DataCollectionPairedRuntime] (not a generic union).
  • DataCollectionElementInternalJson -- extends DataInternalJson with element_identifier and optional columns
  • DataCollectionPairedElements -- typed model requiring exactly forward and reverse fields

DataCollectionParameterModel.py_type_internal_json now routes through _runtime_model_for_collection_type(), handling single types, comma-separated types (discriminated union subsets), and unknown types (full discriminated union fallback).

Collection-to-Runtime Conversion (lib/galaxy/tools/runtime.py)

New module implementing the collection_to_runtime() pipeline:

  1. setup_for_runtimeify() builds lookup dicts for HDAs, HDCAs, and DCEs from job inputs, returns adapt_dataset and adapt_collection callbacks
  2. _build_collection_runtime_dict() recursively converts DatasetCollection -> raw dict, distinguishing list-like (array elements) from record-like (object elements) via is_list_like()
  3. _validate_collection_runtime_dict() validates raw dict through build_collection_model_for_type() with fallback to generic nested models
  4. Special metadata preserved: column_definitions (sample_sheet), fields (record), has_single_item (paired_or_unpaired), columns (sample_sheet elements)

Dynamic Model Factory for Recursive Collection Types

  • build_collection_model_for_type() recursively generates precise Pydantic models for nested collection types (e.g. list:paired, list:list:paired)
  • Dynamic models use Literal[collection_type] and narrowed element types -- inner type mismatches and depth mismatches are caught structurally by Pydantic
  • _LEAF_COLLECTION_MODELS is the single source of truth for leaf type routing
  • _collection_type_discriminator returns full collection_type strings for routing comma-separated type unions

Job Tool State Persistence

  • Job.tool_state column added (JSONType, nullable)
  • Alembic migration: b4d6191307a5
  • ExecutionSlice carries validated_param_combination
  • UserToolEvaluator.build_param_dict() uses runtimeify when job.tool_state is present, falls back to legacy to_cwl with deprecation log

YAML Tool Enhancements

  • Allow YAML tools to produce output collections
  • Allow collection inputs to YAML tools with JSON-based test syntax
  • Added value_state_representation to distinguish test_case_xml vs test_case_json test formats
  • Set minimum profile of 24.2 for YAML tools so test requests load properly
  • Improved error handling when test requests are unavailable
  • Fixed YAML tool source parsing to not mutate the underlying dict during serialization

State Flow

  1. Tool receives request state
  2. Request validated and converted to internal state
  3. Internal state persisted on Job.tool_state
  4. Internal state converted to runtime state with resolved paths/locations
  5. Runtime state serialized for tool evaluation

Test plan

Unit tests (test/unit/tool_util/test_runtime.py)

  • E2E collection-to-runtime: paired, list, record, sample_sheet, paired_or_unpaired, nested list:paired, deeply nested list:list:paired
  • Invalid structure detection: paired missing reverse
  • Dynamic model factory: leaf type routing, unknown type returns None, correct/wrong inner types, wrong collection_type literal, depth mismatch, JSON schema precision, LRU caching
  • Record-based dynamic models: record:paired accept/reject, empty dict elements
  • Unknown nested segments: list:unknown_type, record:unknown, list:list:unknown all return None

Parameter specification tests (parameter_specification.yml)

  • job_runtime_valid/job_runtime_invalid for all scalar types (int, boolean, float, text, select, genomebuild, directory_uri, hidden, drill_down)
  • job_runtime specs for structural types (conditionals, repeats, sections)
  • Cross-type discrimination: list rejects paired data, paired rejects list data
  • Comma-separated type unions: list:paired,list:list and list,list:paired with valid and invalid runtime cases
  • Collection runtime specs for all leaf types plus nested types

New functional test tools

  • 14 collection type tool definitions covering all leaf types, nested types, and comma-separated type unions
  • 4 scalar/YAML user tool definitions (boolean, data, data_multiple, select_multiple)
  • All registered in sample_tool_conf.xml

Serialization roundtrip test

  • Validates YAML tool source survives to_string() -> re-parse without alteration

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 12, 2026

  • Test inputs are flattened from structured JSON to the pipe-delimited format the test framework expects.

This seems very unfortunate? Is that the internal test framework outside of tool_util ? Because I'm certain we don't need to do that in the planemo / tool_util route. The API should also accept this unmodified

Other than that the description reads great, now to actually review it :)

Comment thread lib/galaxy/tools/__init__.py Outdated
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 12, 2026

OK, i see it's in 4cb4723 ... we only do this for tool tests ? How much would you hate it if you had claude come up with a plan to use the nested state ? I've reviewed all commits and they look great, I wonder if I missed anything but this all makes perfect sense to me.

Comment thread lib/galaxy/model/__init__.py Outdated
Comment thread lib/galaxy/tools/execute.py Outdated
Comment thread lib/galaxy/tool_util/verify/parse.py Outdated
Comment thread test/unit/tool_util/test_parameter_test_cases.py Outdated
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I think we should merge this!

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 12, 2026

... but please change the JSONType column in model and migration

@jmchilton
Copy link
Copy Markdown
Member Author

OK, i see it's in 4cb4723 ... we only do this for tool tests ? How much would you hate it if you had claude come up with a plan to use the nested state ? I've reviewed all commits and they look great, I wonder if I missed anything but this all makes perfect sense to me.

I think there is a balance between keeping things in a... natural, consistent format (nested) vs. not forking code between the older paths and the newer paths. It is possible I went to far toward converging things on the existing path through the code if that makes sense? I'm pretty consistently more conservative about these things than you - but I definitely share your desire to not be collapsing this junk into the flat form - I don't think that helps anyone.

Because I'm certain we don't need to do that in the planemo / tool_util route. The API should also accept this unmodified

That is a surprising claim. Does it not all just go through this same code? Are you sure that is how this works? I mean I know planemo can load tool tests from sidecare JSON/YAML file but no one does that right and that path through the code isn't really used? If there is an established path that does this and is working consistently - then I would especially like to see a way to avoid flat paths.

At any rate - eager to see the Claude research or if you have a prompt I can use my tokens and bake it into the branch.

@jmchilton
Copy link
Copy Markdown
Member Author

Thanks for the detailed review - it is a relief that you're onboard with the dynamic collection type models, that work is cool but it is very extra at the same time. Do you think we should do the same thing for the tool tests? I will remove the stray comments and I want to cleanup some fallback code it left in I wasn't super happy about on my review.

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 12, 2026

I mean I know planemo can load tool tests from sidecare JSON/YAML file but no one does that right and that path through the code isn't really used?

I don't know what I was thinking, yes, we don't use this at all, i think i got that all confused with workflow tests, which are not nested at all. The tool run API does support this though (I think).

jmchilton and others added 19 commits February 12, 2026 15:58
Add job_runtime_valid/job_runtime_invalid test specs for all scalar
types: int, boolean, float, text, select, genomebuild, directory_uri,
hidden, drill_down, and structural types (conditionals, repeats,
sections) containing only scalars.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove duplicate DataCollectionRecordRuntime class definition
- Add explicit branches for record, paired_or_unpaired, sample_sheet
- Include all 5 collection runtime types in fallback union

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- List type rejects paired-shaped data (object elements)
- Paired type rejects list-shaped data (array elements)
- Paired type rejects incomplete data (missing reverse)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use Pydantic discriminated unions to enforce that collection_type
matches element structure:
- Add Literal types to simple models (list, paired, record, etc.)
- Add DataCollectionSampleSheetRuntime as separate model
- Split DataCollectionNestedRuntime into NestedList and NestedRecord
- Add collection_runtime_discriminator() for routing validation
- Create CollectionRuntimeDiscriminated tagged union
- Add structure mismatch tests (list with object elements, etc.)
- Add gx_data_collection_list_or_paired test tool

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add build_collection_model_for_type() that recursively generates precise
Pydantic models for nested collection types (e.g. list:paired, list:list:paired).
Dynamic models use Literal[collection_type] and narrowed elements types so
inner type mismatches and depth mismatches are caught structurally by Pydantic
rather than requiring runtime model_validators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add dict_type() helper to _types.py for symmetric type construction
- Replace CollectionRuntimeT union with DataCollectionInternalJsonBase
  in all runtime.py annotations (dynamic models aren't in the union)
- Consolidate parallel routing tables: both _runtime_model_for_collection_type
  and _validate_collection_runtime_dict now delegate to
  build_collection_model_for_type, single source of truth is
  _LEAF_COLLECTION_MODELS

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool YAMLs + sample_tool_conf.xml entries for record, sample_sheet,
paired_or_unpaired, list:paired, list:list, list:list:paired,
list_or_record, paired_or_pou.

Rebase note: squash into 419f415 (Add cross-type discrimination tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gx_data_collection_list_paired_or_list_list and
  gx_data_collection_list_or_list_paired tool definitions + specs
- Add job_runtime valid/invalid specs for comma-separated type unions
- Add E2E test for list:list:paired, record:paired dynamic models
- Tighten nested collection assertions to use dynamic model types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebase note: squash into 52035c6 (YAML tools specify a minimum
profile of 24.2 so test requests load).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- evaluation.py: annotate input_dataset_collections dict with HDCA|DCE union
- runtime.py: refine InpDataCollectionsDictT from dict[str, Any] to proper union
- test_runtime.py: add assert-not-None guards for Optional[Type] returns from
  build_collection_model_for_type, hasattr narrowing for dynamic model elements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New gx_data_collection_any_y tool accepts any collection type (no
collection_type constraint). Specs cover list, paired, list:paired,
and record in job_runtime_valid. Fix parameter_tool_source to handle
_y suffix convention matching functional_test_tool_source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test_runtime.py imported galaxy.model and galaxy.tools.runtime which
aren't deps of the tool_util package. Split: pure tool_util_models
tests stay in test/unit/tool_util/, collection_to_runtime tests move
to test/unit/app/tools/ where galaxy.model is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bare list[] subscript requires Python 3.9+. The tool_util package
tests run on 3.8 and fail with TypeError: 'type' object is not
subscriptable. Use typing.List[] which is already imported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only caller always provides it; remove Optional + dead NotImplementedError guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- collection_runtime_discriminator: raise ValueError for unknown types
- _runtime_model_for_collection_type: remove nested-type fallback to generic models
- py_type_internal_json: raise on unrecognized declared collection_type
- _validate_collection_runtime_dict: raise instead of trying list/generic models
- Add tests for discriminator routing, unknown type rejection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Binary representation that supports equality operator.
sqlite should work fine with JSON.
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 13, 2026

Sorry about the JSON suggestion, should be JSONB on postgres. I've pushed one commit to change that and one to fix up the linting, feel free to rebase or drop these.

@jmchilton
Copy link
Copy Markdown
Member Author

Thanks for the fix - that makes sense to me! I'm loving pre-commit since you mentioned it the other day but I am occasionally falling into traps where it is using the wrong black version also. I guess because when I install the hook I have to do it for the whole repository so it is using a fixed version of black and doesn't match the branch. Figuring out how to get that to dynamically dispatch - maybe a wrapper or something - would be great.

@jmchilton jmchilton marked this pull request as ready for review February 13, 2026 14:57
@jmchilton jmchilton changed the title [WIP] Various YAML Tool Hardening and Progress toward Tool State Goals Various YAML Tool Hardening and Progress toward Tool State Goals Feb 13, 2026
@github-actions github-actions Bot added this to the 26.1 milestone Feb 13, 2026
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 13, 2026

I'm loving pre-commit since you mentioned it the other day but I am occasionally falling into traps where it is using the wrong black version also.

that's interesting, i'm just symlinking the sample config, so as long as we keep the sample config updated we should get the correct version. When the sample file changes I see the new version getting installed, so it should work ...

@mvdbeek mvdbeek merged commit a7086d7 into galaxyproject:dev Feb 13, 2026
60 of 63 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to Done in Galaxy Dev - weeklies Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants