Various YAML Tool Hardening and Progress toward Tool State Goals by jmchilton · Pull Request #21828 · galaxyproject/galaxy

jmchilton · 2026-02-11T22:08:02Z

Extracted from #17393.

@mvdbeek this is probably too much to review in this format but it is only growing. Any ideas about stuff you'd like to see pulled out and reviewed in isolation?

I think the biggest changes to review are probably the parameter_specification.yml file and runtime.py / evaluation.py

AI Generated Summary:

Structured tool state: typed collection runtime models and YAML tool hardening

Summary

Typed collection runtime models: Replace untyped Dict[str, DataInternalJson] collection representations with a hierarchy of Pydantic models (DataCollectionPairedRuntime, DataCollectionListRuntime, etc.) using discriminated unions routed by collection_type. A dynamic model factory (build_collection_model_for_type) generates precise models for nested types like list:paired with exact inner type validation.
Persist validated tool state on Job: Add Job.tool_state column (with Alembic migration) to store Pydantic-validated internal tool state at execution time, enabling the new runtimeify path that converts internal state to CWL-style runtime representations without the legacy to_cwl function.
YAML tool test case validation: YAML tools now validate test inputs against Pydantic parameter models at parse time via a new test_case_json state representation. Test inputs are flattened from structured JSON to the pipe-delimited format the test framework expects.
Collection inputs for YAML tools: Wire collection input support (data_collection type) through the YAML tool parser, including output collections, dce source type for subcollection mapping, and comma-separated collection_type (e.g., list,paired).
New state representations: Add job_runtime and test_case_json to the state representation enum with full parameter specification coverage for all scalar and collection types.

Details

Typed Collection Runtime Models (`lib/galaxy/tool_util_models/parameters.py`)

The old representation used raw dicts (Dict[str, DataInternalJson], DataCollectionPaired). This branch introduces:

DataCollectionInternalJsonBase -- base model with class, name, collection_type, tags, and optional metadata fields (column_definitions, fields, has_single_item, columns)
Leaf models: DataCollectionListRuntime, DataCollectionPairedRuntime, DataCollectionRecordRuntime, DataCollectionPairedOrUnpairedRuntime, DataCollectionSampleSheetRuntime
Nested models: DataCollectionNestedListRuntime (list-like outer), DataCollectionNestedRecordRuntime (record-like outer) with recursive element unions
CollectionRuntimeDiscriminated -- a Discriminator-based union routing by collection_type pattern
build_collection_model_for_type() -- LRU-cached factory that recursively builds create_model() types with Literal[collection_type] and narrowed elements types. For list:paired, elements is List[DataCollectionPairedRuntime] (not a generic union).
DataCollectionElementInternalJson -- extends DataInternalJson with element_identifier and optional columns
DataCollectionPairedElements -- typed model requiring exactly forward and reverse fields

DataCollectionParameterModel.py_type_internal_json now routes through _runtime_model_for_collection_type(), handling single types, comma-separated types (discriminated union subsets), and unknown types (full discriminated union fallback).

Collection-to-Runtime Conversion (`lib/galaxy/tools/runtime.py`)

New module implementing the collection_to_runtime() pipeline:

setup_for_runtimeify() builds lookup dicts for HDAs, HDCAs, and DCEs from job inputs, returns adapt_dataset and adapt_collection callbacks
_build_collection_runtime_dict() recursively converts DatasetCollection -> raw dict, distinguishing list-like (array elements) from record-like (object elements) via is_list_like()
_validate_collection_runtime_dict() validates raw dict through build_collection_model_for_type() with fallback to generic nested models
Special metadata preserved: column_definitions (sample_sheet), fields (record), has_single_item (paired_or_unpaired), columns (sample_sheet elements)

Dynamic Model Factory for Recursive Collection Types

build_collection_model_for_type() recursively generates precise Pydantic models for nested collection types (e.g. list:paired, list:list:paired)
Dynamic models use Literal[collection_type] and narrowed element types -- inner type mismatches and depth mismatches are caught structurally by Pydantic
_LEAF_COLLECTION_MODELS is the single source of truth for leaf type routing
_collection_type_discriminator returns full collection_type strings for routing comma-separated type unions

Job Tool State Persistence

Job.tool_state column added (JSONType, nullable)
Alembic migration: b4d6191307a5
ExecutionSlice carries validated_param_combination
UserToolEvaluator.build_param_dict() uses runtimeify when job.tool_state is present, falls back to legacy to_cwl with deprecation log

YAML Tool Enhancements

Allow YAML tools to produce output collections
Allow collection inputs to YAML tools with JSON-based test syntax
Added value_state_representation to distinguish test_case_xml vs test_case_json test formats
Set minimum profile of 24.2 for YAML tools so test requests load properly
Improved error handling when test requests are unavailable
Fixed YAML tool source parsing to not mutate the underlying dict during serialization

State Flow

Tool receives request state
Request validated and converted to internal state
Internal state persisted on Job.tool_state
Internal state converted to runtime state with resolved paths/locations
Runtime state serialized for tool evaluation

Test plan

Unit tests (`test/unit/tool_util/test_runtime.py`)

E2E collection-to-runtime: paired, list, record, sample_sheet, paired_or_unpaired, nested list:paired, deeply nested list:list:paired
Invalid structure detection: paired missing reverse
Dynamic model factory: leaf type routing, unknown type returns None, correct/wrong inner types, wrong collection_type literal, depth mismatch, JSON schema precision, LRU caching
Record-based dynamic models: record:paired accept/reject, empty dict elements
Unknown nested segments: list:unknown_type, record:unknown, list:list:unknown all return None

Parameter specification tests (`parameter_specification.yml`)

job_runtime_valid/job_runtime_invalid for all scalar types (int, boolean, float, text, select, genomebuild, directory_uri, hidden, drill_down)
job_runtime specs for structural types (conditionals, repeats, sections)
Cross-type discrimination: list rejects paired data, paired rejects list data
Comma-separated type unions: list:paired,list:list and list,list:paired with valid and invalid runtime cases
Collection runtime specs for all leaf types plus nested types

New functional test tools

14 collection type tool definitions covering all leaf types, nested types, and comma-separated type unions
4 scalar/YAML user tool definitions (boolean, data, data_multiple, select_multiple)
All registered in sample_tool_conf.xml

Serialization roundtrip test

Validates YAML tool source survives to_string() -> re-parse without alteration

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

mvdbeek · 2026-02-12T07:41:17Z

Test inputs are flattened from structured JSON to the pipe-delimited format the test framework expects.

This seems very unfortunate? Is that the internal test framework outside of tool_util ? Because I'm certain we don't need to do that in the planemo / tool_util route. The API should also accept this unmodified

Other than that the description reads great, now to actually review it :)

mvdbeek · 2026-02-12T07:58:16Z

OK, i see it's in 4cb4723 ... we only do this for tool tests ? How much would you hate it if you had claude come up with a plan to use the nested state ? I've reviewed all commits and they look great, I wonder if I missed anything but this all makes perfect sense to me.

mvdbeek

Awesome, I think we should merge this!

mvdbeek · 2026-02-12T08:07:00Z

... but please change the JSONType column in model and migration

jmchilton · 2026-02-12T14:53:47Z

OK, i see it's in 4cb4723 ... we only do this for tool tests ? How much would you hate it if you had claude come up with a plan to use the nested state ? I've reviewed all commits and they look great, I wonder if I missed anything but this all makes perfect sense to me.

I think there is a balance between keeping things in a... natural, consistent format (nested) vs. not forking code between the older paths and the newer paths. It is possible I went to far toward converging things on the existing path through the code if that makes sense? I'm pretty consistently more conservative about these things than you - but I definitely share your desire to not be collapsing this junk into the flat form - I don't think that helps anyone.

Because I'm certain we don't need to do that in the planemo / tool_util route. The API should also accept this unmodified

That is a surprising claim. Does it not all just go through this same code? Are you sure that is how this works? I mean I know planemo can load tool tests from sidecare JSON/YAML file but no one does that right and that path through the code isn't really used? If there is an established path that does this and is working consistently - then I would especially like to see a way to avoid flat paths.

At any rate - eager to see the Claude research or if you have a prompt I can use my tokens and bake it into the branch.

jmchilton · 2026-02-12T15:05:40Z

Thanks for the detailed review - it is a relief that you're onboard with the dynamic collection type models, that work is cool but it is very extra at the same time. Do you think we should do the same thing for the tool tests? I will remove the stray comments and I want to cleanup some fallback code it left in I wasn't super happy about on my review.

Allow passing pre-parsed parameters to parse_tool_test_descriptions() to avoid re-calling input_models_for_tool_source() when the caller already has the parameters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…lization

mvdbeek · 2026-02-12T16:19:50Z

I mean I know planemo can load tool tests from sidecare JSON/YAML file but no one does that right and that path through the code isn't really used?

I don't know what I was thinking, yes, we don't use this at all, i think i got that all confused with workflow tests, which are not nested at all. The tool run API does support this though (I think).

…seem to be working?

Add job_runtime_valid/job_runtime_invalid test specs for all scalar types: int, boolean, float, text, select, genomebuild, directory_uri, hidden, drill_down, and structural types (conditionals, repeats, sections) containing only scalars. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove duplicate DataCollectionRecordRuntime class definition - Add explicit branches for record, paired_or_unpaired, sample_sheet - Include all 5 collection runtime types in fallback union 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- List type rejects paired-shaped data (object elements) - Paired type rejects list-shaped data (array elements) - Paired type rejects incomplete data (missing reverse) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use Pydantic discriminated unions to enforce that collection_type matches element structure: - Add Literal types to simple models (list, paired, record, etc.) - Add DataCollectionSampleSheetRuntime as separate model - Split DataCollectionNestedRuntime into NestedList and NestedRecord - Add collection_runtime_discriminator() for routing validation - Create CollectionRuntimeDiscriminated tagged union - Add structure mismatch tests (list with object elements, etc.) - Add gx_data_collection_list_or_paired test tool 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add build_collection_model_for_type() that recursively generates precise Pydantic models for nested collection types (e.g. list:paired, list:list:paired). Dynamic models use Literal[collection_type] and narrowed elements types so inner type mismatches and depth mismatches are caught structurally by Pydantic rather than requiring runtime model_validators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add dict_type() helper to _types.py for symmetric type construction - Replace CollectionRuntimeT union with DataCollectionInternalJsonBase in all runtime.py annotations (dynamic models aren't in the union) - Consolidate parallel routing tables: both _runtime_model_for_collection_type and _validate_collection_runtime_dict now delegate to build_collection_model_for_type, single source of truth is _LEAF_COLLECTION_MODELS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tool YAMLs + sample_tool_conf.xml entries for record, sample_sheet, paired_or_unpaired, list:paired, list:list, list:list:paired, list_or_record, paired_or_pou. Rebase note: squash into 419f415 (Add cross-type discrimination tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add gx_data_collection_list_paired_or_list_list and gx_data_collection_list_or_list_paired tool definitions + specs - Add job_runtime valid/invalid specs for comma-separated type unions - Add E2E test for list:list:paired, record:paired dynamic models - Tighten nested collection assertions to use dynamic model types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rebase note: squash into 52035c6 (YAML tools specify a minimum profile of 24.2 so test requests load). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- evaluation.py: annotate input_dataset_collections dict with HDCA|DCE union - runtime.py: refine InpDataCollectionsDictT from dict[str, Any] to proper union - test_runtime.py: add assert-not-None guards for Optional[Type] returns from build_collection_model_for_type, hasattr narrowing for dynamic model elements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New gx_data_collection_any_y tool accepts any collection type (no collection_type constraint). Specs cover list, paired, list:paired, and record in job_runtime_valid. Fix parameter_tool_source to handle _y suffix convention matching functional_test_tool_source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test_runtime.py imported galaxy.model and galaxy.tools.runtime which aren't deps of the tool_util package. Split: pure tool_util_models tests stay in test/unit/tool_util/, collection_to_runtime tests move to test/unit/app/tools/ where galaxy.model is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Bare list[] subscript requires Python 3.9+. The tool_util package tests run on 3.8 and fail with TypeError: 'type' object is not subscriptable. Use typing.List[] which is already imported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only caller always provides it; remove Optional + dead NotImplementedError guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- collection_runtime_discriminator: raise ValueError for unknown types - _runtime_model_for_collection_type: remove nested-type fallback to generic models - py_type_internal_json: raise on unrecognized declared collection_type - _validate_collection_runtime_dict: raise instead of trying list/generic models - Add tests for discriminator routing, unknown type rejection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Binary representation that supports equality operator. sqlite should work fine with JSON.

mvdbeek · 2026-02-13T11:01:42Z

Sorry about the JSON suggestion, should be JSONB on postgres. I've pushed one commit to change that and one to fix up the linting, feel free to rebase or drop these.

jmchilton · 2026-02-13T14:57:15Z

Thanks for the fix - that makes sense to me! I'm loving pre-commit since you mentioned it the other day but I am occasionally falling into traps where it is using the wrong black version also. I guess because when I install the hook I have to do it for the whole repository so it is using a fixed version of black and doesn't match the branch. Figuring out how to get that to dynamically dispatch - maybe a wrapper or something - would be great.

mvdbeek · 2026-02-13T15:24:14Z

I'm loving pre-commit since you mentioned it the other day but I am occasionally falling into traps where it is using the wrong black version also.

that's interesting, i'm just symlinking the sample config, so as long as we keep the sample config updated we should get the correct version. When the sample file changes I see the new version getting installed, so it should work ...

jmchilton added kind/bug kind/enhancement area/tool-framework labels Feb 11, 2026

github-project-automation Bot added this to Galaxy Dev - weeklies Feb 11, 2026

github-project-automation Bot moved this to Needs Review in Galaxy Dev - weeklies Feb 11, 2026