Various YAML Tool Hardening and Progress toward Tool State Goals#21828
Various YAML Tool Hardening and Progress toward Tool State Goals#21828mvdbeek merged 45 commits intogalaxyproject:devfrom
Conversation
This seems very unfortunate? Is that the internal test framework outside of tool_util ? Because I'm certain we don't need to do that in the planemo / tool_util route. The API should also accept this unmodified Other than that the description reads great, now to actually review it :) |
|
OK, i see it's in 4cb4723 ... we only do this for tool tests ? How much would you hate it if you had claude come up with a plan to use the nested state ? I've reviewed all commits and they look great, I wonder if I missed anything but this all makes perfect sense to me. |
mvdbeek
left a comment
There was a problem hiding this comment.
Awesome, I think we should merge this!
|
... but please change the JSONType column in model and migration |
I think there is a balance between keeping things in a... natural, consistent format (nested) vs. not forking code between the older paths and the newer paths. It is possible I went to far toward converging things on the existing path through the code if that makes sense? I'm pretty consistently more conservative about these things than you - but I definitely share your desire to not be collapsing this junk into the flat form - I don't think that helps anyone.
That is a surprising claim. Does it not all just go through this same code? Are you sure that is how this works? I mean I know planemo can load tool tests from sidecare JSON/YAML file but no one does that right and that path through the code isn't really used? If there is an established path that does this and is working consistently - then I would especially like to see a way to avoid flat paths. At any rate - eager to see the Claude research or if you have a prompt I can use my tokens and bake it into the branch. |
|
Thanks for the detailed review - it is a relief that you're onboard with the dynamic collection type models, that work is cool but it is very extra at the same time. Do you think we should do the same thing for the tool tests? I will remove the stray comments and I want to cleanup some fallback code it left in I wasn't super happy about on my review. |
Allow passing pre-parsed parameters to parse_tool_test_descriptions() to avoid re-calling input_models_for_tool_source() when the caller already has the parameters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
I don't know what I was thinking, yes, we don't use this at all, i think i got that all confused with workflow tests, which are not nested at all. The tool run API does support this though (I think). |
…seem to be working?
Add job_runtime_valid/job_runtime_invalid test specs for all scalar types: int, boolean, float, text, select, genomebuild, directory_uri, hidden, drill_down, and structural types (conditionals, repeats, sections) containing only scalars. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove duplicate DataCollectionRecordRuntime class definition - Add explicit branches for record, paired_or_unpaired, sample_sheet - Include all 5 collection runtime types in fallback union 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- List type rejects paired-shaped data (object elements) - Paired type rejects list-shaped data (array elements) - Paired type rejects incomplete data (missing reverse) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use Pydantic discriminated unions to enforce that collection_type matches element structure: - Add Literal types to simple models (list, paired, record, etc.) - Add DataCollectionSampleSheetRuntime as separate model - Split DataCollectionNestedRuntime into NestedList and NestedRecord - Add collection_runtime_discriminator() for routing validation - Create CollectionRuntimeDiscriminated tagged union - Add structure mismatch tests (list with object elements, etc.) - Add gx_data_collection_list_or_paired test tool 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add build_collection_model_for_type() that recursively generates precise Pydantic models for nested collection types (e.g. list:paired, list:list:paired). Dynamic models use Literal[collection_type] and narrowed elements types so inner type mismatches and depth mismatches are caught structurally by Pydantic rather than requiring runtime model_validators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add dict_type() helper to _types.py for symmetric type construction - Replace CollectionRuntimeT union with DataCollectionInternalJsonBase in all runtime.py annotations (dynamic models aren't in the union) - Consolidate parallel routing tables: both _runtime_model_for_collection_type and _validate_collection_runtime_dict now delegate to build_collection_model_for_type, single source of truth is _LEAF_COLLECTION_MODELS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool YAMLs + sample_tool_conf.xml entries for record, sample_sheet, paired_or_unpaired, list:paired, list:list, list:list:paired, list_or_record, paired_or_pou. Rebase note: squash into 419f415 (Add cross-type discrimination tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gx_data_collection_list_paired_or_list_list and gx_data_collection_list_or_list_paired tool definitions + specs - Add job_runtime valid/invalid specs for comma-separated type unions - Add E2E test for list:list:paired, record:paired dynamic models - Tighten nested collection assertions to use dynamic model types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebase note: squash into 52035c6 (YAML tools specify a minimum profile of 24.2 so test requests load). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- evaluation.py: annotate input_dataset_collections dict with HDCA|DCE union - runtime.py: refine InpDataCollectionsDictT from dict[str, Any] to proper union - test_runtime.py: add assert-not-None guards for Optional[Type] returns from build_collection_model_for_type, hasattr narrowing for dynamic model elements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New gx_data_collection_any_y tool accepts any collection type (no collection_type constraint). Specs cover list, paired, list:paired, and record in job_runtime_valid. Fix parameter_tool_source to handle _y suffix convention matching functional_test_tool_source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test_runtime.py imported galaxy.model and galaxy.tools.runtime which aren't deps of the tool_util package. Split: pure tool_util_models tests stay in test/unit/tool_util/, collection_to_runtime tests move to test/unit/app/tools/ where galaxy.model is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bare list[] subscript requires Python 3.9+. The tool_util package tests run on 3.8 and fail with TypeError: 'type' object is not subscriptable. Use typing.List[] which is already imported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only caller always provides it; remove Optional + dead NotImplementedError guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- collection_runtime_discriminator: raise ValueError for unknown types - _runtime_model_for_collection_type: remove nested-type fallback to generic models - py_type_internal_json: raise on unrecognized declared collection_type - _validate_collection_runtime_dict: raise instead of trying list/generic models - Add tests for discriminator routing, unknown type rejection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
607b44e to
aa67790
Compare
Binary representation that supports equality operator. sqlite should work fine with JSON.
|
Sorry about the JSON suggestion, should be JSONB on postgres. I've pushed one commit to change that and one to fix up the linting, feel free to rebase or drop these. |
|
Thanks for the fix - that makes sense to me! I'm loving pre-commit since you mentioned it the other day but I am occasionally falling into traps where it is using the wrong black version also. I guess because when I install the hook I have to do it for the whole repository so it is using a fixed version of black and doesn't match the branch. Figuring out how to get that to dynamically dispatch - maybe a wrapper or something - would be great. |
that's interesting, i'm just symlinking the sample config, so as long as we keep the sample config updated we should get the correct version. When the sample file changes I see the new version getting installed, so it should work ... |
Extracted from #17393.
@mvdbeek this is probably too much to review in this format but it is only growing. Any ideas about stuff you'd like to see pulled out and reviewed in isolation?
I think the biggest changes to review are probably the parameter_specification.yml file and runtime.py / evaluation.py
AI Generated Summary:
Structured tool state: typed collection runtime models and YAML tool hardening
Summary
Dict[str, DataInternalJson]collection representations with a hierarchy of Pydantic models (DataCollectionPairedRuntime,DataCollectionListRuntime, etc.) using discriminated unions routed bycollection_type. A dynamic model factory (build_collection_model_for_type) generates precise models for nested types likelist:pairedwith exact inner type validation.Job.tool_statecolumn (with Alembic migration) to store Pydantic-validated internal tool state at execution time, enabling the newruntimeifypath that converts internal state to CWL-style runtime representations without the legacyto_cwlfunction.test_case_jsonstate representation. Test inputs are flattened from structured JSON to the pipe-delimited format the test framework expects.data_collectiontype) through the YAML tool parser, including output collections,dcesource type for subcollection mapping, and comma-separatedcollection_type(e.g.,list,paired).job_runtimeandtest_case_jsonto the state representation enum with full parameter specification coverage for all scalar and collection types.Details
Typed Collection Runtime Models (
lib/galaxy/tool_util_models/parameters.py)The old representation used raw dicts (
Dict[str, DataInternalJson],DataCollectionPaired). This branch introduces:DataCollectionInternalJsonBase-- base model withclass,name,collection_type,tags, and optional metadata fields (column_definitions,fields,has_single_item,columns)DataCollectionListRuntime,DataCollectionPairedRuntime,DataCollectionRecordRuntime,DataCollectionPairedOrUnpairedRuntime,DataCollectionSampleSheetRuntimeDataCollectionNestedListRuntime(list-like outer),DataCollectionNestedRecordRuntime(record-like outer) with recursive element unionsCollectionRuntimeDiscriminated-- aDiscriminator-based union routing bycollection_typepatternbuild_collection_model_for_type()-- LRU-cached factory that recursively buildscreate_model()types withLiteral[collection_type]and narrowedelementstypes. Forlist:paired, elements isList[DataCollectionPairedRuntime](not a generic union).DataCollectionElementInternalJson-- extendsDataInternalJsonwithelement_identifierand optionalcolumnsDataCollectionPairedElements-- typed model requiring exactlyforwardandreversefieldsDataCollectionParameterModel.py_type_internal_jsonnow routes through_runtime_model_for_collection_type(), handling single types, comma-separated types (discriminated union subsets), and unknown types (full discriminated union fallback).Collection-to-Runtime Conversion (
lib/galaxy/tools/runtime.py)New module implementing the
collection_to_runtime()pipeline:setup_for_runtimeify()builds lookup dicts for HDAs, HDCAs, and DCEs from job inputs, returnsadapt_datasetandadapt_collectioncallbacks_build_collection_runtime_dict()recursively convertsDatasetCollection-> raw dict, distinguishing list-like (array elements) from record-like (object elements) viais_list_like()_validate_collection_runtime_dict()validates raw dict throughbuild_collection_model_for_type()with fallback to generic nested modelscolumn_definitions(sample_sheet),fields(record),has_single_item(paired_or_unpaired),columns(sample_sheet elements)Dynamic Model Factory for Recursive Collection Types
build_collection_model_for_type()recursively generates precise Pydantic models for nested collection types (e.g.list:paired,list:list:paired)Literal[collection_type]and narrowed element types -- inner type mismatches and depth mismatches are caught structurally by Pydantic_LEAF_COLLECTION_MODELSis the single source of truth for leaf type routing_collection_type_discriminatorreturns fullcollection_typestrings for routing comma-separated type unionsJob Tool State Persistence
Job.tool_statecolumn added (JSONType, nullable)b4d6191307a5ExecutionSlicecarriesvalidated_param_combinationUserToolEvaluator.build_param_dict()usesruntimeifywhenjob.tool_stateis present, falls back to legacyto_cwlwith deprecation logYAML Tool Enhancements
value_state_representationto distinguishtest_case_xmlvstest_case_jsontest formatsState Flow
Job.tool_stateTest plan
Unit tests (
test/unit/tool_util/test_runtime.py)list:unknown_type,record:unknown,list:list:unknownall return NoneParameter specification tests (
parameter_specification.yml)job_runtime_valid/job_runtime_invalidfor all scalar types (int, boolean, float, text, select, genomebuild, directory_uri, hidden, drill_down)job_runtimespecs for structural types (conditionals, repeats, sections)list:paired,list:listandlist,list:pairedwith valid and invalid runtime casesNew functional test tools
sample_tool_conf.xmlSerialization roundtrip test
to_string()-> re-parse without alterationHow to test the changes?
(Select all options that apply)
License