refactor(client): consolidate three identical pathspec identifier patterns into one#3263
Conversation
Fixes Netflix#948 Previously, pathspec validation only checked the number of components (e.g., "FlowName/RunID" has 2 parts). This caused confusing errors when users made typos or used invalid characters, since the error would come later from the metadata provider rather than at creation time. This change adds proper format validation: - Flow/Step/Artifact names must be valid identifiers (start with letter/underscore) - Run IDs and Task IDs must be numeric - Empty components are rejected (e.g., "Flow//Step") - Leading/trailing slashes are rejected The validation provides clear, actionable error messages that tell users exactly what's wrong with their pathspec. For example: - "Invalid flow name '123Flow'. Flow names must start with a letter..." - "Invalid run ID 'abc'. Run IDs must be numeric." Added comprehensive unit tests covering valid and invalid cases.
… into one _FLOW_NAME_PATTERN, _STEP_NAME_PATTERN, and _ARTIFACT_NAME_PATTERN were all defined with the exact same expression (^[a-zA-Z_][a-zA-Z0-9_]*$). Having three separately-named copies creates maintenance risk: a future change to one pattern name is unlikely to be applied to the others. Replace all three with a single _IDENTIFIER_PATTERN constant, keeping _NUMERIC_ID_PATTERN separate since run/task IDs have a distinct rule. No behaviour change.
Greptile SummaryThis PR refactors pathspec validation in
Confidence Score: 2/5Not safe to merge: the numeric-only run-ID check silently breaks Argo Workflows and Step Functions users who access runs via the client API. The new The run-ID and task-ID validation in Important Files Changed
Reviews (1): Last reviewed commit: "refactor(client): consolidate three iden..." | Re-trigger Greptile |
| if not _NUMERIC_ID_PATTERN.match(component): | ||
| raise MetaflowInvalidPathspec( | ||
| f"Invalid run ID '{component}'. Run IDs must be numeric." | ||
| ) | ||
| elif idx == 2: # Step name |
There was a problem hiding this comment.
_NUMERIC_ID_PATTERN breaks Argo Workflows and Step Functions run IDs
Metaflow has two production-grade backends that assign non-numeric run IDs: Argo Workflows uses argo-<workflow-name> (e.g. "argo-myflow-1a2b3c") and AWS Step Functions uses sfn-<execution-id> (e.g. "sfn-abc123"). After this change, any call to Run("MyFlow/argo-myflow-1a2b3c") will immediately raise MetaflowInvalidPathspec("Invalid run ID … Run IDs must be numeric") before even hitting the metadata service, completely preventing access to runs produced by those schedulers.
Both local.py's register_run_id and register_task_id explicitly handle non-integer IDs via a try: int(…) except ValueError branch, which documents that arbitrary-string IDs are a first-class, intended format. The _NUMERIC_ID_PATTERN validation contradicts that contract. Either the run-ID and task-ID positions should be left unvalidated (only length/format checks), or the pattern must be broadened to cover any non-empty, non-whitespace token.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3263 +/- ##
=========================================
Coverage ? 29.04%
=========================================
Files ? 381
Lines ? 52537
Branches ? 9277
=========================================
Hits ? 15258
Misses ? 36252
Partials ? 1027 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Problem
The pathspec validation in #948 defines three separate module-level constants with identical regex expressions:
Duplicating the same pattern under different names creates maintenance risk — a future change to the identifier syntax would need to be applied in three places, and it's easy to miss one.
Fix
Replace all three with a single
_IDENTIFIER_PATTERNconstant and update the three call-sites:_NUMERIC_ID_PATTERNis kept separate since run/task IDs have a distinct rule.No behaviour change.