Skip to content

fix(client): allow non-numeric run/task IDs for Argo and Step Functions#3260

Open
dheerenmohta wants to merge 1 commit into
Netflix:masterfrom
dheerenmohta:fix/client-argo-sfn-run-id-validation
Open

fix(client): allow non-numeric run/task IDs for Argo and Step Functions#3260
dheerenmohta wants to merge 1 commit into
Netflix:masterfrom
dheerenmohta:fix/client-argo-sfn-run-id-validation

Conversation

@dheerenmohta

Copy link
Copy Markdown
Contributor

Problem

The pathspec validation introduced in #948 uses a numeric-only pattern (^[0-9]+$) for run IDs and task IDs. However, Metaflow run IDs are not always numeric:

  • Argo Workflows produces run IDs like argo-moviestatsflow-68z2h (see argo_workflows_cli.py:965)
  • AWS Step Functions produces run IDs like sfn-my-execution-name (see step_functions_cli.py:549)

This means any user doing Run('MyFlow/argo-...') or Step('MyFlow/sfn-.../start') via the Client API will get a spurious MetaflowInvalidPathspec, completely breaking orchestrator-based workflows.

The original pre-#948 code only checked component count, not content — this PR preserves that correctness while still adding meaningful content validation.

Fix

Replace the strict _NUMERIC_ID_PATTERN check for run/task positions with a permissive _RUN_ID_PATTERN that accepts alphanumeric strings plus hyphens and underscores. This covers all current formats:

  • Local runs: "1748291234567890"
  • Argo: "argo-myflow-abc12"
  • Step Functions: "sfn-prod-execution"

Test

Companion test coverage is added in PR #5 (test branch test/client-pathspec-orchestrator-run-ids).

Fixes #948 (partial — critical correctness bug)

Run IDs are not always numeric. Orchestrators produce prefixed string IDs:
- Argo Workflows: "argo-<workflow-name>" (e.g. "argo-moviestatsflow-68z2h")
- AWS Step Functions: "sfn-<execution-name>"

The numeric-only check would cause Run('FlowName/argo-...') and
Run('FlowName/sfn-...') to raise MetaflowInvalidPathspec, breaking all
users who access runs created via orchestrators through the Client API.

Replace the strict numeric pattern with a permissive alphanumeric pattern
that also allows hyphens and underscores, matching all current ID formats.

Fixes Netflix#948 (partial)
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds content-based validation for run IDs and task IDs when parsing a MetaflowObject pathspec. A new _RUN_ID_PATTERN (^[a-zA-Z0-9][a-zA-Z0-9\-_]*$) is applied to ids[1] (run ID) and ids[3] (task ID) before the metadata lookup is attempted.

  • Introduces _RUN_ID_PATTERN to allow alphanumeric-plus-hyphen/underscore IDs, covering local numeric IDs (1748291234567890), Argo (argo-*), and SFN (sfn-{uuid}) formats.
  • Raises MetaflowInvalidPathspec with a descriptive message when the run ID or task ID contains characters outside the pattern — validation that did not previously exist in the base branch.

Confidence Score: 3/5

The change adds new client-side validation for pathspec run IDs and task IDs where none existed before; while it handles Argo and standard SFN UUIDs correctly, it can incorrectly reject valid SFN run IDs whose execution names contain characters like . (legal per the SFN API but not in the chosen pattern).

The new _RUN_ID_PATTERN covers the common orchestrator formats (numeric local IDs, argo-*, sfn-{uuid}, airflow-{md5}), but the character set is narrower than what AWS SFN execution names legally allow. An EventBridge-triggered execution named prod.2024-01-15 would produce a run ID of sfn-prod.2024-01-15 that the pattern rejects — turning a previously working Run(...) lookup into a hard MetaflowInvalidPathspec error. The fix is straightforward (add . to the pattern or widen it), but the regression is real for users with custom SFN execution names.

metaflow/client/core.py — specifically the _RUN_ID_PATTERN definition and its use in the new validation block.

Important Files Changed

Filename Overview
metaflow/client/core.py Adds _RUN_ID_PATTERN and content validation for run/task IDs in pathspec parsing; pattern may be too strict for SFN execution names containing dots or other legal characters not in [a-zA-Z0-9_-]

Reviews (1): Last reviewed commit: "fix(client): allow non-numeric run/task ..." | Re-trigger Greptile

Comment thread metaflow/client/core.py
# "sfn-<execution-name>" for AWS Step Functions
# Task IDs follow the same pattern — numeric by default but may be prefixed by
# orchestrators that pass their own identifiers.
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9\-_]*$')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Pattern rejects valid SFN run IDs with dots or other legal characters

AWS Step Functions execution names can legally contain . (and other characters like +, @) per the SFN StartExecution API. When a state machine is triggered externally — e.g. by an EventBridge rule with a custom execution name such as prod.2024-01-15 — the run ID stored by Metaflow is sfn-prod.2024-01-15. After this PR, any Run('MyFlow/sfn-prod.2024-01-15') call will raise MetaflowInvalidPathspec, whereas before this PR (where no content validation existed) it would look up the run normally. The original code's correctness guarantee was "only check component count" — this pattern narrows that guarantee without covering the full character set that SFN execution names allow.

Comment thread metaflow/client/core.py
# "sfn-<execution-name>" for AWS Step Functions
# Task IDs follow the same pattern — numeric by default but may be prefixed by
# orchestrators that pass their own identifiers.
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9\-_]*$')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Using \- in the middle of a character class works (Python re treats it as a literal hyphen) but is non-idiomatic. The conventional and unambiguous style is to place the hyphen at the end of the class.

Suggested change
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9\-_]*$')
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$')

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@dddc0d5). Learn more about missing BASE report.

Files with missing lines Patch % Lines
metaflow/client/core.py 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #3260   +/-   ##
=========================================
  Coverage          ?   28.65%           
=========================================
  Files             ?      381           
  Lines             ?    52519           
  Branches          ?     9268           
=========================================
  Hits              ?    15049           
  Misses            ?    36442           
  Partials          ?     1028           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pathspec to create Flow/Run/Step/Task/DataArtifact is not validated

1 participant