Skip to content

refactor(client): consolidate three identical pathspec identifier patterns into one#3263

Open
dheerenmohta wants to merge 2 commits into
Netflix:masterfrom
dheerenmohta:refactor/client-consolidate-pathspec-regex-patterns
Open

refactor(client): consolidate three identical pathspec identifier patterns into one#3263
dheerenmohta wants to merge 2 commits into
Netflix:masterfrom
dheerenmohta:refactor/client-consolidate-pathspec-regex-patterns

Conversation

@dheerenmohta

Copy link
Copy Markdown
Contributor

Problem

The pathspec validation in #948 defines three separate module-level constants with identical regex expressions:

_FLOW_NAME_PATTERN   = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$')
_STEP_NAME_PATTERN   = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$')
_ARTIFACT_NAME_PATTERN = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$')

Duplicating the same pattern under different names creates maintenance risk — a future change to the identifier syntax would need to be applied in three places, and it's easy to miss one.

Fix

Replace all three with a single _IDENTIFIER_PATTERN constant and update the three call-sites:

# Before
_FLOW_NAME_PATTERN = _STEP_NAME_PATTERN = _ARTIFACT_NAME_PATTERN = re.compile(...)

# After
_IDENTIFIER_PATTERN = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$')

_NUMERIC_ID_PATTERN is kept separate since run/task IDs have a distinct rule.

No behaviour change.

Fixes Netflix#948

Previously, pathspec validation only checked the number of components
(e.g., "FlowName/RunID" has 2 parts). This caused confusing errors when
users made typos or used invalid characters, since the error would come
later from the metadata provider rather than at creation time.

This change adds proper format validation:
- Flow/Step/Artifact names must be valid identifiers (start with letter/underscore)
- Run IDs and Task IDs must be numeric
- Empty components are rejected (e.g., "Flow//Step")
- Leading/trailing slashes are rejected

The validation provides clear, actionable error messages that tell users
exactly what's wrong with their pathspec. For example:
  - "Invalid flow name '123Flow'. Flow names must start with a letter..."
  - "Invalid run ID 'abc'. Run IDs must be numeric."

Added comprehensive unit tests covering valid and invalid cases.
… into one

_FLOW_NAME_PATTERN, _STEP_NAME_PATTERN, and _ARTIFACT_NAME_PATTERN were all
defined with the exact same expression (^[a-zA-Z_][a-zA-Z0-9_]*$).
Having three separately-named copies creates maintenance risk: a future
change to one pattern name is unlikely to be applied to the others.

Replace all three with a single _IDENTIFIER_PATTERN constant, keeping
_NUMERIC_ID_PATTERN separate since run/task IDs have a distinct rule.
No behaviour change.
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors pathspec validation in MetaflowObject.__init__ from simple component-count checks into a consolidated _validate_pathspec_format static method, using a single _IDENTIFIER_PATTERN regex for flow/step/artifact names and a separate _NUMERIC_ID_PATTERN for run and task IDs. Despite the "No behaviour change" claim in the description, the change is strictly more restrictive than what existed on the base branch.

  • The new _NUMERIC_ID_PATTERN = re.compile(r'^[0-9]+$') rejects any non-numeric run or task ID, but both the Argo Workflows backend (argo-<name>) and the AWS Step Functions backend (sfn-<id>) store runs under non-numeric, prefixed IDs. This would cause Run(\"MyFlow/argo-…\") to raise MetaflowInvalidPathspec before the metadata service is ever contacted.
  • local.py's register_run_id and register_task_id explicitly handle non-integer IDs via try: int(…) / except ValueError, confirming that string-format IDs are an intentional, documented part of the contract that the new validation silently violates.
  • The test suite is comprehensive for purely numeric run/task-ID paths but does not exercise the argo- or sfn- formats.

Confidence Score: 2/5

Not safe to merge: the numeric-only run-ID check silently breaks Argo Workflows and Step Functions users who access runs via the client API.

The new _NUMERIC_ID_PATTERN rejects the argo-* and sfn-* run IDs that are registered and expected by two first-party backends. Any user of those schedulers who calls Run(...) or Step(...) with a non-numeric run ID would get an immediate MetaflowInvalidPathspec instead of a successful metadata lookup, a regression that did not exist on the base branch.

The run-ID and task-ID validation in metaflow/client/core.py (the _NUMERIC_ID_PATTERN block inside _validate_pathspec_format) needs revision before merging.

Important Files Changed

Filename Overview
metaflow/client/core.py Adds _IDENTIFIER_PATTERN / _NUMERIC_ID_PATTERN constants and a _validate_pathspec_format static method that refactors simple component-count checks into per-component regex validation; the numeric-only run-ID / task-ID check incorrectly rejects the argo-* and sfn-* IDs used by Argo Workflows and AWS Step Functions.
test/unit/test_pathspec_validation.py New unit test suite covering the validation helper with valid/invalid cases for all five object types; does not cover non-numeric run-ID formats produced by Argo/SFN backends.

Reviews (1): Last reviewed commit: "refactor(client): consolidate three iden..." | Re-trigger Greptile

Comment thread metaflow/client/core.py
Comment on lines +345 to +349
if not _NUMERIC_ID_PATTERN.match(component):
raise MetaflowInvalidPathspec(
f"Invalid run ID '{component}'. Run IDs must be numeric."
)
elif idx == 2: # Step name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _NUMERIC_ID_PATTERN breaks Argo Workflows and Step Functions run IDs

Metaflow has two production-grade backends that assign non-numeric run IDs: Argo Workflows uses argo-<workflow-name> (e.g. "argo-myflow-1a2b3c") and AWS Step Functions uses sfn-<execution-id> (e.g. "sfn-abc123"). After this change, any call to Run("MyFlow/argo-myflow-1a2b3c") will immediately raise MetaflowInvalidPathspec("Invalid run ID … Run IDs must be numeric") before even hitting the metadata service, completely preventing access to runs produced by those schedulers.

Both local.py's register_run_id and register_task_id explicitly handle non-integer IDs via a try: int(…) except ValueError branch, which documents that arbitrary-string IDs are a first-class, intended format. The _NUMERIC_ID_PATTERN validation contradicts that contract. Either the run-ID and task-ID positions should be left unvalidated (only length/format checks), or the pattern must be broadened to cover any non-empty, non-whitespace token.

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@6a634bd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
metaflow/client/core.py 94.28% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #3263   +/-   ##
=========================================
  Coverage          ?   29.04%           
=========================================
  Files             ?      381           
  Lines             ?    52537           
  Branches          ?     9277           
=========================================
  Hits              ?    15258           
  Misses            ?    36252           
  Partials          ?     1027           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant