Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions metaflow/client/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import json
import os
import re
import tarfile
from collections import namedtuple
from datetime import datetime
Expand Down Expand Up @@ -60,6 +61,13 @@

current_metadata = False

# Run IDs can be plain integers (local runs) or prefixed strings for orchestrators:
# "argo-<workflow-name>" for Argo Workflows
# "sfn-<execution-name>" for AWS Step Functions
# Task IDs follow the same pattern — numeric by default but may be prefixed by
# orchestrators that pass their own identifiers.
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9\-_]*$')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Pattern rejects valid SFN run IDs with dots or other legal characters

AWS Step Functions execution names can legally contain . (and other characters like +, @) per the SFN StartExecution API. When a state machine is triggered externally — e.g. by an EventBridge rule with a custom execution name such as prod.2024-01-15 — the run ID stored by Metaflow is sfn-prod.2024-01-15. After this PR, any Run('MyFlow/sfn-prod.2024-01-15') call will raise MetaflowInvalidPathspec, whereas before this PR (where no content validation existed) it would look up the run normally. The original code's correctness guarantee was "only check component count" — this pattern narrows that guarantee without covering the full character set that SFN execution names allow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Using \- in the middle of a character class works (Python re treats it as a literal hyphen) but is non-idiomatic. The conventional and unambiguous style is to place the hyphen at the end of the class.

Suggested change
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9\-_]*$')
_RUN_ID_PATTERN = re.compile(r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$')

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!



def metadata(ms: str) -> str:
"""
Expand Down Expand Up @@ -336,6 +344,24 @@ def __init__(
"Expects DataArtifact('FlowName/RunID/StepName/TaskID/ArtifactName')"
)

# Validate run ID and task ID format.
# Run IDs are numeric for local runs but orchestrators (Argo Workflows,
# AWS Step Functions) produce prefixed string IDs like "argo-*" and "sfn-*".
# The same applies to task IDs.
if len(ids) >= 2 and not _RUN_ID_PATTERN.match(ids[1]):
raise MetaflowInvalidPathspec(
"Invalid run ID '%s' in pathspec '%s'. "
"Run IDs must be alphanumeric and may contain hyphens or "
"underscores (e.g. '123', 'argo-myflow-abc12', 'sfn-exec')."
% (ids[1], pathspec)
)
if len(ids) >= 4 and not _RUN_ID_PATTERN.match(ids[3]):
raise MetaflowInvalidPathspec(
"Invalid task ID '%s' in pathspec '%s'. "
"Task IDs must be alphanumeric and may contain hyphens or underscores."
% (ids[3], pathspec)
)

self.id = ids[-1]
self._pathspec = pathspec
self._object = self._get_object(*ids)
Expand Down
Loading