Skip to content

fix: make IAM role optional for AWS Batch job submission#3211

Open
odncode wants to merge 3 commits into
Netflix:masterfrom
odncode:fix-batch-optional-iam-role
Open

fix: make IAM role optional for AWS Batch job submission#3211
odncode wants to merge 3 commits into
Netflix:masterfrom
odncode:fix-batch-optional-iam-role

Conversation

@odncode

@odncode odncode commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #3208@batch steps fail with "No IAM role specified" on EC2 compute environments that use instance profiles.

Problem

BatchJob.execute() raises unconditionally when iam_role is None (L82-85 in batch_client.py). Additionally, _register_job_definition passes "jobRoleArn": None to boto3, which rejects None for string fields.
The AWS Batch ContainerProperties API documents jobRoleArn as optional. EC2 compute environments can use instance profiles for container credentials instead.

Changes Made

  • Removed the unconditional IAM role check in BatchJob.execute()
  • Made jobRoleArn conditional in _register_job_definition — only included when job_role is not None

Testing

Added test/unit/test_batch_optional_iam_role.py with three tests:

  • test_execute_does_not_raise_when_iam_role_is_none — verifies the fix
  • test_execute_still_requires_image — ensures docker image check remains
  • test_job_definition_omits_job_role_arn_when_none — verifies jobRoleArn is conditional

Verified: tests PASS with fix, FAIL without it.

Closes #3208

On EC2 compute environments that use instance profiles, jobRoleArn is
not required. The AWS Batch API documents it as optional, but Metaflow
raised unconditionally when iam_role was None.

Two changes:
- Remove the unconditional raise in BatchJob.execute() when iam_role
  is None
- Make jobRoleArn conditional in _register_job_definition so None is
  not passed to boto3 (which rejects None for string fields)

Closes Netflix#3208
@greptile-apps

greptile-apps Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Makes iam_role optional for AWS Batch job submission by removing the unconditional BatchJobException guard in execute() and conditionally including jobRoleArn in the job-definition payload only when a role is provided. This aligns with the AWS Batch API, which documents jobRoleArn as optional for EC2 compute environments that rely on instance profiles.

  • batch_client.py: Drops the four-line _iam_role is None guard from execute() and replaces the unconditional "jobRoleArn": job_role assignment with **({"jobRoleArn": job_role} if job_role else {}) so None is never forwarded to boto3.
  • test/unit/test_batch_optional_iam_role.py: Adds three unit tests covering the removed guard, the still-required image check, and the conditional jobRoleArn inclusion.

Confidence Score: 5/5

Safe to merge — the production change is a small, targeted removal of a guard that was incorrectly mandatory, with no other code paths affected.

Both changed lines in batch_client.py are self-contained: removing the four-line IAM role check in execute() has no downstream side effects, and the conditional dict-unpacking in _register_job_definition correctly omits jobRoleArn from the boto3 payload when it is None or empty. job_role is not used anywhere else in the function. The Fargate path retains its own mandatory execution_role check, so EC2-specific relaxation does not weaken Fargate validation.

No files require special attention. The test file quality issues were flagged in prior review threads.

Important Files Changed

Filename Overview
metaflow/plugins/aws/batch/batch_client.py Removes the unconditional IAM role guard in execute() and makes jobRoleArn conditional in _register_job_definition using dict-unpacking; both changes are correct and minimal.
test/unit/test_batch_optional_iam_role.py Adds three regression tests; the source-text inspection test and the try/except negative-assertion pattern are both noted in prior review threads.

Reviews (3): Last reviewed commit: "style: fix pre-commit formatting (black,..." | Re-trigger Greptile

Comment thread metaflow/plugins/aws/batch/batch_client.py
Comment on lines +57 to +79


def test_job_definition_omits_job_role_arn_when_none():
"""
When job_role is None, jobRoleArn should not be present in the
job definition at all. boto3 rejects None for string fields.
"""
from pathlib import Path

batch_client_path = Path(__file__).resolve().parents[2] / (
"metaflow/plugins/aws/batch/batch_client.py"
)
source_text = batch_client_path.read_text()

assert "jobRoleArn" in source_text, (
"jobRoleArn reference not found in batch_client.py"
)
# The fix uses conditional dict unpacking: **({"jobRoleArn": ...} if ... else {})
# If jobRoleArn is assigned unconditionally, this pattern won't be present
assert "if job_role" in source_text, (
"jobRoleArn is not conditionally included — it will be passed as None "
"to boto3, which rejects None for string fields"
) No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 test_job_definition_omits_job_role_arn_when_none validates behavior by parsing source text rather than exercising the code. The assert "if job_role" in source_text assertion is brittle: a semantically equivalent rewrite such as if job_role is not None or if bool(job_role) would make this test fail even though the behavior is correct — and a stray if job_role: elsewhere in the file would keep it green even if the fix were reverted. Consider mocking boto3 (or the Batch client) and directly calling _register_job_definition() with job_role=None, then asserting that "jobRoleArn" is absent from the payload passed to register_job_definition.

Comment thread test/unit/test_batch_optional_iam_role.py Outdated
@LuisJG8

LuisJG8 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Hi odncode, I think you are opening too many PRs. It would be better if you discuss the feature/bug first with the maintainers before opening a PR.

@odncode

odncode commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Hi @LuisJG8,

Thank you for the feedback. Completely Understood. My apologies for the noise.

I'll make sure to discuss on the issue first before opening PRs going forward. Happy to close any of the open ones you'd prefer not to review right now.

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (master@6cc3431). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #3211   +/-   ##
=========================================
  Coverage          ?   28.83%           
=========================================
  Files             ?      381           
  Lines             ?    52465           
  Branches          ?     9259           
=========================================
  Hits              ?    15129           
  Misses            ?    36323           
  Partials          ?     1013           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: odncode <nnajiodera2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@batch step fails locally with 'No IAM role specified' when no per-job IAM role is configured

2 participants