Ac evals deploy by afarntrog · Pull Request #816 · strands-agents/docs

afarntrog · 2026-05-06T18:18:27Z

Description

Related Issues

Type of Change

New content
Content update/revision
Structure/organization improvement
Typo/formatting fix
Bug fix
Other (please describe):

Checklist

I have read the CONTRIBUTING document
My changes follow the project's documentation style
I have tested the documentation locally using npm run dev
Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add a new how-to guide that walks Strands users through wrapping a Strands Evals SDK evaluator as an Amazon Bedrock AgentCore code-based evaluator (AWS Lambda), plus a runnable Python sample handler with snippet regions the guide pulls in via MkDocs snippet syntax. The guide covers the AgentCore request/response schema, the mapping between AgentCore evaluation levels and Strands evaluator levels, handler parsing/invocation/response-building/error-handling, round-trip translation between AgentCore payloads and evaluator inputs, Lambda packaging, IAM (execution-role and AgentCore-side), evaluator registration, and running an evaluation. Navigation entries added to both src/config/navigation.yml and mkdocs.yml under Evals SDK > How-To Guides. Cross-links added from the Evals SDK quickstart, evaluators overview, and AgentCore deploy overview. Python-only for v1; TypeScript coverage is tracked by strands-agents/evals#204.

github-actions · 2026-05-06T18:22:51Z

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-816/docs/user-guide/quickstart/overview/

Updated at: 2026-05-06T18:22:51.064Z

github-actions · 2026-05-11T15:14:33Z

+"""
+
+# --8<-- [start:imports]
+import json


Issue: Unused import json in the sample handler.

Suggestion: Remove import json since it's never used in the handler. Unused imports in a sample that readers will copy can cause confusion or linter warnings in their projects.

github-actions · 2026-05-11T15:14:34Z

Issue: The PR description is empty — it doesn't explain what's being added, which issue it addresses, or what use cases it covers.

Suggestion: Please fill in the PR description with:

A brief summary of what this guide adds (deploying Strands evaluators as AgentCore code-based evaluators)
Link to the related issue ([FEATURE] Wrap / Deploy Strands-evals evaluators to Agentcore Code-base Evaluators evals#204)
Any testing done (e.g., "verified with npm run dev and confirmed snippet rendering")

This helps reviewers and future readers understand the context.

github-actions · 2026-05-11T15:14:36Z

+    root_span = getattr(session, "root_span", None)
+    attributes = getattr(root_span, "attributes", {}) or {}
+    return str(
+        attributes.get("input") or attributes.get("prompt") or ""


Issue: The _recover_prompt helper looks for attributes.get("input") or attributes.get("prompt") but these attribute keys don't match standard OTel semantic conventions (gen_ai.prompt is used in the worked example JSON further down in the guide). This inconsistency between the helper and the example payload may confuse readers trying to adapt the sample.

Suggestion: Either align the attribute keys in _recover_prompt with those used in the worked example (e.g., gen_ai.prompt), or add an inline comment clarifying that the attribute keys here are placeholders and the worked example uses a different schema for illustration purposes.

github-actions · 2026-05-11T15:14:37Z

+    if not messages:
+        return ""
+    last = messages[-1]
+    return str(getattr(last, "content", last) or "")


Issue: The _recover_final_output helper reads session.messages[-1] and extracts .content, but the worked example in the MDX file uses "gen_ai.completion" as a span attribute. The handler's recovery logic and the documentation's example JSON show different data-extraction patterns, which could confuse a reader trying to understand how session mapping actually works.

Suggestion: Add a brief note in the MDX prose (around the "Parsing the request" section) explicitly stating that _recover_prompt and _recover_final_output are placeholder implementations that won't match the worked example verbatim, and that readers must adapt both to their agent's actual span/message schema.

github-actions · 2026-05-11T15:14:39Z

+    return evaluator.evaluate(evaluation_data)
+# --8<-- [end:invoke_evaluator]
+
+# --8<-- [start:build_response]


Issue: The status parameter is accepted but immediately discarded (_ = status). This adds surface area to the function signature that readers may wonder about. The docstring says "for parity with _error_response", but from an API design and readability standpoint in a sample handler, symmetry isn't a strong enough justification for an unused parameter.

Suggestion: Consider removing the status parameter from _build_response. If you want to document the duality with _error_response, a brief comment is sufficient without the extra parameter.

github-actions · 2026-05-11T15:14:40Z

+### AgentCore-side IAM policy
+
+AgentCore needs permission to invoke and introspect the deployed Lambda.
+The minimal policy below grants both, scoped to a single function ARN:


Issue: The worked example in the MDX shows sessionSpans with bare span dicts having "name" and "attributes" keys, but the handler passes these to StrandsInMemorySessionMapper().map(session_spans). The reader has no way to know whether StrandsInMemorySessionMapper actually accepts this simplified format or requires full OTel span JSON (with traceId, spanId, startTimeUnixNano, etc.).

Suggestion: Either:

Use a more realistic span payload in the worked example that matches what StrandsInMemorySessionMapper actually expects, or

Add a note clarifying that the example spans are simplified for readability and link to the mapper's documentation or type definition for the full schema.

github-actions · 2026-05-11T15:14:42Z

+(`strands-agents-evals`). TypeScript coverage is out of scope for now and
+is tracked by the upstream feature request at
+[strands-agents/evals#204](https://github.com/strands-agents/evals/issues/204).
+It will land once the


Issue: This section references a feature request at strands-agents/evals#204, which is actually for the Python wrapping feature (the one this PR documents), not a TypeScript Evals SDK feature request.

Suggestion: Verify the issue link is correct. If there's a separate issue tracking TypeScript Evals SDK support, link to that instead. If #204 is the correct tracking issue for the full feature (including eventual TS support), consider clarifying that in the note text.

github-actions · 2026-05-11T15:14:44Z

+source of truth.
+
+The evaluator's scores appear in the AgentCore Evaluations dashboard — see
+[AgentCore Evaluation Dashboard](./agentcore_evaluation_dashboard.md) for


Issue: The agentcore_evaluation_dashboard page is not listed in navigation.yml or mkdocs.yml, yet this guide links to it twice. While the page file exists on main, unlisted pages may not be discoverable to readers who land on them via this link.

Suggestion: Consider adding agentcore_evaluation_dashboard to navigation.yml under the How-To Guides section (either in this PR or confirming it will be added separately). If it's intentionally unlisted, that's fine — just wanted to flag the discoverability concern.

github-actions · 2026-05-11T15:14:45Z

Review Summary

Assessment: Comment (requesting minor changes)

This is a thorough and well-structured how-to guide for wrapping Strands evaluators as AgentCore code-based evaluators. The snippet-based approach (keeping the handler as a standalone .py file with region markers) is a great pattern that ensures docs and code stay in sync.

Review Themes

Consistency between handler code and worked examples: The _recover_prompt/_recover_final_output helpers use different attribute keys than the JSON examples shown in the guide, which could confuse readers. Either aligning them or adding explicit callouts would help.
Code hygiene in sample: Unused import json and an unused status parameter add unnecessary noise to a sample that readers will copy.
PR metadata: The description is empty — linking the related issue and summarizing the change would help reviewers and future archaeology.

Overall this is solid documentation work — the architecture diagram, the evaluation-level mapping table, and the end-to-end deployment instructions are all well done.

afarntrog added 2 commits May 6, 2026 14:14

Merge branch 'main' into ac_evals_deploy

d8f19c9

afarntrog temporarily deployed to auto-approve May 6, 2026 18:18 — with GitHub Actions Inactive

afarntrog temporarily deployed to manual-approval May 6, 2026 18:18 — with GitHub Actions Inactive

github-actions Bot added the strands-running label May 11, 2026

github-actions Bot reviewed May 11, 2026

View reviewed changes

github-actions Bot removed the strands-running label May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ac evals deploy#816

Ac evals deploy#816
afarntrog wants to merge 2 commits into
strands-agents:mainfrom
afarntrog:ac_evals_deploy

afarntrog commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

afarntrog commented May 6, 2026

Description

Related Issues

Type of Change

Checklist

Uh oh!

github-actions Bot commented May 6, 2026

Documentation Preview Ready

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026

Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant