Ac evals deploy#816
Conversation
Add a new how-to guide that walks Strands users through wrapping a Strands Evals SDK evaluator as an Amazon Bedrock AgentCore code-based evaluator (AWS Lambda), plus a runnable Python sample handler with snippet regions the guide pulls in via MkDocs snippet syntax. The guide covers the AgentCore request/response schema, the mapping between AgentCore evaluation levels and Strands evaluator levels, handler parsing/invocation/response-building/error-handling, round-trip translation between AgentCore payloads and evaluator inputs, Lambda packaging, IAM (execution-role and AgentCore-side), evaluator registration, and running an evaluation. Navigation entries added to both src/config/navigation.yml and mkdocs.yml under Evals SDK > How-To Guides. Cross-links added from the Evals SDK quickstart, evaluators overview, and AgentCore deploy overview. Python-only for v1; TypeScript coverage is tracked by strands-agents/evals#204.
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-816/docs/user-guide/quickstart/overview/ Updated at: 2026-05-06T18:22:51.064Z |
| """ | ||
|
|
||
| # --8<-- [start:imports] | ||
| import json |
There was a problem hiding this comment.
Issue: Unused import json in the sample handler.
Suggestion: Remove import json since it's never used in the handler. Unused imports in a sample that readers will copy can cause confusion or linter warnings in their projects.
|
Issue: The PR description is empty — it doesn't explain what's being added, which issue it addresses, or what use cases it covers. Suggestion: Please fill in the PR description with:
This helps reviewers and future readers understand the context. |
| root_span = getattr(session, "root_span", None) | ||
| attributes = getattr(root_span, "attributes", {}) or {} | ||
| return str( | ||
| attributes.get("input") or attributes.get("prompt") or "" |
There was a problem hiding this comment.
Issue: The _recover_prompt helper looks for attributes.get("input") or attributes.get("prompt") but these attribute keys don't match standard OTel semantic conventions (gen_ai.prompt is used in the worked example JSON further down in the guide). This inconsistency between the helper and the example payload may confuse readers trying to adapt the sample.
Suggestion: Either align the attribute keys in _recover_prompt with those used in the worked example (e.g., gen_ai.prompt), or add an inline comment clarifying that the attribute keys here are placeholders and the worked example uses a different schema for illustration purposes.
| if not messages: | ||
| return "" | ||
| last = messages[-1] | ||
| return str(getattr(last, "content", last) or "") |
There was a problem hiding this comment.
Issue: The _recover_final_output helper reads session.messages[-1] and extracts .content, but the worked example in the MDX file uses "gen_ai.completion" as a span attribute. The handler's recovery logic and the documentation's example JSON show different data-extraction patterns, which could confuse a reader trying to understand how session mapping actually works.
Suggestion: Add a brief note in the MDX prose (around the "Parsing the request" section) explicitly stating that _recover_prompt and _recover_final_output are placeholder implementations that won't match the worked example verbatim, and that readers must adapt both to their agent's actual span/message schema.
| return evaluator.evaluate(evaluation_data) | ||
| # --8<-- [end:invoke_evaluator] | ||
|
|
||
| # --8<-- [start:build_response] |
There was a problem hiding this comment.
Issue: The status parameter is accepted but immediately discarded (_ = status). This adds surface area to the function signature that readers may wonder about. The docstring says "for parity with _error_response", but from an API design and readability standpoint in a sample handler, symmetry isn't a strong enough justification for an unused parameter.
Suggestion: Consider removing the status parameter from _build_response. If you want to document the duality with _error_response, a brief comment is sufficient without the extra parameter.
| ### AgentCore-side IAM policy | ||
|
|
||
| AgentCore needs permission to invoke and introspect the deployed Lambda. | ||
| The minimal policy below grants both, scoped to a single function ARN: |
There was a problem hiding this comment.
Issue: The worked example in the MDX shows sessionSpans with bare span dicts having "name" and "attributes" keys, but the handler passes these to StrandsInMemorySessionMapper().map(session_spans). The reader has no way to know whether StrandsInMemorySessionMapper actually accepts this simplified format or requires full OTel span JSON (with traceId, spanId, startTimeUnixNano, etc.).
Suggestion: Either:
- Use a more realistic span payload in the worked example that matches what
StrandsInMemorySessionMapperactually expects, or - Add a note clarifying that the example spans are simplified for readability and link to the mapper's documentation or type definition for the full schema.
| (`strands-agents-evals`). TypeScript coverage is out of scope for now and | ||
| is tracked by the upstream feature request at | ||
| [strands-agents/evals#204](https://github.com/strands-agents/evals/issues/204). | ||
| It will land once the |
There was a problem hiding this comment.
Issue: This section references a feature request at strands-agents/evals#204, which is actually for the Python wrapping feature (the one this PR documents), not a TypeScript Evals SDK feature request.
Suggestion: Verify the issue link is correct. If there's a separate issue tracking TypeScript Evals SDK support, link to that instead. If #204 is the correct tracking issue for the full feature (including eventual TS support), consider clarifying that in the note text.
| source of truth. | ||
|
|
||
| The evaluator's scores appear in the AgentCore Evaluations dashboard — see | ||
| [AgentCore Evaluation Dashboard](./agentcore_evaluation_dashboard.md) for |
There was a problem hiding this comment.
Issue: The agentcore_evaluation_dashboard page is not listed in navigation.yml or mkdocs.yml, yet this guide links to it twice. While the page file exists on main, unlisted pages may not be discoverable to readers who land on them via this link.
Suggestion: Consider adding agentcore_evaluation_dashboard to navigation.yml under the How-To Guides section (either in this PR or confirming it will be added separately). If it's intentionally unlisted, that's fine — just wanted to flag the discoverability concern.
Review SummaryAssessment: Comment (requesting minor changes) This is a thorough and well-structured how-to guide for wrapping Strands evaluators as AgentCore code-based evaluators. The snippet-based approach (keeping the handler as a standalone Review Themes
Overall this is solid documentation work — the architecture diagram, the evaluation-level mapping table, and the end-to-end deployment instructions are all well done. |
Description
Related Issues
Type of Change
Checklist
npm run devBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.