Skip to content

Ac evals deploy#816

Open
afarntrog wants to merge 2 commits into
strands-agents:mainfrom
afarntrog:ac_evals_deploy
Open

Ac evals deploy#816
afarntrog wants to merge 2 commits into
strands-agents:mainfrom
afarntrog:ac_evals_deploy

Conversation

@afarntrog
Copy link
Copy Markdown
Contributor

Description

Related Issues

Type of Change

  • New content
  • Content update/revision
  • Structure/organization improvement
  • Typo/formatting fix
  • Bug fix
  • Other (please describe):

Checklist

  • I have read the CONTRIBUTING document
  • My changes follow the project's documentation style
  • I have tested the documentation locally using npm run dev
  • Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

afarntrog added 2 commits May 6, 2026 14:14
Add a new how-to guide that walks Strands users through wrapping a
Strands Evals SDK evaluator as an Amazon Bedrock AgentCore code-based
evaluator (AWS Lambda), plus a runnable Python sample handler with
snippet regions the guide pulls in via MkDocs snippet syntax.

The guide covers the AgentCore request/response schema, the mapping
between AgentCore evaluation levels and Strands evaluator levels,
handler parsing/invocation/response-building/error-handling, round-trip
translation between AgentCore payloads and evaluator inputs, Lambda
packaging, IAM (execution-role and AgentCore-side), evaluator
registration, and running an evaluation.

Navigation entries added to both src/config/navigation.yml and
mkdocs.yml under Evals SDK > How-To Guides. Cross-links added from the
Evals SDK quickstart, evaluators overview, and AgentCore deploy
overview.

Python-only for v1; TypeScript coverage is tracked by
strands-agents/evals#204.
@afarntrog afarntrog temporarily deployed to manual-approval May 6, 2026 18:18 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-816/docs/user-guide/quickstart/overview/

Updated at: 2026-05-06T18:22:51.064Z

"""

# --8<-- [start:imports]
import json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Unused import json in the sample handler.

Suggestion: Remove import json since it's never used in the handler. Unused imports in a sample that readers will copy can cause confusion or linter warnings in their projects.

@github-actions
Copy link
Copy Markdown
Contributor

Issue: The PR description is empty — it doesn't explain what's being added, which issue it addresses, or what use cases it covers.

Suggestion: Please fill in the PR description with:

This helps reviewers and future readers understand the context.

root_span = getattr(session, "root_span", None)
attributes = getattr(root_span, "attributes", {}) or {}
return str(
attributes.get("input") or attributes.get("prompt") or ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The _recover_prompt helper looks for attributes.get("input") or attributes.get("prompt") but these attribute keys don't match standard OTel semantic conventions (gen_ai.prompt is used in the worked example JSON further down in the guide). This inconsistency between the helper and the example payload may confuse readers trying to adapt the sample.

Suggestion: Either align the attribute keys in _recover_prompt with those used in the worked example (e.g., gen_ai.prompt), or add an inline comment clarifying that the attribute keys here are placeholders and the worked example uses a different schema for illustration purposes.

if not messages:
return ""
last = messages[-1]
return str(getattr(last, "content", last) or "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The _recover_final_output helper reads session.messages[-1] and extracts .content, but the worked example in the MDX file uses "gen_ai.completion" as a span attribute. The handler's recovery logic and the documentation's example JSON show different data-extraction patterns, which could confuse a reader trying to understand how session mapping actually works.

Suggestion: Add a brief note in the MDX prose (around the "Parsing the request" section) explicitly stating that _recover_prompt and _recover_final_output are placeholder implementations that won't match the worked example verbatim, and that readers must adapt both to their agent's actual span/message schema.

return evaluator.evaluate(evaluation_data)
# --8<-- [end:invoke_evaluator]

# --8<-- [start:build_response]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The status parameter is accepted but immediately discarded (_ = status). This adds surface area to the function signature that readers may wonder about. The docstring says "for parity with _error_response", but from an API design and readability standpoint in a sample handler, symmetry isn't a strong enough justification for an unused parameter.

Suggestion: Consider removing the status parameter from _build_response. If you want to document the duality with _error_response, a brief comment is sufficient without the extra parameter.

### AgentCore-side IAM policy

AgentCore needs permission to invoke and introspect the deployed Lambda.
The minimal policy below grants both, scoped to a single function ARN:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The worked example in the MDX shows sessionSpans with bare span dicts having "name" and "attributes" keys, but the handler passes these to StrandsInMemorySessionMapper().map(session_spans). The reader has no way to know whether StrandsInMemorySessionMapper actually accepts this simplified format or requires full OTel span JSON (with traceId, spanId, startTimeUnixNano, etc.).

Suggestion: Either:

  1. Use a more realistic span payload in the worked example that matches what StrandsInMemorySessionMapper actually expects, or
  2. Add a note clarifying that the example spans are simplified for readability and link to the mapper's documentation or type definition for the full schema.

(`strands-agents-evals`). TypeScript coverage is out of scope for now and
is tracked by the upstream feature request at
[strands-agents/evals#204](https://github.com/strands-agents/evals/issues/204).
It will land once the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This section references a feature request at strands-agents/evals#204, which is actually for the Python wrapping feature (the one this PR documents), not a TypeScript Evals SDK feature request.

Suggestion: Verify the issue link is correct. If there's a separate issue tracking TypeScript Evals SDK support, link to that instead. If #204 is the correct tracking issue for the full feature (including eventual TS support), consider clarifying that in the note text.

source of truth.

The evaluator's scores appear in the AgentCore Evaluations dashboard — see
[AgentCore Evaluation Dashboard](./agentcore_evaluation_dashboard.md) for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The agentcore_evaluation_dashboard page is not listed in navigation.yml or mkdocs.yml, yet this guide links to it twice. While the page file exists on main, unlisted pages may not be discoverable to readers who land on them via this link.

Suggestion: Consider adding agentcore_evaluation_dashboard to navigation.yml under the How-To Guides section (either in this PR or confirming it will be added separately). If it's intentionally unlisted, that's fine — just wanted to flag the discoverability concern.

@github-actions
Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Comment (requesting minor changes)

This is a thorough and well-structured how-to guide for wrapping Strands evaluators as AgentCore code-based evaluators. The snippet-based approach (keeping the handler as a standalone .py file with region markers) is a great pattern that ensures docs and code stay in sync.

Review Themes
  • Consistency between handler code and worked examples: The _recover_prompt/_recover_final_output helpers use different attribute keys than the JSON examples shown in the guide, which could confuse readers. Either aligning them or adding explicit callouts would help.
  • Code hygiene in sample: Unused import json and an unused status parameter add unnecessary noise to a sample that readers will copy.
  • PR metadata: The description is empty — linking the related issue and summarizing the change would help reviewers and future archaeology.

Overall this is solid documentation work — the architecture diagram, the evaluation-level mapping table, and the end-to-end deployment instructions are all well done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant