Skip to content

feat: add code source references and related page tagging; emits Rela…#832

Merged
notowen333 merged 5 commits into
strands-agents:mainfrom
notowen333:headless-improvement
May 14, 2026
Merged

feat: add code source references and related page tagging; emits Rela…#832
notowen333 merged 5 commits into
strands-agents:mainfrom
notowen333:headless-improvement

Conversation

@notowen333
Copy link
Copy Markdown
Contributor

@notowen333 notowen333 commented May 11, 2026

Tag-driven Related Pages

note this change does not change any human facing pages. It just adds related pages and source reference metadata for agents and headless browsers. As a follow up we /may/ add tags at the bottom of pages that group other related pages.

sourceLinks infrastructure (schema, renderer for ## Implementation blocks) is wired but no values are set — pending the upcoming monorepo migration. Re-adoption is purely a frontmatter exercise.

Surfaces

Surface Where Cap Score floor Audience
## Related pages block appended to body in /<slug>/index.md and aggregated in /llms-full.txt 10 none LLMs / llms.txt consumers
JSON-LD relatedLink inside the page's TechArticle graph node 10 none HTML-only crawlers

Algorithm

Score is rarity-weighted Jaccard:

score =  Σ (rarity-weight of tags in BOTH pages)
        ─────────────────────────────────────────
         Σ (rarity-weight of tags in EITHER page)

Each tag's rarity-weight is 1 - (pages_with_tag / total_tagged_pages). Common tags (e.g. aws, used by 14 pages) contribute near 0; rare tags (e.g. agentcore, used by 4) contribute near 1.

A 1-tag match on a broad tag ranks below any 2-tag specific match, so coincidental connections sink naturally. A tag that bloats over time loses weight automatically — no manual audit cycle needed to react. Ties break alphabetically by title (deterministic).

Tag registry

src/config/tags.yml — 32 tags grouped by axis (topics / capabilities / lifecycle), validated by Zod at build time. The YAML header documents the rule:

A tag means "this page teaches the named thing." Not "mentions" or "has a section labeled with it."

…with a four-trap checklist drawn from concrete failure modes encountered while tagging the corpus.

Coverage

112 of 123 user-guide pages tagged. 11 are deliberately untagged (umbrella pages, policy docs, pages with no good cross-section bridge).

Concrete output for safety-security/guardrails

Source frontmatter:

title: Guardrails
tags: [safety, bedrock, aws]

/docs/user-guide/safety-security/guardrails/index.md and the same block inside /llms-full.txt

Top 10, ranked by score, no floor:

## Related pages

- [Amazon Bedrock](/docs/user-guide/concepts/model-providers/amazon-bedrock/index.md) (3 shared tags)
- [Amazon Nova](/docs/user-guide/concepts/model-providers/amazon-nova/index.md) (2 shared tags)
- [Nova Sonic](/docs/user-guide/concepts/bidirectional-streaming/models/nova_sonic/index.md) (2 shared tags)
- [Deploying Strands Agents to Amazon Bedrock AgentCore Runtime](/docs/user-guide/deploy/deploy_to_bedrock_agentcore/index.md) (2 shared tags)
- [Python Deployment to Amazon Bedrock AgentCore Runtime](/docs/user-guide/deploy/deploy_to_bedrock_agentcore/python/index.md) (2 shared tags)
- [TypeScript Deployment to Amazon Bedrock AgentCore Runtime](/docs/user-guide/deploy/deploy_to_bedrock_agentcore/typescript/index.md) (2 shared tags)
- [AgentCore Evaluation Dashboard Configuration](/docs/user-guide/evals-sdk/how-to/agentcore_evaluation_dashboard/index.md) (2 shared tags)
- [PII Redaction](/docs/user-guide/safety-security/pii-redaction/index.md) (2 shared tags)
- [Harmfulness Evaluator](/docs/user-guide/evals-sdk/evaluators/harmfulness_evaluator/index.md) (1 shared tag)
- [Refusal Evaluator](/docs/user-guide/evals-sdk/evaluators/refusal_evaluator/index.md) (1 shared tag)

Links are emitted as index.md siblings so an LLM following them stays on the markdown surface.

<head> JSON-LD TechArticle.relatedLink

Same 10 entries, same ordering, but as canonical HTML URLs:

{
  "@type": "TechArticle",
  "headline": "Guardrails",
  "keywords": "safety, bedrock, aws",
  "relatedLink": [
    "https://strandsagents.com/docs/user-guide/concepts/model-providers/amazon-bedrock/",
    "https://strandsagents.com/docs/user-guide/concepts/model-providers/amazon-nova/",
    /* ... */
  ]
}

HTML "See also" pills

Top 6 from the same ranking, filtered to score ≥ 0.4. On Guardrails this lands as 6 pills; on a thin-tag page (e.g. retry-strategies whose best score is 0.34) the strip is empty, by design.

Safety nets

  • Zod validation on tag registry — unknown tags fail the build at the page level.
  • Title-uniqueness lint in test/content-collection.test.ts — fails build on cross-section title collisions (the "two pages named Hooks" failure mode that produces ambiguous pills).
  • 14 algorithm tests in test/related-docs.test.ts covering scoring, tie-breaking, score floor, edge cases. 311 tests total in the repo.

Verification

  • npm run build — clean, no broken links across 515 pages
  • npm run typecheck — clean
  • npm test — 311 pass

Type of Change

  • New feature

Checklist

  • I have read the CONTRIBUTING document
  • My changes follow the project's documentation style
  • I have tested the documentation locally using npm run dev
  • Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ted/Implementation blocks into /<slug>/index.md and /llms-full.txt, and JSON-LD TechArticle keywords + relatedLink into <head>; No visible change; all headless updates
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-832/docs/user-guide/quickstart/overview/

Updated at: 2026-05-14T17:32:42.175Z

Comment thread src/components/overrides/Head.astro
Comment thread src/util/related-docs.ts
Comment thread src/content/docs/user-guide/concepts/agents/retry-strategies.mdx Outdated
Comment thread src/content/docs/user-guide/concepts/agents/state.mdx Outdated
Comment thread src/content/docs/user-guide/concepts/bidirectional-streaming/events.mdx Outdated
Comment thread src/content/docs/user-guide/concepts/tools/vended-tools.mdx Outdated
Comment thread src/content/docs/user-guide/deploy/operating-agents-in-production.mdx Outdated
Comment thread src/config/tags.yml
Replaces the earlier raw-overlap algorithm with a specificity-weighted
Jaccard scorer plus a confidence floor for the human-facing surface.

Algorithm (src/util/related-docs.ts):
- Score is rarity-weighted Jaccard: each shared tag's contribution scales
  with its rarity in the corpus (1 - freq/N), so a shared `bedrock` (8/108)
  outweighs a shared `aws` (14/108). Self-correcting if a tag bloats over
  time — its weight automatically drops.
- Headless surface (/<slug>/index.md, /llms-full.txt, JSON-LD relatedLink):
  top 10, no floor. LLMs benefit from wider recall and self-filter.
- Human surface ("See also" pill row): top 6 with score >= 0.4 floor.
  Empty strip is strictly better than a misleading one; the floor catches
  the spurious 1-broad-tag matches that earlier audits kept finding by
  hand.
- Specificity table memoized on the input array (cheap WeakMap hit).

Vocabulary (src/config/tags.yml):
- 32-tag registry organized by axis (topics / capabilities / lifecycle).
- Inline rule and authoring checklist at the head of the YAML — "tag =
  teaches, not mentions" with concrete failure-pattern examples drawn
  from earlier audit rounds.
- Build-time Zod validation (src/config/tags.ts); unknown tags fail.

Content:
- 108 of 123 user-guide pages tagged; 15 deliberately untagged
  (umbrella/policy pages and known structural cases).
- Two audit passes done; broad-tag misuses (production, aws on
  session-management, tool-execution on model providers, etc.) stripped.

Surface design (src/components/RelatedPagesInline.astro):
- Outlined pills wrapping to fit content. Small "Related" caption above.
- Mounted in MarkdownContent.astro above Starlight's Prev/Next pagination.
- No surface-aware filtering — algorithm is unaware of Pagination.

Lint:
- Title-uniqueness check in test/content-collection.test.ts catches
  cross-section title collisions (the kind that produce ambiguous
  "Hooks" pills).

311 tests pass; 14 cover the algorithm directly.
…t labels

- Drop "right outcome" / "to bring those pages back" prose from
  HUMAN_SCORE_FLOOR comment; keep the calibration note and a one-line
  hint to add sharper tags.
- Update describe/it labels in the headless test suite from "Jaccard" /
  "tag-overlap size" to the current language ("specificity-weighted
  Jaccard" / "score") so test output reflects the actual algorithm.

No behavior change. 14 tests still pass.
The visible pill row at the bottom of user-guide articles is removed.
Tag-driven Related Pages remains in the headless surfaces:

- `## Related pages` block in /<slug>/index.md and /llms-full.txt
- `relatedLink` array in <head> JSON-LD `TechArticle`

What this removes:
- src/components/RelatedPagesInline.astro (deleted)
- humanRelatedUserGuideFor() and its tests
- HUMAN_MAX, HUMAN_SCORE_FLOOR constants and the floor docblock
- import + render from MarkdownContent.astro

Tag authoring rule (src/config/tags.yml header) updated to reflect that
the surface is now headless-only.

306 tests pass; build clean; HTML body verified to contain zero traces of
the removed strip.
@notowen333 notowen333 merged commit 76c487b into strands-agents:main May 14, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants