Skip to content

Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons#229

Draft
venkatkrish543re wants to merge 9 commits into
strands-agents:mainfrom
venkatkrish543re:feature/skills-aggregator
Draft

Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons#229
venkatkrish543re wants to merge 9 commits into
strands-agents:mainfrom
venkatkrish543re:feature/skills-aggregator

Conversation

@venkatkrish543re
Copy link
Copy Markdown

@venkatkrish543re venkatkrish543re commented May 14, 2026

No description provided.

ybdarrenwang and others added 9 commits May 6, 2026 19:21
Adds skills/ subpackage providing paired-comparison aggregation for evaluating agent skills against a baseline. Mirrors the chaos aggregator pattern from feature/aggregator-demo-2.

- SkillEvalAggregator with Wilcoxon, paired-t, and McNemar tests

- Bootstrap CI on the mean delta (1000 resamples)

- Corruption filtering before paired statistics

- SkillEvalExperiment composes base Experiment

- Rich-based interactive display

- 44 unit tests covering paired stats, corruption filtering, pairing, and serialization

Closes strands-agents#228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants