Add experiment-setup skill by elliotrfeinberg · Pull Request #24 · mixpanel/ai-plugins

elliotrfeinberg · 2026-06-04T23:44:06Z

Summary

Adds a new experiment-setup skill at plugins/mixpanel-mcp/skills/experiment-setup/, ported from the staging branch in mixpanel/analytics#95516.

Linear: MULTI-581 — experiment-setup skill in ai-plugins

What the skill does

A single skill covering setup-phase expertise for Mixpanel experiments — hypothesis framing, metric roles, statistical model, sizing, advanced features (CUPED / Winsorization / Bonferroni), XP-vs-FF routing, prior-experiment reuse, and pre-launch pitfalls. Replaces the per-capability tools listed in MULTI-581 (get_experiment_setup_guidance, check_statistical_config, recommend_advanced_features, parse_hypothesis, run_pre_launch_checks, explain_health_check).

Structured for progressive disclosure:

SKILL.md — skill entry point. Routes to references on demand.
references/hypothesis-framing.md — coaches the "Changing X will increase Y because Z" structure.
references/metric-selection.md — primary / guardrail / secondary roles, lagging-indicator trap.
references/sizing.md — MDE, power, baseline rate, traffic — Kohavi's inverted formula.
references/statistical-model.md — sequential vs fixed-horizon, confidence level.
references/advanced-features.md — CUPED, Winsorization, Bonferroni / Benjamini-Hochberg.
references/routing-xp-vs-ff.md — when the user wants an experiment vs a plain feature flag.
references/prior-experiments.md — how to fold prior results into a new design.
references/pitfalls.md — the deterministic check catalogue and the >5% guardrail hard-gate rationale.
evals/ — three fixtures (Pelando, Confetti, Polarsteps) seeded from PRD scenarios.

Content ports from ai/engine/tools/experiments/_guidance/setup.md and _shared/pitfall_prose.py in mixpanel/analytics.

Sync

mixpanel-mcp-eu and mixpanel-mcp-in are synced via make sync-skills FORCE=1. make check-skills-sync passes locally. The top-level README.md skills table includes the new row.

Open follow-ups (from the staging README)

Eval fixtures use placeholder customer quotes. The originating bot did not have PRD access; the fixture shape and expected_behavior checklists are real, but the verbatim Pelando / Confetti / Polarsteps quotes need a human pass before merge.
Tool-name normalisation. SKILL.md and references/prior-experiments.md reference tool names from MULTI-588 / MULTI-589 (search_prior_experiments, validate_experiment). If those names change before merge, update both.

Test plan

Read SKILL.md end-to-end; confirm trigger phrasings don't over-trigger on analyze-experiment requests.
Spot-check each reference vs canonical ai/engine/tools/experiments/_guidance/setup.md.
Confirm evals/ README and three fixture files are usable as a starting scaffold.
Verify the >5% guardrail hard-gate language in references/pitfalls.md matches product's customer-facing copy.

🤖 Generated with Claude Code

Assisted by Claude

Authors a single skill covering setup-phase expertise for Mixpanel experiments: hypothesis framing, metric roles, statistical model, sizing, advanced features (CUPED / Winsorization / Bonferroni), XP-vs-FF routing, prior-experiment reuse, and pre-launch pitfalls. Replaces several per-capability tools per MULTI-581. Structured for progressive disclosure: the spine lives in SKILL.md, with deep-dive references for each topic and eval fixtures scaffolded from PRD customer scenarios (Pelando, Confetti, Polarsteps — real quotes still need to be filled in). Synced to mixpanel-mcp-eu and mixpanel-mcp-in via make sync-skills. Linear: https://linear.app/mixpanel/issue/MULTI-581 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Replace explicit tool name references (search_prior_experiments, validate_experiment, run_pre_launch_checks, Run-Query) with agent-agnostic phrasing per the convention from #22. Skills describe capabilities ("query the metric", "search prior experiments", "the platform's pre-launch validation") rather than specific tool calls. - Drop evals/ directory — this repo doesn't run evals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The spine is always loaded; references are lazy. Move spine content that duplicated reference material out: - Drop hypothesis examples + 5-item commitment list (in hypothesis-framing.md) - Drop sizing worked example arithmetic (in sizing.md) - Collapse Step 4 confidence-level + multiple-testing detail to one line each + pointer to statistical-model.md - Collapse the 3-paragraph "Advanced features" detail to two lines + pointer to advanced-features.md - Drop the enumerated warnings list and edge-cases section (covered in pitfalls.md / sizing.md / metric-selection.md) - Drop the redundant "References" appendix at the bottom (every reference is already linked inline from its step) SKILL.md: 243 → 151 lines. Hypothesis template, metric roles, sizing formula, statistical-model defaults, pitfall blockers, and the output summary template all preserved. All 8 references still linked from the spine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kaan-barmore-genc-mixpanel

Fine, although same concern I had on #23 with files being duplicated across regions.

Surfaces sibling-skill boundaries (experiment-results for post-launch, feature-flags for plain rollouts) at routing time. The exclusions were buried in the body where the routing pass never reaches them. Sync mixpanel-mcp-eu and mixpanel-mcp-in. Assisted by Claude

elliotrfeinberg and others added 2 commits June 4, 2026 23:43

elliotrfeinberg marked this pull request as ready for review June 5, 2026 01:20

elliotrfeinberg requested review from gslopez and kaan-barmore-genc-mixpanel June 5, 2026 01:21

kaan-barmore-genc-mixpanel previously approved these changes Jun 5, 2026

View reviewed changes

elliotrfeinberg added 2 commits June 5, 2026 23:37

Merge main into experiment-setup-skill

fb1ea8b

elliotrfeinberg dismissed kaan-barmore-genc-mixpanel’s stale review via 2334791 June 5, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experiment-setup skill#24

Add experiment-setup skill#24
elliotrfeinberg wants to merge 5 commits into
mainfrom
experiment-setup-skill

elliotrfeinberg commented Jun 4, 2026

Uh oh!

kaan-barmore-genc-mixpanel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

elliotrfeinberg commented Jun 4, 2026

Summary

What the skill does

Sync

Open follow-ups (from the staging README)

Test plan

Uh oh!

kaan-barmore-genc-mixpanel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants