Add experiment-setup skill#24
Open
elliotrfeinberg wants to merge 5 commits into
Open
Conversation
Authors a single skill covering setup-phase expertise for Mixpanel experiments: hypothesis framing, metric roles, statistical model, sizing, advanced features (CUPED / Winsorization / Bonferroni), XP-vs-FF routing, prior-experiment reuse, and pre-launch pitfalls. Replaces several per-capability tools per MULTI-581. Structured for progressive disclosure: the spine lives in SKILL.md, with deep-dive references for each topic and eval fixtures scaffolded from PRD customer scenarios (Pelando, Confetti, Polarsteps — real quotes still need to be filled in). Synced to mixpanel-mcp-eu and mixpanel-mcp-in via make sync-skills. Linear: https://linear.app/mixpanel/issue/MULTI-581 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace explicit tool name references (search_prior_experiments, validate_experiment, run_pre_launch_checks, Run-Query) with agent-agnostic phrasing per the convention from #22. Skills describe capabilities ("query the metric", "search prior experiments", "the platform's pre-launch validation") rather than specific tool calls. - Drop evals/ directory — this repo doesn't run evals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The spine is always loaded; references are lazy. Move spine content that duplicated reference material out: - Drop hypothesis examples + 5-item commitment list (in hypothesis-framing.md) - Drop sizing worked example arithmetic (in sizing.md) - Collapse Step 4 confidence-level + multiple-testing detail to one line each + pointer to statistical-model.md - Collapse the 3-paragraph "Advanced features" detail to two lines + pointer to advanced-features.md - Drop the enumerated warnings list and edge-cases section (covered in pitfalls.md / sizing.md / metric-selection.md) - Drop the redundant "References" appendix at the bottom (every reference is already linked inline from its step) SKILL.md: 243 → 151 lines. Hypothesis template, metric roles, sizing formula, statistical-model defaults, pitfall blockers, and the output summary template all preserved. All 8 references still linked from the spine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kaan-barmore-genc-mixpanel
previously approved these changes
Jun 5, 2026
kaan-barmore-genc-mixpanel
left a comment
There was a problem hiding this comment.
Fine, although same concern I had on #23 with files being duplicated across regions.
Surfaces sibling-skill boundaries (experiment-results for post-launch, feature-flags for plain rollouts) at routing time. The exclusions were buried in the body where the routing pass never reaches them. Sync mixpanel-mcp-eu and mixpanel-mcp-in. Assisted by Claude
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
experiment-setupskill atplugins/mixpanel-mcp/skills/experiment-setup/, ported from the staging branch in mixpanel/analytics#95516.What the skill does
A single skill covering setup-phase expertise for Mixpanel experiments — hypothesis framing, metric roles, statistical model, sizing, advanced features (CUPED / Winsorization / Bonferroni), XP-vs-FF routing, prior-experiment reuse, and pre-launch pitfalls. Replaces the per-capability tools listed in MULTI-581 (
get_experiment_setup_guidance,check_statistical_config,recommend_advanced_features,parse_hypothesis,run_pre_launch_checks,explain_health_check).Structured for progressive disclosure:
SKILL.md— skill entry point. Routes to references on demand.references/hypothesis-framing.md— coaches the "Changing X will increase Y because Z" structure.references/metric-selection.md— primary / guardrail / secondary roles, lagging-indicator trap.references/sizing.md— MDE, power, baseline rate, traffic — Kohavi's inverted formula.references/statistical-model.md— sequential vs fixed-horizon, confidence level.references/advanced-features.md— CUPED, Winsorization, Bonferroni / Benjamini-Hochberg.references/routing-xp-vs-ff.md— when the user wants an experiment vs a plain feature flag.references/prior-experiments.md— how to fold prior results into a new design.references/pitfalls.md— the deterministic check catalogue and the >5% guardrail hard-gate rationale.evals/— three fixtures (Pelando, Confetti, Polarsteps) seeded from PRD scenarios.Content ports from
ai/engine/tools/experiments/_guidance/setup.mdand_shared/pitfall_prose.pyinmixpanel/analytics.Sync
mixpanel-mcp-euandmixpanel-mcp-inare synced viamake sync-skills FORCE=1.make check-skills-syncpasses locally. The top-levelREADME.mdskills table includes the new row.Open follow-ups (from the staging README)
SKILL.mdandreferences/prior-experiments.mdreference tool names from MULTI-588 / MULTI-589 (search_prior_experiments,validate_experiment). If those names change before merge, update both.Test plan
SKILL.mdend-to-end; confirm trigger phrasings don't over-trigger onanalyze-experimentrequests.ai/engine/tools/experiments/_guidance/setup.md.evals/README and three fixture files are usable as a starting scaffold.references/pitfalls.mdmatches product's customer-facing copy.🤖 Generated with Claude Code
Assisted by Claude