Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 10 additions & 160 deletions mixpanel-plugin/skills/mixpanelyst/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: mixpanelyst
description: This skill should be used when the user asks about Mixpanel product analytics, event data, funnel analysis, retention curves, cohort analysis, segmentation queries, user behavior, conversion rates, churn, DAU/MAU, ARPU, revenue metrics, feature adoption, A/B test results, user paths, flow analysis, or any request to query, explore, visualize, or analyze Mixpanel data using Python. Also use when the user asks to read, write, or manage Mixpanel "business context" β€” the markdown documentation that grounds AI assistants in an organization's structure and goals.
description: "Build segmentation queries, generate retention curves, create funnel reports, analyze user flows, explore cohort behavior, and manage entity CRUD operations via the Mixpanel API using Python. Use when the user asks about Mixpanel product analytics, event data, funnel analysis, retention curves, cohort analysis, segmentation queries, user behavior, conversion rates, churn, DAU/MAU, ARPU, revenue metrics, feature adoption, A/B test results, user paths, flow analysis, or any request involving Mixpanel data using the mixpanel_headless library. Also use when the user asks to read, write, or manage Mixpanel business context β€” the markdown documentation that grounds AI assistants in an organization's structure and goals. Do not use for general analytics questions unrelated to Mixpanel or its API."
allowed-tools: Bash Read Write WebFetch
---

Expand Down Expand Up @@ -198,59 +198,13 @@ When string properties have complex/unreadable values (e.g., campaign names from

### Custom Property Formula Reference

Formulas use a SQL-like expression language. Variables (A, B, _A, etc.) map to properties via `composedProperties`.

**Variable binding:** `LET(name, expression, body)` β€” define intermediate results:
```
LET(raw, A, REGEX_REPLACE(raw, "pattern", "replacement"))
LET(x, A * B, IFS(x < 50, "low", x < 200, "mid", TRUE, "high"))
```

**Conditionals:** `IF(cond, then, else)`, `IFS(cond1, val1, cond2, val2, ..., TRUE, default)`

**String functions:** `UPPER(s)`, `LOWER(s)`, `LEN(s)`, `LEFT(s, n)`, `RIGHT(s, n)`, `MID(s, start, count)`, `SPLIT(s, delim, n)`, `HAS_PREFIX(s, p)`, `HAS_SUFFIX(s, p)`, `PARSE_URL(s, "domain")`

**Regex functions (PCRE2 engine):**
- `REGEX_MATCH(haystack, pattern)` β€” returns true/false
- `REGEX_EXTRACT(haystack, pattern, capture_group)` β€” returns match or capture group
- `REGEX_REPLACE(haystack, pattern, replacement)` β€” replaces all matches
Formulas use a SQL-like expression language with `LET`, `IF`/`IFS`, string/regex/math/date/list functions, and `DEFINED()` for null checks. See `references/custom-property-formulas.md` for the complete function reference with examples.

**Regex engine quirks (Mixpanel-specific):**
- **Case-insensitive by default** β€” use `(?-i)` to switch to case-sensitive matching within a pattern
- **Backreferences work** β€” `$1`, `$2` capture groups and `$0` whole-match all work in `REGEX_REPLACE` replacements
- **`{n,m}` quantifiers conflict with formula syntax** β€” curly braces are parsed as formula constructs. Use repeated character classes instead (e.g., `[0-9][0-9][0-9][0-9]` instead of `[0-9]{4}`)
- **Case-insensitive by default** β€” use `(?-i)` to switch to case-sensitive
- **`{n,m}` quantifiers conflict with formula syntax** β€” use repeated character classes (e.g., `[0-9][0-9][0-9][0-9]` instead of `[0-9]{4}`)
- **`\d`, `\w` shorthand classes don't work** β€” use `[0-9]`, `[A-Za-z0-9_]` explicitly
- **Escape backslashes carefully** β€” in formula strings, `\\\\` may be needed for a literal `\` depending on how the formula is constructed (Python string β†’ JSON β†’ regex engine)

**CamelCase splitting** — insert space between lowercase→uppercase boundaries:
```
REGEX_REPLACE(text, "(?-i)([a-z])([A-Z])", "$1 $2")
// ChickenSundaysApril β†’ Chicken Sundays April
```

**Practical multi-step cleanup example** (campaign names from Braze):
```
LET(s1, REGEX_REPLACE(A, "^[0-9][0-9][0-9][0-9][0-9]*_", ""),
LET(s2, REGEX_REPLACE(s1, "^(NW|TARGETED|REGIONAL|NTL)_", ""),
LET(s3, REGEX_REPLACE(s2, "_(Push|Email|NotificationCenter|ModalInAppMessage)_.*$", ""),
LET(s4, REGEX_REPLACE(s3, "_", " "),
REGEX_REPLACE(s4, " +", " ")
))))
```

**Type functions:** `STRING(x)`, `NUMBER(x)`, `BOOLEAN(x)`, `DEFINED(x)`

**Math:** `+`, `-`, `*`, `/`, `%`, `MIN(a,b)`, `MAX(a,b)`, `FLOOR(n)`, `CEIL(n)`, `ROUND(n)`

**Date:** `DATEDIF(start, end, unit)` β€” units: D, M, Y, MD, YM, YD. `TODAY()` for current date.

**List:** `SUM(list)`, `ANY(x, list, expr)`, `ALL(x, list, expr)`, `FILTER(x, list, expr)`, `MAP(x, list, expr)`

**Comparison:** `==`, `!=`, `<`, `>`, `<=`, `>=` (case-insensitive for strings), `IN` for list membership

**Logical:** `AND`, `OR`, `NOT(x)`

**Constants:** `TRUE`, `FALSE`, `UNDEFINED`
- **Backreferences work** β€” `$1`, `$2` in `REGEX_REPLACE` replacements

**Creating a custom property via the API:**
```python
Expand Down Expand Up @@ -840,87 +794,13 @@ result = ws.query_flow("Login", forward=3,

### Legacy Queries & Counts

These use older APIs. Prefer the typed query methods above when possible.

```python
def segmentation(self, event: str, *, from_date: str, to_date: str, on: str | None = None, unit: Literal['day', 'week', 'month'] = 'day', where: str | None = None) -> SegmentationResult: ...
def funnel(self, funnel_id: int, *, from_date: str, to_date: str, unit: str | None = None, on: str | None = None) -> FunnelResult: ...
def retention(self, *, born_event: str, return_event: str, from_date: str, to_date: str, born_where: str | None = None, return_where: str | None = None, interval: int = 1, interval_count: int = 10, unit: Literal['day', 'week', 'month'] = 'day') -> RetentionResult: ...
def event_counts(self, events: list[str], *, from_date: str, to_date: str, type: Literal['general', 'unique', 'average'] = 'general', unit: Literal['day', 'week', 'month'] = 'day') -> EventCountsResult: ...
def property_counts(self, event: str, property_name: str, *, from_date: str, to_date: str, type: Literal['general', 'unique', 'average'] = 'general', unit: Literal['day', 'week', 'month'] = 'day', values: list[str] | None = None, limit: int | None = None) -> PropertyCountsResult: ...
def frequency(self, *, from_date: str, to_date: str, unit: Literal['day', 'week', 'month'] = 'day', addiction_unit: Literal['hour', 'day'] = 'hour', event: str | None = None, where: str | None = None) -> FrequencyResult: ...
def activity_feed(self, distinct_ids: list[str], *, from_date: str | None = None, to_date: str | None = None) -> ActivityFeedResult: ...
def query_saved_report(self, bookmark_id: int, *, bookmark_type: Literal['insights', 'funnels', 'retention', 'flows'] = 'insights', from_date: str | None = None, to_date: str | None = None) -> SavedReportResult: ...
def query_saved_flows(self, bookmark_id: int) -> FlowsResult: ...
def segmentation_numeric(self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal['hour', 'day'] = 'day', where: str | None = None, type: Literal['general', 'unique', 'average'] = 'general') -> NumericBucketResult: ...
def segmentation_sum(self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal['hour', 'day'] = 'day', where: str | None = None) -> NumericSumResult: ...
def segmentation_average(self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal['hour', 'day'] = 'day', where: str | None = None) -> NumericAverageResult: ...
```
Older API wrappers (`segmentation`, `funnel`, `retention`, `event_counts`, `property_counts`, `frequency`, `activity_feed`, `query_saved_report`, `query_saved_flows`). Prefer the typed query methods above. See `references/entity-crud-reference.md` for full signatures.

### Entity CRUD (App API)

All entity methods require a workspace ID. Use `python3 ${CLAUDE_SKILL_DIR}/scripts/help.py Workspace.<method>` for full signatures and parameter types.
User Guide: `WebFetch(url="https://mixpanel.github.io/mixpanel-headless/guide/entity-management/index.md")`

#### Dashboard (β†’ `Dashboard`)

`list_dashboards`, `create_dashboard`, `get_dashboard`, `update_dashboard`, `delete_dashboard`, `bulk_delete_dashboards`, `favorite_dashboard`, `unfavorite_dashboard`, `pin_dashboard`, `unpin_dashboard`, `add_report_to_dashboard`, `remove_report_from_dashboard`, `update_text_card`, `update_report_link`

**Blueprints:** `list_blueprint_templates` β†’ `list[BlueprintTemplate]`, `create_blueprint`, `get_blueprint_config`, `update_blueprint_cohorts`, `finalize_blueprint`, `create_rca_dashboard`

**Helpers:** `get_bookmark_dashboard_ids` β†’ `list[int]`, `get_dashboard_erf` β†’ `dict`

#### Bookmark / Report (β†’ `Bookmark`)

`list_bookmarks_v2`, `create_bookmark`, `get_bookmark`, `update_bookmark`, `delete_bookmark`, `bulk_delete_bookmarks`, `bulk_update_bookmarks`, `bookmark_linked_dashboard_ids` β†’ `list[int]`, `get_bookmark_history` β†’ `BookmarkHistoryResponse`

#### Cohort (β†’ `Cohort`)

`list_cohorts_full`, `get_cohort`, `create_cohort`, `update_cohort`, `delete_cohort`, `bulk_delete_cohorts`, `bulk_update_cohorts`
All entity methods require a workspace ID. Covers dashboards, bookmarks, cohorts, feature flags, experiments, alerts, annotations, webhooks, Lexicon (event/property definitions, tags, drop filters, custom properties/events, lookup tables, schema registry/enforcement), audit, and data deletion. Use `python3 ${CLAUDE_SKILL_DIR}/scripts/help.py Workspace.<method>` for full signatures.

#### Feature Flag (β†’ `FeatureFlag`)

`list_feature_flags`, `create_feature_flag`, `get_feature_flag`, `update_feature_flag`, `delete_feature_flag`, `archive_feature_flag`, `restore_feature_flag`, `duplicate_feature_flag`, `set_flag_test_users`, `get_flag_history` β†’ `FlagHistoryResponse`, `get_flag_limits` β†’ `FlagLimitsResponse`

#### Experiment (β†’ `Experiment`)

`list_experiments`, `create_experiment`, `get_experiment`, `update_experiment`, `delete_experiment`, `launch_experiment`, `conclude_experiment`, `decide_experiment`, `archive_experiment`, `restore_experiment`, `duplicate_experiment`, `list_erf_experiments` β†’ `list[dict]`

#### Alert (β†’ `CustomAlert`)

`list_alerts`, `create_alert`, `get_alert`, `update_alert`, `delete_alert`, `bulk_delete_alerts`, `get_alert_count` β†’ `AlertCount`, `get_alert_history` β†’ `AlertHistoryResponse`, `test_alert`, `get_alert_screenshot_url`, `validate_alerts_for_bookmark`

#### Annotation (β†’ `Annotation`)

`list_annotations`, `create_annotation`, `get_annotation`, `update_annotation`, `delete_annotation`, `list_annotation_tags` β†’ `list[AnnotationTag]`, `create_annotation_tag`

#### Webhook (β†’ `ProjectWebhook`)

`list_webhooks`, `create_webhook`, `update_webhook`, `delete_webhook`, `test_webhook`

#### Lexicon & Data Governance

**Event/Property Definitions:** `get_event_definitions`, `update_event_definition`, `delete_event_definition`, `bulk_update_event_definitions`, `get_property_definitions`, `update_property_definition`, `bulk_update_property_definitions`, `export_lexicon`, `get_event_history`, `get_property_history`

**Tags:** `list_lexicon_tags`, `create_lexicon_tag`, `update_lexicon_tag`, `delete_lexicon_tag`

**Drop Filters:** `list_drop_filters`, `create_drop_filter`, `update_drop_filter`, `delete_drop_filter`, `get_drop_filter_limits`

**Custom Properties:** `list_custom_properties`, `create_custom_property`, `get_custom_property`, `update_custom_property`, `delete_custom_property`, `validate_custom_property`

**Custom Events:** `list_custom_events`, `update_custom_event`, `delete_custom_event`

**Lookup Tables:** `list_lookup_tables`, `upload_lookup_table`, `download_lookup_table`, `update_lookup_table`, `delete_lookup_tables`

**Schema Registry:** `list_schema_registry`, `create_schema`, `update_schema`, `create_schemas_bulk`, `update_schemas_bulk`, `delete_schemas`

**Schema Enforcement:** `get_schema_enforcement`, `init_schema_enforcement`, `update_schema_enforcement`, `replace_schema_enforcement`, `delete_schema_enforcement`

**Audit & Monitoring:** `run_audit`, `run_audit_events_only`, `list_data_volume_anomalies`, `update_anomaly`, `bulk_update_anomalies`

**Data Deletion:** `list_deletion_requests`, `create_deletion_request`, `cancel_deletion_request`, `preview_deletion_filters`

**Other:** `get_tracking_metadata`
See `references/entity-crud-reference.md` for the complete method listing by entity type.

### Business Context

Expand Down Expand Up @@ -1001,41 +881,11 @@ Full reference: `WebFetch(url="https://mixpanel.github.io/mixpanel-headless/api/

## Statistical Analysis β€” numpy, scipy

All query results produce pandas DataFrames, which integrate directly with numpy and scipy:

```python
import numpy as np
from scipy import stats

# Compare two segments
a = result.df[result.df["platform"] == "iOS"]["count"]
b = result.df[result.df["platform"] == "Android"]["count"]
t_stat, p_value = stats.ttest_ind(a, b)
cohens_d = (a.mean() - b.mean()) / np.sqrt((a.std()**2 + b.std()**2) / 2)

# Useful scipy.stats tests: ttest_ind, mannwhitneyu, chi2_contingency, pearsonr, spearmanr
# Useful numpy: np.percentile, np.corrcoef, np.polyfit (trend lines)
```
All query results produce pandas DataFrames via `.df`, compatible with numpy and scipy for statistical tests (`ttest_ind`, `mannwhitneyu`, `chi2_contingency`, `pearsonr`) and trend analysis (`np.polyfit`).

## Visualization β€” matplotlib, seaborn

Save charts to files for the user. Always use a non-interactive backend:

```python
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(figsize=(10, 5))
result.df.plot(x="date", y="count", ax=ax)
ax.set_title("Daily Logins")
fig.savefig("chart.png", dpi=150, bbox_inches="tight")
plt.close(fig)

# seaborn: sns.lineplot, sns.barplot, sns.heatmap (for retention matrices)
# Multi-panel: fig, axes = plt.subplots(2, 2) for dashboard-style layouts
```
Save charts to files. Always use `matplotlib.use("Agg")` before importing pyplot. Use `sns.heatmap` for retention matrices, `sns.barplot` for segment comparisons, and multi-panel layouts (`plt.subplots(2, 2)`) for dashboard-style views.

## Exceptions

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Custom Property Formula Reference

Formulas use a SQL-like expression language. Variables (A, B, _A, etc.) map to properties via `composedProperties`.

## Variable Binding

`LET(name, expression, body)` β€” define intermediate results:
```
LET(raw, A, REGEX_REPLACE(raw, "pattern", "replacement"))
LET(x, A * B, IFS(x < 50, "low", x < 200, "mid", TRUE, "high"))
```

## Conditionals

`IF(cond, then, else)`, `IFS(cond1, val1, cond2, val2, ..., TRUE, default)`

## String Functions

`UPPER(s)`, `LOWER(s)`, `LEN(s)`, `LEFT(s, n)`, `RIGHT(s, n)`, `MID(s, start, count)`, `SPLIT(s, delim, n)`, `HAS_PREFIX(s, p)`, `HAS_SUFFIX(s, p)`, `PARSE_URL(s, "domain")`

## Regex Functions (PCRE2 engine)

- `REGEX_MATCH(haystack, pattern)` β€” returns true/false
- `REGEX_EXTRACT(haystack, pattern, capture_group)` β€” returns match or capture group
- `REGEX_REPLACE(haystack, pattern, replacement)` β€” replaces all matches

## Type Functions

`STRING(x)`, `NUMBER(x)`, `BOOLEAN(x)`, `DEFINED(x)`

## Math

`+`, `-`, `*`, `/`, `%`, `MIN(a,b)`, `MAX(a,b)`, `FLOOR(n)`, `CEIL(n)`, `ROUND(n)`

## Date

`DATEDIF(start, end, unit)` β€” units: D, M, Y, MD, YM, YD. `TODAY()` for current date.

## List

`SUM(list)`, `ANY(x, list, expr)`, `ALL(x, list, expr)`, `FILTER(x, list, expr)`, `MAP(x, list, expr)`

## Comparison, Logical & Constants

`==`, `!=`, `<`, `>`, `<=`, `>=` (case-insensitive for strings), `IN` for list membership. `AND`, `OR`, `NOT(x)`. Constants: `TRUE`, `FALSE`, `UNDEFINED`.

## Examples

**CamelCase splitting:** `REGEX_REPLACE(text, "(?-i)([a-z])([A-Z])", "$1 $2")` β†’ "ChickenSundaysApril" becomes "Chicken Sundays April"

**Multi-step cleanup** (campaign names from Braze): Chain `LET` + `REGEX_REPLACE` to strip date prefixes, targeting codes, channel suffixes, and underscores in a single formula.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Escape-backslash gotcha silently dropped

The original SKILL.md had an explicit warning about needing \\\\ for a literal \ depending on how the formula is constructed (Python string β†’ JSON β†’ regex engine). This is a common stumbling block for agents writing formulas with literal backslashes, and it appears in neither the condensed SKILL.md regex-quirks block nor this reference file. Agents hitting this will get silent, hard-to-debug mismatch errors.

Comment on lines +48 to +51
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Concrete Braze cleanup example replaced with prose

The original file had the full LET + REGEX_REPLACE chain for stripping date prefixes, targeting codes, and channel suffixes from Braze campaign names. The replacement here describes what it does but omits the actual formula. Agents prompted to clean up campaign names will have no runnable pattern to work from, defeating the purpose of this reference file.

Loading