Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
5487dd9
Add session-replay foundation (044, Phase 1 + Phase 2/T011-T021)
jaredmixpanel May 28, 2026
f50844e
Complete US1: ReplaysService + Workspace replay surface (044/T015-T032)
jaredmixpanel May 28, 2026
e35cc55
Add mp replays CLI commands + PR 1 ready (044/T033-T042)
jaredmixpanel May 28, 2026
bf4a246
Ship US2 + US4 + Phase 6 polish — PRs 2 and 3 ready (044/T043-T094)
jaredmixpanel May 28, 2026
85ba556
Rework rrweb_analyzer + fix events_for query shape (044/T055 follow-up)
jaredmixpanel May 28, 2026
015e14d
Add focused rrweb analyzer coverage (043 tests, all synthetic fixtures)
jaredmixpanel May 28, 2026
0762f40
Fix session-replay discovery: parse series, not the lossy .df (044)
jaredmixpanel May 29, 2026
06fb25d
Harden replay fetch + scope events window (044 QA findings #2–#4)
jaredmixpanel May 29, 2026
e27e5e6
Batch replay-fetch Insights queries (044 QA finding #5, MCP parity)
jaredmixpanel May 29, 2026
287fa22
QA the pm4py/tslearn extras: fix event_log contract, cluster crash, m…
jaredmixpanel May 29, 2026
77a8336
Remove pm4py + tslearn process-mining / clustering from session repla…
jaredmixpanel May 29, 2026
d7f96fa
Cut dead replay projections, harden the deterministic core (044)
jaredmixpanel Jun 5, 2026
416bcd1
Document the session-replay surface across README, docs, and plugin (…
jaredmixpanel Jun 5, 2026
0109796
Resolve PR #193 review: replay discovery, security, and doc fixes (044)
jaredmixpanel Jun 5, 2026
e128091
Resolve PR #193 review: public replay labels, docstrings, typed CLI e…
jaredmixpanel Jun 5, 2026
9de9de7
Address PR #193 review round 2: fix selector_label_fn + replay cleanu…
jaredmixpanel Jun 5, 2026
7a11354
Fix DOMTracker MAX_NODES guard: cap held only on first over-limit nod…
jaredmixpanel Jun 5, 2026
2512ac2
Address PR #193 Copilot review + strip session-replay phase residue (…
jaredmixpanel Jun 5, 2026
ad193f3
Type mobile-replay rejection as UnsupportedReplayFormatError (044)
jaredmixpanel Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"feature_directory": "specs/043-frictionless-auth"
"feature_directory": "specs/044-session-replay"
}
87 changes: 87 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Changelog

All notable changes to `mixpanel-headless` are recorded here. The format
loosely follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
this project follows semver but is currently pre-1.0, so minor versions
may include API changes.

## Unreleased — Session Replay (044, PRs 1–3)

### Added (PR 1 — Phase 1: discovery + signed CDN access)

- `Workspace.list_replays(distinct_id|replay_ids, from_date, to_date, limit)`
— discover replays for a user, or hydrate explicit IDs.
- `Workspace.sign_replay(id)` / `Workspace.sign_replays(ids, env)` —
sign replay IDs for CDN access via the
`/app/projects/<id>/replays/sign[/bulk]` endpoint.
- `Workspace.fetch_replay(id, env, retention_days, max_files,
include_mixpanel_events, event_properties, cdn_concurrency)` — sign +
parallel CDN walk + return a fully materialized `Replay`.
- `Workspace.stream_replay(id, …)` — sync iterator wrapping the async
CDN walker; re-signs on expiry by default.
- `Workspace.events_for_replay(id, event_properties)` and
`Workspace.events_for_replays(ids, event_properties)` — Mixpanel
events that overlap a replay's time window.
- New result types: `ReplaySummary`, `SignedReplay` (with
`query_string` masked in `__repr__`/`__str__` per FR-008/9),
`ReplayEvent`, `UserAction`, `Replay`.
- New exceptions: `SessionReplayError` (base) plus
`SessionReplayAccessError` (sensitive-data 403),
`SignedURLExpiredError`, `ReplayNotFoundError`. CLI exit-code mapping
added: sensitive-data → 2, replay-not-found → 4.
- New CLI commands: `mp replays list`, `mp replays events`,
`mp replays sign` (with `--reveal-signed-urls` opt-in that emits a
stderr warning on every invocation), `mp replays fetch [-o FILE]`.
- Depends on the undocumented `/app/projects/<id>/replays/sign[/bulk]`
endpoint — the same endpoint Mixpanel's own MCP server uses.

### Added (PR 2 — Phase 2: analyzer + ReplayBundle)

- `Workspace.fetch_replays(ids, …)` — parallel multi-replay fetch
returning a `ReplayBundle`.
- `Workspace.replays_for_user(distinct_id, from_date, to_date, …)` —
composition of `list_replays` + `fetch_replays`; defaults
`include_mixpanel_events=True`.
- `Workspace.analyze_replay(id)` — sugar for
`fetch_replay(id).summary_markdown`.
- `RrwebAnalyzer` (`_internal/replays/rrweb_analyzer.py`) — rrweb
event-stream analyzer producing normalized `UserAction` records +
markdown timeline. Handles click / input / scroll / navigate /
select / console_error event families with per-source debouncing
(scroll / input / selection at 1s each), plus a DOM tracker with
ancestor traversal and descriptive-attrs extraction for
human-readable target descriptions. Pure stdlib.
- `ReplayBundle` (`types.py`): five DataFrame projections
(`sessions_df`, `actions_df`, `events_df`, `mixpanel_df`,
`elements_df`); three aggregations (`top_clicks`, `rage_clicks`,
`long_pauses`); six chainable filters (`filter`, `where`,
`find_pattern`, `error_sessions`, `head`, `sample`);
`join_mixpanel_events`, `summary_markdown`, `compare`.
- Label functions: `default_label_fn`, `selector_label_fn`,
`url_normalizer` (public `replay_labels.py`, re-exported from the
top-level package). The URL normalizer collapses numeric / hex path
segments to `:id` so parameterized URLs aggregate cleanly across users.
- Module-level aggregators (`_internal/replays/aggregators.py`)
re-exposed via `ReplayBundle` methods.
- New CLI commands: `mp replays analyze` (markdown timeline /
`--format json` for action list) and `mp replays for-user
--include analyze --out-dir DIR` (the Mixpanel-events join is on by
default; opt out with `--no-mixpanel-events`).

### Security

- `SignedReplay.query_string` is a 5-minute bearer credential and is
masked in `__repr__`/`__str__`. The `--reveal-signed-urls` CLI opt-in
emits a stderr warning on every invocation (FR-008/9). The pre-merge
security audit greps the source tree for `Signature=` / `URLPrefix=`
/ `Expires=`; no leaks were found.

### Notes

- Mobile session replays are detected by the CDN walker (first event
lacks rrweb's `type`/`data`/`timestamp` keys) and surface as a
forward-compat `NotImplementedError` per error-messages.md §9.
- Live integration tests (`tests/integration/test_replays_live.py`) are
marked `@pytest.mark.live` and deselected by default; set
`MP_LIVE_TESTS=1` plus `MP_REPLAY_FIXTURE_DISTINCT_ID` to run them
against a fixture project.
16 changes: 12 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,11 @@ Services → DiscoveryService, LiveQueryService
Infrastructure → ConfigManager, MixpanelAPIClient
```

**Three capability areas:**
**Capability areas:**
- **Discovery**: Explore schema (events, properties, funnels, cohorts, bookmarks, schema graph)
- **Live queries & streaming**: Call Mixpanel API directly (segmentation, funnels, retention, user profiles), stream events and profiles
- **Entity CRUD & Data Governance**: Create, read, update, delete dashboards, reports (bookmarks), cohorts, feature flags, experiments, alerts, annotations, webhooks, Lexicon definitions, drop filters, custom properties, custom events, and lookup tables via App API
- **Session replay**: Discover, sign, fetch, and analyze rrweb session recordings (`Workspace.replays_for_user` / `fetch_replay`, `Replay` / `ReplayBundle`, `mp replays`)

## Package Structure

Expand All @@ -43,6 +44,7 @@ src/mixpanel_headless/
├── targets.py # `mp.targets` — saved (account, project, workspace?) cursors
├── exceptions.py # Exception hierarchy (incl. AccountInUseError, WorkspaceScopeError)
├── types.py # Result types (SegmentationResult, AccountSummary, …)
├── replay_labels.py # Public replay label helpers (default_label_fn, selector_label_fn, url_normalizer)
├── _internal/ # Private implementation (do not import directly)
│ ├── config.py # ConfigManager (TOML-backed)
│ ├── api_client.py # MixpanelAPIClient (Session-bound; per-request OAuth bearer)
Expand All @@ -61,14 +63,15 @@ src/mixpanel_headless/
│ │ ├── callback_server.py # Local HTTP callback server
│ │ └── client_registration.py # Dynamic Client Registration (RFC 7591)
│ ├── query/ # Query engine builders and validators
│ └── services/ # Discovery, LiveQuery services
│ ├── replays/ # Session-replay analyzer + aggregators (vendored rrweb); public label helpers live in replay_labels.py
│ └── services/ # Discovery, LiveQuery, Replays services
└── cli/
├── main.py # Typer entry point + global flags (-a / -p / -w / -t)
├── commands/ # account / project / workspace / target / session
│ # + query, inspect, dashboards, reports, cohorts, flags,
│ # experiments, alerts, annotations, webhooks, lexicon,
│ # drop-filters, custom-properties, custom-events,
│ # lookup-tables, schemas, business-context
│ # lookup-tables, schemas, business-context, replays
├── formatters.py # JSON, JSONL, Table, CSV, Plain output
└── utils.py # Error handling, console helpers
```
Expand Down Expand Up @@ -332,8 +335,13 @@ python help.py Filter # type fields + construction patterns + r
- N/A — query parameter types only, no persistence (040-query-engine-completeness)
- Python 3.10+ (mypy --strict) + httpx, Pydantic v2, Typer, Rich, Hypothesis, mutmut (043-frictionless-auth)
- TOML config (`~/.mp/config.toml`) + per-account state at `~/.mp/accounts/{name}/{tokens,client,me}.json` — schema unchanged from 042 (043-frictionless-auth)
- Python 3.10+ (mypy --strict) + httpx, Pydantic v2, pandas, Typer, Rich, Hypothesis, mutmut; vendored rrweb analyzer (pure stdlib) for session replay (044-session-replay)
- N/A — signed URLs are time-bounded bearer credentials handled in-process; no new on-disk persistence (044-session-replay)

<!-- SPECKIT START -->
**Active plan**: [`specs/043-frictionless-auth/plan.md`](specs/043-frictionless-auth/plan.md) — Frictionless Auth (`mp login` and `/me`-driven discovery). Single PR landing AIE-114/115/116/117 together.
Current plan: [specs/044-session-replay/plan.md](specs/044-session-replay/plan.md)

For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan.
<!-- SPECKIT END -->

12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ A complete programmable interface to Mixpanel analytics—Python library and CLI

Mixpanel's web UI is powerful for interactive exploration, but programmatic access requires navigating multiple REST endpoints with different conventions. **mixpanel_headless** provides a unified interface: discover your schema, run analytics queries, stream data, and manage entities—all through consistent Python methods or CLI commands.

Core analytics—typed Insights engine queries (DAU/WAU/MAU, formulas, filters, breakdowns, cohort-scoped queries, period-over-period comparison, frequency analysis), typed funnel queries (ad-hoc steps, exclusions, conversion windows), typed retention queries (event pairs, custom buckets, alignment modes), typed flow queries (path analysis, direction controls, visualization modes), typed user profile queries (property filtering, sorting, parallel fetching, aggregate statistics), segmentation, saved reports—plus entity management (dashboards, reports, cohorts, feature flags, experiments), and streaming data extraction.
Core analytics—typed Insights engine queries (DAU/WAU/MAU, formulas, filters, breakdowns, cohort-scoped queries, period-over-period comparison, frequency analysis), typed funnel queries (ad-hoc steps, exclusions, conversion windows), typed retention queries (event pairs, custom buckets, alignment modes), typed flow queries (path analysis, direction controls, visualization modes), typed user profile queries (property filtering, sorting, parallel fetching, aggregate statistics), segmentation, saved reports—plus entity management (dashboards, reports, cohorts, feature flags, experiments), streaming data extraction, and session replay (discover, fetch, and analyze rrweb session recordings).

## Installation

Expand Down Expand Up @@ -264,6 +264,12 @@ schemas = ws.list_schema_registry()
enforcement = ws.get_schema_enforcement()
audit = ws.run_audit()

# Session replay — discover, fetch, and analyze rrweb recordings
bundle = ws.replays_for_user("user-42", from_date="2025-01-01", to_date="2025-01-31")
print(bundle.sessions_df) # one row per session: duration, clicks, errors
print(bundle.replays[0].summary_markdown) # LLM-friendly action timeline
print(bundle.top_clicks(10)) # most-clicked elements across the bundle

# Stream events for processing
for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
process(event)
Expand Down Expand Up @@ -323,6 +329,8 @@ for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):

**`mp schemas`** — Schema registry management: `list`, `create`, `create-bulk`, `update`, `update-bulk`, `delete`

**`mp replays`** — Session replay: `list` (discover a user's replays or hydrate explicit IDs), `events` (Mixpanel events during a replay window), `sign` (sign replay IDs for CDN access — redacted by default), `fetch` (pull raw rrweb bytes), `analyze` (render the markdown action timeline), `for-user` (discover + fetch + analyze in one call)

All commands support `--format` (`json`, `jsonl`, `table`, `csv`, `plain`) and `--help`.

### Filtering with --jq
Expand Down Expand Up @@ -352,6 +360,7 @@ Full documentation: [mixpanel.github.io/mixpanel-headless](https://mixpanel.gith
- [Retention Queries](https://mixpanel.github.io/mixpanel-headless/guide/query-retention/) — Typed retention analysis with event pairs, custom buckets, alignment modes
- [Flow Queries](https://mixpanel.github.io/mixpanel-headless/guide/query-flows/) — Typed flow path analysis with direction controls, visualization modes
- [User Profile Queries](https://mixpanel.github.io/mixpanel-headless/guide/query-users/) — Profile filtering, sorting, parallel fetching, aggregate statistics
- [Session Replay](https://mixpanel.github.io/mixpanel-headless/guide/session-replay/) — Discover, fetch, and analyze rrweb session recordings
- [CLI Reference](https://mixpanel.github.io/mixpanel-headless/cli/)
- [Python API](https://mixpanel.github.io/mixpanel-headless/api/)
- [Streaming Guide](https://mixpanel.github.io/mixpanel-headless/guide/streaming/)
Expand All @@ -371,6 +380,7 @@ Key design features:
- **Consistent interfaces**: Same operations available as Python methods and CLI commands
- **Structured output**: All CLI commands support `--format json` for machine-readable responses, plus `--jq` for inline filtering
- **Streaming data extraction**: Memory-efficient iterators for events and profiles
- **Session replay**: Discover a user's rrweb recordings, fetch the raw event stream, and project them into session-level DataFrames plus an LLM-friendly action timeline (`replays_for_user`, `Replay`, `ReplayBundle`); signed CDN URLs are masked by default and never logged
- **Three first-class account types**: `service_account` (Basic Auth) for unattended automation, `oauth_browser` (PKCE flow with auto-refreshed tokens) for interactive use, `oauth_token` (static bearer) for CI / agents
- **Typed exceptions**: Error codes and context for programmatic handling

Expand Down
Loading
Loading