Garmin Wearable Analytics

Garmin Wearable Analytics is a privacy-first case study built on local Garmin exports. It turns messy aggregate JSON and minute-level FIT monitoring files into curated parquet tables, applies sanitization and quality gating before analysis, and uses notebook-driven EDA plus time-aware modeling to surface interpretable behavioral and recovery patterns. The project is packaged as a balanced DS/DA portfolio artifact that combines analytical depth with reproducible engineering practices.

If you open only one file after this page, start with the case study.

Portfolio Highlights

677 daily Garmin rows across 2023-05-26 to 2026-05-18, with privacy and quality gates before analysis.
1.56M+ minute-level FIT monitoring observations: 675,325 heart-rate rows and 889,323 stress rows.
589-row Stage 4 monitoring quality index with separate core/full feature tables for leakage-aware modeling.
Stage 4 Huber regression improves fixed-future-holdout next-sleep stress MAE by 15.8% versus a preselected median baseline.
Reproducible Python package with CLI workflows, SQL mart outputs, notebooks, docs, CI, and 132 passing tests in the latest local run.

What This Project Demonstrates

Robust ingestion and normalization of heterogeneous wearable exports (UDS + sleep JSON) into stable day-level tables
Privacy-aware preprocessing, with sanitization treated as a hard boundary before sharing or analysis
Quality labeling and artifact review, including strict vs loose readiness logic and suspicious-day triage
SQL-first analytics layer (DuckDB primary + compact PostgreSQL showcase) with CTE/window/view patterns
Structured EDA across coverage, time series, distributions, segmentation, and directed relationship analysis
Time-aware Stage 3 extension with statistical validation plus classification/regression baselines
Stage 4 monitoring extension with minute-level HR/stress FIT decoding, sleep-aware windows, quality index, feature tables, and time-aware regression modeling
Validation-selected Stage 4 linear-family modeling with random plus expanding-temporal holdouts and a fixed future evaluation block
Reproducible Python project organization with CLI workflows, tests, and CI-backed iteration

Role Fit

Strongest fit: DS generalist, Data Analyst, Product/Analytics, and analytics-heavy data roles that value messy real-world data handling as much as final charts.
Signals: raw nested JSON ingestion, privacy-safe preprocessing, quality-aware analysis, explicit limitations, and reproducible Python packaging.
Framing: this repository emphasizes trustworthy analytics and interpretable findings over heavy production ML, which is intentional for the portfolio story.

If You Have 60 Seconds

Headline Findings

The dataset spans 677 daily rows from 2023-05-26 to 2026-05-18, with explicit quality-aware filtering before analysis.
About 91.1% of days are strict good, which makes the retained EDA slices analytically useful without hiding real-world coverage gaps.
Weekly segmentation reveals stable routines: Saturday is the most active day, Sunday the least active, and Tuesday shows the highest median awake stress.
Higher daytime stress is associated with worse next-night recovery, supporting a day-to-night carryover story rather than same-row coincidence only.
Sleep score follows an optimum-duration pattern: mid-range sleep durations score best, while both shorter and longer nights tend to underperform.
Stage 4 linear-family regression finds a modest next-sleep avgSleepStress signal: the validation-selected Huber model improves fixed-future-holdout MAE by 15.8% versus a baseline selected before future evaluation.

Featured Visuals

Coverage calendar: the project keeps visible the difference between real behavioral variation and plain no-wear / partial-coverage periods.

Sleep score behaves like an optimum-duration pattern rather than a monotonic one: mid-range nights score better than both shorter and longer ones.

The strongest directional relationship in the repo is a negative association between daytime stress and next-night recovery score.

Stage 4 linear diagnostics show a real but modest next-sleep stress signal, with residual drift and high-stress-night underprediction still visible.

Project Structure

Pipeline / ingestion: discover raw Garmin exports, flatten nested JSON, and build parquet checkpoints
Quality & privacy: sanitize sensitive fields, generate a data dictionary, label day readiness, and isolate suspicious artifacts
SQL layer (optional): build a DuckDB mart, run portfolio SQL packs, and mirror a compact schema in PostgreSQL
EDA notebooks: prepare coverage-aware slices, inspect time series, analyze distributions, and validate cross-metric relationships
Monitoring extension: decode minute-level FIT HR/stress records, build sleep-aware semantic windows, publish quality/feature tables, and evaluate a first linear-family next-sleep stress model
Case study & docs: recruiter-facing summary first, technical stage docs and notebooks second

Results Snapshot

rows: 677
date range: 2023-05-26 to 2026-05-18
strict labels: good 91.14%, partial 3.69%, bad 5.17%
loose labels: good 94.09%, partial 0.74%, bad 5.17%
corrupted stress-only days: 21 (3.10%)

Stage 3 Snapshot

Primary task: predict whether next-night sleepRecoveryScore < 75 with contiguous time-ordered splits.
Best interpretable model family: sparse logistic variants using compact daytime stress and heart-rate context.
Current selected test result: balanced accuracy ~0.68, ROC-AUC ~0.71, PR-AUC ~0.60, F1 ~0.62.
Statistical validation supports key directional findings (for example, daytime awake stress -> lower next-night recovery).

Stage 4 Monitoring Snapshot

Decoded 3,562 Garmin monitoring FIT files from 10,236 FIT files seen, with 0 decode errors skipped.
Built minute-level monitoring tables with 675,325 heart-rate rows and 889,323 stress rows.
Created 556 semantic sleep windows, a 589-row monitoring quality index, core/full feature tables, and a shared Stage 4 sleep-outcome modeling frame.
Used monitoring_full_wake_pre_sleep_plus_state, a 148-feature wake/pre-sleep set enriched with previous-sleep, prior-history, and current-vs-recent-baseline context.
Screened 52,812 linear-family configurations on 3 random plus 3 expanding-temporal holdouts, then reranked a representative 150-candidate shortlist on 10 random plus 8 temporal holdouts.
Validation-selected rank-1 model: Huber alpha=30 eps=1.05 | correlation_prune_0.9 | clip=z=4.
Fixed-future-holdout result: MAE 5.327, R2 0.264, versus preselected dummy_median MAE 6.326 (15.8% MAE improvement).
Recent-state deviations were useful: dev7_presleep_stress_mean and dev7_wake_stress_mean ranked among the strongest validation permutation features.
Keeps quality diagnostics separate from candidate features: monitoring_quality_index.parquet joins to feature tables on analysis_window_id.
The result is an exploratory single-subject baseline, not a production or medical predictor; residuals still show drift and high-stress-night underprediction.

Technical Appendix / Deep Dive

Start here for the portfolio narrative, then use the links below for technical depth:

Case study - recruiter-friendly project narrative and key findings.
Relationships notebook - directional D -> D+1 relationships and artifact checks.
Distributions notebook - metric distributions and segmented behavior patterns.
Overview - map of stages, outputs, and how to navigate the repository.
Pipeline - end-to-end flow from raw exports to analysis artifacts.
EDA guide - notebook purpose, structure, and interpretation scope.
Stage 0 - discovery, ingestion, and parquet build details.
Stage 1 - sanitize, data dictionary, and quality labeling.
Stage 2 - EDA workflow and promoted observational findings.
Stage 3 - predictive modeling and lightweight statistical validation.
Stage 4 - minute-level FIT monitoring extension, quality index, feature table contract, and linear modeling snapshot.
Monitoring EDA notebook - public Stage 4 analytical layer for minute-level FIT data.
Sleep outcome modeling frame notebook - Stage 4 target, eligibility, split, and feature-set audit.
Sleep stress linear models notebook - two-stage mixed-holdout linear-family regression for next-sleep average stress.
Stage 4 linear-model summary - validation-selected Huber result, dummy comparison, and caveats.
SQL layer - DuckDB mart, SQL query pack, and PostgreSQL showcase.
CLI - command reference, flags, outputs, and run order.
Privacy - guardrails for local-only data and safe publishing boundaries.

Quickstart

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
python -m pip install -e .

Primary CLI mode:

garmin-analytics discover
garmin-analytics ingest-uds
garmin-analytics ingest-sleep
garmin-analytics build-daily
garmin-analytics sanitize
garmin-analytics quality

Optional monitoring extension:

garmin-analytics ingest-monitoring-fit
garmin-analytics build-semantic-windows
garmin-analytics build-monitoring-features
garmin-analytics build-monitoring-datasets
garmin-analytics build-stage4-modeling-frame

Optional SQL layer:

garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

Open notebooks:

jupyter lab

Public Demo

If you do not have private Garmin exports, you can still exercise the public Stage 1 workflow on a tiny committed sample:

PYTHONPATH=src .venv/bin/python scripts/setup_public_demo.py
garmin-analytics data-dictionary --markdown-mode both
garmin-analytics quality
garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

Details: Public demo

SQL Showcase

DuckDB (primary local analytics mart):

garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

PostgreSQL (compact production-like mirror):

setup + runbook: examples/postgres_showcase/README.md
schema/views/queries: examples/postgres_showcase/
SQL skills demonstrated: CTEs, window functions, day-to-next-day (D -> D+1) alignment, and view-based analytics contracts.

Privacy

Raw Garmin exports stay local and must never be committed. Sanitized outputs are the default analysis and sharing boundary. See docs/privacy.md.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
notebooks		notebooks
reports		reports
scripts		scripts
sql/duckdb		sql/duckdb
src/garmin_analytics		src/garmin_analytics
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Garmin Wearable Analytics

Portfolio Highlights

What This Project Demonstrates

Role Fit

If You Have 60 Seconds

Headline Findings

Featured Visuals

Project Structure

Results Snapshot

Stage 3 Snapshot

Stage 4 Monitoring Snapshot

Technical Appendix / Deep Dive

Quickstart

Public Demo

SQL Showcase

Privacy

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Garmin Wearable Analytics

Portfolio Highlights

What This Project Demonstrates

Role Fit

If You Have 60 Seconds

Headline Findings

Featured Visuals

Project Structure

Results Snapshot

Stage 3 Snapshot

Stage 4 Monitoring Snapshot

Technical Appendix / Deep Dive

Quickstart

Public Demo

SQL Showcase

Privacy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages