A production-ready solution accelerator for processing GPS mobility data using Wherobots and Apache Sedona. Demonstrates a full medallion architecture (Bronze / Silver / Gold) with advanced 3D/4D geometry functions, trajectory construction, map matching, and multi-view analytics output.
Raw .plt files ──► [prepare_geolife.py] ──► CSV on S3
│
┌─────────────────────────┘
▼
┌───────────┐
│ 01_Bronze │ Ingest raw GPS → 2D point geometries → GeoParquet
└─────┬─────┘
▼
┌───────────┐
│ 02_Silver │ Clean → 4D XYZM points → Trajectories → Map Match
└─────┬─────┘
▼
┌───────────┐
│ 03_Gold │ H3 heatmaps │ Trajectory explorer │ Clustering
└───────────┘
GeoLife GPS Trajectories (Microsoft Research)
- 17,621 trajectories from 182 users in Beijing (2007–2012)
- Raw GPS points with latitude, longitude, altitude, and timestamps
- Ideal for 3D/4D geometry processing (XYZM)
- Download instructions
| Feature | Sedona Functions Used |
|---|---|
| 4D Point Construction | ST_MakePoint(x, y, z, m) with elevation + timestamp |
| Trajectory Building | ST_MakeLine(), ST_IsValidTrajectory() |
| 3D Analysis | ST_Z(), ST_M(), ST_ZMin/Max(), ST_3DDistance() |
| Elevation Profiles | ST_LineInterpolatePoint(), ST_Z() |
| Spatial Indexing | ST_H3CellIDs(), ST_H3ToGeom(), ST_GeoHash() |
| Map Matching | wherobots.matcher.load_osm(), matcher.match() |
| Stop Detection | dbscan() clustering on low-speed points |
| Visualization | SedonaKepler.create_map() interactive maps |
- Loads raw GeoLife CSV from S3
- Creates 2D point geometries
- Data quality profiling (null checks, altitude distribution, spatial extent)
- Writes raw GeoParquet
- Cleans invalid records, converts altitude ft → meters
- Constructs 4D XYZM geometries with elevation and epoch timestamp
- Segments trips by time gaps, builds trajectory LineStrings
- Calculates speed, elevation gain/loss, bearing
- H3 and GeoHash spatial indexing
- Map matches GPS traces to OSM road network via WherobotsAI
- Analytical View — H3 hexbin heatmaps, temporal patterns, trip statistics
- Exploratory View — Individual trajectories, elevation profiles, matched vs raw routes
- Deep Dive View — DBSCAN stop clusters, hotspots, anomaly detection, road segment analysis
- Wherobots Cloud account
- Python 3.9+ (for the data preparation script)
-
Download the GeoLife dataset — see data/README.md
-
Prepare the data (run locally):
pip install pandas python scripts/prepare_geolife.py --input data/Geolife_Trajectories_1.3/Data --output data/geolife_combined.csv
-
Upload to S3:
aws s3 cp data/geolife_combined.csv s3://<YOUR_BUCKET>/mobility/raw/geolife_combined.csv
-
Run notebooks in Wherobots Cloud in order:
01_bronze_ingestion.ipynb02_silver_transformation.ipynb03_gold_analytics.ipynb
-
Configure S3 paths — update the
S3_BASEvariable in each notebook to point to your bucket.
Download a Beijing OSM extract and upload to S3:
# Download Beijing OSM data from Geofabrik
wget https://download.geofabrik.de/asia/china-latest.osm.pbf
# Or use a smaller Beijing-specific extract
aws s3 cp osm_beijing.xml s3://<YOUR_BUCKET>/mobility/data/osm_beijing.xmlAll gold-layer outputs are written as GeoParquet files, compatible with:
- Kepler.gl — drag-and-drop GeoParquet support
- QGIS — native GeoParquet reader
- DuckDB Spatial — query GeoParquet with SQL
- Felt — upload GeoParquet directly
- Foursquare Studio — cloud visualization
This solution accelerator is provided as-is for educational and demonstration purposes. The GeoLife dataset is provided by Microsoft Research under their terms of use.