A command-line pipeline for GPS telemetry movement ecology analysis. Takes a CSV of animal fixes and a YAML config file, and produces a full suite of spatial, behavioural, and seasonal outputs — figures, tables, and GIS-ready layers — with no code editing required.
Tested on African lions, mountain caribou, and African elephants.
pip install -r requirements.txt
python run.py config.yamlTo validate your config and data without running the full analysis:
python run.py config.yaml --dry-runPython 3.10 or later.
pip install -r requirements.txt
# Optional — more robust clustering for multi-species datasets
pip install hdbscanA CSV file with at minimum:
| Column | Description |
|---|---|
| timestamp | Fix datetime (any parseable format, UTC or naive) |
| individual ID | Animal identifier |
| longitude | Decimal degrees |
| latitude | Decimal degrees |
Column names are mapped in config.yaml — the file is never modified.
All parameters live in config.yaml. The key sections:
data_path: data/lion.csv
timestamp_col: timestamp
id_col: individual-local-identifier
lon_col: location-long
lat_col: location-latinput_crs: EPSG:4326 # CRS of your raw GPS data (almost always WGS84)
metric_crs: EPSG:32734 # Projected CRS for distances and areas
# Find your UTM zone at epsg.ioresample_interval_minutes: 60Resamples each individual to one fix per interval before all analysis. Eliminates fix-rate bias when comparing datasets with different collar schedules. Set to 0 to disable.
dbscan_eps_method: knn # data-driven eps from point-cloud density
dbscan_knn_percentile: 90.0 # raise → fewer clusters, lower → more clusters
use_hdbscan: false # set true for variable-shape clusters (requires pip install hdbscan)The knn method derives eps from each individual's own nearest-neighbour distance distribution. It is scale-free and works across species without tuning. The alternative adaptive method (eps = median_step × k) is available for back-compatibility.
dataset_name: caribou_bc
output_dir: outputs/{dataset_name} # outputs go to outputs/caribou_bc/Run separate config files per species — outputs never overwrite each other.
Every qualified individual (above min_fixes_per_individual) gets its own set of figures. Population-level figures cover all individuals with appropriate adaptive scaling.
| Figure | Contents |
|---|---|
fig1_space_use_<id>.png |
GPS trajectory + KDE utilisation distribution + MCP 95% boundary |
fig2_kde_heatmap_<id>.png |
KDE density surface with 95% UD contour |
fig3_dbscan_<id>.png |
Spatial clusters (adaptive DBSCAN or HDBSCAN) |
fig4_seasonal_kde_<id>.png |
Wet-season vs dry-season KDE side by side |
| Figure | Contents |
|---|---|
fig5_behaviour_space.png |
GMM state space — hexbin density per behavioural state |
fig6_behaviour_vs_space.png |
Behaviour composition in spatial clusters vs noise |
fig7_behaviour_vs_water.png |
Distance to water by behavioural state (requires water_path) |
fig8_contraction.png |
Seasonal home range contraction — scatter + ranked bar chart |
fig9_mcp_comparison.png |
MCP 100% vs 95% per individual |
fig10_water_seasonal.png |
Water distance distributions, wet vs dry (requires water_path) |
fig11_individual_heatmap.png |
Per-individual metric heatmap (z-scored across population) |
All population figures scale automatically to the number of individuals — font sizes, figure height, label spacing, and bar dimensions adapt so labels never overlap whether you have 5 individuals or 260.
| File | Contents |
|---|---|
mcp_comparison.csv |
MCP 100% and 95% home range areas per individual |
individual_summary.csv |
Full metric table: KDE areas, behaviour proportions, cluster statistics |
clusters.geojson |
Convex hull polygon per cluster per individual — loads directly in QGIS or ArcGIS |
| File | Contents |
|---|---|
run.log |
Timestamped log of every stage: parameters, warnings, statistical results |
config_snapshot.yaml |
Exact copy of the config used for this run |
run_metadata.json |
Pipeline version, elapsed time, full config |
Home range — MCP: 100% minimum convex polygon and 95% trimmed MCP per individual and per season.
Home range — KDE: Fixed-bandwidth kernel density estimation (bw_method=0.3) on up to kde_max_pts fixes, evaluated on a kde_grid_size × kde_grid_size grid. Bandwidth is fixed rather than Scott's rule because Scott's oversmooths for large N, inflating KDE areas toward MCP size.
Spatial clustering — DBSCAN: Run per individual with a data-driven eps derived from the k-nearest-neighbour distance distribution (dbscan_eps_method: knn). This scales naturally to each individual's movement grain and spatial scale without assumptions about step length or fix rate. Optionally replaced by HDBSCAN for datasets with variable cluster shapes.
Behavioural states — GMM: Gaussian Mixture Model on log-speed and absolute turning angle. Number of components selected by BIC from gmm_k_list. State mapping is deterministic: transit = highest log-speed component; rest = lowest turning angle among the remainder; tortuous = highest turning angle. Works correctly for K=2 (transit + rest) and K=3 (adds tortuous) — all downstream figures adapt automatically.
Seasonal analysis: User-defined wet months. Wilcoxon signed-rank test on paired wet/dry KDE areas. Individuals below kde_min_fixes in a season receive NaN for that season's area and are excluded from the test.
Environmental analysis: Nearest-feature distance to water bodies via sjoin_nearest. Aggregated by season and behavioural state. Fully conditional — omit water_path from config to skip.
The pipeline logs automatic warnings for degenerate clustering results:
- All noise — individual has too few fixes relative to
min_sampleswithin the computedepsradius. Usually indicates a genuinely sparse tracking schedule. Raisemin_fixes_per_individualto exclude these individuals, or lowerdbscan_min_samples. - Single cluster — entire home range fits inside the
epsradius. Lowerdbscan_knn_percentileto 75–85 for small-range individuals. - >85% noise —
epsmay be too small. Raisedbscan_knn_percentileor switch touse_hdbscan: true.
Missing required columns — column name mappings must match CSV headers exactly (case-sensitive). The log prints available columns on failure.
KDE very slow — reduce kde_max_pts (default 5000) or kde_grid_size (default 200). Set n_jobs: -1 to parallelise per-individual computation.
GMM produces K=2 unexpectedly — BIC found no evidence for three components. Check log for BIC values. If needed, force K=3 by setting gmm_k_list: [3]. Note that forcing K when data do not support it increases classification entropy.
CRS issues — set input_crs explicitly in config.yaml and run --dry-run to verify the logged CRS values before committing to a full run.
Outputs overwriting between runs — use output_dir: outputs/{dataset_name} with a unique dataset_name per config file.
.
├── run.py
├── config.yaml ← copy and edit for each dataset
├── requirements.txt
├── data/
│ ├── fixes.csv
│ └── water.geojson (optional)
└── outputs/
├── fig1_space_use_<id>.png (one per individual)
├── fig2_kde_heatmap_<id>.png
├── fig3_dbscan_<id>.png
├── fig4_seasonal_kde_<id>.png
├── fig5_behaviour_space.png
├── ...
├── clusters.geojson
├── mcp_comparison.csv
├── individual_summary.csv
├── run.log
├── config_snapshot.yaml
└── run_metadata.json
MIT