This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Red Dwarf is a DIMensional REDuction library for reproducing and experimenting with Polis-like data pipelines. It implements dimensional reduction algorithms (PCA, PaCMAP, LocalMAP) and clustering (K-Means, HDBSCAN) to analyze participatory democracy conversations.
make install-dev # Install with dev dependencies (uses uv)
make test # Run unit tests (pytest)
make test-nb # Test Jupyter notebooks
make test-cov # Tests with coverage report
make test-all # Unit + notebook + docs tests
make docs-serve # Serve MkDocs locally (localhost:8000)
make clear-test-cache # Clear HTTP request cache from testsRun a single test:
pytest tests/test_data_loader.py::test_function_name -vThe pipeline flow:
Data Loading (Loader) → Vote Matrix → Filtering → Dimensional Reduction → Clustering → Statistics
Key modules:
reddwarf/data_loader.py-Loaderclass fetches from Polis APIs or local filesreddwarf/implementations/base.py-run_pipeline()core algorithm, returnsPolisClusteringResultreddwarf/implementations/polis.py-run_clustering()high-level wrapper (PCA + KMeans defaults)reddwarf/utils/matrix.py- Vote matrix generation and filteringreddwarf/utils/reducer/- Pluggable dimensional reduction (PCA, PaCMAP, LocalMAP)reddwarf/utils/clusterer/- Pluggable clustering (KMeans, HDBSCAN)reddwarf/utils/stats.py- Comment statistics and representative statementsreddwarf/sklearn/- Custom scikit-learn estimators (PolisKMeans,BestPolisKMeans,SparsityAwareScaler)reddwarf/types/polis.py- TypedDicts for Polis data structures
- Type hints: Extensive use of TypeAlias, TypedDict, TYPE_CHECKING
- Docstrings: NumPy/Google style, auto-generated by mkdocstrings
- Optional deps: Use
try_import()fromreddwarf/exceptions.pyfor optional dependencies - Data models: Pydantic for Vote/Statement with field alias mapping for multiple data sources
- HTTP: CachedLimiterSession for rate-limited, cached API calls
Tests use real Polis API data fixtures in tests/fixtures/. The test suite includes:
- Unit tests in
tests/ - Notebook execution tests via nbmake
- HTTP responses are cached to avoid hitting APIs during tests
alt-algos: PaCMAP, HDBSCAN (alternative algorithms)plots: matplotlib, seaborn, concave-hull (visualization)dev: pytest, mkdocs, nbmake (development)all: everything
- When working on a branch that references an issue (e.g.,
116-fix-bestkmeans), includeCloses #116in the commit message or PR description to auto-close the issue when merged