Skip to content

Latest commit

 

History

History
68 lines (51 loc) · 2.83 KB

File metadata and controls

68 lines (51 loc) · 2.83 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Red Dwarf is a DIMensional REDuction library for reproducing and experimenting with Polis-like data pipelines. It implements dimensional reduction algorithms (PCA, PaCMAP, LocalMAP) and clustering (K-Means, HDBSCAN) to analyze participatory democracy conversations.

Commands

make install-dev     # Install with dev dependencies (uses uv)
make test            # Run unit tests (pytest)
make test-nb         # Test Jupyter notebooks
make test-cov        # Tests with coverage report
make test-all        # Unit + notebook + docs tests
make docs-serve      # Serve MkDocs locally (localhost:8000)
make clear-test-cache  # Clear HTTP request cache from tests

Run a single test:

pytest tests/test_data_loader.py::test_function_name -v

Architecture

The pipeline flow:

Data Loading (Loader) → Vote Matrix → Filtering → Dimensional Reduction → Clustering → Statistics

Key modules:

  • reddwarf/data_loader.py - Loader class fetches from Polis APIs or local files
  • reddwarf/implementations/base.py - run_pipeline() core algorithm, returns PolisClusteringResult
  • reddwarf/implementations/polis.py - run_clustering() high-level wrapper (PCA + KMeans defaults)
  • reddwarf/utils/matrix.py - Vote matrix generation and filtering
  • reddwarf/utils/reducer/ - Pluggable dimensional reduction (PCA, PaCMAP, LocalMAP)
  • reddwarf/utils/clusterer/ - Pluggable clustering (KMeans, HDBSCAN)
  • reddwarf/utils/stats.py - Comment statistics and representative statements
  • reddwarf/sklearn/ - Custom scikit-learn estimators (PolisKMeans, BestPolisKMeans, SparsityAwareScaler)
  • reddwarf/types/polis.py - TypedDicts for Polis data structures

Code Patterns

  • Type hints: Extensive use of TypeAlias, TypedDict, TYPE_CHECKING
  • Docstrings: NumPy/Google style, auto-generated by mkdocstrings
  • Optional deps: Use try_import() from reddwarf/exceptions.py for optional dependencies
  • Data models: Pydantic for Vote/Statement with field alias mapping for multiple data sources
  • HTTP: CachedLimiterSession for rate-limited, cached API calls

Testing

Tests use real Polis API data fixtures in tests/fixtures/. The test suite includes:

  • Unit tests in tests/
  • Notebook execution tests via nbmake
  • HTTP responses are cached to avoid hitting APIs during tests

Optional Dependency Groups

  • alt-algos: PaCMAP, HDBSCAN (alternative algorithms)
  • plots: matplotlib, seaborn, concave-hull (visualization)
  • dev: pytest, mkdocs, nbmake (development)
  • all: everything

Git Conventions

  • When working on a branch that references an issue (e.g., 116-fix-bestkmeans), include Closes #116 in the commit message or PR description to auto-close the issue when merged