Skip to content

pleiadian53/causal-bio-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

causal-bio-lab

Causal AI/ML for Computational Biology: Research into causal inference, causal discovery, and causal representation learning for drug discovery, target identification, and treatment effect estimation.

Overview

This project investigates causal machine learning approaches across computational biology, inspired by emerging platforms such as:

  • Causal Inference Platforms: biotx.ai (causal genome mapping), insitro (POSH platform)
  • Target Discovery: Ochre Bio (liver disease), Relation Therapeutics (Lab-in-the-Loop)
  • Perturbation Biology: GEARS, CPA (perturbation response prediction)
  • Federated Causal Inference: Owkin (FedECA for clinical trials)

Research Goals:

  1. Learn state-of-the-art causal discovery algorithms (PC, GES, NOTEARS) for gene regulatory network inference
  2. Implement treatment effect estimation methods (ATE, ITE, CATE) for biological interventions
  3. Explore counterfactual reasoning for perturbation response prediction and drug combination effects
  4. Investigate causal representation learning and its connection to generative models

See docs/INDUSTRY_LANDSCAPE.md for a comprehensive survey of companies and technologies in this space.

Project Structure

causal-bio-lab/
├── src/causalbiolab/
│   ├── data/           # Data loading, preprocessing, path management
│   │   ├── paths.py        # Standardized data path management
│   │   ├── sc_preprocess.py    # scRNA-seq/Perturb-seq preprocessing
│   │   └── bulk_preprocess.py  # Bulk RNA-seq preprocessing
│   ├── discovery/      # Causal graph learning (PC, NOTEARS, etc.)
│   ├── estimation/     # Treatment effect estimation (ATE, ITE, CATE)
│   ├── counterfactual/ # Counterfactual prediction, perturbation response
│   ├── representation/ # Causal representation learning, identifiable VAEs
│   └── utils/          # Config, reproducibility
├── data/               # Local data storage (gitignored)
│   ├── perturbation/   # Perturb-seq, CRISPR screens
│   ├── observational/  # GTEx, TCGA, drug response
│   └── synthetic/      # SERGIO, CausalBench benchmarks
├── tests/
├── examples/
├── configs/
├── docs/
└── environment.yml     # Conda environment specification

Installation

Using mamba + poetry (recommended)

# Create conda environment
mamba create -n causalbiolab python=3.11 -y
mamba activate causalbiolab

# Install poetry if not available
pip install poetry

# Install package in editable mode
poetry install

# Optional: install causal inference dependencies
poetry install --with causal

# Optional: install dev dependencies
poetry install --with dev

Quick start

# Verify installation
python -c "import causalbiolab; print(causalbiolab.__version__)"

# Run example (once implemented)
python examples/01_causal_discovery.py

Milestones

Milestone 0: Foundational Tutorials & Documentation ✅

  • Causal Inference Tutorials
    • Treatment effects and potential outcomes framework
    • Propensity score methods and IPW (inverse probability weighting)
    • Do-calculus tutorial document (comprehensive guide with examples)
    • Do-calculus interactive notebook (hands-on exercises and applications)
    • Identifying confounders and adjustment strategies
  • Simulation Framework
    • Confounding simulation utilities
    • Treatment effect estimation examples
    • Cell cycle, batch effect, and disease severity confounders
  • Notebooks
    • A/B testing fundamentals and multi-group comparisons
    • Causal graphs and d-separation
    • Sensitivity analysis methods

Milestone 0.5: Structural Causal Models & Counterfactuals 🚧

  • SCM Framework
    • Base SCM class with structural equations
    • Intervention utilities (do-operator implementation)
    • Counterfactual computation (abduction-action-prediction)
    • Linear SCM for efficient counterfactuals
  • Documentation
    • Comprehensive SCM tutorial covering three levels of causation
    • Association vs intervention vs counterfactual reasoning
    • Connection to potential outcomes and do-calculus
  • Examples & Notebooks
    • Interactive SCM notebook with hands-on exercises
    • Biological SCM examples (gene regulation, drug response)
    • Counterfactual fairness and model explanation examples
  • Integration
    • Connect SCMs to existing do-calculus tutorial
    • Show SCM implementation of IPW and propensity scores
    • Demonstrate mediation analysis with SCMs

Milestone A: Causal Discovery on Gene Expression

  • Implement constraint-based methods (PC algorithm)
  • Implement score-based methods (GES)
  • Implement continuous optimization (NOTEARS)
  • Evaluate on synthetic + real gene expression data
  • Benchmark against CausalBench

Milestone B: Treatment Effect Estimation

  • Integrate DoWhy for causal effect estimation
  • Implement propensity score methods (IPW, stabilized weights)
  • Implement doubly robust estimators (AIPW, TMLE)
  • Apply to drug response prediction
  • Heterogeneous treatment effects (CATE)

Milestone C: Counterfactual Perturbation Prediction

  • Implement CPA-style perturbation autoencoder
  • GEARS-style geometric deep learning for multigene perturbations
  • Out-of-distribution prediction for unseen combinations
  • Dose-response curve estimation

Milestone D: Causal Representation Learning

  • Identifiable VAE implementations
  • Disentangled representations for biological factors
  • Connection to generative models (link to genai-lab)
  • Causal structure in latent space

Key Concepts

Causal Discovery vs Causal Inference

  • Causal Discovery: Learning the causal graph structure from data
  • Causal Inference: Estimating causal effects given a (known or assumed) causal graph

Treatment Effects

  • ATE (Average Treatment Effect): Population-level effect
  • ITE (Individual Treatment Effect): Person-specific effect
  • CATE (Conditional ATE): Effect for subgroups
  • ATT/ATC: Effect on treated/control

Counterfactual Reasoning

  • "What would have happened if...?"
  • Essential for drug repurposing, combination therapy
  • Requires structural causal models (SCMs)

Tools & Libraries

Library Purpose
DoWhy End-to-end causal inference
EconML Heterogeneous treatment effects
CausalML Uplift modeling
gCastle Causal discovery
NOTEARS Continuous optimization for DAGs
CausalNex Bayesian networks

References

Academic

Industry

Related Projects

genai-lab — Generative AI for Computational Biology

Complementary Focus: While causal-bio-lab focuses on uncovering causal structures and estimating causal effects, genai-lab focuses on modeling data-generating processes through generative models (VAE, diffusion, transformers).

Synergy:

  • Generative AI learns rich representations of biological data and can simulate realistic perturbation responses
  • Causal ML provides the framework to ensure these models capture true causal mechanisms, not just correlations (via causal graphs, structural equations, and causal discovery)
  • Together: Causal generative models enable counterfactual reasoning, treatment effect prediction, and mechanistic understanding

Key Integration Points:

  1. Causal graphs from discovery algorithms can constrain generative model architectures and latent space structure
  2. Causal inference methods (do-calculus, structural equations, propensity scores) validate counterfactual predictions from generative models
  3. Causal representation learning (Milestone D) bridges both projects—learning disentangled latent spaces that respect causal structure
  4. Perturbation prediction benefits from both: generative models for realistic simulation + causal effect estimation for unbiased predictions

Example Workflow:

1. Use genai-lab to train a VAE on gene expression data
2. Use causal-bio-lab to discover causal relationships between genes
3. Integrate causal structure into the VAE latent space (causal VAE)
4. Generate counterfactual perturbation responses with causal guarantees

See genai-lab Stage 5 (Counterfactual & Causal) for planned integration work.

License

MIT

About

Causal ML for drug discovery: treatment effect estimation, causal graphs, propensity score methods, and perturbation response prediction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors