causal-bio-lab

Causal AI/ML for Computational Biology: Research into causal inference, causal discovery, and causal representation learning for drug discovery, target identification, and treatment effect estimation.

Overview

This project investigates causal machine learning approaches across computational biology, inspired by emerging platforms such as:

Causal Inference Platforms: biotx.ai (causal genome mapping), insitro (POSH platform)
Target Discovery: Ochre Bio (liver disease), Relation Therapeutics (Lab-in-the-Loop)
Perturbation Biology: GEARS, CPA (perturbation response prediction)
Federated Causal Inference: Owkin (FedECA for clinical trials)

Research Goals:

Learn state-of-the-art causal discovery algorithms (PC, GES, NOTEARS) for gene regulatory network inference
Implement treatment effect estimation methods (ATE, ITE, CATE) for biological interventions
Explore counterfactual reasoning for perturbation response prediction and drug combination effects
Investigate causal representation learning and its connection to generative models

See docs/INDUSTRY_LANDSCAPE.md for a comprehensive survey of companies and technologies in this space.

Project Structure

causal-bio-lab/
├── src/causalbiolab/
│   ├── data/           # Data loading, preprocessing, path management
│   │   ├── paths.py        # Standardized data path management
│   │   ├── sc_preprocess.py    # scRNA-seq/Perturb-seq preprocessing
│   │   └── bulk_preprocess.py  # Bulk RNA-seq preprocessing
│   ├── discovery/      # Causal graph learning (PC, NOTEARS, etc.)
│   ├── estimation/     # Treatment effect estimation (ATE, ITE, CATE)
│   ├── counterfactual/ # Counterfactual prediction, perturbation response
│   ├── representation/ # Causal representation learning, identifiable VAEs
│   └── utils/          # Config, reproducibility
├── data/               # Local data storage (gitignored)
│   ├── perturbation/   # Perturb-seq, CRISPR screens
│   ├── observational/  # GTEx, TCGA, drug response
│   └── synthetic/      # SERGIO, CausalBench benchmarks
├── tests/
├── examples/
├── configs/
├── docs/
└── environment.yml     # Conda environment specification

Installation

Using mamba + poetry (recommended)

# Create conda environment
mamba create -n causalbiolab python=3.11 -y
mamba activate causalbiolab

# Install poetry if not available
pip install poetry

# Install package in editable mode
poetry install

# Optional: install causal inference dependencies
poetry install --with causal

# Optional: install dev dependencies
poetry install --with dev

Quick start

# Verify installation
python -c "import causalbiolab; print(causalbiolab.__version__)"

# Run example (once implemented)
python examples/01_causal_discovery.py

Milestones

Milestone 0: Foundational Tutorials & Documentation ✅

Milestone 0.5: Structural Causal Models & Counterfactuals 🚧

Milestone A: Causal Discovery on Gene Expression

Implement constraint-based methods (PC algorithm)
Implement score-based methods (GES)
Implement continuous optimization (NOTEARS)
Evaluate on synthetic + real gene expression data
Benchmark against CausalBench

Milestone B: Treatment Effect Estimation

Integrate DoWhy for causal effect estimation
Implement propensity score methods (IPW, stabilized weights)
Implement doubly robust estimators (AIPW, TMLE)
Apply to drug response prediction
Heterogeneous treatment effects (CATE)

Milestone C: Counterfactual Perturbation Prediction

Implement CPA-style perturbation autoencoder
GEARS-style geometric deep learning for multigene perturbations
Out-of-distribution prediction for unseen combinations
Dose-response curve estimation

Milestone D: Causal Representation Learning

Identifiable VAE implementations
Disentangled representations for biological factors
Connection to generative models (link to genai-lab)
Causal structure in latent space

Key Concepts

Causal Discovery vs Causal Inference

Causal Discovery: Learning the causal graph structure from data
Causal Inference: Estimating causal effects given a (known or assumed) causal graph

Treatment Effects

ATE (Average Treatment Effect): Population-level effect
ITE (Individual Treatment Effect): Person-specific effect
CATE (Conditional ATE): Effect for subgroups
ATT/ATC: Effect on treated/control

Counterfactual Reasoning

"What would have happened if...?"
Essential for drug repurposing, combination therapy
Requires structural causal models (SCMs)

Tools & Libraries

Library	Purpose
DoWhy	End-to-end causal inference
EconML	Heterogeneous treatment effects
CausalML	Uplift modeling
gCastle	Causal discovery
NOTEARS	Continuous optimization for DAGs
CausalNex	Bayesian networks

References

Academic

Elements of Causal Inference — Peters, Janzing, Schölkopf
Causal Inference: What If — Hernán & Robins
GEARS — Multigene perturbation prediction
CPA — Compositional Perturbation Autoencoder
CausalBench — Gene network inference benchmark

Industry

biotx.ai — Causal modeling at scale
insitro — AI therapeutics on causal biology
Relation Therapeutics — Lab-in-the-Loop causal discovery

Related Projects

genai-lab — Generative AI for Computational Biology

Complementary Focus: While causal-bio-lab focuses on uncovering causal structures and estimating causal effects, genai-lab focuses on modeling data-generating processes through generative models (VAE, diffusion, transformers).

Synergy:

Generative AI learns rich representations of biological data and can simulate realistic perturbation responses
Causal ML provides the framework to ensure these models capture true causal mechanisms, not just correlations (via causal graphs, structural equations, and causal discovery)
Together: Causal generative models enable counterfactual reasoning, treatment effect prediction, and mechanistic understanding

Key Integration Points:

Causal graphs from discovery algorithms can constrain generative model architectures and latent space structure
Causal inference methods (do-calculus, structural equations, propensity scores) validate counterfactual predictions from generative models
Causal representation learning (Milestone D) bridges both projects—learning disentangled latent spaces that respect causal structure
Perturbation prediction benefits from both: generative models for realistic simulation + causal effect estimation for unbiased predictions

Example Workflow:

1. Use genai-lab to train a VAE on gene expression data
2. Use causal-bio-lab to discover causal relationships between genes
3. Integrate causal structure into the VAE latent space (causal VAE)
4. Generate counterfactual perturbation responses with causal guarantees

See genai-lab Stage 5 (Counterfactual & Causal) for planned integration work.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
notebooks		notebooks
scripts		scripts
src/causalbiolab		src/causalbiolab
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

causal-bio-lab

Overview

Project Structure

Installation

Using mamba + poetry (recommended)

Quick start

Milestones

Milestone 0: Foundational Tutorials & Documentation ✅

Milestone 0.5: Structural Causal Models & Counterfactuals 🚧

Milestone A: Causal Discovery on Gene Expression

Milestone B: Treatment Effect Estimation

Milestone C: Counterfactual Perturbation Prediction

Milestone D: Causal Representation Learning

Key Concepts

Causal Discovery vs Causal Inference

Treatment Effects

Counterfactual Reasoning

Tools & Libraries

References

Academic

Industry

Related Projects

genai-lab — Generative AI for Computational Biology

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

causal-bio-lab

Overview

Project Structure

Installation

Using mamba + poetry (recommended)

Quick start

Milestones

Milestone 0: Foundational Tutorials & Documentation ✅

Milestone 0.5: Structural Causal Models & Counterfactuals 🚧

Milestone A: Causal Discovery on Gene Expression

Milestone B: Treatment Effect Estimation

Milestone C: Counterfactual Perturbation Prediction

Milestone D: Causal Representation Learning

Key Concepts

Causal Discovery vs Causal Inference

Treatment Effects

Counterfactual Reasoning

Tools & Libraries

References

Academic

Industry

Related Projects

genai-lab — Generative AI for Computational Biology

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages