GitHub - Genentech/Orion

Website | Paper | Data | Setup

About

Biomedical imaging produces rich visual data, but analysis is bottlenecked by labor-intensive workflows that depend on expert use of multiple GUI-driven software packages and on integrating knowledge from many sources. Orion is an autonomous computer-using agent for biomedical image analysis that combines large language models with terminal execution, GUI control, and adaptive multi-step reasoning inside a shared computing environment. A single Orion agent can inspect visual data, drive standard scientific software (CellProfiler, QuPath, napari, …), mine web resources, and run end-to-end analysis and interpretation workflows without bespoke setup per tool.

Under the hood, Orion drives a real Ubuntu desktop running — clicking through GUIs, running shell commands, browsing the web, and reading multi-channel microscopy and pathology images — so one agent can take on biology lab tasks end-to-end instead of being limited to text or single-tool calls.

Repository layout

Orion/
├── sweagent/                 # Agent runtime (built on SWE-Agent + custom for Orion)
│   └── run/                  # Running entry points: run_orion.py + 5 run_orion_<task>.py
│   └── agent/                # orionagents.py
│   └── deployment/           # swe-rex deployment for vmware & docker ...
│   └── environment/          # swe-rex environment for vmware & docker ...
│   └── operator/             # OS support for GUI
├── config/                   # Per-task YAML agent configs (templates, step caps, memory toggles)
├── tools/                    # Agent-callable tools (cell_image_tools, cellprofiler, jump, web_browser, napari_viewer_tools, …)
├── subagents/                # Agent-calllable GUI sub-agents prompts / configs (figure interpreter, web browser, image viewer, …)
├── scripts/                  # Bash launchers — set env vars and call `python -m sweagent.run.run_orion_<task>`
├── baselines/                # Non-Orion comparison runners (microscopy_baselines, deepresearch_benchmarks, JUMP_discovery)
├── communication_test/       # `vmrun` smoke test for the host ↔ VM swerex pipeline
├── workflow_data/            # Persisted per-task workflow memory
├── setup.md                  # Full installation guide (host conda env + VMware Ubuntu guest)
├── pyproject.toml            # Project metadata + dependencies (single source of truth)
├── requirements.txt          # Flat dependency list (kept as a pinning fallback)
└── LICENSE                   # MIT

Datasets

The supported tasks are backed by five datasets. Orion-hosted datasets live under the orion-agent Hugging Face organization; external benchmarks keep their original locations.

Microscopy-Dataset — orion-agent/Microscopy-Dataset. CellProfiler-Training and -Testing problems (segmentation / counting) for the microscopy image processing task.
Pathology-Dataset — orion-agent/Pathology-Dataset. Whole-slide images + QuPath workflows for digital pathology analyzing (Tutorial / Segmentation / tp53_index splits).
JUMPDiscovery — orion-agent/JUMPDiscovery. JUMP Cell Painting morphological profiles for the discovery task. All figures and reports generated by Orion are available at orion-agent/JUMPDiscovery_results.
LAB-Bench & MicroVQA — data is directly downloaded from huggingface during running.

The guest Ubuntu image (with CellProfiler / QuPath / napari pre-installed) is setup as in VM_SETUP.md. A pre-installed version for arm64 machines is available at orion-agent/orion-vm-arm.

Each dataset contains subfolders, each one corresponding to an independent task. ProblemStatement.mdin tasks subfolders is a set of given instruction to Orion agent. Test suites and ground truth labels are provided for each task (except discovery tasks which require manual evaluation).

JUMPDiscovery and Pathology-Dataset both contain large files which would be time-costly to download and move from local to guest machines. Therefore, we provide download scripts that could be directly given to the agents to download on a need basis.

Quick start

Full instructions live in setup.md, with VM-specific details in VM_SETUP.md. The short version, once you have VMware Fusion / Workstation (vmrun on PATH) and the OSWorld Ubuntu guest image:

# 1. Install Orion on the host (pick ONE of A / B)
git clone <orion-repo>
cd Orion

pip install -e .

# 2. Configure API keys + VM paths (see setup.md § "Running the agent")
cp .env.example .env
$EDITOR .env

# 3. Smoke-test the host ↔ VM channel (one-shot)
python communication_test/test_vmware_deployment.py

# 4. Download a dataset (example: JUMPDiscovery into ./JUMPDiscovery)
hf download orion-agent/JUMPDiscovery --repo-type dataset --local-dir ./JUMPDiscovery

# 5. Run a task
bash scripts/run_orion_jump_discovery.sh       # or any of the 4 launchers

Equivalent CLI forms after pip install -e .:

orion jump-discovery                                   # via the `orion` console script
python -m sweagent jump-discovery                      # via the package dispatcher
python -m sweagent.run.run_orion_jump_discovery        # direct module call

All three resolve to the same run_orion_jump_discovery.main() entry point. Task settings (resume mode, output directory, benchmark split, etc.) are read from environment variables exported by the launcher scripts — see Supported tasks for the full list of launchers and the configs each one uses.

How a run looks

The launcher script sources .env, exports task-specific env vars, and invokes the matching run_orion_<task> module.
sweagent.run.run_orion.build_deployment(...) boots the VMware guest via vmrun, starts swerex-remote on port 4000, and opens a session.
reset_desktop_for_problem(...) pushes the host's tools/napari_viewer_tools/napari_server/ onto the guest at /home/user/server/napari_server, launches the napari helper service, and resets the DesktopEnv.
The agent loads its YAML config from config/, reads the problem statement from inside the VM, and proceeds turn-by-turn: think → act → observe, with screenshots, accessibility trees, and tool calls routed through swerex-remote (port 4000) and the DesktopEnv HTTP server (port 5000) running inside the guest.
Trajectories, intermediate files, and the final answer are written under RESUME_OUTPUT_DIR (host path you set in the launcher).

While the run is in progress you can watch it happen in real time on the Ubuntu machine — the cursor moves, applications open, and files appear / update under the working directory as the agent acts.

Configuration

Agent prompts & limits — config/ (README). One YAML per task; tune max_trajectory_length, system_template, instance_template, and the workflow-memory toggles.
Launcher env vars — scripts/run_orion_*.sh. Each script documents its inputs (mode, benchmark split, resume path, problem source, …) inline.
Host .env — collects API keys (ANTHROPIC_API_KEY, AZURE_OPENAI_API_KEY_EUS2, GEMINI_API_KEY, HF_API_TOKEN, GITHUB_API_KEY) and VM/host paths (HOST_VM_FILE, VM_USER, VM_PASSWORD, VM_WORKING_DIR, ENV_NAME, …). See setup.md § "Running the agent" for the full list.

Supported tasks

Four task families ship out of the box. Each pairs a bash launcher under scripts/ with one or more YAML agent configs under config/:

Microscopy image processing — scripts/run_orion_cellprofiler.sh (vmware_cell.yaml, vmware_cell_test.yaml). Build, debug, and run CellProfiler pipelines on segmentation / counting problems.
Digital pathology analyzing — scripts/run_orion_qupath.sh (vmware_learn_qupath.yaml, vmware_qupath_metastasis.yaml, vmware_qupath_tma.yaml). Learn from QuPath tutorials, then run pathology segmentation on lymph-node / TMA / Ki67 slides.
Deep research — covers both literature/database QA (LabBench DbQA / LitQA2 / FigQA via scripts/run_orion_labbench.sh with vmware_labbench.yaml) and multi-question visual QA on microscopy images (MicroVQA via scripts/run_orion_microvqa.sh with vmware_microvqa.yaml). The agent searches the web, reads papers/figures, and answers multiple-choice biomedical questions.
Discovery — scripts/run_orion_jump_discovery.sh (vmware_jumpdiscovery.yaml). Mine the JUMP Cell Painting morphological-profile dataset for novel perturbation–phenotype hypotheses (up to 500 agent steps per run).

Baselines

baselines/ contains comparison runners (LLM-only, search-augmented, and agent-style) for each task family. They have their own .env and dependency layer:

$EDITOR baselines/.env
python baselines/deepresearch_benchmarks/run_biomni_benchmark.py --benchmarks MicroVQA

See baselines/README.md for the per-folder layout (microscopy_baselines/, deepresearch_benchmarks/, JUMP_discovery/) and the env-var reference.

Acknowledgements

Orion builds on the open-source SWE-agent agent-computer-interface and SWE-rex remote runtime by Yang, Jimenez, Wettig, Lieret, Yao, Narasimhan, and Press; the OSWorld team for both the Ubuntu guest image and the DesktopEnv code we build on; and on community tools and benchmarks including CellProfiler, QuPath, napari, the JUMP Cell Painting consortium dataset, LAB-Bench, and MicroVQA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Repository layout

Datasets

Quick start

How a run looks

Configuration

Supported tasks

Baselines

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
baselines		baselines
communication_test		communication_test
config		config
scripts		scripts
subagents		subagents
sweagent		sweagent
tools		tools
workflow_data		workflow_data
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VM_SETUP.md		VM_SETUP.md
mlc_config.json		mlc_config.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.md		setup.md

Folders and files

Latest commit

History

Repository files navigation

About

Repository layout

Datasets

Quick start

How a run looks

Configuration

Supported tasks

Baselines

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages