CUBE Standard

Note

CUBE is in active development (alpha). Interfaces may change. We welcome early adopters and contributors who want to shape the standard, not just use it. See our Roadmap and Contributing Guide.

Have a benchmark to contribute? Fill out this short form — no commitment required. Want to go deeper? Apply to join the core team.

This repo contains the code and documentation for the AI Alliance: CUBE Standard project, which standardizes benchmark wrapping so the community can wrap otherwise-incompatible benchmarks uniformly and use them everywhere.

CUBE Standard defines the protocol — the Tool, Task, Benchmark, Observation, and Action interfaces that any benchmark must implement. cube-harness is the evaluation runtime that runs agents against CUBE-compatible benchmarks.

Paper: arXiv:2603.15798

Principal developer: ServiceNow AI Research.

Components

CUBE Standard is organized into three layers:

Layer	Package	Description
Core	`cube-standard` (this repo)	interfaces: `Tool`, `Task`, `Benchmark`, `Observation`, `Action`
Resources	`cube-resources/`	Optional shared infrastructure (browser sessions, VM backends)
Tools	`cube-tools/`	Optional action executors (browser tools, computer tools)

Resources are pieces of shared infrastructure — e.g. a running browser instance or a VM — that are launched once and shared across tasks. Tools execute agent actions against that infrastructure.

Benchmark ──► TaskConfig ──► Task ──► Tool ──► Resource ──► Environment
                                ▲               (cube-tools)  (cube-resources)
                         cube-standard

See cube-resources/README.md and cube-tools/README.md for available implementations and usage examples.

Installation

Requires Python 3.12+. Install with uv:

uv add cube-standard

Or with pip:

pip install cube-standard

To include optional container backends:

# Docker support
uv add "cube-standard[docker]"

# Modal support
uv add "cube-standard[modal]"

# Daytona support
uv add "cube-standard[daytona]"

For development (includes test and lint tools):

git clone https://github.com/The-AI-Alliance/cube-standard
cd cube-standard
uv sync --extra dev

CLI commands

Command	What it does
`cube init [NAME]`	Scaffolds a new benchmark package from the built-in template
`cube list`	Lists all installed benchmarks registered under `cube.benchmarks` entry points
`cube test NAME`	Runs the debug suite and asserts `reward == 1.0` on every debug task

For benchmark contributors

Three ways to start:

Guided — run /new-cube in Claude Code with this repo checked out. The skill interviews you, scaffolds the package, fills TODOs, and validates end-to-end.
Copy — cp -r examples/counter-cube my-bench && cd my-bench && uv sync, then edit the placeholders.
Scaffold — cube init my-bench && cd my-bench && uv sync, then work through the TODO markers.

Validate with cube test my-bench (every debug task must reach reward == 1.0), self-audit with /review-cube ./my-bench, and submit with cube registry add --submit.

See the Authoring a CUBE guide for the full walkthrough. CONTRIBUTING.md covers framework invariants and the RFC process.

Note

cube test discovers benchmarks via the cube.benchmarks entry point group. Install the package (uv sync or pip install -e .) before running.

Getting Involved

All contributions are welcome — open an issue, submit a PR, or wrap a new benchmark. See CONTRIBUTING.md for the development guide and RFC process.

Want to contribute a benchmark? Whether you're an original author or just a frequent user, fill out this short form to let us know. No commitment required — we'll follow up based on your interest and the benchmark's fit.

Want deeper involvement? Join the core team, shape the roadmap, and get credit for what you build. Apply here.

For general AI Alliance contribution guidelines, see the community repo and Code of Conduct.

All code contributions are licensed under the Apache 2.0 LICENSE (which is also in this repo, LICENSE.Apache-2.0).

All documentation contributions are licensed under the Creative Commons Attribution 4.0 International (which is also in this repo, LICENSE.CC-BY-4.0).

All data contributions are licensed under the Community Data License Agreement - Permissive - Version 2.0 (which is also in this repo, LICENSE.CDLA-2.0).

We use the "Developer Certificate of Origin" (DCO).

Warning

Before you make any git commits with changes, understand what's required for DCO.

See the Alliance contributing guide section on DCO for details. In practical terms, supporting this requirement means you must use the -s flag with your git commit commands.

Pre-commit hooks (recommended)

This repo uses the pre-commit framework to run fast checks locally before you commit, including enforcing the DCO Signed-off-by line.

Install the hooks (you only need to do this once per clone):

pre-commit install --hook-type pre-commit --hook-type commit-msg

Run the checks on all files (optional, useful the first time):

pre-commit run --all-files

When committing, include your sign-off:

git commit -s -m "your message"

Name		Name	Last commit message	Last commit date
Latest commit History 759 Commits
.claude		.claude
.github		.github
.vscode		.vscode
cube-resources		cube-resources
cube-tools		cube-tools
design		design
docs		docs
examples		examples
openspec		openspec
scripts		scripts
src/cube		src/cube
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPRECATED.md		DEPRECATED.md
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
architecture-diagram.md		architecture-diagram.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUBE Standard

Components

Installation

CLI commands

For benchmark contributors

Getting Involved

We use the "Developer Certificate of Origin" (DCO).

Pre-commit hooks (recommended)

About

Uh oh!

Releases 19

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUBE Standard

Components

Installation

CLI commands

For benchmark contributors

Getting Involved

We use the "Developer Certificate of Origin" (DCO).

Pre-commit hooks (recommended)

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages