Skip to content

fix(loader): detect LeRobot v3.0 datasets and guide users to bundled v3->v2 converter#677

Open
cagataycali wants to merge 1 commit into
NVIDIA:mainfrom
cagataycali:lerobot-v3-detect-and-guide
Open

fix(loader): detect LeRobot v3.0 datasets and guide users to bundled v3->v2 converter#677
cagataycali wants to merge 1 commit into
NVIDIA:mainfrom
cagataycali:lerobot-v3-detect-and-guide

Conversation

@cagataycali
Copy link
Copy Markdown

What

LeRobotEpisodeLoader._load_metadata assumes a LeRobot v2.1 layout (single meta/episodes.jsonl). When pointed at a v3.0 dataset (e.g. lerobot/svla_so101_pickplace, BobShan/double_folding_towel_v3.0) the loader currently dies with a bare FileNotFoundError on meta/episodes.jsonl, because v3 keeps per-chunk episode metadata under meta/episodes/chunk-XXX/file-YYY.parquet instead.

This is a real footgun — the LeRobot team is steadily migrating canonical datasets to v3.0, so users following any GR00T-N1.5/N1.7 finetuning recipe with the latest dataset releases hit a 30-second crash with no hint at the cause.

How

Detect the v3.0 layout in two ways:

  1. meta/episodes/ directory exists, or
  2. info.json["codebase_version"] starts with v3.

When detected, raise a targeted FileNotFoundError that points the user at the converter already shipped in this repo (scripts/lerobot_conversion/convert_v3_to_v2.py), including the exact command to run.

Behaviour for v2.1 datasets is unchanged — the new code only fires when meta/episodes.jsonl is missing.

Diff

# gr00t/data/dataset/lerobot_episode_loader.py
episodes_path = meta_dir / LEROBOT_EPISODES_FILENAME
+ if not episodes_path.exists():
+     v3_episodes_dir = meta_dir / "episodes"
+     codebase_version = self.info_meta.get("codebase_version", "unknown")
+     if v3_episodes_dir.exists() or str(codebase_version).startswith("v3"):
+         raise FileNotFoundError(
+             f"... v3.0 layout detected, run convert_v3_to_v2.py ..."
+         )
+     raise FileNotFoundError(
+         f"{episodes_path} not found for dataset {self.dataset_path}. "
+         f"Expected a LeRobot v2.1 layout (meta/episodes.jsonl)."
+     )
with open(episodes_path, "r") as f:
    self.episodes_metadata = [json.loads(line) for line in f]

Repro of the original failure

hf download lerobot/svla_so101_pickplace --repo-type dataset --local-dir /tmp/svla
python -c "from gr00t.data.dataset.lerobot_episode_loader import LeRobotEpisodeLoader; \LeRobotEpisodeLoader('/tmp/svla', modality_configs={})"
# before: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/svla/meta/episodes.jsonl'
# after:  FileNotFoundError: ... v3.0 layout, which is not yet natively supported.
#         Convert it with the helper shipped in this repo:
#             cd scripts/lerobot_conversion
#             uv venv && source .venv/bin/activate
#             uv pip install -e .
#             python convert_v3_to_v2.py --repo-id <hf-org/dataset-name>

Notes / future work

  • This PR is intentionally minimal — it does not add native v3.0 support, just a clear pointer to the existing converter.
  • A follow-up could call the converter automatically on load, but I'd rather keep that opt-in given v3 -> v2 conversion rewrites the dataset on disk.
  • Tested locally on Linux aarch64 (Thor), Python 3.12.

cc @ ## Checklist

  • No public API change for valid v2.1 datasets
  • No new dependencies
  • Syntactically validated (python -c "import ast; ast.parse(...)")
  • Branch is rebased on main (3df8b38)

… converter

The LeRobotEpisodeLoader._load_metadata expects a v2.1 layout with a single meta/episodes.jsonl file. When users hand it a v3.0 dataset (e.g. lerobot/svla_so101_pickplace, BobShan/double_folding_towel_v3.0) the loader currently crashes with a bare FileNotFoundError on meta/episodes.jsonl, which is hard to diagnose because v3 stores per-chunk episode metadata under meta/episodes/chunk-XXX/file-YYY.parquet instead.

This patch detects the v3.0 layout (presence of meta/episodes/ directory or info.json codebase_version starting with 'v3') and raises a targeted FileNotFoundError that points the user at the converter already shipped in scripts/lerobot_conversion/convert_v3_to_v2.py. Behaviour for v2.1 datasets is unchanged.

Repro: huggingface-cli download lerobot/svla_so101_pickplace --repo-type dataset --local-dir /tmp/d, then load with LeRobotEpisodeLoader('/tmp/d', ...) -> previously bare FileNotFoundError, now a message with the conversion command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants