fix(loader): detect LeRobot v3.0 datasets and guide users to bundled v3->v2 converter#677
Open
cagataycali wants to merge 1 commit into
Open
fix(loader): detect LeRobot v3.0 datasets and guide users to bundled v3->v2 converter#677cagataycali wants to merge 1 commit into
cagataycali wants to merge 1 commit into
Conversation
… converter
The LeRobotEpisodeLoader._load_metadata expects a v2.1 layout with a single meta/episodes.jsonl file. When users hand it a v3.0 dataset (e.g. lerobot/svla_so101_pickplace, BobShan/double_folding_towel_v3.0) the loader currently crashes with a bare FileNotFoundError on meta/episodes.jsonl, which is hard to diagnose because v3 stores per-chunk episode metadata under meta/episodes/chunk-XXX/file-YYY.parquet instead.
This patch detects the v3.0 layout (presence of meta/episodes/ directory or info.json codebase_version starting with 'v3') and raises a targeted FileNotFoundError that points the user at the converter already shipped in scripts/lerobot_conversion/convert_v3_to_v2.py. Behaviour for v2.1 datasets is unchanged.
Repro: huggingface-cli download lerobot/svla_so101_pickplace --repo-type dataset --local-dir /tmp/d, then load with LeRobotEpisodeLoader('/tmp/d', ...) -> previously bare FileNotFoundError, now a message with the conversion command.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
LeRobotEpisodeLoader._load_metadataassumes a LeRobot v2.1 layout (singlemeta/episodes.jsonl). When pointed at a v3.0 dataset (e.g.lerobot/svla_so101_pickplace,BobShan/double_folding_towel_v3.0) the loader currently dies with a bareFileNotFoundErroronmeta/episodes.jsonl, because v3 keeps per-chunk episode metadata undermeta/episodes/chunk-XXX/file-YYY.parquetinstead.This is a real footgun — the LeRobot team is steadily migrating canonical datasets to v3.0, so users following any GR00T-N1.5/N1.7 finetuning recipe with the latest dataset releases hit a 30-second crash with no hint at the cause.
How
Detect the v3.0 layout in two ways:
meta/episodes/directory exists, orinfo.json["codebase_version"]starts withv3.When detected, raise a targeted
FileNotFoundErrorthat points the user at the converter already shipped in this repo (scripts/lerobot_conversion/convert_v3_to_v2.py), including the exact command to run.Behaviour for v2.1 datasets is unchanged — the new code only fires when
meta/episodes.jsonlis missing.Diff
Repro of the original failure
Notes / future work
cc @ ## Checklist
python -c "import ast; ast.parse(...)")main(3df8b38)