Skip to content

Apply the same augmentation chain to all camera images#970

Open
0xadvait wants to merge 1 commit into
Physical-Intelligence:mainfrom
0xadvait:fix/consistent-image-augmentation
Open

Apply the same augmentation chain to all camera images#970
0xadvait wants to merge 1 commit into
Physical-Intelligence:mainfrom
0xadvait:fix/consistent-image-augmentation

Conversation

@0xadvait

Copy link
Copy Markdown

Problem

preprocess_observation passes the same rng for every camera, which suggests augmentation parameters are meant to be consistent across cameras within a frame. They are not: augmax.Chain splits its key once per sub-transform, and the base camera chain has 4 transforms while wrist chains have 1, so ColorJitter draws different subkeys for base vs wrist. The same physical frame gets visibly different hue/brightness/contrast on the base camera vs the wrist cameras (#859 has a visual repro).

Why this divergence matters

The pi0.5 paper (arXiv 2504.16054, Appendix E) describes the training recipe as applying the full chain to every input image:

We apply image augmentation (random crop, resizing, rotation, and color jittering) to all input images using the following hyper-parameters and in this order

followed by exactly the RandomCrop(0.95) -> Resize -> Rotate(+-5) -> ColorJitter(0.3, 0.4, 0.5) chain this file uses. The wrist special case deviates from the recipe used to train the released checkpoints, and as a side effect breaks cross-camera color consistency through the Chain key-splitting described above.

Fix

Apply the full augmentation chain to every camera image. With identical chains and the shared per-frame rng, all cameras receive identical augmentation parameters for a given frame, restoring cross-camera color consistency and matching the published recipe.

If the wrist exclusion from geometric augmentation was intentional, the minimal alternative is to keep the wrist chain geometric-free but give ColorJitter a dedicated key shared across cameras. Happy to switch this PR to that variant.

Test

Adds test_preprocess_observation_train_augmentations_consistent_across_cameras: identical images on all three cameras, fixed key, asserts all augmented outputs are bitwise identical across cameras, actually differ from the input, stay within [-1, 1], and that train=False passes images through unchanged. Runs on CPU.

Fixes #859

augmax.Chain splits its rng once per sub-transform, so with 4 transforms
on the base camera and 1 on wrist cameras, ColorJitter drew different
parameters for base vs wrist views of the same frame even though the
same rng is passed for every camera. The pi0.5 paper (Appendix E)
applies the full crop/resize/rotate/jitter chain to all input images,
so this change applies the full chain to every camera, which also makes
augmentation parameters consistent across cameras within a frame. Adds
a regression test for cross-camera consistency.

Fixes Physical-Intelligence#859
@0xadvait 0xadvait requested a review from kvablack as a code owner June 12, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible bug in data augmentation pipeline

1 participant