Hi, I am currently interested in doing sim-to-real transfer for the LeapCubeReorientation task in my own project. I read the paper and also checked the sim-to-real details for the Leap Hand task in the appendix(https://arxiv.org/abs/2502.08844), but I still have a few questions.
Policy selection across seeds:
In the training plots, the final reward seems to vary quite a lot depending on the random seed. For sim-to-real transfer, how do you choose which trained policy to deploy on the real system? Do you simply select the checkpoint with the highest simulation reward, or do you use some other criterion?
Simulation reward vs. real-world performance:
In your experience, does a higher reward in simulation reliably correlate with better real-world performance for the Leap Hand tasks? Or have you observed cases where a policy with better sim performance transfers worse in practice?
Asynchronous perception and control:
For real-world deployment, do you use ROS to handle the asynchronous timing between the cube’s SE(3) pose estimation and the policy action updates? More generally, how do you structure the perception-control pipeline when the estimator and controller run at different rates?
Thanks in advance — any clarification would be very helpful for people trying to reproduce or extend the sim-to-real setup.
Hi, I am currently interested in doing sim-to-real transfer for the LeapCubeReorientation task in my own project. I read the paper and also checked the sim-to-real details for the Leap Hand task in the appendix(https://arxiv.org/abs/2502.08844), but I still have a few questions.
Policy selection across seeds:
In the training plots, the final reward seems to vary quite a lot depending on the random seed. For sim-to-real transfer, how do you choose which trained policy to deploy on the real system? Do you simply select the checkpoint with the highest simulation reward, or do you use some other criterion?
Simulation reward vs. real-world performance:
In your experience, does a higher reward in simulation reliably correlate with better real-world performance for the Leap Hand tasks? Or have you observed cases where a policy with better sim performance transfers worse in practice?
Asynchronous perception and control:
For real-world deployment, do you use ROS to handle the asynchronous timing between the cube’s SE(3) pose estimation and the policy action updates? More generally, how do you structure the perception-control pipeline when the estimator and controller run at different rates?
Thanks in advance — any clarification would be very helpful for people trying to reproduce or extend the sim-to-real setup.