Thank you for your work, I have a question.
Good metrics on the validation set during training don't necessarily guarantee good metrics during testing. When I save so many models (e.g., saving every 4000 iterations), how can I quickly identify which one will perform better during testing?
Previously, my approach was to select several models that performed relatively well on the validation set during training, test all of them, compare the results, and keep the best one. However, this is too tedious. Do you have any better methods?

Thank you for your work, I have a question.
Good metrics on the validation set during training don't necessarily guarantee good metrics during testing. When I save so many models (e.g., saving every 4000 iterations), how can I quickly identify which one will perform better during testing?
Previously, my approach was to select several models that performed relatively well on the validation set during training, test all of them, compare the results, and keep the best one. However, this is too tedious. Do you have any better methods?