I tried to perform training and inference on the COCO dataset.
But the trained model cannot generate correct images based on the text.
caption : A woman is standing outside with her head on a bat.


What should I do next?
I would appreciate your advice very much!
I tried to perform training and inference on the COCO dataset.
But the trained model cannot generate correct images based on the text.
caption : A woman is standing outside with her head on a bat.

What should I do next?
I would appreciate your advice very much!