Problem of training and inference on the COCO dataset

I tried to perform training and inference on the COCO dataset.

**But the trained model cannot generate correct images based on the text.**

- A sample:

caption : A woman is standing outside with her head on a bat.
![Image](https://github.com/user-attachments/assets/6f5725c2-0350-44b8-95d9-7a598fa86741)

- LR and Loss:

![Image](https://github.com/user-attachments/assets/720699f0-a557-4e79-8950-5ba3b1adc91f)

What should I do next? 

I would appreciate your advice very much！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of training and inference on the COCO dataset #41

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem of training and inference on the COCO dataset #41

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions