Skip to content

ViT pre-trained for 3D images #3947

@phcerdan

Description

@phcerdan

Is your feature request related to a problem? Please describe.
After reading the original paper on visual transformers (link below), they seem to excel when trained over large datasets.
That makes sense because they have to learn from scratch the structure of the image (what patches are neighbors of other patches, etc).

Describe the solution you'd like
I would like to find out if there is any pre-trained ViT for 3D images. And if yes, how can they be re-used in Monai.

Describe alternatives you've considered
I have explored the web with this same question, but without much luck.
This lucidrains/vit-pytorch#125 suggests that a pretrained 2D ViT could be adapted to 3D. But of course, I guess that implementation would differ from Monai? Any hint on how to do this for reuse in Monai?

Additional context
Original paper on ViT, for reference: https://arxiv.org/abs/2010.11929

EDIT: pinging @ahatamiz as the implementor of swin-unetr (thanks!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions