ViT pre-trained for 3D images

**Is your feature request related to a problem? Please describe.**
After reading the original paper on visual transformers (link below), they seem to excel when trained over large datasets.
That makes sense because they have to learn from scratch the structure of the image (what patches are neighbors of other patches, etc).

**Describe the solution you'd like**
I would like to find out if there is any pre-trained ViT for 3D images. And if yes, how can they be re-used in Monai.

**Describe alternatives you've considered**
I have explored the web with this same question, but without much luck.
This https://github.com/lucidrains/vit-pytorch/issues/125 suggests that a pretrained 2D ViT could be adapted to 3D. But of course, I guess that implementation would differ from Monai? Any hint on how to do this for reuse in Monai?

**Additional context**
Original paper on ViT, for reference: https://arxiv.org/abs/2010.11929

EDIT: pinging @ahatamiz as the implementor of swin-unetr (thanks!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViT pre-trained for 3D images #3947

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ViT pre-trained for 3D images #3947

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions