An implementation of the Nyströmformer, using Nystrom method to approximate standard self attention
-
Updated
Aug 21, 2022 - Python
An implementation of the Nyströmformer, using Nystrom method to approximate standard self attention
Convert pretrained RoBerta models to various long-document transformer models
Tensor-core CUDA kernels for Nyström attention, linear-time forward and backward with exact autograd gradients. Faster than flash-attention at long sequence length.
Add a description, image, and links to the nystromformer topic page so that developers can more easily learn about it.
To associate your repository with the nystromformer topic, visit your repo's landing page and select "manage topics."