Replies: 2 comments
-
|
The warning comes from the recently integrated features for ensuring weight tying. For a model with tied embeddings like yours not setting Regarding your question: Untying has the negative effect of doubling the needed memory for the embedding matrix. In models like Gemma which feature a very large embedding matrix this can already be the difference between running out of memory during training or not. As long as you do fine-tuning (LoRA, MiSS, trainable tokens, ...) you can emulate the behavior of untying the embeddings by setting |
Beta Was this translation helpful? Give feedback.
-
|
Option B: Untie embeddings (set |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
When fine-tuning with LoRA and adding special tokens (e.g. a pad token via
tokenizer.add_special_tokens+model.resize_token_embeddings), I encounter the following warning:This raises a design question I'd like clarification on.
Question
When adding special tokens (particularly a pad token) and applying LoRA, which approach is preferred:
Option A: Keep
tie_word_embeddings=Trueand setensure_weight_tying=Trueto maintain the pretrained model's architecture.Option B: Untie embeddings before fine-tuning (set
tie_word_embeddings=False) so thatembed_tokensandlm_headare trained independently.If anyone has experience with this or can point me to relevant discussions, I'd really appreciate the guidance!
Beta Was this translation helpful? Give feedback.
All reactions