Summary
Propose adding FIM-guided adaptive LoRA rank allocation as a new init_lora_weights mode ('fim') — a data-driven method that uses the diagonal of the empirical Fisher Information Matrix (eFIM) to assign higher ranks to information-critical layers and lower ranks to less sensitive ones, subject to a global rank budget.
Motivation
LoRA uses a fixed rank r across all adapter matrices. Empirically, different layers have different sensitivity to fine-tuning data — attention layers in early transformer blocks often matter less than those in later blocks, and q/v projections often differ from k projections. Fixed-rank allocation wastes capacity on insensitive layers while under-allocating to critical ones.
EVA (already in PEFT) solves this via SVD of input activations. This proposal uses a complementary signal: the eFIM diagonal (mean squared gradient per parameter), which directly measures each parameter's contribution to the loss rather than its activation variance. The two methods are orthogonal — EVA is a better initialization signal for directions; FIM is a better signal for loss sensitivity.
Algorithm
The diagonal eFIM entry for parameter θᵢ:
F_ii ≈ (1/T) Σ (∂ℓ_t / ∂θ_i)²
Rank allocation: aggregate per-layer FIM scores → sort layers by mean eFIM magnitude → allocate rank proportionally under a budget constraint (total rank ≤ n_layers × r), with per-layer rank clamped to [r_min, r_max].
Proposed API
from peft import LoraConfig, get_peft_model
from peft.tuners.lora.fim import FimConfig, initialize_lora_fim_ranks
fim_config = FimConfig(
fim_calibration_batches=8, # batches for eFIM accumulation
r_min=1, # minimum rank per layer
r_max=32, # maximum rank per layer
adjust_scaling_factors=True, # rescale lora_alpha after reallocation
)
config = LoraConfig(
r=8, # mean/budget rank
init_lora_weights="fim",
fim_config=fim_config,
target_modules=["q_proj", "v_proj"],
)
model = get_peft_model(base_model, config)
# Calibration: call after the model is wrapped
for batch in calibration_loader:
outputs = model(**batch)
outputs.loss.backward()
# FIM accumulation is handled internally via hooks
initialize_lora_fim_ranks(model, adapter_name="default")
Why this fits PEFT
- Follows the exact EVA pattern:
FimConfig dataclass + initialize_lora_fim_ranks() public function + init_lora_weights='fim' trigger
- No new dependencies (pure PyTorch)
- Orthogonal to EVA: EVA improves initialization directions; FIM improves rank allocation sensitivity
- Directly populates
rank_pattern (already supported) + reinitializes adapter layers
Reference
- Optimal Brain Damage (LeCun et al., NeurIPS 1990) — theoretical basis for eFIM diagonal importance
- AdaLoRA (Zhang et al., ICLR 2023) — adaptive rank via SVD importance scores (related but uses singular value truncation, not gradient-based FIM)
- Related implementation in torchao: pytorch/ao #4352 (FisherPruner, same eFIM diagonal approach applied to weight pruning)
Implementation plan
src/peft/tuners/lora/fim.py — FimConfig dataclass + initialize_lora_fim_ranks() + hook-based eFIM accumulation
src/peft/tuners/lora/config.py — add fim_config field to LoraConfig, add 'fim' to init_lora_weights Literal
src/peft/tuners/lora/__init__.py — export FimConfig, initialize_lora_fim_ranks
src/peft/__init__.py — top-level export
tests/test_lora_fim.py — unit tests (no GPU required)
Happy to submit a draft PR once there's alignment on the design.
Summary
Propose adding FIM-guided adaptive LoRA rank allocation as a new
init_lora_weightsmode ('fim') — a data-driven method that uses the diagonal of the empirical Fisher Information Matrix (eFIM) to assign higher ranks to information-critical layers and lower ranks to less sensitive ones, subject to a global rank budget.Motivation
LoRA uses a fixed rank
racross all adapter matrices. Empirically, different layers have different sensitivity to fine-tuning data — attention layers in early transformer blocks often matter less than those in later blocks, and q/v projections often differ from k projections. Fixed-rank allocation wastes capacity on insensitive layers while under-allocating to critical ones.EVA (already in PEFT) solves this via SVD of input activations. This proposal uses a complementary signal: the eFIM diagonal (mean squared gradient per parameter), which directly measures each parameter's contribution to the loss rather than its activation variance. The two methods are orthogonal — EVA is a better initialization signal for directions; FIM is a better signal for loss sensitivity.
Algorithm
The diagonal eFIM entry for parameter θᵢ:
Rank allocation: aggregate per-layer FIM scores → sort layers by mean eFIM magnitude → allocate rank proportionally under a budget constraint (total rank ≤
n_layers × r), with per-layer rank clamped to[r_min, r_max].Proposed API
Why this fits PEFT
FimConfigdataclass +initialize_lora_fim_ranks()public function +init_lora_weights='fim'triggerrank_pattern(already supported) + reinitializes adapter layersReference
Implementation plan
src/peft/tuners/lora/fim.py—FimConfigdataclass +initialize_lora_fim_ranks()+ hook-based eFIM accumulationsrc/peft/tuners/lora/config.py— addfim_configfield toLoraConfig, add'fim'toinit_lora_weightsLiteralsrc/peft/tuners/lora/__init__.py— exportFimConfig,initialize_lora_fim_rankssrc/peft/__init__.py— top-level exporttests/test_lora_fim.py— unit tests (no GPU required)Happy to submit a draft PR once there's alignment on the design.