Skip to content

Proposal: FIM-guided adaptive LoRA rank allocation (FimConfig + initialize_lora_fim_ranks) #3203

@ramkrishs

Description

@ramkrishs

Summary

Propose adding FIM-guided adaptive LoRA rank allocation as a new init_lora_weights mode ('fim') — a data-driven method that uses the diagonal of the empirical Fisher Information Matrix (eFIM) to assign higher ranks to information-critical layers and lower ranks to less sensitive ones, subject to a global rank budget.

Motivation

LoRA uses a fixed rank r across all adapter matrices. Empirically, different layers have different sensitivity to fine-tuning data — attention layers in early transformer blocks often matter less than those in later blocks, and q/v projections often differ from k projections. Fixed-rank allocation wastes capacity on insensitive layers while under-allocating to critical ones.

EVA (already in PEFT) solves this via SVD of input activations. This proposal uses a complementary signal: the eFIM diagonal (mean squared gradient per parameter), which directly measures each parameter's contribution to the loss rather than its activation variance. The two methods are orthogonal — EVA is a better initialization signal for directions; FIM is a better signal for loss sensitivity.

Algorithm

The diagonal eFIM entry for parameter θᵢ:

F_ii ≈ (1/T) Σ (∂ℓ_t / ∂θ_i)²

Rank allocation: aggregate per-layer FIM scores → sort layers by mean eFIM magnitude → allocate rank proportionally under a budget constraint (total rank ≤ n_layers × r), with per-layer rank clamped to [r_min, r_max].

Proposed API

from peft import LoraConfig, get_peft_model
from peft.tuners.lora.fim import FimConfig, initialize_lora_fim_ranks

fim_config = FimConfig(
    fim_calibration_batches=8,   # batches for eFIM accumulation
    r_min=1,                     # minimum rank per layer
    r_max=32,                    # maximum rank per layer
    adjust_scaling_factors=True, # rescale lora_alpha after reallocation
)

config = LoraConfig(
    r=8,                          # mean/budget rank
    init_lora_weights="fim",
    fim_config=fim_config,
    target_modules=["q_proj", "v_proj"],
)
model = get_peft_model(base_model, config)

# Calibration: call after the model is wrapped
for batch in calibration_loader:
    outputs = model(**batch)
    outputs.loss.backward()
    # FIM accumulation is handled internally via hooks

initialize_lora_fim_ranks(model, adapter_name="default")

Why this fits PEFT

  • Follows the exact EVA pattern: FimConfig dataclass + initialize_lora_fim_ranks() public function + init_lora_weights='fim' trigger
  • No new dependencies (pure PyTorch)
  • Orthogonal to EVA: EVA improves initialization directions; FIM improves rank allocation sensitivity
  • Directly populates rank_pattern (already supported) + reinitializes adapter layers

Reference

  • Optimal Brain Damage (LeCun et al., NeurIPS 1990) — theoretical basis for eFIM diagonal importance
  • AdaLoRA (Zhang et al., ICLR 2023) — adaptive rank via SVD importance scores (related but uses singular value truncation, not gradient-based FIM)
  • Related implementation in torchao: pytorch/ao #4352 (FisherPruner, same eFIM diagonal approach applied to weight pruning)

Implementation plan

  • src/peft/tuners/lora/fim.pyFimConfig dataclass + initialize_lora_fim_ranks() + hook-based eFIM accumulation
  • src/peft/tuners/lora/config.py — add fim_config field to LoraConfig, add 'fim' to init_lora_weights Literal
  • src/peft/tuners/lora/__init__.py — export FimConfig, initialize_lora_fim_ranks
  • src/peft/__init__.py — top-level export
  • tests/test_lora_fim.py — unit tests (no GPU required)

Happy to submit a draft PR once there's alignment on the design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions