Update on "Add support for flashinfer quantize kernel option for nvfp4"

jerryzh168 · jerryzh168 · commit 2880f5c208a2 · 2026-03-14T00:19:13.000-07:00
Summary:
Added the flashinfer option for better performance on some of the workflow
we are interested in, also added numerical equivalence test between different
nvfp4_quantize_kernel_choice options

Test Plan:
pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence

We'll test speedup a bit later

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
diff --git a/torchao/prototype/mx_formats/config.py b/torchao/prototype/mx_formats/config.py
@@ -49,6 +49,7 @@ class QuantizeToNVFP4KernelChoice(str, Enum):
 
 torch.serialization.add_safe_globals([QuantizeToNVFP4KernelChoice])
 
+
 # register as pytree constant so we can use dynamo nonstrict trace in torchao.prototype.moe_training.ep
 @register_as_pytree_constant
 class ScaleCalculationMode(Enum):