improve mslk docs

vkuzo · vkuzo · commit d48f4a0d9655 · 2026-03-13T18:57:20.000Z
Summary: 1. clearly call mslk out in main readme 2. clearly call mslk out in `NVFP4DynamicActivationNVFP4WeightConfig` Test Plan: CI ghstack-source-id: 416ce8c ghstack-comment-id: 4057318351 Pull-Request: #4077
diff --git a/README.md b/README.md
@@ -110,6 +110,17 @@ pip install torchao
 
 Please see the [torchao compability table](https://github.com/pytorch/ao/issues/2919) for version requirements for dependencies.
 
+### Optional Dependencies
+
+[MSLK](https://github.com/pytorch/MSLK) is an optional runtime dependency that provides accelerated kernels for some of the workflows in torchao. Stable MSLK should be used with stable torchao, and nightly MSLK with nightly torchao.
+```bash
+# Stable
+pip install mslk-cuda==1.0.0
+
+# Nightly
+pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
+```
+
 ## 🔎 Inference
 
 TorchAO delivers substantial performance gains with minimal code changes:
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -67,6 +67,19 @@ Other installation options:
 
 Please see the `torchao compatibility table <https://github.com/pytorch/ao/issues/2919>`__ for version requirements for dependencies.
 
+Optional Dependencies
+^^^^^^^^^^^^^^^^^^^^^
+
+`MSLK <https://github.com/pytorch/MSLK>`__ is an optional runtime dependency that provides accelerated kernels for some of the workflows in torchao. Stable MSLK should be used with stable torchao, and nightly MSLK with nightly torchao.
+
+.. code:: bash
+
+    # Stable
+    pip install mslk-cuda==1.0.0
+
+    # Nightly
+    pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
+
 .. toctree::
    :glob:
    :maxdepth: 1
diff --git a/torchao/prototype/mx_formats/inference_workflow.py b/torchao/prototype/mx_formats/inference_workflow.py
@@ -204,7 +204,8 @@ class NVFP4DynamicActivationNVFP4WeightConfig(AOBaseConfig):
     set to False.
 
     Configuration parameters:
-    - use_triton_kernel: bool, whether to use fused triton kernel for activation scaling (default: True)
+    - use_triton_kernel: bool, whether to use fused triton kernel for activation scaling (default: True).
+      Requires `MSLK <https://github.com/pytorch/MSLK>`__ to be installed.
     - use_dynamic_per_tensor_scale: bool, whether to dynamically compute per tensor scale (default: True)
     - step: Optional[QuantizationStep], the quantization step for observer-based flow
     - Data: float4_e2m1fn_x2