docs: add LoRA adapter documentation for LLMInferenceService by andresllh · Pull Request #654 · kserve/website

andresllh · 2026-04-10T15:01:27Z

Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter support in LLMInferenceService, which was implemented in kserve/kserve#5317.

Changes:

New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md
- Overview of LoRA adapters and benefits
- Supported URI schemes (hf://, s3://, pvc://)
- Configuration examples and field reference
- Usage at inference time with OpenAI-compatible API
- Monitoring and troubleshooting guide
Updated llmisvc-configuration.md with LoRA adapter configuration section
Updated llmisvc-overview.md with expanded LoRA mention and link
Updated sidebars.ts with new navigation entry

Implementation PR: feat(llmisvc): add reconciliation logic for static lora adapters kserve#5317
Sample docs: https://github.com/kserve/kserve/tree/master/docs/samples/llmisvc/lora-adapters

"Fixes #issue-number" or "Add description of the problem this PR solves"

Proposed Changes

netlify · 2026-04-10T15:01:37Z

✅ Deploy Preview for elastic-nobel-0aef7a ready!

Name	Link
🔨 Latest commit	`30e50d0`
🔍 Latest deploy log	https://app.netlify.com/projects/elastic-nobel-0aef7a/deploys/6a0cc79da30a0700080d5fd2
😎 Deploy Preview	https://deploy-preview-654--elastic-nobel-0aef7a.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter support in LLMInferenceService, which was implemented in kserve/kserve#5317. Changes: - New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md - Overview of LoRA adapters and benefits - Supported URI schemes (hf://, s3://, pvc://) - Configuration examples and field reference - Usage at inference time with OpenAI-compatible API - Monitoring and troubleshooting guide - Updated llmisvc-configuration.md with LoRA adapter configuration section - Updated llmisvc-overview.md with expanded LoRA mention and link - Updated sidebars.ts with new navigation entry Related: - Implementation PR: kserve/kserve#5317 - Sample docs: https://github.com/kserve/kserve/tree/master/docs/samples/llmisvc/lora-adapters Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>

Signed-off-by: Andres Llausas <allausas@redhat.com>

The storage-initializer is an init container managed exclusively by the controller. Two reasons why spec.template.containers/initContainers env vars don't work for it: 1. spec.template.containers affects regular containers, not init containers 2. stripPriorControllerStorageInitializer strips any user-defined storage-initializer init container before the controller adds its own Credentials are injected via CreateSecretVolumeAndEnvFromServiceAccount, which reads secrets from the Kubernetes service account. Replace all three incorrect spec.template.containers snippets (HF auth, S3 auth, and Example 2) with the correct SA-based pattern — consistent with the official KServe HF storage docs. Add a note to both sections explaining the limitation. Also fix the S3 secret key names: the credential builder looks for awsAccessKeyID/awsSecretAccessKey, not AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. Also fix the troubleshooting table to point to the correct credential setup steps instead of referencing environment variables. Addresses reviewer comment: "does this actually work?" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>

spolti · 2026-05-18T19:20:54Z

+provided through the service account secret mechanism described above.
+:::
+
+**Supported S3-Compatible Providers**:


Did we test all of these?

Not all of them were individually validated end-to-end. Added a note that these providers are expected to work via standard S3 API compatibility but have not been individually validated.

I am only concerned by the ceph one, IIRC we never used it or tested.

Good point — clarified in the doc. Only hf:// was specifically tested with LoRA adapters. Updated the note to reflect that all KServe storage-initializer schemes are expected to work (minus OCI), but only hf:// has been validated end-to-end. S3-compatible providers including Ceph are listed as expected to work but not individually tested.

spolti · 2026-05-18T19:21:09Z

+[KServe storage credentials documentation](https://kserve.github.io/website/latest/modelserving/storage/s3/s3/).
+
+:::note
+Env vars cannot be added to the storage-initializer via `spec.template` — credentials must be


Does the CSC work here?

LLMInferenceService manages the storage-initializer directly as an init container and does not go through the ClusterStorageContainer webhook path — so CSC does not apply here. S3 credentials are injected via the Kubernetes service account secret mechanism.

any reason to ignore the CSC? I mean, it is the main configuration file for storate-init

the ClusterStorageContainer is not implemented for llmisvc

spolti · 2026-05-18T19:26:28Z

+
+**PVC Requirements**:
+- PVC must exist in the same namespace
+- Access mode: `ReadOnlyMany` or `ReadWriteMany` (for multiple replicas)


can we enforce only ReadOnly?

do you want to limit the permissions on the PVC the user creates? so only allow ReadOnly PVCs?
If so, that would need to be done at the controller level

Updated to recommend ReadOnlyMany. The controller always mounts PVC adapters with ReadOnly: true in the volume mount, so ReadOnlyMany is the appropriate access mode. It's documented as a recommendation since the controller doesn't validate the PVC's access mode.

I mentioned it here, because at some point of this doc you mention that the PVC should be readonlymany.

it's not enforced by the controller, but it can be recommended here

ok, it seems that lora does not need write permissions, might be better to go with read only by default, wdyt?

I think it's fine to leave it up to the user however they want to create their PVC, they could be using it for several other loras and mounted to different containers. They could also download an extra one to that PVC and continue using it.

- Add link to vLLM LoRA documentation in prerequisites - Add 1% GPU memory rule of thumb for adapter sizing - Add S3 provider validation caveat (expected to work via S3 API compatibility) - Update PVC access mode to ReadOnlyMany (recommended) with file structure details - Add explanation for OCI limitation (single modelcar per pod constraint) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>

spolti · 2026-05-19T19:43:56Z

/lgtm
@pierDipi please review as well.

Only hf:// has been specifically validated end-to-end for LoRA adapters. Reframe S3 provider list as all KServe storage-initializer schemes (minus OCI) with a note that S3-compatible providers including Ceph are expected to work but not individually tested. Signed-off-by: Andres Llausas <allausas@redhat.com>

andresllh mentioned this pull request Apr 10, 2026

feat(llmisvc): add reconciliation logic for static lora adapters kserve/kserve#5317

Merged

7 tasks

andresllh force-pushed the docs/add-lora-adapter-documentation branch from 70ac1ab to 3895e79 Compare April 10, 2026 15:04

Updates made due to changes in lora implementation.

70e139c

Signed-off-by: Andres Llausas <allausas@redhat.com>

pierDipi reviewed May 14, 2026

View reviewed changes

Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated

andresllh requested a review from pierDipi May 18, 2026 16:38