docs: add LoRA adapter documentation for LLMInferenceService#654
docs: add LoRA adapter documentation for LLMInferenceService#654andresllh wants to merge 5 commits into
Conversation
✅ Deploy Preview for elastic-nobel-0aef7a ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter support in LLMInferenceService, which was implemented in kserve/kserve#5317. Changes: - New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md - Overview of LoRA adapters and benefits - Supported URI schemes (hf://, s3://, pvc://) - Configuration examples and field reference - Usage at inference time with OpenAI-compatible API - Monitoring and troubleshooting guide - Updated llmisvc-configuration.md with LoRA adapter configuration section - Updated llmisvc-overview.md with expanded LoRA mention and link - Updated sidebars.ts with new navigation entry Related: - Implementation PR: kserve/kserve#5317 - Sample docs: https://github.com/kserve/kserve/tree/master/docs/samples/llmisvc/lora-adapters Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>
70ac1ab to
3895e79
Compare
Signed-off-by: Andres Llausas <allausas@redhat.com>
The storage-initializer is an init container managed exclusively by the controller. Two reasons why spec.template.containers/initContainers env vars don't work for it: 1. spec.template.containers affects regular containers, not init containers 2. stripPriorControllerStorageInitializer strips any user-defined storage-initializer init container before the controller adds its own Credentials are injected via CreateSecretVolumeAndEnvFromServiceAccount, which reads secrets from the Kubernetes service account. Replace all three incorrect spec.template.containers snippets (HF auth, S3 auth, and Example 2) with the correct SA-based pattern — consistent with the official KServe HF storage docs. Add a note to both sections explaining the limitation. Also fix the S3 secret key names: the credential builder looks for awsAccessKeyID/awsSecretAccessKey, not AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. Also fix the troubleshooting table to point to the correct credential setup steps instead of referencing environment variables. Addresses reviewer comment: "does this actually work?" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>
| provided through the service account secret mechanism described above. | ||
| ::: | ||
|
|
||
| **Supported S3-Compatible Providers**: |
There was a problem hiding this comment.
Not all of them were individually validated end-to-end. Added a note that these providers are expected to work via standard S3 API compatibility but have not been individually validated.
There was a problem hiding this comment.
I am only concerned by the ceph one, IIRC we never used it or tested.
There was a problem hiding this comment.
Good point — clarified in the doc. Only hf:// was specifically tested with LoRA adapters. Updated the note to reflect that all KServe storage-initializer schemes are expected to work (minus OCI), but only hf:// has been validated end-to-end. S3-compatible providers including Ceph are listed as expected to work but not individually tested.
| [KServe storage credentials documentation](https://kserve.github.io/website/latest/modelserving/storage/s3/s3/). | ||
|
|
||
| :::note | ||
| Env vars cannot be added to the storage-initializer via `spec.template` — credentials must be |
There was a problem hiding this comment.
LLMInferenceService manages the storage-initializer directly as an init container and does not go through the ClusterStorageContainer webhook path — so CSC does not apply here. S3 credentials are injected via the Kubernetes service account secret mechanism.
There was a problem hiding this comment.
any reason to ignore the CSC? I mean, it is the main configuration file for storate-init
There was a problem hiding this comment.
the ClusterStorageContainer is not implemented for llmisvc
|
|
||
| **PVC Requirements**: | ||
| - PVC must exist in the same namespace | ||
| - Access mode: `ReadOnlyMany` or `ReadWriteMany` (for multiple replicas) |
There was a problem hiding this comment.
can we enforce only ReadOnly?
There was a problem hiding this comment.
do you want to limit the permissions on the PVC the user creates? so only allow ReadOnly PVCs?
If so, that would need to be done at the controller level
There was a problem hiding this comment.
Updated to recommend ReadOnlyMany. The controller always mounts PVC adapters with ReadOnly: true in the volume mount, so ReadOnlyMany is the appropriate access mode. It's documented as a recommendation since the controller doesn't validate the PVC's access mode.
There was a problem hiding this comment.
I mentioned it here, because at some point of this doc you mention that the PVC should be readonlymany.
There was a problem hiding this comment.
it's not enforced by the controller, but it can be recommended here
There was a problem hiding this comment.
ok, it seems that lora does not need write permissions, might be better to go with read only by default, wdyt?
There was a problem hiding this comment.
I think it's fine to leave it up to the user however they want to create their PVC, they could be using it for several other loras and mounted to different containers. They could also download an extra one to that PVC and continue using it.
- Add link to vLLM LoRA documentation in prerequisites - Add 1% GPU memory rule of thumb for adapter sizing - Add S3 provider validation caveat (expected to work via S3 API compatibility) - Update PVC access mode to ReadOnlyMany (recommended) with file structure details - Add explanation for OCI limitation (single modelcar per pod constraint) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>
|
/lgtm |
Only hf:// has been specifically validated end-to-end for LoRA adapters. Reframe S3 provider list as all KServe storage-initializer schemes (minus OCI) with a note that S3-compatible providers including Ceph are expected to work but not individually tested. Signed-off-by: Andres Llausas <allausas@redhat.com>
Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter support in LLMInferenceService, which was implemented in kserve/kserve#5317.
Changes:
New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md
Updated llmisvc-configuration.md with LoRA adapter configuration section
Updated llmisvc-overview.md with expanded LoRA mention and link
Updated sidebars.ts with new navigation entry
Related:
"Fixes #issue-number" or "Add description of the problem this PR solves"
Proposed Changes