Skip to content

docs: add LoRA adapter documentation for LLMInferenceService#654

Open
andresllh wants to merge 5 commits into
kserve:mainfrom
andresllh:docs/add-lora-adapter-documentation
Open

docs: add LoRA adapter documentation for LLMInferenceService#654
andresllh wants to merge 5 commits into
kserve:mainfrom
andresllh:docs/add-lora-adapter-documentation

Conversation

@andresllh
Copy link
Copy Markdown

Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter support in LLMInferenceService, which was implemented in kserve/kserve#5317.

Changes:

  • New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md

    • Overview of LoRA adapters and benefits
    • Supported URI schemes (hf://, s3://, pvc://)
    • Configuration examples and field reference
    • Usage at inference time with OpenAI-compatible API
    • Monitoring and troubleshooting guide
  • Updated llmisvc-configuration.md with LoRA adapter configuration section

  • Updated llmisvc-overview.md with expanded LoRA mention and link

  • Updated sidebars.ts with new navigation entry

Related:

"Fixes #issue-number" or "Add description of the problem this PR solves"

Proposed Changes

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 10, 2026

Deploy Preview for elastic-nobel-0aef7a ready!

Name Link
🔨 Latest commit 30e50d0
🔍 Latest deploy log https://app.netlify.com/projects/elastic-nobel-0aef7a/deploys/6a0cc79da30a0700080d5fd2
😎 Deploy Preview https://deploy-preview-654--elastic-nobel-0aef7a.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Add comprehensive documentation for LoRA (Low-Rank Adaptation) adapter
support in LLMInferenceService, which was implemented in kserve/kserve#5317.

Changes:
- New page: docs/model-serving/generative-inference/llmisvc/lora-adapters.md
  - Overview of LoRA adapters and benefits
  - Supported URI schemes (hf://, s3://, pvc://)
  - Configuration examples and field reference
  - Usage at inference time with OpenAI-compatible API
  - Monitoring and troubleshooting guide

- Updated llmisvc-configuration.md with LoRA adapter configuration section
- Updated llmisvc-overview.md with expanded LoRA mention and link
- Updated sidebars.ts with new navigation entry

Related:
- Implementation PR: kserve/kserve#5317
- Sample docs: https://github.com/kserve/kserve/tree/master/docs/samples/llmisvc/lora-adapters

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com>
@andresllh andresllh force-pushed the docs/add-lora-adapter-documentation branch from 70ac1ab to 3895e79 Compare April 10, 2026 15:04
Signed-off-by: Andres Llausas <allausas@redhat.com>
Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated
The storage-initializer is an init container managed exclusively by the
controller. Two reasons why spec.template.containers/initContainers env
vars don't work for it:
1. spec.template.containers affects regular containers, not init containers
2. stripPriorControllerStorageInitializer strips any user-defined
   storage-initializer init container before the controller adds its own

Credentials are injected via CreateSecretVolumeAndEnvFromServiceAccount,
which reads secrets from the Kubernetes service account. Replace all
three incorrect spec.template.containers snippets (HF auth, S3 auth,
and Example 2) with the correct SA-based pattern — consistent with the
official KServe HF storage docs. Add a note to both sections explaining
the limitation.

Also fix the S3 secret key names: the credential builder looks for
awsAccessKeyID/awsSecretAccessKey, not AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY.

Also fix the troubleshooting table to point to the correct credential
setup steps instead of referencing environment variables.

Addresses reviewer comment: "does this actually work?"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com>
@andresllh andresllh requested a review from pierDipi May 18, 2026 16:38
Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated
Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md
Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated
provided through the service account secret mechanism described above.
:::

**Supported S3-Compatible Providers**:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we test all of these?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of them were individually validated end-to-end. Added a note that these providers are expected to work via standard S3 API compatibility but have not been individually validated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only concerned by the ceph one, IIRC we never used it or tested.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — clarified in the doc. Only hf:// was specifically tested with LoRA adapters. Updated the note to reflect that all KServe storage-initializer schemes are expected to work (minus OCI), but only hf:// has been validated end-to-end. S3-compatible providers including Ceph are listed as expected to work but not individually tested.

[KServe storage credentials documentation](https://kserve.github.io/website/latest/modelserving/storage/s3/s3/).

:::note
Env vars cannot be added to the storage-initializer via `spec.template` — credentials must be
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the CSC work here?

Copy link
Copy Markdown
Author

@andresllh andresllh May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMInferenceService manages the storage-initializer directly as an init container and does not go through the ClusterStorageContainer webhook path — so CSC does not apply here. S3 credentials are injected via the Kubernetes service account secret mechanism.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to ignore the CSC? I mean, it is the main configuration file for storate-init

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ClusterStorageContainer is not implemented for llmisvc

Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated
Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md

**PVC Requirements**:
- PVC must exist in the same namespace
- Access mode: `ReadOnlyMany` or `ReadWriteMany` (for multiple replicas)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we enforce only ReadOnly?

Copy link
Copy Markdown
Author

@andresllh andresllh May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to limit the permissions on the PVC the user creates? so only allow ReadOnly PVCs?
If so, that would need to be done at the controller level

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to recommend ReadOnlyMany. The controller always mounts PVC adapters with ReadOnly: true in the volume mount, so ReadOnlyMany is the appropriate access mode. It's documented as a recommendation since the controller doesn't validate the PVC's access mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it here, because at some point of this doc you mention that the PVC should be readonlymany.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not enforced by the controller, but it can be recommended here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it seems that lora does not need write permissions, might be better to go with read only by default, wdyt?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave it up to the user however they want to create their PVC, they could be using it for several other loras and mounted to different containers. They could also download an extra one to that PVC and continue using it.

Comment thread docs/model-serving/generative-inference/llmisvc/lora-adapters.md Outdated
- Add link to vLLM LoRA documentation in prerequisites
- Add 1% GPU memory rule of thumb for adapter sizing
- Add S3 provider validation caveat (expected to work via S3 API compatibility)
- Update PVC access mode to ReadOnlyMany (recommended) with file structure details
- Add explanation for OCI limitation (single modelcar per pod constraint)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com>
@spolti
Copy link
Copy Markdown
Contributor

spolti commented May 19, 2026

/lgtm
@pierDipi please review as well.

Only hf:// has been specifically validated end-to-end for LoRA adapters.
Reframe S3 provider list as all KServe storage-initializer schemes (minus OCI)
with a note that S3-compatible providers including Ceph are expected to work
but not individually tested.

Signed-off-by: Andres Llausas <allausas@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants