LLM - Self-hosted OpenAI-compatible endpoint support (vLLM, LM Studio, llama.cpp) — refs #3204 by tekgnosis-net · Pull Request #4117 · dgtlmoon/changedetection.io

tekgnosis-net · 2026-05-02T13:30:03Z

Refs #3204 — implements the self-hosted OpenAI-compatible endpoint support requested in that thread. The broader vision / image-extraction discussion in #3204 stays as future work; see the Phase 2 roadmap section at the end.

Summary

Adds a new "OpenAI-compatible (vLLM, LM Studio, llama.cpp)" option in Settings → AI → Provider for self-hosted endpoints that speak OpenAI's wire format. The form schema and litellm.completion() plumbing already supported custom api_base + api_key — the wiring is purely UI plus a small mapping in the model-list endpoint, plus an opt-in token-budget multiplier so reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) have room to think before they answer.

Why

Reasoning models emit chain-of-thought into message.reasoning_content before the final answer lands in message.content. The existing tight max_tokens caps truncate mid-thought (finish_reason='length') and the answer never lands — callers see an empty string and silently fall through to safe defaults (e.g. parse_eval_response() returns {'important': False, ...}). For users running self-hosted reasoning models, this manifests as "the AI feature seems broken — nothing fires."

Verified end-to-end on a vLLM endpoint serving a Qwen3-27B reasoning model: with the existing 200-token test cap, the model spent its entire output budget on reasoning and produced empty content. With the multiplier in place, the same call returns the answer reliably.

This PR's design also aligns with the (name, flavor, endpoint, model, auth) tuple pattern proposed by @kquinsland in #3204 (comment): "almost every endpoint will support if not default to flavor=openAI."

Design — opt-in, scoped (no behavior change for cloud users)

A new IntegerField llm_local_token_multiplier (default 5×, range 1–20) appears in the UI only when the new provider option is selected. The helper apply_local_token_multiplier(base, cfg) wraps every completion() call site (setup, summary, preview, intent eval, restock fallback) and is a no-op for any other provider kind.

Cloud users (OpenAI / Anthropic / Gemini / OpenRouter / Ollama) see no behavioral or cost change — original caps preserved unchanged.
Local self-hosted models cost no per-token money, so giving them headroom is essentially free.
Existing env-var configurations (LLM_MODEL etc.) are unaffected — without provider_kind, the helper short-circuits.

The opt-in mechanism is a hidden field llm_provider_kind driven by the provider dropdown JS — necessary because the dropdown was previously UX-only and not persisted, but we need the backend to know which mode to apply. detectCurrentProvider was extended to distinguish a saved openai/<model> + non-empty api_base (= local) from bare openai/<model> (= cloud) on page reload.

Files touched (22)

UI / form

changedetectionio/forms.py — adds HiddenField llm_provider_kind + IntegerField llm_local_token_multiplier
changedetectionio/blueprint/settings/templates/settings_llm_tab.html — new dropdown option, JS visibility toggle, hidden-field wiring, provider-detection update

Backend

changedetectionio/blueprint/settings/__init__.py — round-trip persistence of the two new fields
changedetectionio/blueprint/settings/llm.py — _LITELLM_PROVIDER mapping (openai_compatible → openai for litellm.get_valid_models); test-connection prompt simplified, max_tokens 200 → 4000, timeout 20 → 30 to give reasoning models room

Helper + call sites

changedetectionio/llm/evaluator.py — new apply_local_token_multiplier(base, cfg) and JSON_RESPONSE_MAX_TOKENS = 400 constant; wrapped at all four completion() sites
changedetectionio/processors/restock_diff/plugins/llm_restock.py — wraps the restock fallback's previously-hardcoded max_tokens=80 (which was catastrophic for reasoning models)

i18n — 3 new English msgids extracted to messages.pot and propagated to all 14 .po catalogs via setup.py update_catalog. No fragmentation; entire-sentence msgids per the project's translation contract.

README — adds vLLM / LM Studio / OpenAI-compatible mention alongside the existing Ollama line.

Test plan

Local end-to-end against vLLM: configured a vLLM endpoint serving a Qwen3-27B reasoning model, verified the new provider option appears, model list loads from /v1/models with the bearer token, Test connection returns ✓ Connected, settings round-trip through page reload
Translation gate: python setup.py extract_messages && update_catalog && compile_catalog produces the expected diff (only the 3 new msgids land in each .po, no fragment churn). Dennis lint clean.
Lint gate: ruff check . --select E9,F63,F7,F82,INT passes (matches the upstream CI gate)
Helper short-circuits correctly: verified for openai, anthropic, gemini, ollama provider kinds and for env-var-only configs — input unchanged
CI: full test matrix passed (3.10/3.11/3.12/3.13/3.14) on the contributor's fork after the changes (one transient flake on 3.14 / basic-tests cleared on rerun, unrelated to this change)

Review notes

The test-connection prompt was changed from "Reply with exactly five words confirming you are ready." to "Respond with just the word: ready". The word-count constraint in the original prompt was a thinking-trap for reasoning models (forced enumeration of candidate phrases). The simpler prompt is fine for the connectivity smoke test that this method actually is.
_LITELLM_PROVIDER translation only applies at the litellm.get_valid_models() call site — the UI-level identifier openai_compatible is stable in the datastore. If LiteLLM ever adds a native vllm provider, this becomes a one-line change.
apply_local_token_multiplier is intentionally simple — no model-name detection, no "this looks like a reasoning model" heuristics. The user opted in by picking the local provider; that's the only signal we use.

Known adjacent issues (not addressed here)

AI API key not valid #4107 ("AI API key not valid", recently closed): the underlying root cause — api_base value sticks in the form/datastore when switching from Ollama to a cloud provider — was not addressed by this PR. My new provider option doesn't reintroduce the bug for new flows, but the existing Ollama→Gemini sticky-value bug remains. Happy to file a follow-up that clears api_base on provider change for non-base-needing providers if useful.
finish_reason='length' is logged but not surfaced to callers (client.py:68-72): even with the multiplier, the rare truncation case is invisible to upstream code. A future PR could change the return tuple from 4 → 5 elements (adding finish_reason) so parsers in response_parser.py can distinguish "model said nothing" from "model truncated". Not addressed here to keep the diff focused.

Roadmap — Phase 2: vision support for change evaluation

The original feature request in #3204 explicitly discusses sending screenshots to vision-capable LLMs for structured extraction. This PR delivers the foundational endpoint plumbing — Phase 1. Phase 2 (vision) is a deliberate follow-up because:

The PR already touches 22 files / 443 lines for the foundational piece. Bundling vision would triple the surface and meaningfully slow review.
Vision needs new design opinions that deserve their own discussion: which screenshot to send (current? before/after? compressed?), per-watch vs. global opt-in, cost model (vision tokens are priced very differently from text), prompt structure for "look at this" vs. "diff this with that".
Many self-hosted users will run text-only local models — vision is "useful when available," not universal.

Phase 2 design sketch (intended as a follow-up PR, not in this one):

Models with vision: Qwen3-VL family / Gemma 3 multimodal / DeepSeek-VL2 / GPT-4o / Claude 3+ / Gemini 1.5+. Vision is opt-in, never assumed.
Where the image comes from: the existing per-watch screenshot bytes in watch.data_dir/last-screenshot.png (already produced by browser fetchers and consumed by processors/image_ssim_diff/).
Message shape: the existing prompt_builder.py functions return strings; introduce a parallel build_*_messages() variant returning OpenAI-format multipart [{type:"text"}, {type:"image_url"}]. client.completion() already accepts arbitrary messages — no signature change needed.
Opt-in surface: a new llm_use_vision boolean on watch + tag + global, cascading like the existing LLM intent / summary fields.
Cost & truncation: image token costs vary by model and resolution; a vision-aware variant of _summary_max_tokens and apply_local_token_multiplier would account for the embedded image's ~85–1500 tokens depending on detail level.
Tests: at least one mocked LiteLLM vision call asserting message-shape correctness, plus a docs page noting which models are tested.

Happy to open Phase 2 as its own PR once this lands — it builds naturally on the provider_kind + local_token_multiplier infrastructure introduced here.

…h token multiplier for reasoning models Adds a new "OpenAI-compatible (vLLM, LM Studio, llama.cpp)" option in the Settings → AI provider dropdown for self-hosted endpoints that speak OpenAI's wire format. The form schema and litellm.completion() plumbing already supported custom api_base + api_key — the wiring is purely UI plus a small mapping in the model-list endpoint. Reasoning-model token multiplier (opt-in, scoped to the new option): Models like Qwen3 / DeepSeek-R1 / Gemma 3 emit chain-of-thought into message.reasoning_content before the answer lands in message.content. The original tight max_tokens caps truncate mid-thought (finish_reason='length') and the answer never lands. A new IntegerField llm_local_token_multiplier (default 5x, range 1-20) appears only when the new provider is selected; the helper apply_local_token_multiplier() wraps every completion() call site (setup, summary, preview, intent eval, restock fallback) and is a no-op for any other provider kind. Cloud users (OpenAI/Anthropic/Gemini/OpenRouter/Ollama) see no behavioral or cost change — original caps are preserved unchanged. Local self-hosted models cost no per-token money, so headroom is cheap. UI / form - New option under the existing Local / Self-hosted optgroup - Hidden field llm_provider_kind (set by dropdown JS) + llm_local_token_multiplier IntegerField (rendered only when openai_compatible) - LIVE_PROVIDERS, KEY_HINTS, api_base visibility, and detectCurrentProvider updated to recognize the new option Backend - llm_get_models maps openai_compatible -> openai for litellm.get_valid_models so vLLM's /v1/models is hit with the right provider semantics; results get an openai/ prefix so saved values route correctly through litellm.completion() later - Test-connection: simpler prompt, max_tokens 200 -> 4000, timeout 20 -> 30 to give reasoning models room - Form persistence stores provider_kind + local_token_multiplier in datastore['settings']['application']['llm'] with round-trip pre-population i18n: 3 new English msgids extracted to messages.pot and propagated to all 14 .po catalogs via setup.py update_catalog. README: mention vLLM / LM Studio / OpenAI-compatible alongside Ollama.

Copilot

Pull request overview

Adds UI + backend wiring to support self-hosted OpenAI-compatible endpoints (vLLM / LM Studio / llama.cpp) as a first-class LLM provider option, including an opt-in output-token multiplier to avoid truncation for reasoning models.

Changes:

Adds a new “OpenAI-compatible” provider option in Settings → AI, persisting provider kind + a configurable local token multiplier.
Applies the token multiplier to LLM call sites (evaluator flows and restock fallback) for self-hosted OpenAI-compatible endpoints only.
Updates model-list endpoint/provider mapping and propagates new translatable strings across catalogs; README updated accordingly.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
changedetectionio/blueprint/settings/templates/settings_llm_tab.html	Adds provider dropdown option, persists `llm_provider_kind`, shows local token multiplier UI, and updates provider detection JS.
changedetectionio/forms.py	Adds `llm_provider_kind` (hidden) and `llm_local_token_multiplier` fields to the global LLM settings form.
changedetectionio/blueprint/settings/init.py	Persists the new provider-kind and multiplier settings into datastore LLM config.
changedetectionio/blueprint/settings/llm.py	Maps `openai_compatible` to LiteLLM `openai` for model listing; adjusts test prompt/timeout/max_tokens.
changedetectionio/llm/evaluator.py	Introduces `apply_local_token_multiplier()` and applies it to multiple completion call sites; adds `JSON_RESPONSE_MAX_TOKENS`.
changedetectionio/processors/restock_diff/plugins/llm_restock.py	Applies the local token multiplier to the restock fallback completion’s `max_tokens`.
README.md	Documents using vLLM/LM Studio/OpenAI-compatible self-hosted endpoints via the provider dropdown.
changedetectionio/translations/messages.pot	Adds new msgids for the new provider option/help text; updates POT creation date.
changedetectionio/translations/cs/LC_MESSAGES/messages.po	Propagates new msgids into Czech catalog.
changedetectionio/translations/de/LC_MESSAGES/messages.po	Propagates new msgids into German catalog.
changedetectionio/translations/en_GB/LC_MESSAGES/messages.po	Propagates new msgids into en_GB catalog.
changedetectionio/translations/en_US/LC_MESSAGES/messages.po	Propagates new msgids into en_US catalog.
changedetectionio/translations/es/LC_MESSAGES/messages.po	Propagates new msgids into Spanish catalog.
changedetectionio/translations/fr/LC_MESSAGES/messages.po	Propagates new msgids into French catalog.
changedetectionio/translations/it/LC_MESSAGES/messages.po	Propagates new msgids into Italian catalog.
changedetectionio/translations/ja/LC_MESSAGES/messages.po	Propagates new msgids into Japanese catalog.
changedetectionio/translations/ko/LC_MESSAGES/messages.po	Propagates new msgids into Korean catalog.
changedetectionio/translations/pt_BR/LC_MESSAGES/messages.po	Propagates new msgids into Brazilian Portuguese catalog.
changedetectionio/translations/tr/LC_MESSAGES/messages.po	Propagates new msgids into Turkish catalog.
changedetectionio/translations/uk/LC_MESSAGES/messages.po	Propagates new msgids into Ukrainian catalog.
changedetectionio/translations/zh/LC_MESSAGES/messages.po	Propagates new msgids into Simplified Chinese catalog.
changedetectionio/translations/zh_Hant_TW/LC_MESSAGES/messages.po	Propagates new msgids into Traditional Chinese (Taiwan) catalog.

Comments suppressed due to low confidence (1)

changedetectionio/blueprint/settings/templates/settings_llm_tab.html:555

detectCurrentProvider() ignores the persisted hidden field (llm_provider_kind) and re-infers provider from model + whether api_base is non-empty. Given the known “sticky api_base” behavior, this can mis-select openai_compatible (or hide it) on reload and therefore show the wrong UI and potentially apply the wrong token-multiplier behavior. Prefer using the stored hidden field value first (if present/valid) and only falling back to heuristics when it’s blank/unknown.

  // On page load: detect and pre-select provider from current model
  (function detectCurrentProvider() {
    const modelField = document.querySelector('[name="llm-llm_model"]');
    if (!modelField) return;
    const m = modelField.value.trim();
    if (!m) return;

    let guessed = '';
    if (m.startsWith('gemini/'))       guessed = 'gemini';
    else if (m.startsWith('ollama/'))  guessed = 'ollama';
    else if (m.startsWith('openrouter/')) guessed = 'openrouter';
    else if (m.startsWith('openai/')) {
      // openai/<model> + custom api_base = self-hosted OpenAI-compatible (vLLM etc.)
      const baseField = document.querySelector('[name="llm-llm_api_base"]');
      guessed = (baseField && baseField.value.trim()) ? 'openai_compatible' : 'openai';
    }
    else if (m.startsWith('claude'))   guessed = 'anthropic';
    else if (m.startsWith('gpt') || m.startsWith('o1') || m.startsWith('o3')) guessed = 'openai';

    if (guessed) {
      const sel = document.getElementById('llm-provider');
      if (sel) { sel.value = guessed; llmOnProviderChange(guessed); }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…er, clamp upper bound - blueprint/settings/llm.py: llm_test() now routes its max_tokens through apply_local_token_multiplier(200, llm_cfg) instead of a hardcoded 4000. Cloud providers stay on the small 200-token base (matching upstream's pre-existing test behavior); only openai_compatible endpoints opt into the multiplier's reasoning headroom. Avoids unintentionally giving cloud reasoning models (o1/o3, Gemini 2.5 thinking) a large unbounded budget on a one-word smoke test. - llm/evaluator.py: apply_local_token_multiplier() now clamps the multiplier to [1, 20], matching the form's NumberRange validator. Defense-in-depth against corrupted datastore values that bypassed form validation (manual JSON edits, future migrations, plugins).

dgtlmoon

Two small things before we merge:

1. JS fix — use stored llm_provider_kind before falling back to the heuristic

detectCurrentProvider() re-runs its model-string heuristic on every page load and overwrites the hidden field, so the stored value never gets a chance to be authoritative. Fix: check the stored field first, only guess when it's blank (i.e. configs saved before this PR):

(function detectCurrentProvider() {
  // Prefer the persisted provider kind; fall back to heuristics only for
  // configs saved before llm_provider_kind was introduced.
  const kindField = document.querySelector('[name="llm-llm_provider_kind"]');
  if (kindField && kindField.value.trim()) {
    const sel = document.getElementById('llm-provider');
    if (sel) { sel.value = kindField.value.trim(); llmOnProviderChange(kindField.value.trim()); }
    return;
  }
  // … existing heuristic unchanged …

2. update_N — populate provider_kind for existing configs

Without a migration, old configs have no provider_kind in the datastore, so the JS always falls through to the heuristic. An update_22 (or next available number) in store/updates.py fixes that by inferring from the already-stored model + api_base:

def update_22(self):
    """Infer llm.provider_kind for configs saved before it was introduced."""
    llm = self.data['settings']['application'].get('llm') or {}
    if llm.get('provider_kind'):
        return
    model    = (llm.get('model')    or '').strip()
    api_base = (llm.get('api_base') or '').strip()

    PREFIX_MAP = {'gemini': 'gemini', 'ollama': 'ollama', 'openrouter': 'openrouter', 'openai': 'openai'}
    prefix = model.split('/')[0]
    kind = PREFIX_MAP.get(prefix)

    # Models without a provider prefix (gpt-4o, o1, claude-3-sonnet, etc.)
    if not kind:
        if prefix.startswith(('gpt', 'o1', 'o3')): kind = 'openai'
        elif prefix.startswith('claude'):           kind = 'anthropic'

    if kind == 'openai' and api_base:
        kind = 'openai_compatible'

    if kind:
        self.data['settings']['application']['llm']['provider_kind'] = kind

The two work together: the migration stamps the correct value for old installs, the JS fix makes sure it's respected on page load rather than overwritten by the heuristic.

Copilot AI review requested due to automatic review settings May 2, 2026 13:30

Copilot started reviewing on behalf of tekgnosis-net May 2, 2026 13:30 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

Comment thread changedetectionio/blueprint/settings/llm.py

Comment thread changedetectionio/llm/evaluator.py Outdated

tekgnosis-net mentioned this pull request May 2, 2026

Ollama Cloud missing :cloud models when using https://ollama.com as endpoint #4115

Open

dgtlmoon reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM - Self-hosted OpenAI-compatible endpoint support (vLLM, LM Studio, llama.cpp) — refs #3204#4117

LLM - Self-hosted OpenAI-compatible endpoint support (vLLM, LM Studio, llama.cpp) — refs #3204#4117
tekgnosis-net wants to merge 2 commits intodgtlmoon:masterfrom
tekgnosis-net:llm-openai-compatible-provider

tekgnosis-net commented May 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

dgtlmoon left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tekgnosis-net commented May 2, 2026

Summary

Why

Design — opt-in, scoped (no behavior change for cloud users)

Files touched (22)

Test plan

Review notes

Known adjacent issues (not addressed here)

Roadmap — Phase 2: vision support for change evaluation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

dgtlmoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dgtlmoon left a comment •

edited

Loading