From 4a5af6da9930230f1ac3b76d87f8ed949901f833 Mon Sep 17 00:00:00 2001
From: tekgnosis-net <6506223+tekgnosis-net@users.noreply.github.com>
Date: Sat, 2 May 2026 21:41:34 +1000
Subject: [PATCH 1/2] LLM - Add OpenAI-compatible provider (vLLM, LM Studio,
 llama.cpp) with token multiplier for reasoning models
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a new "OpenAI-compatible (vLLM, LM Studio, llama.cpp)" option in the
Settings → AI provider dropdown for self-hosted endpoints that speak
OpenAI's wire format. The form schema and litellm.completion() plumbing
already supported custom api_base + api_key — the wiring is purely UI
plus a small mapping in the model-list endpoint.

Reasoning-model token multiplier (opt-in, scoped to the new option):
Models like Qwen3 / DeepSeek-R1 / Gemma 3 emit chain-of-thought into
message.reasoning_content before the answer lands in message.content.
The original tight max_tokens caps truncate mid-thought
(finish_reason='length') and the answer never lands. A new IntegerField
llm_local_token_multiplier (default 5x, range 1-20) appears only when the
new provider is selected; the helper apply_local_token_multiplier() wraps
every completion() call site (setup, summary, preview, intent eval,
restock fallback) and is a no-op for any other provider kind. Cloud users
(OpenAI/Anthropic/Gemini/OpenRouter/Ollama) see no behavioral or cost
change — original caps are preserved unchanged. Local self-hosted models
cost no per-token money, so headroom is cheap.

UI / form
- New option under the existing Local / Self-hosted optgroup
- Hidden field llm_provider_kind (set by dropdown JS) +
  llm_local_token_multiplier IntegerField (rendered only when
  openai_compatible)
- LIVE_PROVIDERS, KEY_HINTS, api_base visibility, and detectCurrentProvider
  updated to recognize the new option

Backend
- llm_get_models maps openai_compatible -> openai for
  litellm.get_valid_models so vLLM's /v1/models is hit with the right
  provider semantics; results get an openai/ prefix so saved values route
  correctly through litellm.completion() later
- Test-connection: simpler prompt, max_tokens 200 -> 4000, timeout 20 -> 30
  to give reasoning models room
- Form persistence stores provider_kind + local_token_multiplier in
  datastore['settings']['application']['llm'] with round-trip
  pre-population

i18n: 3 new English msgids extracted to messages.pot and propagated to
all 14 .po catalogs via setup.py update_catalog.
README: mention vLLM / LM Studio / OpenAI-compatible alongside Ollama.
---
 README.md                                     |  2 +-
 .../blueprint/settings/__init__.py            |  6 +++
 changedetectionio/blueprint/settings/llm.py   | 20 ++++---
 .../settings/templates/settings_llm_tab.html  | 54 ++++++++++++++-----
 changedetectionio/forms.py                    | 19 +++++++
 changedetectionio/llm/evaluator.py            | 43 +++++++++++++--
 .../restock_diff/plugins/llm_restock.py       |  6 ++-
 .../translations/cs/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/de/LC_MESSAGES/messages.po   | 21 ++++++++
 .../en_GB/LC_MESSAGES/messages.po             | 21 ++++++++
 .../en_US/LC_MESSAGES/messages.po             | 21 ++++++++
 .../translations/es/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/fr/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/it/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/ja/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/ko/LC_MESSAGES/messages.po   | 21 ++++++++
 changedetectionio/translations/messages.pot   | 23 +++++++-
 .../pt_BR/LC_MESSAGES/messages.po             | 21 ++++++++
 .../translations/tr/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/uk/LC_MESSAGES/messages.po   | 21 ++++++++
 .../translations/zh/LC_MESSAGES/messages.po   | 21 ++++++++
 .../zh_Hant_TW/LC_MESSAGES/messages.po        | 21 ++++++++
 22 files changed, 443 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index db10fe57581..ce1336724f9 100644
--- a/README.md
+++ b/README.md
@@ -30,7 +30,7 @@ Stop drowning in noise. Connect any LLM (OpenAI, Gemini, Anthropic, Ollama and m
 
 **AI change summaries** — instead of staring at a raw diff, your notification reads _"Price dropped from $89.99 to $67.00"_ or _"3 new products added to the listing"_. Works globally or per-watch, with full control over the prompt.
 
-Works with any model you already pay for — GPT-4o-mini and Gemini Flash handle this well at fractions of a cent per check. Or run it entirely locally with Ollama. Powered by [LiteLLM](https://github.com/BerriAI/litellm), giving you seamless access to [100+ supported providers and models](https://docs.litellm.ai/docs/providers).
+Works with any model you already pay for — GPT-4o-mini and Gemini Flash handle this well at fractions of a cent per check. Or run it entirely locally with **Ollama**, **vLLM**, **LM Studio**, or any **OpenAI-compatible self-hosted endpoint** — pick the *OpenAI-compatible (vLLM, LM Studio, llama.cpp)* option in the provider dropdown and point it at your server's `/v1` URL. Powered by [LiteLLM](https://github.com/BerriAI/litellm), giving you seamless access to [100+ supported providers and models](https://docs.litellm.ai/docs/providers).
 
 [<img src="./docs/LLM-change-summary.jpeg" style="max-width:100%;" alt="AI-powered website change detection — plain language change summaries and smart alert rules"  title="AI website change detection with LLM change summaries and intelligent alert filtering" />](https://changedetection.io?src=github)
 
diff --git a/changedetectionio/blueprint/settings/__init__.py b/changedetectionio/blueprint/settings/__init__.py
index 7e4e57507be..74af6b71287 100644
--- a/changedetectionio/blueprint/settings/__init__.py
+++ b/changedetectionio/blueprint/settings/__init__.py
@@ -36,6 +36,8 @@ def settings_page():
         default['llm'] = {
             'llm_model':                         _stored_llm.get('model', ''),
             'llm_api_base':                      _stored_llm.get('api_base', ''),
+            'llm_provider_kind':                 _stored_llm.get('provider_kind', ''),
+            'llm_local_token_multiplier':        _stored_llm.get('local_token_multiplier', 5),
             'llm_change_summary_default':        datastore.data['settings']['application'].get('llm_change_summary_default', ''),
             'llm_override_diff_with_summary':    datastore.data['settings']['application'].get('llm_override_diff_with_summary', True),
             'llm_restock_use_fallback_extract':  datastore.data['settings']['application'].get('llm_restock_use_fallback_extract', True),
@@ -148,6 +150,10 @@ def settings_page():
                     'model': (llm_data.get('llm_model') or '').strip(),
                     'api_key': effective_api_key,
                     'api_base': (llm_data.get('llm_api_base') or '').strip(),
+                    # Identifies a self-hosted OpenAI-compatible endpoint so reasoning-friendly
+                    # token caps can be applied conditionally (cloud-LLM defaults stay tight).
+                    'provider_kind': (llm_data.get('llm_provider_kind') or '').strip(),
+                    'local_token_multiplier': int(llm_data.get('llm_local_token_multiplier') or 5),
                     'token_budget_month': existing_llm.get('token_budget_month', 0),
                     'max_input_chars': existing_llm.get('max_input_chars', 0),
                     **preserved_counters,
diff --git a/changedetectionio/blueprint/settings/llm.py b/changedetectionio/blueprint/settings/llm.py
index 2658633ebf0..8be993c9944 100644
--- a/changedetectionio/blueprint/settings/llm.py
+++ b/changedetectionio/blueprint/settings/llm.py
@@ -30,15 +30,20 @@ def llm_get_models():
             api_key = (datastore.data['settings']['application'].get('llm') or {}).get('api_key', '')
             logger.debug("LLM model list: no api_key in request, using stored key")
 
-        _PREFIXES = {'gemini': 'gemini/', 'ollama': 'ollama/', 'openrouter': 'openrouter/'}
+        _PREFIXES = {'gemini': 'gemini/', 'ollama': 'ollama/', 'openrouter': 'openrouter/',
+                     'openai_compatible': 'openai/'}
+        # vLLM / LM Studio / llama.cpp speak OpenAI's wire format — route through litellm's
+        # 'openai' provider but keep the UI-level name distinct from cloud OpenAI.
+        _LITELLM_PROVIDER = {'openai_compatible': 'openai'}
         prefix = _PREFIXES.get(provider, '')
+        litellm_provider = _LITELLM_PROVIDER.get(provider, provider)
 
         try:
             import litellm
-            logger.debug(f"LLM model list: calling litellm.get_valid_models provider={provider!r} api_base={api_base!r}")
+            logger.debug(f"LLM model list: calling litellm.get_valid_models provider={provider!r} (litellm={litellm_provider!r}) api_base={api_base!r}")
             raw = litellm.get_valid_models(
                 check_provider_endpoint=True,
-                custom_llm_provider=provider,
+                custom_llm_provider=litellm_provider,
                 api_key=api_key or None,
                 api_base=api_base or None,
             ) or []
@@ -70,11 +75,14 @@ def llm_test():
             text, total_tokens, input_tokens, output_tokens = completion(
                 model=model,
                 messages=[{'role': 'user', 'content':
-                    'Reply with exactly five words confirming you are ready.'}],
+                    'Respond with just the word: ready'}],
                 api_key=llm_cfg.get('api_key') or None,
                 api_base=api_base or None,
-                timeout=20,
-                max_tokens=200,
+                timeout=30,
+                # Sized for reasoning models (Qwen3, DeepSeek-R1, o1/o3, Gemini 2.5 thinking)
+                # which emit chain-of-thought into message.reasoning_content before the answer
+                # lands in message.content — a small cap truncates mid-thought and yields no answer.
+                max_tokens=4000,
             )
             reply = text.strip()
             if not reply:
diff --git a/changedetectionio/blueprint/settings/templates/settings_llm_tab.html b/changedetectionio/blueprint/settings/templates/settings_llm_tab.html
index 636b348d66a..18233e7f6f5 100644
--- a/changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+++ b/changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -111,6 +111,7 @@ <h3><span class="stab-overview-glyph">✦</span> {{ _('AI-powered change monitor
       </optgroup>
       <optgroup label="{{ _('Local / Self-hosted') }}">
         <option value="ollama">Ollama (local)</option>
+        <option value="openai_compatible">{{ _('OpenAI-compatible (vLLM, LM Studio, llama.cpp)') }}</option>
       </optgroup>
       <optgroup label="OpenRouter">
         <option value="openrouter">OpenRouter (200+ models)</option>
@@ -127,6 +128,18 @@ <h3><span class="stab-overview-glyph">✦</span> {{ _('AI-powered change monitor
     <span class="pure-form-message-inline">{{ _('Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers.') }}</span>
   </div>
 
+  {# Hidden field carrying the dropdown selection so the backend knows when to apply
+     reasoning-friendly token caps (only for self-hosted OpenAI-compatible endpoints). #}
+  {{ form.llm.form.llm_provider_kind() }}
+
+  <div class="pure-control-group" id="llm-local-advanced-group" style="display:none">
+    <label for="{{ form.llm.form.llm_local_token_multiplier.id }}">{{ form.llm.form.llm_local_token_multiplier.label.text }}</label>
+    {{ form.llm.form.llm_local_token_multiplier() }}
+    <span class="pure-form-message-inline">
+      {{ _('Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to %(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps.', default='5x') | safe }}
+    </span>
+  </div>
+
   <div class="pure-control-group" id="llm-fetch-group" style="display:none">
     <label></label>
     <button type="button" id="llm-fetch-btn" class="pure-button button-xsmall" onclick="llmFetchModels()"
@@ -377,14 +390,15 @@ <h3><span class="stab-overview-glyph">✦</span> {{ _('AI-powered change monitor
 
 <script>
 (function () {
-  const LIVE_PROVIDERS = ['openai', 'anthropic', 'gemini', 'ollama', 'openrouter'];
+  const LIVE_PROVIDERS = ['openai', 'anthropic', 'gemini', 'ollama', 'openai_compatible', 'openrouter'];
   const BASE_DEFAULTS  = { ollama: 'http://localhost:11434' };
   const KEY_HINTS = {
-    openai:     '{{ _("platform.openai.com → API keys") }}',
-    anthropic:  '{{ _("console.anthropic.com → API keys") }}',
-    gemini:     '{{ _("aistudio.google.com → Get API key") }}',
-    ollama:     '{{ _("No API key needed for local Ollama") }}',
-    openrouter: '{{ _("openrouter.ai → Keys") }}',
+    openai:             '{{ _("platform.openai.com → API keys") }}',
+    anthropic:          '{{ _("console.anthropic.com → API keys") }}',
+    gemini:             '{{ _("aistudio.google.com → Get API key") }}',
+    ollama:             '{{ _("No API key needed for local Ollama") }}',
+    openai_compatible:  '{{ _("Bearer token for your self-hosted server (vLLM, LM Studio, etc.)") }}',
+    openrouter:         '{{ _("openrouter.ai → Keys") }}',
   };
 
   window.llmDisclaimerToggle = function (cb) {
@@ -393,20 +407,31 @@ <h3><span class="stab-overview-glyph">✦</span> {{ _('AI-powered change monitor
   };
 
   window.llmOnProviderChange = function (provider) {
-    const fetchGroup  = document.getElementById('llm-fetch-group');
-    const baseGroup   = document.getElementById('llm-base-group');
-    const modelSelGrp = document.getElementById('llm-model-select-group');
-    const baseField   = document.querySelector('[name="llm-llm_api_base"]');
-    const hint        = document.getElementById('llm-key-hint');
+    const fetchGroup    = document.getElementById('llm-fetch-group');
+    const baseGroup     = document.getElementById('llm-base-group');
+    const modelSelGrp   = document.getElementById('llm-model-select-group');
+    const localAdvGrp   = document.getElementById('llm-local-advanced-group');
+    const baseField     = document.querySelector('[name="llm-llm_api_base"]');
+    const kindField     = document.querySelector('[name="llm-llm_provider_kind"]');
+    const hint          = document.getElementById('llm-key-hint');
 
     fetchGroup.style.display = LIVE_PROVIDERS.includes(provider) ? '' : 'none';
 
-    const needsBase = provider === 'ollama';
+    const needsBase = provider === 'ollama' || provider === 'openai_compatible';
     baseGroup.style.display = needsBase ? '' : 'none';
     if (BASE_DEFAULTS[provider] !== undefined) {
       if (!baseField.value) baseField.value = BASE_DEFAULTS[provider];
     }
 
+    // Persist the dropdown selection so the backend can branch on provider kind
+    // (currently only 'openai_compatible' triggers the local-multiplier code path).
+    if (kindField) kindField.value = provider || '';
+
+    // Show the local-endpoint advanced settings (token multiplier) only for the
+    // OpenAI-compatible self-hosted option. Cloud providers and Ollama get the
+    // original tight caps and don't see this section at all.
+    if (localAdvGrp) localAdvGrp.style.display = (provider === 'openai_compatible') ? '' : 'none';
+
     hint.textContent = KEY_HINTS[provider] || '';
     modelSelGrp.style.display = 'none';
     document.getElementById('llm-fetch-status').textContent = '';
@@ -516,6 +541,11 @@ <h3><span class="stab-overview-glyph">✦</span> {{ _('AI-powered change monitor
     if (m.startsWith('gemini/'))       guessed = 'gemini';
     else if (m.startsWith('ollama/'))  guessed = 'ollama';
     else if (m.startsWith('openrouter/')) guessed = 'openrouter';
+    else if (m.startsWith('openai/')) {
+      // openai/<model> + custom api_base = self-hosted OpenAI-compatible (vLLM etc.)
+      const baseField = document.querySelector('[name="llm-llm_api_base"]');
+      guessed = (baseField && baseField.value.trim()) ? 'openai_compatible' : 'openai';
+    }
     else if (m.startsWith('claude'))   guessed = 'anthropic';
     else if (m.startsWith('gpt') || m.startsWith('o1') || m.startsWith('o3')) guessed = 'openai';
 
diff --git a/changedetectionio/forms.py b/changedetectionio/forms.py
index ff5abdfb963..6b6dbfd7e2c 100644
--- a/changedetectionio/forms.py
+++ b/changedetectionio/forms.py
@@ -17,6 +17,7 @@
     Form,
     Field,
     FloatField,
+    HiddenField,
     IntegerField,
     PasswordField,
     RadioField,
@@ -1084,6 +1085,24 @@ class globalSettingsLLMForm(Form):
             "style": "width: 24em;",
         },
     )
+    # Persisted by the Provider dropdown JS — lets the backend distinguish a self-hosted
+    # OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp) from cloud OpenAI, so we can
+    # apply reasoning-friendly token caps only when the user opted in.
+    llm_provider_kind = HiddenField(
+        validators=[validators.Optional()],
+        default='',
+    )
+    # Multiplier applied to LLM max_tokens caps when provider_kind == 'openai_compatible'.
+    # Reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought into
+    # message.reasoning_content before the final answer lands in message.content.
+    # Local self-hosted models cost no per-token money, so giving them headroom is cheap;
+    # cloud providers stay on the original tight caps so existing users see no cost change.
+    llm_local_token_multiplier = IntegerField(
+        _l('Token multiplier for local reasoning models'),
+        validators=[validators.Optional(), validators.NumberRange(min=1, max=20)],
+        default=5,
+        render_kw={"placeholder": "5", "style": "width: 6em;"},
+    )
     llm_change_summary_default = TextAreaField(
         _l('Default AI Change Summary prompt'),
         validators=[validators.Optional(), validators.Length(max=2000)],
diff --git a/changedetectionio/llm/evaluator.py b/changedetectionio/llm/evaluator.py
index 680f1377238..0d12481dec1 100644
--- a/changedetectionio/llm/evaluator.py
+++ b/changedetectionio/llm/evaluator.py
@@ -81,6 +81,11 @@ def _cached_system(text: str, model: str = '') -> dict:
 
 LLM_DEFAULT_MAX_SUMMARY_TOKENS = 3000
 
+# Output-token cap for the JSON-returning calls (intent eval, preview, setup/prefilter).
+# Mirrors client.py's _MAX_COMPLETION_TOKENS so the multiplier helper has a base value
+# to scale; cloud-LLM users hit this default unmodified, preserving prior cost defaults.
+JSON_RESPONSE_MAX_TOKENS = 400
+
 # Default prompt used when the user hasn't configured llm_change_summary
 DEFAULT_CHANGE_SUMMARY_PROMPT = "Describe in plain English what changed — list what was added or removed as bullet points, including key details for each item. Be careful of content that merely just moved around, you should mention that it moved but dont report that it was added/removed etc. Be considerate of the style content you are summarising the change of, adjust your report accordingly. Do not quote non-English text verbatim; translate and summarise all content into English. Your entire response must be in English."
 
@@ -90,6 +95,32 @@ def _summary_max_tokens(diff: str, max_cap: int = LLM_DEFAULT_MAX_SUMMARY_TOKENS
     return max(400, min(len(diff) // 4, max_cap))
 
 
+def apply_local_token_multiplier(base_max_tokens: int, llm_cfg: dict) -> int:
+    """
+    Scale max_tokens for self-hosted OpenAI-compatible endpoints (vLLM, LM Studio, llama.cpp).
+
+    Reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought into
+    `message.reasoning_content` BEFORE the final answer lands in `message.content`.
+    Without enough headroom the request truncates mid-thought (`finish_reason='length'`)
+    and the answer never lands — callers see an empty string and silently fall through
+    to safe defaults, hiding the problem.
+
+    Local self-hosted models cost no per-token money, so headroom is cheap; cloud
+    providers (OpenAI, Anthropic, Gemini, OpenRouter) keep their original tight caps
+    so existing users see no cost change.
+
+    Activated only when `llm_cfg['provider_kind'] == 'openai_compatible'`.
+    Multiplier defaults to 5x and is user-configurable in Settings → AI → Provider.
+    """
+    if (llm_cfg or {}).get('provider_kind') != 'openai_compatible':
+        return base_max_tokens
+    try:
+        multiplier = int(llm_cfg.get('local_token_multiplier') or 5)
+    except (TypeError, ValueError):
+        multiplier = 5
+    return base_max_tokens * max(1, multiplier)
+
+
 # ---------------------------------------------------------------------------
 # Intent resolution
 # ---------------------------------------------------------------------------
@@ -338,6 +369,7 @@ def run_setup(watch, datastore, snapshot_text: str) -> None:
             ],
             api_key=cfg.get('api_key'),
             api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
             extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
         )
         _check_token_budget(watch, cfg, tokens)
@@ -431,9 +463,12 @@ def summarise_change(watch, datastore, diff: str, current_snapshot: str = '') ->
             ],
             api_key=cfg.get('api_key'),
             api_base=cfg.get('api_base'),
-            max_tokens=_summary_max_tokens(
-                diff,
-                max_cap=int(datastore.data['settings']['application'].get('llm_max_summary_tokens', LLM_DEFAULT_MAX_SUMMARY_TOKENS) or LLM_DEFAULT_MAX_SUMMARY_TOKENS),
+            max_tokens=apply_local_token_multiplier(
+                _summary_max_tokens(
+                    diff,
+                    max_cap=int(datastore.data['settings']['application'].get('llm_max_summary_tokens', LLM_DEFAULT_MAX_SUMMARY_TOKENS) or LLM_DEFAULT_MAX_SUMMARY_TOKENS),
+                ),
+                cfg,
             ),
             extra_body=_extra_body,
         )
@@ -496,6 +531,7 @@ def preview_extract(watch, datastore, content: str) -> dict | None:
             ],
             api_key=cfg.get('api_key'),
             api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
             extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
         )
         accumulate_global_tokens(datastore, tokens, model=cfg['model'])
@@ -579,6 +615,7 @@ def evaluate_change(watch, datastore, diff: str, current_snapshot: str = '') ->
             ],
             api_key=cfg.get('api_key'),
             api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
             extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
         )
         raw, tokens = _resp[0], _resp[1]
diff --git a/changedetectionio/processors/restock_diff/plugins/llm_restock.py b/changedetectionio/processors/restock_diff/plugins/llm_restock.py
index b3e704d154a..ec49dd6fe23 100644
--- a/changedetectionio/processors/restock_diff/plugins/llm_restock.py
+++ b/changedetectionio/processors/restock_diff/plugins/llm_restock.py
@@ -13,6 +13,7 @@
 import re
 from loguru import logger
 from changedetectionio.pluggy_interface import hookimpl
+from changedetectionio.llm.evaluator import apply_local_token_multiplier
 
 # Injected at startup by inject_datastore_into_plugins()
 datastore = None
@@ -234,7 +235,10 @@ def get_itemprop_availability_override(content, fetcher_name, fetcher_instance,
             ],
             api_key=llm_cfg.get('api_key'),
             api_base=llm_cfg.get('api_base'),
-            max_tokens=80,
+            # 80 fits a {price, currency, availability} JSON answer comfortably for cloud
+            # models. Local reasoning models burn most of that on chain-of-thought before
+            # the JSON lands — the multiplier scales it up only when provider_kind says so.
+            max_tokens=apply_local_token_multiplier(80, llm_cfg),
         )
 
         accumulate_global_tokens(
diff --git a/changedetectionio/translations/cs/LC_MESSAGES/messages.po b/changedetectionio/translations/cs/LC_MESSAGES/messages.po
index 5b4d3d7112a..2f161748def 100644
--- a/changedetectionio/translations/cs/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/cs/LC_MESSAGES/messages.po
@@ -903,10 +903,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1102,6 +1115,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3135,6 +3152,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/de/LC_MESSAGES/messages.po b/changedetectionio/translations/de/LC_MESSAGES/messages.po
index 62e67247502..3ef24e1e96c 100644
--- a/changedetectionio/translations/de/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/de/LC_MESSAGES/messages.po
@@ -919,10 +919,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1118,6 +1131,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3187,6 +3204,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/en_GB/LC_MESSAGES/messages.po b/changedetectionio/translations/en_GB/LC_MESSAGES/messages.po
index aa094f98299..0a8752af02c 100644
--- a/changedetectionio/translations/en_GB/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/en_GB/LC_MESSAGES/messages.po
@@ -901,10 +901,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1100,6 +1113,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3129,6 +3146,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/en_US/LC_MESSAGES/messages.po b/changedetectionio/translations/en_US/LC_MESSAGES/messages.po
index 1c6b87c0221..d75367ecb32 100644
--- a/changedetectionio/translations/en_US/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/en_US/LC_MESSAGES/messages.po
@@ -901,10 +901,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1100,6 +1113,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3129,6 +3146,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/es/LC_MESSAGES/messages.po b/changedetectionio/translations/es/LC_MESSAGES/messages.po
index 89931980bf4..e31e9dc4fcc 100644
--- a/changedetectionio/translations/es/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/es/LC_MESSAGES/messages.po
@@ -939,10 +939,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1138,6 +1151,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3202,6 +3219,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/fr/LC_MESSAGES/messages.po b/changedetectionio/translations/fr/LC_MESSAGES/messages.po
index 13a62a71646..73af6767696 100644
--- a/changedetectionio/translations/fr/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/fr/LC_MESSAGES/messages.po
@@ -907,10 +907,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1106,6 +1119,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3142,6 +3159,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/it/LC_MESSAGES/messages.po b/changedetectionio/translations/it/LC_MESSAGES/messages.po
index 789dd5dd5ab..d4254a95e25 100644
--- a/changedetectionio/translations/it/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/it/LC_MESSAGES/messages.po
@@ -903,10 +903,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1102,6 +1115,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3131,6 +3148,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/ja/LC_MESSAGES/messages.po b/changedetectionio/translations/ja/LC_MESSAGES/messages.po
index 098f827d245..cb3e942d3a4 100644
--- a/changedetectionio/translations/ja/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/ja/LC_MESSAGES/messages.po
@@ -908,10 +908,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1107,6 +1120,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3148,6 +3165,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/ko/LC_MESSAGES/messages.po b/changedetectionio/translations/ko/LC_MESSAGES/messages.po
index 9d62c84113a..620243d618b 100644
--- a/changedetectionio/translations/ko/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/ko/LC_MESSAGES/messages.po
@@ -909,10 +909,23 @@ msgstr "프로바이더 선택"
 msgid "Local / Self-hosted"
 msgstr "로컬 / 자체 호스팅"
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr "Ollama 또는 사용자 지정/자체 호스팅 엔드포인트에만 필요합니다. 클라우드 프로바이더는 비워 두세요."
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr "사용 가능한 모델 불러오기"
@@ -1108,6 +1121,10 @@ msgstr "aistudio.google.com → API 키 받기"
 msgid "No API key needed for local Ollama"
 msgstr "로컬 Ollama에는 API 키가 필요 없습니다"
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr "openrouter.ai → 키"
@@ -3139,6 +3156,10 @@ msgstr "LITELLM_API_KEY 환경 변수를 사용하려면 비워 두세요"
 msgid "API Base URL"
 msgstr "API 기본 URL"
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr "기본 AI 변경 요약 프롬프트"
diff --git a/changedetectionio/translations/messages.pot b/changedetectionio/translations/messages.pot
index 6f2a7138509..86a5288ee3c 100644
--- a/changedetectionio/translations/messages.pot
+++ b/changedetectionio/translations/messages.pot
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: changedetection.io 0.55.3\n"
 "Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
-"POT-Creation-Date: 2026-04-30 19:35+0900\n"
+"POT-Creation-Date: 2026-05-02 21:27+1000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <LL@li.org>\n"
@@ -900,10 +900,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1099,6 +1112,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3128,6 +3145,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/pt_BR/LC_MESSAGES/messages.po b/changedetectionio/translations/pt_BR/LC_MESSAGES/messages.po
index 63dacadbe4a..b2376731560 100644
--- a/changedetectionio/translations/pt_BR/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/pt_BR/LC_MESSAGES/messages.po
@@ -926,10 +926,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1125,6 +1138,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3177,6 +3194,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/tr/LC_MESSAGES/messages.po b/changedetectionio/translations/tr/LC_MESSAGES/messages.po
index c6196867fc0..b3f1edb8c9a 100644
--- a/changedetectionio/translations/tr/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/tr/LC_MESSAGES/messages.po
@@ -936,10 +936,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1135,6 +1148,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3182,6 +3199,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/uk/LC_MESSAGES/messages.po b/changedetectionio/translations/uk/LC_MESSAGES/messages.po
index 61fe7b17154..32fe6aac13c 100644
--- a/changedetectionio/translations/uk/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/uk/LC_MESSAGES/messages.po
@@ -916,10 +916,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1115,6 +1128,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3161,6 +3178,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/zh/LC_MESSAGES/messages.po b/changedetectionio/translations/zh/LC_MESSAGES/messages.po
index 36aab44db8f..6da2576ffe5 100644
--- a/changedetectionio/translations/zh/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/zh/LC_MESSAGES/messages.po
@@ -905,10 +905,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1104,6 +1117,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3134,6 +3151,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""
diff --git a/changedetectionio/translations/zh_Hant_TW/LC_MESSAGES/messages.po b/changedetectionio/translations/zh_Hant_TW/LC_MESSAGES/messages.po
index 3208ab9e97a..c61019f9be5 100644
--- a/changedetectionio/translations/zh_Hant_TW/LC_MESSAGES/messages.po
+++ b/changedetectionio/translations/zh_Hant_TW/LC_MESSAGES/messages.po
@@ -904,10 +904,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1103,6 +1116,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""
 
+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -3133,6 +3150,10 @@ msgstr ""
 msgid "API Base URL"
 msgstr ""
 
+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr ""

From d418058a1d848c75b06997dcbe8f396cf7a5c729 Mon Sep 17 00:00:00 2001
From: tekgnosis-net <6506223+tekgnosis-net@users.noreply.github.com>
Date: Sat, 2 May 2026 23:39:50 +1000
Subject: [PATCH 2/2] LLM - Address review feedback: route test cap through
 multiplier helper, clamp upper bound

- blueprint/settings/llm.py: llm_test() now routes its max_tokens through
  apply_local_token_multiplier(200, llm_cfg) instead of a hardcoded 4000.
  Cloud providers stay on the small 200-token base (matching upstream's
  pre-existing test behavior); only openai_compatible endpoints opt into
  the multiplier's reasoning headroom. Avoids unintentionally giving
  cloud reasoning models (o1/o3, Gemini 2.5 thinking) a large unbounded
  budget on a one-word smoke test.

- llm/evaluator.py: apply_local_token_multiplier() now clamps the
  multiplier to [1, 20], matching the form's NumberRange validator.
  Defense-in-depth against corrupted datastore values that bypassed
  form validation (manual JSON edits, future migrations, plugins).
---
 changedetectionio/blueprint/settings/llm.py | 9 +++++----
 changedetectionio/llm/evaluator.py          | 7 ++++++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/changedetectionio/blueprint/settings/llm.py b/changedetectionio/blueprint/settings/llm.py
index 8be993c9944..690711d011e 100644
--- a/changedetectionio/blueprint/settings/llm.py
+++ b/changedetectionio/blueprint/settings/llm.py
@@ -72,6 +72,10 @@ def llm_test():
 
         try:
             logger.debug(f"LLM connection test: sending test prompt to model={model!r}")
+            # Reuse the same multiplier path the production calls use, so cloud providers
+            # stay on a small base cap (matching upstream's pre-existing behavior) and only
+            # 'openai_compatible' endpoints opt into the reasoning-friendly headroom.
+            from changedetectionio.llm.evaluator import apply_local_token_multiplier
             text, total_tokens, input_tokens, output_tokens = completion(
                 model=model,
                 messages=[{'role': 'user', 'content':
@@ -79,10 +83,7 @@ def llm_test():
                 api_key=llm_cfg.get('api_key') or None,
                 api_base=api_base or None,
                 timeout=30,
-                # Sized for reasoning models (Qwen3, DeepSeek-R1, o1/o3, Gemini 2.5 thinking)
-                # which emit chain-of-thought into message.reasoning_content before the answer
-                # lands in message.content — a small cap truncates mid-thought and yields no answer.
-                max_tokens=4000,
+                max_tokens=apply_local_token_multiplier(200, llm_cfg),
             )
             reply = text.strip()
             if not reply:
diff --git a/changedetectionio/llm/evaluator.py b/changedetectionio/llm/evaluator.py
index 0d12481dec1..1344002c02f 100644
--- a/changedetectionio/llm/evaluator.py
+++ b/changedetectionio/llm/evaluator.py
@@ -118,7 +118,12 @@ def apply_local_token_multiplier(base_max_tokens: int, llm_cfg: dict) -> int:
         multiplier = int(llm_cfg.get('local_token_multiplier') or 5)
     except (TypeError, ValueError):
         multiplier = 5
-    return base_max_tokens * max(1, multiplier)
+    # Clamp to the same 1-20 range the form enforces. Defense-in-depth against
+    # corrupted datastore values that bypassed form validation (manual JSON edits,
+    # future migrations, plugins): a runaway multiplier could otherwise produce
+    # absurdly large max_tokens caps and exhaust local-endpoint memory.
+    multiplier = max(1, min(multiplier, 20))
+    return base_max_tokens * multiplier
 
 
 # ---------------------------------------------------------------------------