Feature Request: Support for Multiple LLM Models (Text + Vision)
Issue Description
Currently, the memu-bot configuration system supports only a single LLM model via the customModel configuration. This creates a significant limitation for users who want to:
- Use one model for text reasoning (e.g., GLM-4.7, Claude Opus)
- Use a separate model for vision/multimodal tasks (e.g., GLM-4.6V, GPT-4V)
With the current single-model architecture, users must compromise between:
- Text model quality vs. Vision capabilities
- Cost optimization (using cheaper text models vs. more capable vision models)
Proposed Solution
Option A: Separate Configuration Objects (Recommended)
{
"llmProvider": "custom",
"customApiKey": "...",
"customBaseUrl": "...",
"customModel": "glm-4.7",
"visionProvider": "custom",
"visionApiKey": "your-vision-api-key",
"visionBaseUrl": "...",
"visionModel": "glm-4v",
"autoSelectModel": true
}
Behavior:
- Image tasks automatically use
visionModel
- Text-only tasks use the primary
customModel
- Backward compatible with existing configs
Option C: Minimal Change - Single Vision Field
{
"llmProvider": "custom",
"customApiKey": "...",
"customBaseUrl": "...",
"customModel": "glm-4.7",
"visionModel": "glm-4v"
}
Implementation Details
Changes Required
-
Configuration Schema Update (config/settings.json)
- Add
visionProvider, visionModel, visionApiKey, visionBaseUrl fields
-
LLM Service Layer
- Detect task type (text-only vs. multimodal)
- Route requests to appropriate model based on configuration
Task Detection Logic
function selectLLM(messages) {
const hasImage = messages.some(msg =>
msg.content && msg.content.some(c => c.type === 'image')
);
if (hasImage && config.visionProvider) {
return {
provider: config.visionProvider,
model: config.visionModel,
apiKey: config.visionApiKey || config.customApiKey,
baseUrl: config.visionBaseUrl || config.customBaseUrl
};
}
return {
provider: config.llmProvider,
model: config.customModel,
apiKey: config.customApiKey,
baseUrl: config.customBaseUrl
};
}
Priority
Medium-High (Enhances usability for power users)...
Feature Request: Support for Multiple LLM Models (Text + Vision)
Issue Description
Currently, the memu-bot configuration system supports only a single LLM model via the
customModelconfiguration. This creates a significant limitation for users who want to:With the current single-model architecture, users must compromise between:
Proposed Solution
Option A: Separate Configuration Objects (Recommended)
{ "llmProvider": "custom", "customApiKey": "...", "customBaseUrl": "...", "customModel": "glm-4.7", "visionProvider": "custom", "visionApiKey": "your-vision-api-key", "visionBaseUrl": "...", "visionModel": "glm-4v", "autoSelectModel": true }Behavior:
visionModelcustomModelOption C: Minimal Change - Single Vision Field
{ "llmProvider": "custom", "customApiKey": "...", "customBaseUrl": "...", "customModel": "glm-4.7", "visionModel": "glm-4v" }Implementation Details
Changes Required
Configuration Schema Update (
config/settings.json)visionProvider,visionModel,visionApiKey,visionBaseUrlfieldsLLM Service Layer
Task Detection Logic
Priority
Medium-High (Enhances usability for power users)...