GenieBot is a local-first Telegram assistant that supports:
- document-grounded Q&A with RAG
- image captioning with tags
- quick summary of the latest chat or image interaction
It uses Ollama for generation, SentenceTransformers for embeddings, BLIP for vision, and SQLite for persistent vector storage.
- Telegram bot: @mygenie_ai_bot
- Service check: http://68.183.85.47:8080/ to confirm whether the bot service is running
- You do not need to run the full setup locally if the service is already up; you can use the bot directly on Telegram
- The models are open source, so feel free to use and test the bot as much as you want
- RAG retrieval with persistent embeddings in SQLite
- Query and embedding caches in RAM for fast repeated calls
- Vision pipeline for image caption + tags
- User memory that keeps the last 3 interactions per user
- Health/status page at
/and/health
- Python 3.10+
- python-telegram-bot
- Ollama (local LLM runtime)
- sentence-transformers/all-MiniLM-L6-v2 (embeddings)
- Salesforce/blip-image-captioning-base (vision)
- SQLite (
data/rag_embeddings.db)
- retrieval top_k: 2
- chunk_size: 200
- max_tokens: 250
- user history: last 3 interactions per user
app.py # Bot bootstrap + status server
bot/handlers.py # Telegram command handlers
rag/system.py # Chunking, embedding, retrieval, SQLite persistence
rag/qa.py # QA orchestration and prompt flow
rag/llm.py # Ollama client + model fallback handling
vision/processor.py # Image captioning and tag extraction
utils/cache.py # QueryCache + EmbeddingCache (RAM)
utils/memory.py # Per-user short history (RAM)
utils/logger.py # File + console logging
data/ # Knowledge documents + SQLite DB file
media/ # Assignment screenshots used below
docs/diagrams/system-design.mmd
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1macOS/Linux:
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtollama serve
ollama pull gemma3:4b
ollama pull mistral
ollama pull tinyllamaCreate .env from .env.example and set:
TELEGRAM_BOT_TOKEN=your_token_here
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma3:4b
OLLAMA_MODEL_PRIORITY=mistral,phi3
OLLAMA_FALLBACK_MODELS=tinyllama
LOG_LEVEL=INFO
PORT=8080python scripts/build_vector_db.py --data-dir data --db-path data/rag_embeddings.db --chunk-size 200 --chunk-overlap 50python app.py/start- quick intro and commands/help- usage guidance/ask <question>- document-grounded answer/image- upload image for caption + tags/summarize [chat|image]- summarize latest interaction
These are example /ask queries that fetch information from your loaded documents and then answer:
/ask What is the return policy?/ask What are the pricing plans and included features?/ask What are the support hours and escalation process?/ask How do I reset my account password?/ask What are the main security/compliance points?
Source: docs/diagrams/system-design.mmd
flowchart TD
U[Telegram User] --> TG[Telegram API]
TG --> APP["app.py - Bot Runtime"]
WEB[Browser Render Ping] --> STATUS["Status endpoints: / and /health"]
STATUS --> APP
APP --> H["bot/handlers.py - Command Handlers"]
H --> MEM["utils/memory.py - Last 3 interactions per user"]
H --> QA["rag/qa.py - RAG QA Orchestrator"]
H --> VISION["vision/processor.py - BLIP Caption + Tags"]
QA --> RAG["rag/system.py - RAG Retrieval"]
QA --> LLM["rag/llm.py - Ollama LLM + Fallback"]
QA --> QCACHE["utils/cache.py - QueryCache (RAM only)"]
RAG --> DOCS["Knowledge Documents (md txt files)"]
RAG --> ECACHE["utils/cache.py - EmbeddingCache (RAM only)"]
RAG --> ST["sentence transformers model all MiniLM L6 v2"]
RAG --> SQLITE["SQLite DB - data/rag_embeddings.db"]
BLD["scripts/build_vector_db.py - One-time or manual DB build"] --> SQLITE
VISION --> BLIP["Salesforce BLIP - Image Caption Model"]
APP --> LOGS["logs/geniebot YYYYMMDD.log"]
APP --> ENV[".env configuration"]
- Persistent: document chunk embeddings in SQLite (
data/rag_embeddings.db) - RAM only: QueryCache, EmbeddingCache, user interaction history
- Logs:
logs/geniebot_YYYYMMDD.log(daily file name, no auto-rotation cleanup)
- This project runs without Docker.
- RAG is optimized for concise answers with grounded context.
- Repeated same questions are served from in-memory query cache when available.



