Skip to content

Latest commit

 

History

History
188 lines (139 loc) · 5.25 KB

File metadata and controls

188 lines (139 loc) · 5.25 KB

GenieBot - RAG + Vision Telegram Assistant

GenieBot is a local-first Telegram assistant that supports:

  • document-grounded Q&A with RAG
  • image captioning with tags
  • quick summary of the latest chat or image interaction

It uses Ollama for generation, SentenceTransformers for embeddings, BLIP for vision, and SQLite for persistent vector storage.

Quick Access

  • Telegram bot: @mygenie_ai_bot
  • Service check: http://68.183.85.47:8080/ to confirm whether the bot service is running
  • You do not need to run the full setup locally if the service is already up; you can use the bot directly on Telegram
  • The models are open source, so feel free to use and test the bot as much as you want

Highlights

  • RAG retrieval with persistent embeddings in SQLite
  • Query and embedding caches in RAM for fast repeated calls
  • Vision pipeline for image caption + tags
  • User memory that keeps the last 3 interactions per user
  • Health/status page at / and /health

Tech Stack

  • Python 3.10+
  • python-telegram-bot
  • Ollama (local LLM runtime)
  • sentence-transformers/all-MiniLM-L6-v2 (embeddings)
  • Salesforce/blip-image-captioning-base (vision)
  • SQLite (data/rag_embeddings.db)

Current Runtime Settings

  • retrieval top_k: 2
  • chunk_size: 200
  • max_tokens: 250
  • user history: last 3 interactions per user

Project Structure

app.py                    # Bot bootstrap + status server
bot/handlers.py           # Telegram command handlers
rag/system.py             # Chunking, embedding, retrieval, SQLite persistence
rag/qa.py                 # QA orchestration and prompt flow
rag/llm.py                # Ollama client + model fallback handling
vision/processor.py       # Image captioning and tag extraction
utils/cache.py            # QueryCache + EmbeddingCache (RAM)
utils/memory.py           # Per-user short history (RAM)
utils/logger.py           # File + console logging
data/                     # Knowledge documents + SQLite DB file
media/                    # Assignment screenshots used below
docs/diagrams/system-design.mmd

Setup and Run

1) Create and activate virtual environment

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS/Linux:

python -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

3) Start Ollama and pull models

ollama serve
ollama pull gemma3:4b
ollama pull mistral
ollama pull tinyllama

4) Configure environment

Create .env from .env.example and set:

TELEGRAM_BOT_TOKEN=your_token_here
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma3:4b
OLLAMA_MODEL_PRIORITY=mistral,phi3
OLLAMA_FALLBACK_MODELS=tinyllama
LOG_LEVEL=INFO
PORT=8080

5) Optional: prebuild the embedding DB

python scripts/build_vector_db.py --data-dir data --db-path data/rag_embeddings.db --chunk-size 200 --chunk-overlap 50

6) Run the bot

python app.py

Bot Commands

  • /start - quick intro and commands
  • /help - usage guidance
  • /ask <question> - document-grounded answer
  • /image - upload image for caption + tags
  • /summarize [chat|image] - summarize latest interaction

Demo Screenshots

Start

Start

Ask

Ask

These are example /ask queries that fetch information from your loaded documents and then answer:

  • /ask What is the return policy?
  • /ask What are the pricing plans and included features?
  • /ask What are the support hours and escalation process?
  • /ask How do I reset my account password?
  • /ask What are the main security/compliance points?

Image

Image

Summarize

Summarize

Architecture Diagram

Source: docs/diagrams/system-design.mmd

flowchart TD
    U[Telegram User] --> TG[Telegram API]
    TG --> APP["app.py - Bot Runtime"]
    WEB[Browser Render Ping] --> STATUS["Status endpoints: / and /health"]
    STATUS --> APP

    APP --> H["bot/handlers.py - Command Handlers"]
    H --> MEM["utils/memory.py - Last 3 interactions per user"]
    H --> QA["rag/qa.py - RAG QA Orchestrator"]
    H --> VISION["vision/processor.py - BLIP Caption + Tags"]

    QA --> RAG["rag/system.py - RAG Retrieval"]
    QA --> LLM["rag/llm.py - Ollama LLM + Fallback"]
    QA --> QCACHE["utils/cache.py - QueryCache (RAM only)"]

    RAG --> DOCS["Knowledge Documents (md txt files)"]
    RAG --> ECACHE["utils/cache.py - EmbeddingCache (RAM only)"]
    RAG --> ST["sentence transformers model all MiniLM L6 v2"]
    RAG --> SQLITE["SQLite DB - data/rag_embeddings.db"]

    BLD["scripts/build_vector_db.py - One-time or manual DB build"] --> SQLITE
    VISION --> BLIP["Salesforce BLIP - Image Caption Model"]

    APP --> LOGS["logs/geniebot YYYYMMDD.log"]
    APP --> ENV[".env configuration"]
Loading

Storage and Caching

  • Persistent: document chunk embeddings in SQLite (data/rag_embeddings.db)
  • RAM only: QueryCache, EmbeddingCache, user interaction history
  • Logs: logs/geniebot_YYYYMMDD.log (daily file name, no auto-rotation cleanup)

Assignment Notes

  • This project runs without Docker.
  • RAG is optimized for concise answers with grounded context.
  • Repeated same questions are served from in-memory query cache when available.