Live Demo Β· API Docs Β· Architecture Β· RAG Pipeline Β· Portfolio
Spiritual AI Guide is a production-deployed, full-stack Retrieval-Augmented Generation (RAG) system that semantically searches a personal knowledge base of 1,649 Obsidian notes (~300,000 words, spanning 75+ books on spirituality, psychology, philosophy, and neuroscience) and generates precise, cited responses via large language models. The system implements a five-stage RAG pipeline β vault ingestion, structure-aware semantic chunking, 384-dimensional sentence-transformer embedding into ChromaDB, hybrid BM25 + dense vector retrieval with composite re-ranking, and multi-LLM generation (GPT-4 Turbo primary, Ollama Llama 3.1 local) β demonstrating end-to-end applied NLP engineering from raw Markdown corpus to a streaming, citation-grounded chat interface. The architecture is containerised with Docker and deployed on Vercel (frontend) and Railway (backend), providing a publicly accessible demonstration of retrieval-augmented AI at scale.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β Next.js 14 Β· TypeScript Β· Tailwind CSS β
β Chat Β· Semantic Search Β· Note Browser Β· Tree View β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β REST / SSE (NEXT_PUBLIC_API_URL)
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND β
β /api/chat Β· /api/search Β· /api/notes Β· /api/tree β
β RAG Engine (Orchestrator) β
βββββ¬βββββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββββ
β ChromaDB β β Embedding β β LLM Providers β
β Vector DB β β Service β β OpenAI GPT-4 Turbo (prod) β
β 1,772 ch. β β all-MiniLM β β Ollama Llama 3.1 (local) β
β cosine simβ β L6-v2 (384D) β β Anthropic / Google (opt.) β
βββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββββ
Data Pipeline (offline, run once):
Obsidian Vault (.md) β Parser β Chunker β EmbeddingService β ChromaDB
| Layer | Technology | Purpose |
|---|---|---|
| Backend API | Python 3.11, FastAPI, Uvicorn | Async REST API with SSE streaming |
| Vector Database | ChromaDB (persistent) | Embedding storage & ANN retrieval (cosine) |
| Embedding Model | all-MiniLM-L6-v2 (sentence-transformers) |
384D semantic embeddings |
| Primary LLM | OpenAI GPT-4 Turbo | Response generation + citation injection |
| Local LLM | Ollama Llama 3.1 8B | Free, offline fallback |
| Optional LLMs | Anthropic Claude 3, Google Gemini | Multi-provider abstraction |
| Frontend | Next.js 14, TypeScript, Tailwind CSS | React SSR with streaming chat UI |
| Data Source | Obsidian Markdown vault (1,649 notes) | Personal curated knowledge base |
| Containerisation | Docker, docker-compose | Reproducible local deployment |
| Deployment | Vercel (frontend), Railway (backend) | Production cloud hosting |
Full technical deep-dive: docs/rag-pipeline.md
The ObsidianParser walks the vault directory tree, extracts Markdown content, and preserves bidirectional [[WikiLink]] relationships between notes. Each note is tagged with category, book/source, and file path metadata β critical for citation accuracy.
ChunkingService splits notes hierarchically: first by Markdown headers (#, ##, ###), then by double-newline paragraph boundaries if sections exceed the target size. Parameters: 800-token target chunks, 150-token overlap (implemented as word-count proxies). The overlap strategy (appending the final N words of the preceding chunk) preserves cross-boundary semantic continuity. Notes shorter than the minimum threshold (100 tokens) are kept as a single chunk.
All 1,772 chunks are encoded with sentence-transformers/all-MiniLM-L6-v2, producing normalised 384-dimensional L2-normalised embeddings stored in a ChromaDB persistent collection (hnsw:space=cosine). Batch encoding (batch_size=32) is used for efficiency. The same model encodes queries at inference time for consistent semantic space alignment.
Query processing uses a composite scoring strategy combining three signals:
- Semantic similarity (70%): ChromaDB cosine distance β similarity score from the HNSW index (top-10 candidates retrieved)
- Keyword overlap (20%): Jaccard overlap between query tokens and chunk tokens (BM25-style lexical signal without full BM25 index)
- Link density (10%): Notes with more
[[WikiLink]]connections are treated as more semantically central and receive a bonus (capped at 10%)
The top candidates are re-sorted by this composite score before context assembly.
A structured prompt injects retrieved chunks with [Source: Title] attribution labels. The system prompt instructs the LLM to maintain these citations in its response. After generation, a regex parser (\[Source:\s*([^\]]+)\]) extracts cited titles for display in the citation panel. Both streaming (SSE) and non-streaming endpoints are supported.
| Model | Provider | Cost | Avg Latency | Response Quality | Privacy |
|---|---|---|---|---|---|
| GPT-4 Turbo | OpenAI API | ~$0.01β0.03/query | ~12β18s | Excellent | Cloud |
| Claude 3 Sonnet | Anthropic API | ~$0.015/query | ~8β12s | Excellent | Cloud |
| Gemini Pro | Google API | ~$0.001/query | ~5β8s | Very Good | Cloud |
| Llama 3.1 8B | Ollama (local) | Free | ~4β8s* | Good | On-device |
* On Apple M2 MacBook Air 8GB RAM. GPT-4 Turbo is ~22Γ slower than Llama 3.1 locally due to API network overhead, but produces substantially higher quality citations and reasoning.
Full evaluation methodology: docs/evaluation.md
- Retrieval-Augmented Generation (RAG): Full end-to-end pipeline from raw corpus to cited LLM responses
- Semantic chunking: Structure-aware text segmentation preserving Markdown header hierarchy and paragraph boundaries with configurable overlap
- Dense vector retrieval: ANN search via ChromaDB HNSW index with cosine similarity on L2-normalised sentence-transformer embeddings (384D)
- Hybrid retrieval: Composite re-ranking combining dense similarity (70%) + BM25-style keyword overlap (20%) + graph centrality heuristic (10%)
- Multi-LLM provider abstraction: Abstract base class pattern with interchangeable OpenAI, Anthropic, Google, and Ollama backends
- Async streaming generation: FastAPI SSE streaming with
AsyncGeneratorfor token-by-token response delivery - Prompt engineering: Structured system prompt with context injection, source attribution format, and persona constraints for a spiritual guidance persona
- Citation extraction: Regex-based post-processing to parse and surface inline
[Source: Title]citations from LLM output - Vector database management: ChromaDB schema design with category/book/path metadata for filtered retrieval
- NLP data pipeline: Obsidian
[[WikiLink]]graph preservation, Unicode-safe Markdown parsing, batch embedding with progress tracking
- Python 3.11+
- Node.js 18+
- Ollama (optional, for local LLM)
- OpenAI API key (for GPT-4 Turbo)
# Clone the repository
git clone https://github.com/FrancescoCavina02/Spiritual-chatbot.git
cd Spiritual-chatbot
# Create virtual environment
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp env.example .env
# Edit .env and add your OPENAI_API_KEY
# Start the API server
uvicorn app.main:app --reload --port 8000# Set your Obsidian vault path
export OBSIDIAN_VAULT_PATH=/path/to/your/obsidian/vault
# Run the ingestion pipeline (one-time setup)
python scripts/ingest_notes.py
# Load embeddings into ChromaDB
python scripts/load_chromadb.pycd frontend
npm install
# Configure API URL
cp env.example .env.local
# Set NEXT_PUBLIC_API_URL=http://localhost:8000
npm run dev
# Open http://localhost:3000# Copy and configure environment
cp docker-compose.env.example .env
# Edit .env with your API keys and vault path
docker-compose up --buildFull deployment guide: docs/deployment.md
| Component | Platform | URL |
|---|---|---|
| Frontend | Netlify | https://spiritualchatbot1.netlify.app |
| Backend API | Railway | https://spiritual-chatbot-api.onrender.com/api |
| API Docs (Swagger) | Railway | https://spiritual-chatbot-api.onrender.com/docs |
Architecture: The Next.js frontend is deployed to Vercel's edge network. The FastAPI backend (with pre-seeded ChromaDB embeddings) runs on Railway with a persistent volume mount for the vector database. Environment variables are configured via each platform's dashboard.
βββ backend/
β βββ app/
β β βββ api/ # FastAPI route handlers (chat, search, notes, tree)
β β βββ models/ # Pydantic request/response schemas
β β βββ services/ # Core services (RAG engine, embedding, LLM, ChromaDB)
β β βββ main.py # Application entry point & lifespan management
β βββ Dockerfile
β βββ requirements.txt
βββ frontend/
β βββ app/ # Next.js 14 App Router pages
β βββ components/ # Chat, layout, and notes UI components
β βββ lib/ # API client, storage, markdown utilities
β βββ hooks/ # Custom React hooks (useChat)
βββ scripts/
β βββ ingest_notes.py # Stage 1β3: Parse β Chunk β Embed
β βββ load_chromadb.py # Stage 4: Load embeddings into ChromaDB
βββ docs/
β βββ architecture.md # Full system architecture
β βββ rag-pipeline.md # RAG pipeline deep-dive
β βββ evaluation.md # Model evaluation & benchmarks
β βββ deployment.md # Production deployment guide
βββ data/
β βββ raw/ # Source Obsidian notes (gitignored)
β βββ processed/ # Parsed & chunked JSON (gitignored)
β βββ embeddings/ # ChromaDB persistent store (gitignored)
βββ docker-compose.yml
βββ railway.toml
MIT License β see LICENSE for details.
Built by Francesco Cavina Β· Powered by RAG + GPT-4 Turbo + ChromaDB
GitHub Β· Live Demo Β· Portfolio