Skip to content

FrancescoCavina02/Spiritual-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

95 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Spiritual AI Guide

RAG-Powered Knowledge Chatbot

Python FastAPI Next.js ChromaDB OpenAI License: MIT

Live Demo Β· API Docs Β· Architecture Β· RAG Pipeline Β· Portfolio


Abstract

Spiritual AI Guide is a production-deployed, full-stack Retrieval-Augmented Generation (RAG) system that semantically searches a personal knowledge base of 1,649 Obsidian notes (~300,000 words, spanning 75+ books on spirituality, psychology, philosophy, and neuroscience) and generates precise, cited responses via large language models. The system implements a five-stage RAG pipeline β€” vault ingestion, structure-aware semantic chunking, 384-dimensional sentence-transformer embedding into ChromaDB, hybrid BM25 + dense vector retrieval with composite re-ranking, and multi-LLM generation (GPT-4 Turbo primary, Ollama Llama 3.1 local) β€” demonstrating end-to-end applied NLP engineering from raw Markdown corpus to a streaming, citation-grounded chat interface. The architecture is containerised with Docker and deployed on Vercel (frontend) and Railway (backend), providing a publicly accessible demonstration of retrieval-augmented AI at scale.


System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         USER INTERFACE                          β”‚
β”‚              Next.js 14  Β·  TypeScript  Β·  Tailwind CSS         β”‚
β”‚        Chat Β· Semantic Search Β· Note Browser Β· Tree View        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚  REST / SSE  (NEXT_PUBLIC_API_URL)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      FASTAPI BACKEND                            β”‚
β”‚  /api/chat  Β·  /api/search  Β·  /api/notes  Β·  /api/tree        β”‚
β”‚                   RAG Engine (Orchestrator)                     β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚                β”‚                   β”‚
    β–Ό                β–Ό                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ChromaDB  β”‚  β”‚  Embedding   β”‚  β”‚       LLM Providers         β”‚
β”‚ Vector DB β”‚  β”‚  Service     β”‚  β”‚  OpenAI GPT-4 Turbo (prod)  β”‚
β”‚ 1,772 ch. β”‚  β”‚ all-MiniLM   β”‚  β”‚  Ollama Llama 3.1 (local)  β”‚
β”‚ cosine simβ”‚  β”‚ L6-v2 (384D) β”‚  β”‚  Anthropic / Google (opt.)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Pipeline (offline, run once):
Obsidian Vault (.md) β†’ Parser β†’ Chunker β†’ EmbeddingService β†’ ChromaDB

Tech Stack

Layer Technology Purpose
Backend API Python 3.11, FastAPI, Uvicorn Async REST API with SSE streaming
Vector Database ChromaDB (persistent) Embedding storage & ANN retrieval (cosine)
Embedding Model all-MiniLM-L6-v2 (sentence-transformers) 384D semantic embeddings
Primary LLM OpenAI GPT-4 Turbo Response generation + citation injection
Local LLM Ollama Llama 3.1 8B Free, offline fallback
Optional LLMs Anthropic Claude 3, Google Gemini Multi-provider abstraction
Frontend Next.js 14, TypeScript, Tailwind CSS React SSR with streaming chat UI
Data Source Obsidian Markdown vault (1,649 notes) Personal curated knowledge base
Containerisation Docker, docker-compose Reproducible local deployment
Deployment Vercel (frontend), Railway (backend) Production cloud hosting

RAG Pipeline: 5-Stage Architecture

Full technical deep-dive: docs/rag-pipeline.md

Stage 1 β€” Obsidian Vault Ingestion

The ObsidianParser walks the vault directory tree, extracts Markdown content, and preserves bidirectional [[WikiLink]] relationships between notes. Each note is tagged with category, book/source, and file path metadata β€” critical for citation accuracy.

Stage 2 β€” Structure-Aware Semantic Chunking

ChunkingService splits notes hierarchically: first by Markdown headers (#, ##, ###), then by double-newline paragraph boundaries if sections exceed the target size. Parameters: 800-token target chunks, 150-token overlap (implemented as word-count proxies). The overlap strategy (appending the final N words of the preceding chunk) preserves cross-boundary semantic continuity. Notes shorter than the minimum threshold (100 tokens) are kept as a single chunk.

Stage 3 β€” Dense Vector Embedding

All 1,772 chunks are encoded with sentence-transformers/all-MiniLM-L6-v2, producing normalised 384-dimensional L2-normalised embeddings stored in a ChromaDB persistent collection (hnsw:space=cosine). Batch encoding (batch_size=32) is used for efficiency. The same model encodes queries at inference time for consistent semantic space alignment.

Stage 4 β€” Hybrid Retrieval with Composite Re-ranking

Query processing uses a composite scoring strategy combining three signals:

  • Semantic similarity (70%): ChromaDB cosine distance β†’ similarity score from the HNSW index (top-10 candidates retrieved)
  • Keyword overlap (20%): Jaccard overlap between query tokens and chunk tokens (BM25-style lexical signal without full BM25 index)
  • Link density (10%): Notes with more [[WikiLink]] connections are treated as more semantically central and receive a bonus (capped at 10%)

The top candidates are re-sorted by this composite score before context assembly.

Stage 5 β€” Citation-Grounded LLM Generation

A structured prompt injects retrieved chunks with [Source: Title] attribution labels. The system prompt instructs the LLM to maintain these citations in its response. After generation, a regex parser (\[Source:\s*([^\]]+)\]) extracts cited titles for display in the citation panel. Both streaming (SSE) and non-streaming endpoints are supported.


Model Comparison

Model Provider Cost Avg Latency Response Quality Privacy
GPT-4 Turbo OpenAI API ~$0.01–0.03/query ~12–18s Excellent Cloud
Claude 3 Sonnet Anthropic API ~$0.015/query ~8–12s Excellent Cloud
Gemini Pro Google API ~$0.001/query ~5–8s Very Good Cloud
Llama 3.1 8B Ollama (local) Free ~4–8s* Good On-device

* On Apple M2 MacBook Air 8GB RAM. GPT-4 Turbo is ~22Γ— slower than Llama 3.1 locally due to API network overhead, but produces substantially higher quality citations and reasoning.

Full evaluation methodology: docs/evaluation.md


AI Concepts and Skills Involved

  • Retrieval-Augmented Generation (RAG): Full end-to-end pipeline from raw corpus to cited LLM responses
  • Semantic chunking: Structure-aware text segmentation preserving Markdown header hierarchy and paragraph boundaries with configurable overlap
  • Dense vector retrieval: ANN search via ChromaDB HNSW index with cosine similarity on L2-normalised sentence-transformer embeddings (384D)
  • Hybrid retrieval: Composite re-ranking combining dense similarity (70%) + BM25-style keyword overlap (20%) + graph centrality heuristic (10%)
  • Multi-LLM provider abstraction: Abstract base class pattern with interchangeable OpenAI, Anthropic, Google, and Ollama backends
  • Async streaming generation: FastAPI SSE streaming with AsyncGenerator for token-by-token response delivery
  • Prompt engineering: Structured system prompt with context injection, source attribution format, and persona constraints for a spiritual guidance persona
  • Citation extraction: Regex-based post-processing to parse and surface inline [Source: Title] citations from LLM output
  • Vector database management: ChromaDB schema design with category/book/path metadata for filtered retrieval
  • NLP data pipeline: Obsidian [[WikiLink]] graph preservation, Unicode-safe Markdown parsing, batch embedding with progress tracking

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Ollama (optional, for local LLM)
  • OpenAI API key (for GPT-4 Turbo)

Backend Setup

# Clone the repository
git clone https://github.com/FrancescoCavina02/Spiritual-chatbot.git
cd Spiritual-chatbot

# Create virtual environment
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp env.example .env
# Edit .env and add your OPENAI_API_KEY

# Start the API server
uvicorn app.main:app --reload --port 8000

Load Your Knowledge Base

# Set your Obsidian vault path
export OBSIDIAN_VAULT_PATH=/path/to/your/obsidian/vault

# Run the ingestion pipeline (one-time setup)
python scripts/ingest_notes.py

# Load embeddings into ChromaDB
python scripts/load_chromadb.py

Frontend Setup

cd frontend
npm install

# Configure API URL
cp env.example .env.local
# Set NEXT_PUBLIC_API_URL=http://localhost:8000

npm run dev
# Open http://localhost:3000

Docker (Full Stack)

# Copy and configure environment
cp docker-compose.env.example .env
# Edit .env with your API keys and vault path

docker-compose up --build

Deployment

Full deployment guide: docs/deployment.md

Component Platform URL
Frontend Netlify https://spiritualchatbot1.netlify.app
Backend API Railway https://spiritual-chatbot-api.onrender.com/api
API Docs (Swagger) Railway https://spiritual-chatbot-api.onrender.com/docs

Architecture: The Next.js frontend is deployed to Vercel's edge network. The FastAPI backend (with pre-seeded ChromaDB embeddings) runs on Railway with a persistent volume mount for the vector database. Environment variables are configured via each platform's dashboard.


Repository Structure

β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/          # FastAPI route handlers (chat, search, notes, tree)
β”‚   β”‚   β”œβ”€β”€ models/       # Pydantic request/response schemas
β”‚   β”‚   β”œβ”€β”€ services/     # Core services (RAG engine, embedding, LLM, ChromaDB)
β”‚   β”‚   └── main.py       # Application entry point & lifespan management
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/              # Next.js 14 App Router pages
β”‚   β”œβ”€β”€ components/       # Chat, layout, and notes UI components
β”‚   β”œβ”€β”€ lib/              # API client, storage, markdown utilities
β”‚   └── hooks/            # Custom React hooks (useChat)
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ ingest_notes.py   # Stage 1–3: Parse β†’ Chunk β†’ Embed
β”‚   └── load_chromadb.py  # Stage 4: Load embeddings into ChromaDB
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture.md   # Full system architecture
β”‚   β”œβ”€β”€ rag-pipeline.md   # RAG pipeline deep-dive
β”‚   β”œβ”€β”€ evaluation.md     # Model evaluation & benchmarks
β”‚   └── deployment.md     # Production deployment guide
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/              # Source Obsidian notes (gitignored)
β”‚   β”œβ”€β”€ processed/        # Parsed & chunked JSON (gitignored)
β”‚   └── embeddings/       # ChromaDB persistent store (gitignored)
β”œβ”€β”€ docker-compose.yml
└── railway.toml

License

MIT License β€” see LICENSE for details.


Built by Francesco Cavina Β· Powered by RAG + GPT-4 Turbo + ChromaDB
GitHub Β· Live Demo Β· Portfolio

About

RAG-powered AI chatbot for spiritual guidance using personal Obsidian notes. Built with FastAPI, Next.js 14, and vector embeddings for semantic search and context-aware conversations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors