This repository documents my technical evolution from writing basic LLM prompts to engineering a production-ready Retrieval-Augmented Generation (RAG) Proof of Concept (PoC). Each folder and notebook represents a critical milestone in mastering LlamaIndex, open-source model deployment, and intelligent document orchestration.
The foundational concepts to building a fully functional RAG application:
-
RAG Fundamentals: Gain a deep understanding of how RAG pipelines work and the core concepts of information retrieval and processing in the context of LLMs.
-
LlamaIndex Mastery: Learn to effectively use LlamaIndex for organizing, indexing, and efficiently searching through large, unstructured document sets.
-
Model Integration: Practice integrating open-source Large Language Models (LLMs) into the RAG workflow.
-
Practical Application: Build a functional, simple chatbot that uses the RAG pipeline to retrieve and synthesize relevant data for accurate, context-aware responses.
Focus: Establishing basic interaction, initial RAG logic, and environment setup.
-
Simple Chatbot with LlamaIndex CoLab Notebook
-
Core Logic: Configuring the LLM engine (LlamaIndex + Gemini) for basic chat loops.
-
Review Demo: WATCH HERE
-
-
Build And Optimize A RAG Pipeline For Document Retrieval
-
Core Logic: Moving from simple chat to basic document-grounded answers using local data.
-
Review Data: HERE
-
Focus: Optimizing how data is processed, stored, and retrieved.
-
RAG Optimization: Implementing Chunking & Embeddings in LlamaIndex and Gemini
- Core Logic: Evaluating how various chunk sizes and embedding strategies impact accuracy.
-
Comparing Open-Source Embedding Models for RAG
-
Core Logic: Systematic comparison of open-source vs. proprietary embeddings for domain-specific tasks.
-
Review Data: HERE
-
-
Advanced PDF Retrieval and Optimization with LlamaIndex
-
Core Logic: Implementing query expansion and hybrid search for complex PDF layouts.
-
Review Data: HERE
-
-
Optimized RAG Pipeline with Interactive RAG Chatbot for Document Retrieval
-
Core Logic: Integrating PyMuPDF and HuggingFace embeddings for high-speed retrieval.
-
Review Data: HERE
-
Review Demo: WATCH HERE
-
Focus: Managing multi-document "blobs," metadata, and local open-source deployment.
-
RAG with Open-Source Model: Mistral 7B
-
Core Logic: Transitioning to self-contained, local GGUF models on GPU to ensure data privacy.
-
Review Data: HERE
-
-
-
Core Logic: Comparative accuracy testing across Gemini, Mistral, Phi-2, and TinyLlama.
-
Review Data: HERE
-
-
Designing A Page-Level Detection Strategy Using RAG
-
Core Logic: Developing the logic to separate and classify different documents inside a single PDF.
-
Review Data: HERE
-
-
Tagging Chunks with Metadata in LlamaIndex
-
Core Logic: Enhancing retrieval precision through document attributes (page number, doc type).
-
Review Data: HERE
-
-
-
Core Logic: Building a "Router" to automatically direct questions to specific document types.
-
Review Data:
-
-
End-to-End RAG Pipeline with Page-Level
-
Core Logic: Finalizing the back-end architecture for multi-document stream processing.
-
Review Data: HERE
-
Focus: Creating user-friendly interfaces for research and deployment.
-
- Core Logic: Learning to map Python functions to web-based UI components.
-
Gradio Chatbot with Lite RAG Implementation
- Core Logic: Creating a minimalist workspace for high-speed indexing and keyword-based retrieval.
Focus: Integrating all modules into a unified, enterprise-grade platform.
-
Full RAG Pipeline with Interactive Gradio Chatbot
-
View Presentation: HERE
-
Core Logic: Merging the high-performance retrieval back-end with the interactive Gradio front-end.
-
-
POC - AI-Powered Document Automation Platform
-
A complete system featuring parallel ingestion, computer vision preprocessing, and semantic routing for complex document portfolios.
-
POC Presentation: HERE
-
View Web-based POC: HERE
-
To see additional enhancements and final deliverable for AI-Powered Document Intelligence Automation Platform, view repo: AI-Powered Document Automation Platform
-
-
Orchestration: LlamaIndex
-
Models: Google Gemini, Mistral 7B (Local), Phi-2, TinyLlama
-
Vector DB/Indices: FAISS, VectorStoreIndex
-
Embeddings: BGE, HuggingFace Transformers
-
UI/UX: Gradio 5.x
-
Processing: PyMuPDF, OpenCV, Multithreading
| Feature | Technical Solution | Benefit |
|---|---|---|
| Context Isolation | Semantic Routing via Gemini | Prevents "Context Contamination" from irrelevant docs. |
| High-Speed Ingestion | Parallel ThreadPool Processing | 5x faster parsing of 100+ page documents. |
| Source Trust | Metadata Tagging & Citations | Real-time source badges for every AI response. |
| Open-Source Ready | GGUF & LlamaCPP Integration | Zero-cost, private deployment options. |