Skip to content

ombharatiya/ai-system-design-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 AI System Design Guide

The Complete Interview & Production Reference

GitHub Twitter LinkedIn

Last Updated License PRs Welcome Stars

The living reference for production AI systems. Continuously updated. Interview-ready depth.


📚 Quick Navigation

I want to... Start here
Prepare for interviews Question BankAnswer Frameworks
Learn AI systems fast LLM InternalsRAG Fundamentals
Build production RAG ChunkingVector DBsRerankingProduction RAG
Advanced retrieval Contextual RetrievalColBERTMulti-modal RAG
Design multi-tenant AI Isolation PatternsCase Study
Build agents Agent FundamentalsMCP & A2ALangGraph
Tool-use & computer agents LandscapeOpenClawSafety
Autonomous coding agents Claude CodeOpenCoder Landscape
Pick the right model (2026) Model TaxonomyPricing
Evaluate AI in production AI Evals Guide (Phoenix/Langfuse)AI Evals Guide (LangWatch/Langfuse)
Find the best courses to learn AI Recommended Courses & Learning Paths
Transition from my current role to AI Role Transition Guide

🎯 Why This Guide

Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.

This Guide Printed Books
April 2026 models (Claude Opus 4.6, GPT-5.4, Gemini 3.1, Llama 4, Grok 4) Stuck on GPT-4
MCP 2.0, A2A protocol, OpenClaw, Computer Use, Agentic RAG, ColBERT Does not exist
Real pricing with verification dates Already wrong
Staff-level interview Q&A Generic questions

📖 Guide Structure

├── 00-interview-prep/           # Questions, frameworks, exercises
├── 01-foundations/              # Transformers, attention, embeddings
├── 02-model-landscape/          # Claude Opus 4.6, GPT-5.4, Gemini 3.1, Llama 4, Grok 4
├── 03-training-and-adaptation/  # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/   # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/    # CoT, Extended Thinking, DSPy, prompt injection
├── 06-retrieval-systems/        # RAG, chunking, GraphRAG, Agentic RAG, ColBERT, Contextual Retrieval
├── 07-agentic-systems/          # MCP 2.0, A2A protocol, multi-agent, computer-use
├── 08-memory-and-state/         # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/     # LangGraph, DSPy, LlamaIndex, Claude Code, OpenCoder
├── 10-document-processing/      # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/      # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/   # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/       # Pattern catalog, anti-patterns
├── 17-tool-use-and-computer-agents/ # OpenClaw, Computer Use, tool agents, safety
├── 16-case-studies/             # Real-world architectures with diagrams
├── GLOSSARY.md                  # Every term defined
│
├── ai_evals_comprehensive_study_guide.md      # 🔬 Deep-dive: AI Evals (Phoenix + Langfuse)
└── ai_evals_complete_guide_langwatch_langfuse.md  # 🔬 Deep-dive: AI Evals (LangWatch + Langfuse)
└── COURSES.md                   # 🎓 Recommended courses & learning paths
└── TRANSITION_GUIDE.md          # 🔄 Transition from Backend/QA/PM/EM to AI roles

🔥 Featured Case Studies

Real interview problems with complete solutions and diagrams:

Case Study Problem Key Patterns
Real-Time Search 5-minute data freshness at scale Streaming + Hybrid Search
Coding Agent Autonomous multi-file changes Sandboxing + Self-Correction
Multi-Tenant SaaS Coca-Cola and Pepsi on same infra Defense-in-Depth Isolation
Customer Support 60% auto-resolution rate Tiered Routing + Escalation
Document Intelligence 50K contracts/month extraction Vision-LLM + Parallel Extractors
Recommendation Engine Personalized explanations at 50M users ML Ranking + LLM Explanations
Compliance Automation FDA regulation pre-screening Claim Extraction + Precedent DB
Voice Healthcare Real-time clinical note generation On-Prem ASR + HIPAA
Fraud Detection 100ms decision with explainability ML + Rules Hybrid
Knowledge Management 2M docs with access control Permission-Aware RAG

🔬 Bonus Deep-Dive Guides

Two companion guides (3,000+ lines each) covering AI evaluation end-to-end — for Engineers, PMs, and QAs:

Guide Platforms Covered What's Inside
AI Evals: Comprehensive Study Guide Arize Phoenix + Langfuse LLM-as-a-Judge, RAG eval, multi-turn eval, production safety, statistical correction with judgy, 30-day learning path
AI Evals: LangWatch + Langfuse Guide LangWatch + Langfuse Same syllabus with LangWatch's 40+ built-in evaluators, side-by-side platform comparisons, platform choice guidance

Topics covered across both guides:

  • Tracing and observability setup (Phoenix, LangWatch, Langfuse)
  • Error analysis: open coding → axial coding → failure mode taxonomy
  • Building LLM judges with Train/Dev/Test split and ground truth calibration
  • Code-based evaluators (regex, JSON schema, format validators)
  • RAG-specific evals: faithfulness, context recall, answer relevance
  • Multi-step pipeline evaluation and multi-turn conversation eval
  • Production guardrails, safety monitoring, real-time drift detection
  • Statistical correction with judgy library
  • Human annotation best practices and inter-rater reliability
  • Cost/latency optimization for eval pipelines at scale

🎓 For Interview Prep

AI system design interviews ask questions like:

"Design a multi-tenant RAG system where competitors cannot see each other's data."

"Your agent takes 15 steps for a 3-step task. How do you debug it?"

This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.

➡️ Start with Interview Prep


🔄 Living Book

This guide tracks:

  • New model releases and real-world performance
  • Emerging patterns (MCP, Agentic RAG, Flow Engineering)
  • Updated pricing and rate limits
  • Deprecations and best practice changes

⭐ Star and Watch to get notified when updates are pushed.


🤝 Contributing

Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.


📄 License

MIT License. See LICENSE.


Built by Om Bharatiya
GitHub Twitter LinkedIn

Last updated: April 2026