The living reference for production AI systems. Continuously updated. Interview-ready depth.
| I want to... | Start here |
|---|---|
| Prepare for interviews | Question Bank → Answer Frameworks |
| Learn AI systems fast | LLM Internals → RAG Fundamentals |
| Build production RAG | Chunking → Vector DBs → Reranking → Production RAG |
| Advanced retrieval | Contextual Retrieval → ColBERT → Multi-modal RAG |
| Design multi-tenant AI | Isolation Patterns → Case Study |
| Build agents | Agent Fundamentals → MCP & A2A → LangGraph |
| Tool-use & computer agents | Landscape → OpenClaw → Safety |
| Autonomous coding agents | Claude Code → OpenCoder Landscape |
| Pick the right model (2026) | Model Taxonomy → Pricing |
| Evaluate AI in production | AI Evals Guide (Phoenix/Langfuse) → AI Evals Guide (LangWatch/Langfuse) |
| Find the best courses to learn AI | Recommended Courses & Learning Paths |
| Transition from my current role to AI | Role Transition Guide |
Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.
| This Guide | Printed Books |
|---|---|
| April 2026 models (Claude Opus 4.6, GPT-5.4, Gemini 3.1, Llama 4, Grok 4) | Stuck on GPT-4 |
| MCP 2.0, A2A protocol, OpenClaw, Computer Use, Agentic RAG, ColBERT | Does not exist |
| Real pricing with verification dates | Already wrong |
| Staff-level interview Q&A | Generic questions |
├── 00-interview-prep/ # Questions, frameworks, exercises
├── 01-foundations/ # Transformers, attention, embeddings
├── 02-model-landscape/ # Claude Opus 4.6, GPT-5.4, Gemini 3.1, Llama 4, Grok 4
├── 03-training-and-adaptation/ # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/ # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/ # CoT, Extended Thinking, DSPy, prompt injection
├── 06-retrieval-systems/ # RAG, chunking, GraphRAG, Agentic RAG, ColBERT, Contextual Retrieval
├── 07-agentic-systems/ # MCP 2.0, A2A protocol, multi-agent, computer-use
├── 08-memory-and-state/ # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/ # LangGraph, DSPy, LlamaIndex, Claude Code, OpenCoder
├── 10-document-processing/ # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/ # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/ # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/ # Pattern catalog, anti-patterns
├── 17-tool-use-and-computer-agents/ # OpenClaw, Computer Use, tool agents, safety
├── 16-case-studies/ # Real-world architectures with diagrams
├── GLOSSARY.md # Every term defined
│
├── ai_evals_comprehensive_study_guide.md # 🔬 Deep-dive: AI Evals (Phoenix + Langfuse)
└── ai_evals_complete_guide_langwatch_langfuse.md # 🔬 Deep-dive: AI Evals (LangWatch + Langfuse)
└── COURSES.md # 🎓 Recommended courses & learning paths
└── TRANSITION_GUIDE.md # 🔄 Transition from Backend/QA/PM/EM to AI roles
Real interview problems with complete solutions and diagrams:
| Case Study | Problem | Key Patterns |
|---|---|---|
| Real-Time Search | 5-minute data freshness at scale | Streaming + Hybrid Search |
| Coding Agent | Autonomous multi-file changes | Sandboxing + Self-Correction |
| Multi-Tenant SaaS | Coca-Cola and Pepsi on same infra | Defense-in-Depth Isolation |
| Customer Support | 60% auto-resolution rate | Tiered Routing + Escalation |
| Document Intelligence | 50K contracts/month extraction | Vision-LLM + Parallel Extractors |
| Recommendation Engine | Personalized explanations at 50M users | ML Ranking + LLM Explanations |
| Compliance Automation | FDA regulation pre-screening | Claim Extraction + Precedent DB |
| Voice Healthcare | Real-time clinical note generation | On-Prem ASR + HIPAA |
| Fraud Detection | 100ms decision with explainability | ML + Rules Hybrid |
| Knowledge Management | 2M docs with access control | Permission-Aware RAG |
Two companion guides (3,000+ lines each) covering AI evaluation end-to-end — for Engineers, PMs, and QAs:
| Guide | Platforms Covered | What's Inside |
|---|---|---|
| AI Evals: Comprehensive Study Guide | Arize Phoenix + Langfuse | LLM-as-a-Judge, RAG eval, multi-turn eval, production safety, statistical correction with judgy, 30-day learning path |
| AI Evals: LangWatch + Langfuse Guide | LangWatch + Langfuse | Same syllabus with LangWatch's 40+ built-in evaluators, side-by-side platform comparisons, platform choice guidance |
Topics covered across both guides:
- Tracing and observability setup (Phoenix, LangWatch, Langfuse)
- Error analysis: open coding → axial coding → failure mode taxonomy
- Building LLM judges with Train/Dev/Test split and ground truth calibration
- Code-based evaluators (regex, JSON schema, format validators)
- RAG-specific evals: faithfulness, context recall, answer relevance
- Multi-step pipeline evaluation and multi-turn conversation eval
- Production guardrails, safety monitoring, real-time drift detection
- Statistical correction with
judgylibrary - Human annotation best practices and inter-rater reliability
- Cost/latency optimization for eval pipelines at scale
AI system design interviews ask questions like:
"Design a multi-tenant RAG system where competitors cannot see each other's data."
"Your agent takes 15 steps for a 3-step task. How do you debug it?"
This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.
➡️ Start with Interview Prep
This guide tracks:
- New model releases and real-world performance
- Emerging patterns (MCP, Agentic RAG, Flow Engineering)
- Updated pricing and rate limits
- Deprecations and best practice changes
⭐ Star and Watch to get notified when updates are pushed.
Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.
MIT License. See LICENSE.
Built by Om Bharatiya
Last updated: April 2026