Skip to content

robcost/nvidia-stack-tutor-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA Stack Tutor — Workshop Guide

A hands-on learning project that builds a working agentic RAG chat app on the NVIDIA AI stack, phase by phase. Each phase ends with a working checkpoint. By the end of Phase 7 you have a complete, locally-hosted application running against NVIDIA's hosted endpoints. Phases 8–11 are stretch goals.

The corpus you ingest is the NVIDIA developer docs themselves — so by the end you have an agent, built on NIM, that knows the NVIDIA stack and can explain it.

No local GPU required for Phases 0–9. All inference runs against hosted endpoints. Total cost for a working build is typically <$20 including pay-as-you-go after free credits.

What you'll build

A chat application that:

  • Accepts natural-language questions about the NVIDIA stack through a Next.js chat UI
  • Streams responses token-by-token from a hosted NIM chat model
  • Retrieves relevant chunks from a local Chroma store of NVIDIA documentation, embedded with llama-3.2-nv-embedqa-1b-v2 and reranked with llama-3.2-nv-rerankqa-1b-v2
  • Enforces topic scope with NeMo Guardrails (Colang 2.x) — a two-tier input rail (deterministic keyword regex + LLM-backed semantic intent) and a self_check_output policy gate
  • Runs a multi-turn agent loop with tool use (local RAG, optional web search)
  • Optionally exposes a second agent backend powered by NeMo Agent Toolkit, selectable from the same UI via a dropdown, backed by the same Chroma corpus via a NAT plugin

Prerequisites

Tool Install
Python 3.12 via uvcurl -LsSf https://astral.sh/uv/install.sh | sh
Node 22+ and pnpm brew install node pnpm (or equivalent)
Xcode CLT (macOS only) xcode-select --install
Docker Desktop Only for Phase 9 (Phoenix) and Phase 10 (self-host NIM)

Accounts:

  • NVIDIA Developer Program — required. Sign in at build.nvidia.com, open any model card, "Get API Key". Keys start with nvapi-.
  • Tavily — optional, for the web-search fallback tool. Free tier ~1,000 searches/month at tavily.com.

How to use this repo

  1. Clone this repo.
  2. Copy .env.example to .env and fill in your NVIDIA_API_KEY.
  3. Follow the phases in order. Each phase doc is self-contained: a goal, step-by-step instructions with code blocks, and a checkpoint to prove it works.
  4. Write the code yourself. If you get stuck, check your work against the matching file in solutions/.

The backend/ and frontend/ directories start nearly empty — you scaffold them yourself following the phase instructions. The only pre-populated content is the corpus fetch script.

Phases

Core (Phases 0–7) — a working end-to-end agent

Phase Focus What you build
Phase 0 Accounts & first API call Prove the laptop can talk to NIM
Phase 1 Project skeleton backend/ (uv) + frontend/ (Next.js)
Phase 2 Chroma vector store Persistent embedded client, store.py
Phase 3 Ingest a corpus llm.py (NIM client) + ingest.py (chunk → embed → upsert)
Phase 4 Retrieval pipeline retrieval.py (embed query → ANN search → rerank → top-N)
Phase 5 Hand-rolled agent loop agent.py (multi-turn tool use) + api.py (FastAPI)
Phase 6 NeMo Guardrails Colang 2.x, rails-as-gates, two-tier topic check
Phase 7 Next.js chat UI page.tsx + chat-client.ts → streaming chat

Stretch (Phases 8–11)

Phase Focus What you build
Phase 8 NeMo Agent Toolkit NAT plugin wrapping Chroma, /chat-nat proxy, UI dropdown
Phase 9 Phoenix observability OTLP traces with OpenInference LLM-semantic spans
Phase 10 Self-host NIM NIM container on a rented GPU VM
Phase 11 PersonaPlex voice Full-duplex speech-to-speech interface

Architecture

Browser :3000
  └─ Next.js UI with backend dropdown
        │ HTTP/SSE
        ▼
FastAPI :8000
  ├─ POST /chat      → input rail → hand-rolled agent → tools → output rail
  ├─ POST /chat-nat  → input rail → proxy to nat serve → output rail
  ├─ agent.py        hand-rolled tool-use loop (max 6 steps)
  ├─ runtime.py      check_input_rail + check_output_rail (Colang 2.x)
  ├─ nat_plugin.py   search_nvidia_docs NAT tool wrapping retrieve()
  └─ Chroma (embedded) ./chroma_db/

nat serve :8001 (Phase 8)
  ├─ tool_calling_agent (Llama 3.3 70B)
  └─ tools: search_nvidia_docs (plugin), tavily_internet_search, current_datetime

Phoenix :6006 (Phase 9, optional)  ◄─── OTLP ─── FastAPI + NAT

NVIDIA hosted endpoints
  ├─ integrate.api.nvidia.com/v1  chat + embeddings
  └─ ai.api.nvidia.com/v1         reranking

Repo layout

nvidia-stack-tutor-guide/
├── README.md                     # this file
├── .env.example
├── .gitignore
├── backend/
│   └── data/
│       └── corpus/
│           └── fetch-corpus.sh   # curl commands for starter docs
├── frontend/
│   └── .gitkeep                  # empty — learner scaffolds it
├── phases/                       # one focused doc per phase
│   ├── phase-0-hello-nim.md
│   ├── phase-1-skeleton.md
│   ├── …
│   └── phase-11-personaplex.md
└── solutions/                    # complete solution files per phase
    ├── README.md
    ├── phase-2-store.py
    ├── phase-3-llm.py
    └── …

Companion reference repo

The complete working application (all phases committed, ready to run) lives in the companion repo: nvidia-stack-tutor. Clone that if you want to skip to a running app; use this repo if you want to learn by building.

Cost notes

  • Free tier: 1,000 inference credits on a new NVIDIA Developer Program account.
  • Phases 0–7: ~$5–10 per week of active iteration after free credits.
  • GPU rental for Phases 10–11: budget 6 hours of Lambda Labs H100 (~$20).

License

The guide and solution code are provided as learning material. The seed corpus under backend/data/corpus/ is Apache 2.0 content from the NVIDIA-AI-Blueprints and NVIDIA-NeMo GitHub organisations.

About

Workshop guide for building an agentic RAG chat app on the NVIDIA AI stack — phase-by-phase instructions, solution files, and checkpoints. Companion to nvidia-stack-tutor.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors