NVIDIA Stack Tutor — Workshop Guide

A hands-on learning project that builds a working agentic RAG chat app on the NVIDIA AI stack, phase by phase. Each phase ends with a working checkpoint. By the end of Phase 7 you have a complete, locally-hosted application running against NVIDIA's hosted endpoints. Phases 8–11 are stretch goals.

The corpus you ingest is the NVIDIA developer docs themselves — so by the end you have an agent, built on NIM, that knows the NVIDIA stack and can explain it.

No local GPU required for Phases 0–9. All inference runs against hosted endpoints. Total cost for a working build is typically <$20 including pay-as-you-go after free credits.

What you'll build

A chat application that:

Accepts natural-language questions about the NVIDIA stack through a Next.js chat UI
Streams responses token-by-token from a hosted NIM chat model
Retrieves relevant chunks from a local Chroma store of NVIDIA documentation, embedded with llama-3.2-nv-embedqa-1b-v2 and reranked with llama-3.2-nv-rerankqa-1b-v2
Enforces topic scope with NeMo Guardrails (Colang 2.x) — a two-tier input rail (deterministic keyword regex + LLM-backed semantic intent) and a self_check_output policy gate
Runs a multi-turn agent loop with tool use (local RAG, optional web search)
Optionally exposes a second agent backend powered by NeMo Agent Toolkit, selectable from the same UI via a dropdown, backed by the same Chroma corpus via a NAT plugin

Prerequisites

Tool	Install
Python 3.12	via uv — `curl -LsSf https://astral.sh/uv/install.sh \| sh`
Node 22+ and `pnpm`	`brew install node pnpm` (or equivalent)
Xcode CLT (macOS only)	`xcode-select --install`
Docker Desktop	Only for Phase 9 (Phoenix) and Phase 10 (self-host NIM)

Accounts:

NVIDIA Developer Program — required. Sign in at build.nvidia.com, open any model card, "Get API Key". Keys start with nvapi-.
Tavily — optional, for the web-search fallback tool. Free tier ~1,000 searches/month at tavily.com.

How to use this repo

Clone this repo.
Copy .env.example to .env and fill in your NVIDIA_API_KEY.
Follow the phases in order. Each phase doc is self-contained: a goal, step-by-step instructions with code blocks, and a checkpoint to prove it works.
Write the code yourself. If you get stuck, check your work against the matching file in solutions/.

The backend/ and frontend/ directories start nearly empty — you scaffold them yourself following the phase instructions. The only pre-populated content is the corpus fetch script.

Phases

Core (Phases 0–7) — a working end-to-end agent

Phase	Focus	What you build
Phase 0	Accounts & first API call	Prove the laptop can talk to NIM
Phase 1	Project skeleton	`backend/` (uv) + `frontend/` (Next.js)
Phase 2	Chroma vector store	Persistent embedded client, `store.py`
Phase 3	Ingest a corpus	`llm.py` (NIM client) + `ingest.py` (chunk → embed → upsert)
Phase 4	Retrieval pipeline	`retrieval.py` (embed query → ANN search → rerank → top-N)
Phase 5	Hand-rolled agent loop	`agent.py` (multi-turn tool use) + `api.py` (FastAPI)
Phase 6	NeMo Guardrails	Colang 2.x, rails-as-gates, two-tier topic check
Phase 7	Next.js chat UI	`page.tsx` + `chat-client.ts` → streaming chat

Stretch (Phases 8–11)

Phase	Focus	What you build
Phase 8	NeMo Agent Toolkit	NAT plugin wrapping Chroma, `/chat-nat` proxy, UI dropdown
Phase 9	Phoenix observability	OTLP traces with OpenInference LLM-semantic spans
Phase 10	Self-host NIM	NIM container on a rented GPU VM
Phase 11	PersonaPlex voice	Full-duplex speech-to-speech interface

Architecture

Browser :3000
  └─ Next.js UI with backend dropdown
        │ HTTP/SSE
        ▼
FastAPI :8000
  ├─ POST /chat      → input rail → hand-rolled agent → tools → output rail
  ├─ POST /chat-nat  → input rail → proxy to nat serve → output rail
  ├─ agent.py        hand-rolled tool-use loop (max 6 steps)
  ├─ runtime.py      check_input_rail + check_output_rail (Colang 2.x)
  ├─ nat_plugin.py   search_nvidia_docs NAT tool wrapping retrieve()
  └─ Chroma (embedded) ./chroma_db/

nat serve :8001 (Phase 8)
  ├─ tool_calling_agent (Llama 3.3 70B)
  └─ tools: search_nvidia_docs (plugin), tavily_internet_search, current_datetime

Phoenix :6006 (Phase 9, optional)  ◄─── OTLP ─── FastAPI + NAT

NVIDIA hosted endpoints
  ├─ integrate.api.nvidia.com/v1  chat + embeddings
  └─ ai.api.nvidia.com/v1         reranking

Repo layout

nvidia-stack-tutor-guide/
├── README.md                     # this file
├── .env.example
├── .gitignore
├── backend/
│   └── data/
│       └── corpus/
│           └── fetch-corpus.sh   # curl commands for starter docs
├── frontend/
│   └── .gitkeep                  # empty — learner scaffolds it
├── phases/                       # one focused doc per phase
│   ├── phase-0-hello-nim.md
│   ├── phase-1-skeleton.md
│   ├── …
│   └── phase-11-personaplex.md
└── solutions/                    # complete solution files per phase
    ├── README.md
    ├── phase-2-store.py
    ├── phase-3-llm.py
    └── …

Companion reference repo

The complete working application (all phases committed, ready to run) lives in the companion repo: nvidia-stack-tutor. Clone that if you want to skip to a running app; use this repo if you want to learn by building.

Cost notes

Free tier: 1,000 inference credits on a new NVIDIA Developer Program account.
Phases 0–7: ~$5–10 per week of active iteration after free credits.
GPU rental for Phases 10–11: budget 6 hours of Lambda Labs H100 (~$20).

License

The guide and solution code are provided as learning material. The seed corpus under backend/data/corpus/ is Apache 2.0 content from the NVIDIA-AI-Blueprints and NVIDIA-NeMo GitHub organisations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA Stack Tutor — Workshop Guide

What you'll build

Prerequisites

How to use this repo

Phases

Core (Phases 0–7) — a working end-to-end agent

Stretch (Phases 8–11)

Architecture

Repo layout

Companion reference repo

Cost notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend/data/corpus		backend/data/corpus
frontend		frontend
phases		phases
solutions		solutions
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Stack Tutor — Workshop Guide

What you'll build

Prerequisites

How to use this repo

Phases

Core (Phases 0–7) — a working end-to-end agent

Stretch (Phases 8–11)

Architecture

Repo layout

Companion reference repo

Cost notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages