Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@
"ORACLE_DSN": "localhost:1521/FREEPDB1",
"ORACLE_USER": "VECTOR",
"ORACLE_PASSWORD": "VectorPwd_2025",
"LANGSMITH_TRACING": "true",
"LANGSMITH_PROJECT": "agent-memory-workshop",
"LANGSMITH_API_KEY": "${localEnv:LANGSMITH_API_KEY}",
"TAVILY_API_KEY": "${localEnv:TAVILY_API_KEY}",
"OCI_GENAI_API_KEY": "${localEnv:OCI_GENAI_API_KEY}",
"OCI_GENAI_ENDPOINT": "${localEnv:OCI_GENAI_ENDPOINT}"
Expand Down
3 changes: 2 additions & 1 deletion workshops/agent_memory_workshop/.devcontainer/setup_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ pip install -q --no-cache-dir \
ipywidgets \
matplotlib \
tiktoken \
pydantic
pydantic \
langsmith

echo ""
echo "[2/2] Registering Jupyter kernel..."
Expand Down
12 changes: 10 additions & 2 deletions workshops/agent_memory_workshop/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ A **Research Paper Assistant** — an AI agent that searches, retrieves, and rea
| 4 | Context engineering: summarisation and offloading | [Part 4 Guide](docs/part-4-context-engineering.md) |
| 5 | Web access with Tavily | [Part 5 Guide](docs/part-5-web-search.md) |
| 6 | Agent execution and memory vs no-memory comparison | [Part 6 Guide](docs/part-6-agent-execution.md) |
| 7 | Agent observability with LangSmith | [Part 7 Guide](docs/part-7-observability.md) |

> **[TODO Checklist](docs/TODO-checklist.md)** — all 16 tasks at a glance with links to their guide sections.
> **[TODO Checklist](docs/TODO-checklist.md)** — all 19 tasks at a glance with links to their guide sections.

## Getting Started

Expand All @@ -41,6 +42,11 @@ cd workshops/agent_memory_workshop
# Start Oracle AI Database
docker compose -f .devcontainer/docker-compose.yml up -d oracle

# Optional for Part 7: export your LangSmith key
export LANGSMITH_API_KEY="lsv2_..."
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=agent-memory-workshop

# Install dependencies
pip install -r requirements.txt

Expand Down Expand Up @@ -74,7 +80,8 @@ agent-memory-workshop/
│ ├── part-4-context-engineering.md
│ ├── part-5-web-search.md
│ ├── part-6-agent-execution.md
│ ├── TODO-checklist.md All 16 tasks at a glance
│ ├── part-7-observability.md
│ ├── TODO-checklist.md All 19 tasks at a glance
│ └── troubleshooting.md Common issues and solutions
├── images/ Screenshots and architecture diagrams
└── README.md
Expand All @@ -88,6 +95,7 @@ agent-memory-workshop/
- `openai` — OCI GenAI (xAI Grok 3 Fast) via OpenAI-compatible endpoint
- `tavily-python` — web search for agents
- `oracledb` — Python Oracle driver
- LangSmith — agent trace collection and inspection for Part 7

## Where to Next?

Expand Down
8 changes: 7 additions & 1 deletion workshops/agent_memory_workshop/docs/TODO-checklist.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Workshop TODO Checklist

16 hands-on tasks across Parts 2–6. Complete them in order — each builds on the last.
19 hands-on tasks across Parts 2–7. Complete them in order — each builds on the last.

Part 1 (Oracle setup) is pre-built — just run the cells to connect.

Expand Down Expand Up @@ -36,3 +36,9 @@ Part 1 (Oracle setup) is pre-built — just run the cells to connect.

15. Assemble `build_context()` from all memory types (TODO 15)
16. Run 5 test questions before memory recall (TODO 16)

### Part 7 — Agent Observability ([Guide](part-7-observability.md))

17. Configure LangSmith tracing (TODO 17)
18. Create `call_agent_observed()` with trace runs around the agent loop (TODO 18)
19. Run observed turns and inspect the trace in LangSmith (TODO 19)
132 changes: 132 additions & 0 deletions workshops/agent_memory_workshop/docs/observability-tool-comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Part 7 Observability Tool Comparison

## Recommendation

Use **LangSmith as the Part 7 trace backend**.

This workshop now traces the existing memory-aware agent with LangSmith manual runs. That is the best fit for the requested revision because Part 7 is focused on AI agent observability, and LangSmith provides a purpose-built trace view for agent steps, LLM calls, tool calls, and metadata without adding a local trace container or collector to the workshop.

The tradeoff is that LangSmith requires a hosted account and API key. For this lab, that is acceptable because it removes the local observability service and gives learners an AI-native trace UI. Oracle AI Database remains the durable memory and vector search system of record; LangSmith only shows what happened during execution.

## Selection Criteria

The backend should optimize for this workshop, not for general production observability.

Scoring scale:

- `5`: excellent fit
- `3`: workable with tradeoffs
- `1`: poor fit for this lab

| Option | AI agent fit | Setup burden | Trace teaching value | Local service burden | Privacy control | Overall |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| LangSmith | 5 | 4 | 5 | 5 | 4 | 24 |
| Arize Phoenix | 5 | 3 | 5 | 3 | 4 | 20 |
| Langfuse | 5 | 2 | 4 | 2 | 4 | 18 |
| SigNoz | 4 | 2 | 4 | 2 | 4 | 18 |
| OpenLIT | 5 | 3 | 3 | 3 | 3 | 19 |

## Candidate Notes

### LangSmith

LangSmith is the selected backend.

Why it fits:

- It is designed for LLM and agent traces.
- It integrates naturally with LangChain-oriented workflows.
- It supports manual tracing, so the lab can show the exact Part 6 agent architecture instead of hiding it behind automatic instrumentation.
- It avoids running a local trace container beside Oracle Database in Codespaces.
- It gives learners a clear project-based UI for inspecting `agent.run` and child runs.

Tradeoffs:

- It requires a LangSmith API key.
- It sends trace metadata to the LangSmith service.
- The lab must be explicit about privacy and avoid recording full prompts, retrieved documents, API keys, raw Tavily output, or database connection strings.

Best Part 7 use:

- Set `LANGSMITH_TRACING=true`, `LANGSMITH_PROJECT=agent-memory-workshop`, and `LANGSMITH_API_KEY`.
- Use manual LangSmith trace runs for `agent.run`, memory reads, tool selection, LLM calls, tool execution, context checks, and memory writes.
- Record lengths, counts, model names, tool names, memory types, and status only.

### Arize Phoenix

Phoenix remains a strong AI-native alternative if the workshop needs local-first tracing plus evaluation.

Why it fits:

- Phoenix has LLM tracing, sessions, projects, evaluation, datasets, and prompts in the product surface.
- It aligns well with OpenInference and LangChain-style instrumentation.
- It gives a future path to evaluation workflows.

Tradeoffs:

- It would require explaining Phoenix/OpenInference concepts in addition to agent memory concepts.
- The local service path adds more setup than LangSmith for this requested revision.

### Langfuse

Langfuse is a strong LLM observability product, but its self-hosted path is broader than this lab needs.

Why it fits:

- It supports LLM traces, prompt management, evaluations, datasets, sessions, and metadata.
- It can support OpenTelemetry-oriented workflows.

Tradeoffs:

- Its self-hosted footprint is likely heavier than this workshop needs.
- Its product surface may distract from the narrower Part 7 goal: inspect one memory-aware agent run.

### SigNoz

SigNoz is a strong observability platform, but it is platform-shaped rather than agent-workshop-shaped.

Why it fits:

- It is credible for traces, metrics, logs, dashboards, and production observability.
- It has LLM observability documentation and integrations.

Tradeoffs:

- The local stack is more than the workshop needs.
- It introduces APM concepts that are not required for this lab.

### OpenLIT

OpenLIT is attractive as an AI observability and automatic instrumentation layer.

Why it fits:

- It focuses on OpenTelemetry-native AI instrumentation.
- It can instrument LLMs, agents, frameworks, vector databases, MCP, and GPUs.

Tradeoffs:

- Automatic instrumentation can obscure the manual span/run structure learners need to understand first.
- It has more product surface than Part 7 needs.

## Privacy Defaults

Part 7 should default to safe telemetry:

- Do not capture full prompts.
- Do not capture full responses.
- Do not capture retrieved documents.
- Do not capture raw Tavily output.
- Do not capture API keys, database passwords, DSNs with credentials, or environment values.
- Do capture lengths, counts, status, model names, memory types, selected tool names, and errors.

If the notebook includes a prompt/response capture toggle, make it clearly opt-in and label it as unsafe for shared environments.

## Sources

- LangSmith tracing with LangChain: https://docs.smith.langchain.com/observability/how_to_guides/trace_with_langchain
- LangSmith manual instrumentation: https://docs.langchain.com/langsmith/annotate-code
- Arize Phoenix tracing setup: https://arize.com/docs/phoenix/tracing/how-to-tracing/setup-tracing
- OpenLIT overview: https://docs.openlit.io/
- Langfuse OpenTelemetry integration: https://langfuse.com/docs/opentelemetry
- SigNoz LLM observability overview: https://signoz.io/docs/llm-observability/
146 changes: 146 additions & 0 deletions workshops/agent_memory_workshop/docs/part-7-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Part 7: Agent Observability

## Three TODOs in This Part

Part 7 adds LangSmith tracing to the agent you built in Part 6. You will keep the original `call_agent()` function unchanged and create an observed wrapper that sends traces to a LangSmith project.

Before running the Part 7 notebook cells, set a LangSmith API key:

```bash
export LANGSMITH_API_KEY="lsv2_..."
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=agent-memory-workshop
```

Then open your project in LangSmith:

```text
https://smith.langchain.com
```

---

## TODO 17: Configure LangSmith

LangSmith has three moving parts in this lab:

- **Client** - connects the notebook to LangSmith
- **Project** - groups traces for this workshop
- **Trace runs** - records parent and child operations for one agent turn

**Complete solution:**

```python
import os

import langsmith as ls
from langsmith import Client

def configure_agent_observability(
project_name: str = "agent-memory-workshop",
):
os.environ.setdefault("LANGSMITH_TRACING", "true")
os.environ.setdefault("LANGSMITH_PROJECT", project_name)

if not os.environ.get("LANGSMITH_API_KEY"):
raise RuntimeError(
"Set LANGSMITH_API_KEY before running Part 7. "
"Create an API key in LangSmith, then export it in your shell or set it in this notebook."
)

client = Client()
return {"client": client, "project_name": project_name}

observability = configure_agent_observability()
tracer = ls
```

**Privacy default:** This lab records metadata, not content. Trace inputs, outputs, and metadata should include lengths, counts, model names, tool names, memory types, and error status - not full prompts, retrieved documents, API keys, raw tool output, or database connection strings.

---

## TODO 18: `call_agent_observed()`

The original `call_agent()` remains your working agent harness. In Part 7, you create a second function, `call_agent_observed()`, that follows the same flow but wraps each major operation in LangSmith trace runs.

**Trace shape:**

```text
agent.run
├── agent.context.build
│ ├── agent.memory.read conversational
│ ├── agent.memory.read knowledge_base
│ ├── agent.memory.read workflow
│ ├── agent.memory.read entity
│ └── agent.memory.read summary
├── agent.context.check
├── agent.toolbox.read
├── agent.memory.write user_message
├── agent.llm.call
├── agent.tool.execute
├── agent.tool.log
├── agent.memory.write workflow
├── agent.memory.write entity
└── agent.memory.write assistant_message
```

**Important metadata to record:**

| Field | Example | Why it is safe |
|---|---|---|
| `agent.thread_id` | `0022` | Identifier, not content |
| `query.length` | `74` | Length only |
| `context.estimated_tokens` | `1320` | Count only |
| `memory.type` | `knowledge_base` | Category only |
| `memory.result_length` | `540` | Length only |
| `tool.name` | `search_tavily` | Tool name only |
| `tool.result_length` | `1800` | Length only |
| `llm.model` | `xai.grok-3-fast` | Model name only |

**Why manual trace runs first:** LangSmith can trace LangChain applications automatically, but manual runs make this notebook's Part 6 architecture visible. Once you understand that trace, automatic instrumentation is easier to reason about.

---

## TODO 19: Run and Inspect the Trace

Run a short observed conversation using a fresh thread ID:

```python
observed_thread = "observed-0022"

for q in [
"Find papers about memory in AI agents",
"What did we just discuss?",
"Search the web for recent agent observability ideas",
]:
call_agent_observed(q, thread_id=observed_thread, max_iterations=5)
```

Then open LangSmith:

1. Open `https://smith.langchain.com`
2. Select the `agent-memory-workshop` project
3. Open the most recent `agent.run` trace
4. Expand the child runs

You should see where the agent spent time and which operations happened during the turn.

![LangSmith trace for an observed agent run](../images/part7-langsmith-trace.png)

## What to Look For

**Context build runs:** These show which memory systems were read before the LLM call.

**Tool runs:** These show whether the model called Tavily or summary tools.

**Context check runs:** These show estimated context window size without exposing the full prompt.

**Memory write runs:** These show the durable writes that make the next turn memory-aware.

## Key Takeaways

**Observability makes agent behavior inspectable.** The Part 6 chart shows that the memory-aware agent controls context growth. Part 7 shows the operational path behind that chart.

**The trace is not the memory store.** Oracle AI Database still stores the agent's memory. LangSmith shows what happened during execution.

**Safe traces are designed.** A useful trace does not need full prompts or raw tool results. In most labs and production systems, counts, names, durations, statuses, and sanitized IDs are enough to debug the flow.
Loading