diff --git a/workshops/agent_memory_workshop/.devcontainer/devcontainer.json b/workshops/agent_memory_workshop/.devcontainer/devcontainer.json index 3460ef2a..df90b406 100644 --- a/workshops/agent_memory_workshop/.devcontainer/devcontainer.json +++ b/workshops/agent_memory_workshop/.devcontainer/devcontainer.json @@ -42,6 +42,9 @@ "ORACLE_DSN": "localhost:1521/FREEPDB1", "ORACLE_USER": "VECTOR", "ORACLE_PASSWORD": "VectorPwd_2025", + "LANGSMITH_TRACING": "true", + "LANGSMITH_PROJECT": "agent-memory-workshop", + "LANGSMITH_API_KEY": "${localEnv:LANGSMITH_API_KEY}", "TAVILY_API_KEY": "${localEnv:TAVILY_API_KEY}", "OCI_GENAI_API_KEY": "${localEnv:OCI_GENAI_API_KEY}", "OCI_GENAI_ENDPOINT": "${localEnv:OCI_GENAI_ENDPOINT}" diff --git a/workshops/agent_memory_workshop/.devcontainer/setup_build.sh b/workshops/agent_memory_workshop/.devcontainer/setup_build.sh index d168d7cf..88a95a71 100644 --- a/workshops/agent_memory_workshop/.devcontainer/setup_build.sh +++ b/workshops/agent_memory_workshop/.devcontainer/setup_build.sh @@ -27,7 +27,8 @@ pip install -q --no-cache-dir \ ipywidgets \ matplotlib \ tiktoken \ - pydantic + pydantic \ + langsmith echo "" echo "[2/2] Registering Jupyter kernel..." diff --git a/workshops/agent_memory_workshop/README.md b/workshops/agent_memory_workshop/README.md index 1444566d..efd592c2 100644 --- a/workshops/agent_memory_workshop/README.md +++ b/workshops/agent_memory_workshop/README.md @@ -18,8 +18,9 @@ A **Research Paper Assistant** — an AI agent that searches, retrieves, and rea | 4 | Context engineering: summarisation and offloading | [Part 4 Guide](docs/part-4-context-engineering.md) | | 5 | Web access with Tavily | [Part 5 Guide](docs/part-5-web-search.md) | | 6 | Agent execution and memory vs no-memory comparison | [Part 6 Guide](docs/part-6-agent-execution.md) | +| 7 | Agent observability with LangSmith | [Part 7 Guide](docs/part-7-observability.md) | -> **[TODO Checklist](docs/TODO-checklist.md)** — all 16 tasks at a glance with links to their guide sections. +> **[TODO Checklist](docs/TODO-checklist.md)** — all 19 tasks at a glance with links to their guide sections. ## Getting Started @@ -41,6 +42,11 @@ cd workshops/agent_memory_workshop # Start Oracle AI Database docker compose -f .devcontainer/docker-compose.yml up -d oracle +# Optional for Part 7: export your LangSmith key +export LANGSMITH_API_KEY="lsv2_..." +export LANGSMITH_TRACING=true +export LANGSMITH_PROJECT=agent-memory-workshop + # Install dependencies pip install -r requirements.txt @@ -74,7 +80,8 @@ agent-memory-workshop/ │ ├── part-4-context-engineering.md │ ├── part-5-web-search.md │ ├── part-6-agent-execution.md -│ ├── TODO-checklist.md All 16 tasks at a glance +│ ├── part-7-observability.md +│ ├── TODO-checklist.md All 19 tasks at a glance │ └── troubleshooting.md Common issues and solutions ├── images/ Screenshots and architecture diagrams └── README.md @@ -88,6 +95,7 @@ agent-memory-workshop/ - `openai` — OCI GenAI (xAI Grok 3 Fast) via OpenAI-compatible endpoint - `tavily-python` — web search for agents - `oracledb` — Python Oracle driver +- LangSmith — agent trace collection and inspection for Part 7 ## Where to Next? diff --git a/workshops/agent_memory_workshop/docs/TODO-checklist.md b/workshops/agent_memory_workshop/docs/TODO-checklist.md index 7b227d8f..aa0059c3 100644 --- a/workshops/agent_memory_workshop/docs/TODO-checklist.md +++ b/workshops/agent_memory_workshop/docs/TODO-checklist.md @@ -1,6 +1,6 @@ # Workshop TODO Checklist -16 hands-on tasks across Parts 2–6. Complete them in order — each builds on the last. +19 hands-on tasks across Parts 2–7. Complete them in order — each builds on the last. Part 1 (Oracle setup) is pre-built — just run the cells to connect. @@ -36,3 +36,9 @@ Part 1 (Oracle setup) is pre-built — just run the cells to connect. 15. Assemble `build_context()` from all memory types (TODO 15) 16. Run 5 test questions before memory recall (TODO 16) + +### Part 7 — Agent Observability ([Guide](part-7-observability.md)) + +17. Configure LangSmith tracing (TODO 17) +18. Create `call_agent_observed()` with trace runs around the agent loop (TODO 18) +19. Run observed turns and inspect the trace in LangSmith (TODO 19) diff --git a/workshops/agent_memory_workshop/docs/observability-tool-comparison.md b/workshops/agent_memory_workshop/docs/observability-tool-comparison.md new file mode 100644 index 00000000..fa1cbb0d --- /dev/null +++ b/workshops/agent_memory_workshop/docs/observability-tool-comparison.md @@ -0,0 +1,132 @@ +# Part 7 Observability Tool Comparison + +## Recommendation + +Use **LangSmith as the Part 7 trace backend**. + +This workshop now traces the existing memory-aware agent with LangSmith manual runs. That is the best fit for the requested revision because Part 7 is focused on AI agent observability, and LangSmith provides a purpose-built trace view for agent steps, LLM calls, tool calls, and metadata without adding a local trace container or collector to the workshop. + +The tradeoff is that LangSmith requires a hosted account and API key. For this lab, that is acceptable because it removes the local observability service and gives learners an AI-native trace UI. Oracle AI Database remains the durable memory and vector search system of record; LangSmith only shows what happened during execution. + +## Selection Criteria + +The backend should optimize for this workshop, not for general production observability. + +Scoring scale: + +- `5`: excellent fit +- `3`: workable with tradeoffs +- `1`: poor fit for this lab + +| Option | AI agent fit | Setup burden | Trace teaching value | Local service burden | Privacy control | Overall | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | +| LangSmith | 5 | 4 | 5 | 5 | 4 | 24 | +| Arize Phoenix | 5 | 3 | 5 | 3 | 4 | 20 | +| Langfuse | 5 | 2 | 4 | 2 | 4 | 18 | +| SigNoz | 4 | 2 | 4 | 2 | 4 | 18 | +| OpenLIT | 5 | 3 | 3 | 3 | 3 | 19 | + +## Candidate Notes + +### LangSmith + +LangSmith is the selected backend. + +Why it fits: + +- It is designed for LLM and agent traces. +- It integrates naturally with LangChain-oriented workflows. +- It supports manual tracing, so the lab can show the exact Part 6 agent architecture instead of hiding it behind automatic instrumentation. +- It avoids running a local trace container beside Oracle Database in Codespaces. +- It gives learners a clear project-based UI for inspecting `agent.run` and child runs. + +Tradeoffs: + +- It requires a LangSmith API key. +- It sends trace metadata to the LangSmith service. +- The lab must be explicit about privacy and avoid recording full prompts, retrieved documents, API keys, raw Tavily output, or database connection strings. + +Best Part 7 use: + +- Set `LANGSMITH_TRACING=true`, `LANGSMITH_PROJECT=agent-memory-workshop`, and `LANGSMITH_API_KEY`. +- Use manual LangSmith trace runs for `agent.run`, memory reads, tool selection, LLM calls, tool execution, context checks, and memory writes. +- Record lengths, counts, model names, tool names, memory types, and status only. + +### Arize Phoenix + +Phoenix remains a strong AI-native alternative if the workshop needs local-first tracing plus evaluation. + +Why it fits: + +- Phoenix has LLM tracing, sessions, projects, evaluation, datasets, and prompts in the product surface. +- It aligns well with OpenInference and LangChain-style instrumentation. +- It gives a future path to evaluation workflows. + +Tradeoffs: + +- It would require explaining Phoenix/OpenInference concepts in addition to agent memory concepts. +- The local service path adds more setup than LangSmith for this requested revision. + +### Langfuse + +Langfuse is a strong LLM observability product, but its self-hosted path is broader than this lab needs. + +Why it fits: + +- It supports LLM traces, prompt management, evaluations, datasets, sessions, and metadata. +- It can support OpenTelemetry-oriented workflows. + +Tradeoffs: + +- Its self-hosted footprint is likely heavier than this workshop needs. +- Its product surface may distract from the narrower Part 7 goal: inspect one memory-aware agent run. + +### SigNoz + +SigNoz is a strong observability platform, but it is platform-shaped rather than agent-workshop-shaped. + +Why it fits: + +- It is credible for traces, metrics, logs, dashboards, and production observability. +- It has LLM observability documentation and integrations. + +Tradeoffs: + +- The local stack is more than the workshop needs. +- It introduces APM concepts that are not required for this lab. + +### OpenLIT + +OpenLIT is attractive as an AI observability and automatic instrumentation layer. + +Why it fits: + +- It focuses on OpenTelemetry-native AI instrumentation. +- It can instrument LLMs, agents, frameworks, vector databases, MCP, and GPUs. + +Tradeoffs: + +- Automatic instrumentation can obscure the manual span/run structure learners need to understand first. +- It has more product surface than Part 7 needs. + +## Privacy Defaults + +Part 7 should default to safe telemetry: + +- Do not capture full prompts. +- Do not capture full responses. +- Do not capture retrieved documents. +- Do not capture raw Tavily output. +- Do not capture API keys, database passwords, DSNs with credentials, or environment values. +- Do capture lengths, counts, status, model names, memory types, selected tool names, and errors. + +If the notebook includes a prompt/response capture toggle, make it clearly opt-in and label it as unsafe for shared environments. + +## Sources + +- LangSmith tracing with LangChain: https://docs.smith.langchain.com/observability/how_to_guides/trace_with_langchain +- LangSmith manual instrumentation: https://docs.langchain.com/langsmith/annotate-code +- Arize Phoenix tracing setup: https://arize.com/docs/phoenix/tracing/how-to-tracing/setup-tracing +- OpenLIT overview: https://docs.openlit.io/ +- Langfuse OpenTelemetry integration: https://langfuse.com/docs/opentelemetry +- SigNoz LLM observability overview: https://signoz.io/docs/llm-observability/ diff --git a/workshops/agent_memory_workshop/docs/part-7-observability.md b/workshops/agent_memory_workshop/docs/part-7-observability.md new file mode 100644 index 00000000..78afcb70 --- /dev/null +++ b/workshops/agent_memory_workshop/docs/part-7-observability.md @@ -0,0 +1,146 @@ +# Part 7: Agent Observability + +## Three TODOs in This Part + +Part 7 adds LangSmith tracing to the agent you built in Part 6. You will keep the original `call_agent()` function unchanged and create an observed wrapper that sends traces to a LangSmith project. + +Before running the Part 7 notebook cells, set a LangSmith API key: + +```bash +export LANGSMITH_API_KEY="lsv2_..." +export LANGSMITH_TRACING=true +export LANGSMITH_PROJECT=agent-memory-workshop +``` + +Then open your project in LangSmith: + +```text +https://smith.langchain.com +``` + +--- + +## TODO 17: Configure LangSmith + +LangSmith has three moving parts in this lab: + +- **Client** - connects the notebook to LangSmith +- **Project** - groups traces for this workshop +- **Trace runs** - records parent and child operations for one agent turn + +**Complete solution:** + +```python +import os + +import langsmith as ls +from langsmith import Client + +def configure_agent_observability( + project_name: str = "agent-memory-workshop", +): + os.environ.setdefault("LANGSMITH_TRACING", "true") + os.environ.setdefault("LANGSMITH_PROJECT", project_name) + + if not os.environ.get("LANGSMITH_API_KEY"): + raise RuntimeError( + "Set LANGSMITH_API_KEY before running Part 7. " + "Create an API key in LangSmith, then export it in your shell or set it in this notebook." + ) + + client = Client() + return {"client": client, "project_name": project_name} + +observability = configure_agent_observability() +tracer = ls +``` + +**Privacy default:** This lab records metadata, not content. Trace inputs, outputs, and metadata should include lengths, counts, model names, tool names, memory types, and error status - not full prompts, retrieved documents, API keys, raw tool output, or database connection strings. + +--- + +## TODO 18: `call_agent_observed()` + +The original `call_agent()` remains your working agent harness. In Part 7, you create a second function, `call_agent_observed()`, that follows the same flow but wraps each major operation in LangSmith trace runs. + +**Trace shape:** + +```text +agent.run +├── agent.context.build +│ ├── agent.memory.read conversational +│ ├── agent.memory.read knowledge_base +│ ├── agent.memory.read workflow +│ ├── agent.memory.read entity +│ └── agent.memory.read summary +├── agent.context.check +├── agent.toolbox.read +├── agent.memory.write user_message +├── agent.llm.call +├── agent.tool.execute +├── agent.tool.log +├── agent.memory.write workflow +├── agent.memory.write entity +└── agent.memory.write assistant_message +``` + +**Important metadata to record:** + +| Field | Example | Why it is safe | +|---|---|---| +| `agent.thread_id` | `0022` | Identifier, not content | +| `query.length` | `74` | Length only | +| `context.estimated_tokens` | `1320` | Count only | +| `memory.type` | `knowledge_base` | Category only | +| `memory.result_length` | `540` | Length only | +| `tool.name` | `search_tavily` | Tool name only | +| `tool.result_length` | `1800` | Length only | +| `llm.model` | `xai.grok-3-fast` | Model name only | + +**Why manual trace runs first:** LangSmith can trace LangChain applications automatically, but manual runs make this notebook's Part 6 architecture visible. Once you understand that trace, automatic instrumentation is easier to reason about. + +--- + +## TODO 19: Run and Inspect the Trace + +Run a short observed conversation using a fresh thread ID: + +```python +observed_thread = "observed-0022" + +for q in [ + "Find papers about memory in AI agents", + "What did we just discuss?", + "Search the web for recent agent observability ideas", +]: + call_agent_observed(q, thread_id=observed_thread, max_iterations=5) +``` + +Then open LangSmith: + +1. Open `https://smith.langchain.com` +2. Select the `agent-memory-workshop` project +3. Open the most recent `agent.run` trace +4. Expand the child runs + +You should see where the agent spent time and which operations happened during the turn. + +![LangSmith trace for an observed agent run](../images/part7-langsmith-trace.png) + +## What to Look For + +**Context build runs:** These show which memory systems were read before the LLM call. + +**Tool runs:** These show whether the model called Tavily or summary tools. + +**Context check runs:** These show estimated context window size without exposing the full prompt. + +**Memory write runs:** These show the durable writes that make the next turn memory-aware. + +## Key Takeaways + +**Observability makes agent behavior inspectable.** The Part 6 chart shows that the memory-aware agent controls context growth. Part 7 shows the operational path behind that chart. + +**The trace is not the memory store.** Oracle AI Database still stores the agent's memory. LangSmith shows what happened during execution. + +**Safe traces are designed.** A useful trace does not need full prompts or raw tool results. In most labs and production systems, counts, names, durations, statuses, and sanitized IDs are enough to debug the flow. diff --git a/workshops/agent_memory_workshop/docs/troubleshooting.md b/workshops/agent_memory_workshop/docs/troubleshooting.md index 1f8c37e7..364c7936 100644 --- a/workshops/agent_memory_workshop/docs/troubleshooting.md +++ b/workshops/agent_memory_workshop/docs/troubleshooting.md @@ -146,6 +146,27 @@ Do not commit this to git. --- +### OCI GenAI returns `Authorization failed or requested resource not found` + +**Symptom:** A model call fails with: + +```text +Error code: 404 - {'code': '404', 'message': 'Authorization failed or requested resource not found.'} +``` + +**Cause:** The OCI Generative AI API key cannot access the requested model in the configured region, or `OCI_GENAI_ENDPOINT` points at the service root instead of the OpenAI-compatible path. + +**Fix:** Confirm the endpoint and model access: + +```bash +export OCI_GENAI_ENDPOINT=https://inference.generativeai.us-phoenix-1.oci.oraclecloud.com/openai/v1 +export OCI_GENAI_API_KEY= +``` + +If your endpoint omits `/openai/v1`, the notebook appends it automatically. If the error remains, check that the API key was created in the same region, the key has an IAM policy that allows Generative AI use, and the selected model is available to your tenancy. + +--- + ### `ipywidgets` rendering error in output cell **Symptom:** A cell output shows `Error rendering output item using jupyter-ipywidget-renderer`. @@ -207,6 +228,62 @@ os.environ["TAVILY_API_KEY"] = "tvly-..." --- +## Observability and LangSmith Issues + +### LangSmith project does not open + +**Symptom:** You cannot find the `agent-memory-workshop` project in LangSmith. + +**Cause:** The notebook has not sent a trace yet, or it is using a different `LANGSMITH_PROJECT` value. + +**Fix:** + +```python +import os + +print(os.environ.get("LANGSMITH_PROJECT", "agent-memory-workshop")) +``` + +Rerun the Part 7 LangSmith configuration cell and one observed agent call. Then open `https://smith.langchain.com` and select the same project name. + +--- + +### No traces appear in LangSmith + +**Symptom:** Part 7 runs, but LangSmith has no new `agent.run` trace. + +**Cause:** `LANGSMITH_API_KEY` is missing, `LANGSMITH_TRACING` is not enabled, or the notebook is using a different LangSmith workspace/project than expected. + +**Fix:** Confirm the LangSmith environment variables, then rerun the Part 7 configuration cell and observed agent call: + +```python +import os + +print("LANGSMITH_API_KEY:", "SET" if os.environ.get("LANGSMITH_API_KEY") else "NOT SET") +print("LANGSMITH_TRACING:", os.environ.get("LANGSMITH_TRACING", "NOT SET")) +print("LANGSMITH_PROJECT:", os.environ.get("LANGSMITH_PROJECT", "agent-memory-workshop")) +``` + +If the key is missing, set it in your shell before launching Jupyter: + +```bash +export LANGSMITH_API_KEY="lsv2_..." +export LANGSMITH_TRACING=true +export LANGSMITH_PROJECT=agent-memory-workshop +``` + +--- + +### Traces contain too much information + +**Symptom:** You see prompt text, tool output, or document text in trace inputs, outputs, or metadata. + +**Cause:** A custom trace run captured raw content. + +**Fix:** Use lengths and counts instead of content. For example, record `query.length`, `response.length`, `memory.result_count`, and `tool.result_length`. Do not record API keys, raw prompts, retrieved documents, full Tavily output, or database connection strings. + +--- + ## Checking System Status If something is not working and you are not sure where the problem is, run this diagnostic cell: diff --git a/workshops/agent_memory_workshop/images/part7-jaeger-trace.png b/workshops/agent_memory_workshop/images/part7-jaeger-trace.png new file mode 100644 index 00000000..302097eb Binary files /dev/null and b/workshops/agent_memory_workshop/images/part7-jaeger-trace.png differ diff --git a/workshops/agent_memory_workshop/images/part7-langsmith-trace.png b/workshops/agent_memory_workshop/images/part7-langsmith-trace.png new file mode 100644 index 00000000..1ce1311a Binary files /dev/null and b/workshops/agent_memory_workshop/images/part7-langsmith-trace.png differ diff --git a/workshops/agent_memory_workshop/requirements.txt b/workshops/agent_memory_workshop/requirements.txt index e63e5916..1629838e 100644 --- a/workshops/agent_memory_workshop/requirements.txt +++ b/workshops/agent_memory_workshop/requirements.txt @@ -16,3 +16,4 @@ ipywidgets matplotlib tiktoken pydantic +langsmith diff --git a/workshops/agent_memory_workshop/workshop/notebook_complete.ipynb b/workshops/agent_memory_workshop/workshop/notebook_complete.ipynb index 4666ba6e..2f74f9d3 100644 --- a/workshops/agent_memory_workshop/workshop/notebook_complete.ipynb +++ b/workshops/agent_memory_workshop/workshop/notebook_complete.ipynb @@ -16,7 +16,7 @@ "\n", "\n", "In this notebook, you'll learn how to engineer memory systems that give AI agents the ability to remember, learn, and adapt across conversations. \n", - "Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory types—each serving a specific cognitive function." + "Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory types\u2014each serving a specific cognitive function." ] }, { @@ -26,10 +26,10 @@ "source": [ "## The Use Case: A Research Paper Assistant\n", "\n", - "Throughout this workshop you will build a **Research Paper Assistant** — an AI agent that can search, retrieve, and reason over arxiv research papers. \n", + "Throughout this workshop you will build a **Research Paper Assistant** \u2014 an AI agent that can search, retrieve, and reason over arxiv research papers. \n", "It ingests 50 papers into Oracle AI Database as vectors, answers multi-turn questions using memory that persists across conversations, \n", "and reaches the live web via Tavily when its knowledge base isn't enough. \n", - "The assistant is the vehicle — the real goal is learning the memory and context engineering patterns that make any agent reliable at scale." + "The assistant is the vehicle \u2014 the real goal is learning the memory and context engineering patterns that make any agent reliable at scale." ] }, { @@ -59,7 +59,7 @@ "source": [ "## The End Result\n", "\n", - "By the end of this workshop, your memory-engineered agent will keep its context window flat and stable — while a naive agent without memory or context engineering spirals toward the token limit within a few turns.\n", + "By the end of this workshop, your memory-engineered agent will keep its context window flat and stable \u2014 while a naive agent without memory or context engineering spirals toward the token limit within a few turns.\n", "\n", "![Context Window Growth: Engineered vs Naive Agent](../images/end_result.png)\n", "\n", @@ -113,7 +113,7 @@ "metadata": {}, "outputs": [], "source": [ - "! pip install -qU langchain-oracledb sentence-transformers langchain-openai langchain tavily-python datasets oracledb openai matplotlib" + "! pip install -qU langchain-oracledb sentence-transformers langchain-community langchain-openai langchain-huggingface langchain tavily-python datasets oracledb openai matplotlib langsmith\n" ] }, { @@ -125,7 +125,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-1-oracle-setup.md](../docs/part-1-oracle-setup.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-1-oracle-setup.md](../docs/part-1-oracle-setup.md)\n" ] }, { @@ -133,7 +133,7 @@ "id": "792a6485", "metadata": {}, "source": [ - "This section walks you through connecting to **Oracle AI Database**. Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engine—making it ideal for AI applications that need semantic search, embeddings storage, and vector similarity queries.\n", + "This section walks you through connecting to **Oracle AI Database**. Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engine\u2014making it ideal for AI applications that need semantic search, embeddings storage, and vector similarity queries.\n", "\n", "**What you'll do:**\n", "1. Pull and run the Oracle Database Docker container\n", @@ -187,7 +187,7 @@ " dsn=dsn,\n", " program=program\n", " )\n", - " print(\"✓ Connected successfully!\")\n", + " print(\"\u2713 Connected successfully!\")\n", " \n", " # Test the connection\n", " with conn.cursor() as cur:\n", @@ -199,10 +199,10 @@ " \n", " except oracledb.OperationalError as e:\n", " error_msg = str(e)\n", - " print(f\"✗ Connection failed (attempt {attempt}/{max_retries})\")\n", + " print(f\"\u2717 Connection failed (attempt {attempt}/{max_retries})\")\n", " \n", " if \"DPY-4011\" in error_msg or \"Connection reset by peer\" in error_msg:\n", - " print(\" → This usually means:\")\n", + " print(\" \u2192 This usually means:\")\n", " print(\" 1. Database is still starting up (wait 2-3 minutes)\")\n", " print(\" 2. Listener configuration issue\")\n", " print(\" 3. Container is not running\")\n", @@ -215,7 +215,7 @@ " else:\n", " raise\n", " except Exception as e:\n", - " print(f\"✗ Unexpected error: {e}\")\n", + " print(f\"\u2717 Unexpected error: {e}\")\n", " raise\n", " \n", " raise ConnectionError(\"Failed to connect after all retries\")" @@ -226,7 +226,7 @@ "id": "1f8bacbe", "metadata": {}, "source": [ - "> Connect as the `VECTOR` user — a dedicated schema created for storing embeddings and agent memory. All workshop operations use this user rather than `SYS` to follow the principle of least privilege.\n" + "> Connect as the `VECTOR` user \u2014 a dedicated schema created for storing embeddings and agent memory. All workshop operations use this user rather than `SYS` to follow the principle of least privilege.\n" ] }, { @@ -263,13 +263,13 @@ " cur.execute(f'DROP INDEX \"{idx}\"')\n", " dropped.append(idx)\n", " except Exception as e:\n", - " print(f\" ⚠️ Could not drop index {idx}: {e}\")\n", + " print(f\" \u26a0\ufe0f Could not drop index {idx}: {e}\")\n", "\n", " conn.commit()\n", " if dropped:\n", - " print(f\"🧹 One-time cleanup: dropped {len(dropped)} old index(es): {', '.join(dropped)}\")\n", + " print(f\"\ud83e\uddf9 One-time cleanup: dropped {len(dropped)} old index(es): {', '.join(dropped)}\")\n", " else:\n", - " print(\"🧹 One-time cleanup: no existing user-created indexes on VECTOR_SEARCH_DEMO\")\n", + " print(\"\ud83e\uddf9 One-time cleanup: no existing user-created indexes on VECTOR_SEARCH_DEMO\")\n", "\n", "one_time_cleanup_vector_demo_indexes(vector_conn)\n" ] @@ -279,7 +279,7 @@ "id": "365bcafb", "metadata": {}, "source": [ - "✅ **Connection established!** Oracle AI Database is running in this Codespace and your `vector_conn` is active.\n", + "\u2705 **Connection established!** Oracle AI Database is running in this Codespace and your `vector_conn` is active.\n", "\n", "Next, we will create vector-enabled SQL tables using **LangChain's OracleVS integration** to store embeddings and metadata for semantic search.\n" ] @@ -289,9 +289,9 @@ "id": "05aa0378", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 1**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 1**\n", ">\n", - "> Oracle AI Database is not a separate vector database — it is a converged engine where vectors, relational data, and SQL queries coexist. This means your agent's memory lives in one ACID-compliant system, not scattered across specialised stores." + "> Oracle AI Database is not a separate vector database \u2014 it is a converged engine where vectors, relational data, and SQL queries coexist. This means your agent's memory lives in one ACID-compliant system, not scattered across specialised stores." ] }, { @@ -303,7 +303,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-2-vector-search.md](../docs/part-2-vector-search.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-2-vector-search.md](../docs/part-2-vector-search.md)\n" ] }, { @@ -339,9 +339,9 @@ "id": "73cf543b", "metadata": {}, "source": [ - "> 💡 **Key Definition — Vector Search**\n", + "> \ud83d\udca1 **Key Definition \u2014 Vector Search**\n", ">\n", - "> Vector search finds documents by *meaning*, not keywords. Text is converted to a numeric vector (embedding), and retrieval is based on distance between vectors in high-dimensional space. Two documents about the same topic will be close together — even if they share no words." + "> Vector search finds documents by *meaning*, not keywords. Text is converted to a numeric vector (embedding), and retrieval is based on distance between vectors in high-dimensional space. Two documents about the same topic will be close together \u2014 even if they share no words." ] }, { @@ -358,7 +358,7 @@ "\n", "**Your task:** Initialise `OracleVS` in the cell below using the provided parameters. Then run the next two cells to create the HNSW index and ingest 50 research paper abstracts. You need data in the table before the search cells in Step 3 will return results.\n", "\n", - "> 📖 Open **docs/part-2-vector-search.md** if you need guidance.\n" + "> \ud83d\udcd6 Open **docs/part-2-vector-search.md** if you need guidance.\n" ] }, { @@ -412,11 +412,11 @@ " vector_store=vs,\n", " params={\"idx_name\": idx_name, \"idx_type\": \"HNSW\"}\n", " )\n", - " print(f\" ✅ Created index: {idx_name}\")\n", + " print(f\" \u2705 Created index: {idx_name}\")\n", " except Exception as e:\n", " err = str(e)\n", " if \"ORA-00955\" in err:\n", - " print(f\" ⏭️ Index already exists: {idx_name} (skipped)\")\n", + " print(f\" \u23ed\ufe0f Index already exists: {idx_name} (skipped)\")\n", " else:\n", " raise\n" ] @@ -463,7 +463,7 @@ "source": [ "Rather than downloading all papers at once into memory, streaming=True pulls them one at a time as you loop over them. \n", "\n", - "This is important because the full dataset could be millions of rows — streaming means you only ever hold one paper in memory at a time. `MAX_PAPERS = 50` then acts as an early exit so you only take the first 50.\n" + "This is important because the full dataset could be millions of rows \u2014 streaming means you only ever hold one paper in memory at a time. `MAX_PAPERS = 50` then acts as an early exit so you only take the first 50.\n" ] }, { @@ -489,7 +489,7 @@ "The loop builds three lists simultaneously from the same 50 papers:\n", "\n", "- `sampled_papers`: a full copy of each paper's raw fields, kept for inspection and reuse later in the notebook.\n", - "- `texts`: just the title and abstract combined into a single string. This is what gets embedded — the actual content that Oracle will turn into a vector\n", + "- `texts`: just the title and abstract combined into a single string. This is what gets embedded \u2014 the actual content that Oracle will turn into a vector\n", "- `metadata`: the identifiers and labels (arxiv ID, subject, authors) stored alongside the vector but not embedded. This is what you get back when you search, so you know which paper matched" ] }, @@ -562,7 +562,7 @@ " metadatas=metadata,\n", ")\n", "\n", - "print(f\"✅ Ingested {len(texts)} research papers into VECTOR_SEARCH_DEMO\")\n" + "print(f\"\u2705 Ingested {len(texts)} research papers into VECTOR_SEARCH_DEMO\")\n" ] }, { @@ -597,7 +597,7 @@ "source": [ "### Basic Similarity Search\n", "\n", - "Run a semantic search against the 50 ingested papers. Notice that the query does not need to share any keywords with the documents — Oracle finds papers whose *meaning* is closest to your query.\n", + "Run a semantic search against the 50 ingested papers. Notice that the query does not need to share any keywords with the documents \u2014 Oracle finds papers whose *meaning* is closest to your query.\n", "\n", "**Your task:** Complete the cell below using `similarity_search()`. Return the top 3 results and print `page_content` and `metadata` for each.\n" ] @@ -654,12 +654,12 @@ "source": [ "**Filtered Similarity Search**\n", "\n", - "Sometimes you want semantic search within a specific category — not across all documents. The cell below combines a natural language query with an exact metadata filter, restricting results to papers whose `primary_subject` matches a specific value.\n", + "Sometimes you want semantic search within a specific category \u2014 not across all documents. The cell below combines a natural language query with an exact metadata filter, restricting results to papers whose `primary_subject` matches a specific value.\n", "\n", - "This runs entirely inside Oracle as a single operation: the vector similarity and the metadata filter are evaluated together, not in sequence. Oracle does not fetch all matching vectors first and then filter — it applies both conditions simultaneously, which keeps it fast at scale.\n", + "This runs entirely inside Oracle as a single operation: the vector similarity and the metadata filter are evaluated together, not in sequence. Oracle does not fetch all matching vectors first and then filter \u2014 it applies both conditions simultaneously, which keeps it fast at scale.\n", "\n", "\n", - "`{\"primary_subject\": {\"$eq\": value}}` is the filter syntax. $eq means exact match — equivalent to WHERE primary_subject = value in SQL." + "`{\"primary_subject\": {\"$eq\": value}}` is the filter syntax. $eq means exact match \u2014 equivalent to WHERE primary_subject = value in SQL." ] }, { @@ -715,11 +715,11 @@ "source": [ "**Why would you ever do this?**\n", "\n", - "This pattern is useful in agent memory when you already know which document you want but you want to retrieve it through the same vector search pipeline that retrieves everything else. For example, an agent might know the arxiv ID of a paper the user mentioned earlier and want to pull it back into context — using $in filter lets you do that without writing a separate SQL query.\n", + "This pattern is useful in agent memory when you already know which document you want but you want to retrieve it through the same vector search pipeline that retrieves everything else. For example, an agent might know the arxiv ID of a paper the user mentioned earlier and want to pull it back into context \u2014 using $in filter lets you do that without writing a separate SQL query.\n", "\n", - "**The`$in` operator** works like SQL's IN clause — it matches any document whose id is in the provided list. You could pass multiple IDs: {\"id\": {\"$in\": [id1, id2, id3]}} to restrict the search to a specific set of documents.\n", + "**The`$in` operator** works like SQL's IN clause \u2014 it matches any document whose id is in the provided list. You could pass multiple IDs: {\"id\": {\"$in\": [id1, id2, id3]}} to restrict the search to a specific set of documents.\n", "\n", - "**The broader point for the workshop:** Oracle's vector search is not separate from its SQL capabilities — filters run as SQL predicates inside Oracle, meaning you get the full expressiveness of SQL filtering combined with semantic vector search in a single query. This is one of the key advantages of using a converged database over a standalone vector store." + "**The broader point for the workshop:** Oracle's vector search is not separate from its SQL capabilities \u2014 filters run as SQL predicates inside Oracle, meaning you get the full expressiveness of SQL filtering combined with semantic vector search in a single query. This is one of the key advantages of using a converged database over a standalone vector store." ] }, { @@ -727,9 +727,9 @@ "id": "ec892c84", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 2**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 2**\n", ">\n", - "> Vector search retrieves by meaning, not keywords — but metadata filters let you combine semantic similarity with exact constraints in a single query. This hybrid approach is what makes vector search practical for agent memory: find what's *relevant* and *scoped* at the same time." + "> Vector search retrieves by meaning, not keywords \u2014 but metadata filters let you combine semantic similarity with exact constraints in a single query. This hybrid approach is what makes vector search practical for agent memory: find what's *relevant* and *scoped* at the same time." ] }, { @@ -740,7 +740,7 @@ "# Part 3: Memory Engineering and Agent Memory\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-3-memory-engineering.md](../docs/part-3-memory-engineering.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-3-memory-engineering.md](../docs/part-3-memory-engineering.md)\n" ] }, { @@ -749,10 +749,10 @@ "metadata": {}, "source": [ "\n", - "**`Agent Memory`** is the exocortex that augments an LLM—capturing, encoding, storing, linking, and retrieving information beyond the model's parametric and contextual limits. \n", + "**`Agent Memory`** is the exocortex that augments an LLM\u2014capturing, encoding, storing, linking, and retrieving information beyond the model's parametric and contextual limits. \n", "It provides the persistence and structure required for long-horizon reasoning and reliable behaviour.\n", "\n", - "**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right time—so agents can behave reliably, believably, and capabaly.\n", + "**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right time\u2014so agents can behave reliably, believably, and capabaly.\n", "\n", "This is the core section of the notebook where we build a complete **`Memory Manager`** for AI agents. \n", "\n", @@ -784,7 +784,7 @@ "| **Summary** | Compressed memory | Condensed context for long conversations | Vector-Enabled SQL Table |\n", "| **Tool Log** | Episodic memory | Raw tool call outputs offloaded from context | SQL Table |\n", "\n", - "> **Note on Tool Log:** Tool Log is a form of episodic memory — it records *what happened* during each tool execution. Beyond keeping the context window lean, tool logs can serve as a source from which **procedural memories** (workflow patterns) and **semantic memories** (knowledge base entries) can be distilled over time.\n", + "> **Note on Tool Log:** Tool Log is a form of episodic memory \u2014 it records *what happened* during each tool execution. Beyond keeping the context window lean, tool logs can serve as a source from which **procedural memories** (workflow patterns) and **semantic memories** (knowledge base entries) can be distilled over time.\n", "\n", "## Steps in This Section\n", "\n", @@ -802,7 +802,7 @@ "id": "52a4e0fb", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agent Memory**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agent Memory**\n", ">\n", "> Agent memory is the persistent infrastructure that gives a stateless LLM the ability to remember across turns, sessions, and tasks. Without it, every inference starts from scratch." ] @@ -820,7 +820,7 @@ "id": "6820db4c", "metadata": {}, "source": [ - "> 💡 **Key Definition — Memory Engineering**\n", + "> \ud83d\udca1 **Key Definition \u2014 Memory Engineering**\n", ">\n", "> Memory engineering is the design of *what* to store, *where* to store it, and *when* to retrieve it. It is the scaffolding that moves information into, through, and across an AI system so agents behave reliably over long horizons." ] @@ -863,7 +863,7 @@ " if \"ORA-00942\" in str(e):\n", " print(f\" - {table} (not exists)\")\n", " else:\n", - " print(f\" ✗ {table}: {e}\")\n", + " print(f\" \u2717 {table}: {e}\")\n", " \n", "vector_conn.commit()" ] @@ -903,7 +903,7 @@ "- Adds an index on `thread_id` for fast conversation lookups\n", "- Adds an index on `timestamp` for chronological ordering\n", "\n", - "The `summary_id` column is critical for Part 4 — it links messages to their summaries when the context window is compacted." + "The `summary_id` column is critical for Part 4 \u2014 it links messages to their summaries when the context window is compacted." ] }, { @@ -975,11 +975,11 @@ "source": [ "### Step 1b: Create Tool Log Table (Experimental Memory)\n", "\n", - "Tool call outputs during agent execution can **bloat the context window** quickly — a single web search might return thousands of tokens that are only needed once. \n", + "Tool call outputs during agent execution can **bloat the context window** quickly \u2014 a single web search might return thousands of tokens that are only needed once. \n", "\n", "The `TOOL_LOG` table acts as an **experimental memory**: full tool outputs are persisted to the database and replaced in the context window with a compact one-line reference. The agent can retrieve full outputs later if needed via `read_tool_log`.\n", "\n", - "This is a form of **context offloading** — keeping the working memory lean while preserving full fidelity in durable storage." + "This is a form of **context offloading** \u2014 keeping the working memory lean while preserving full fidelity in durable storage." ] }, { @@ -1023,7 +1023,7 @@ "source": [ "### Step 1c: Create Vector-Enabled Tables for Each Memory Type\n", "\n", - "Here we create 5 separate OracleVS-backed vector-enabled SQL tables—one for each memory type. \n", + "Here we create 5 separate OracleVS-backed vector-enabled SQL tables\u2014one for each memory type. \n", "\n", "Each semantic memory is backed by its own Oracle table with a VECTOR column and uses the same embedding model for consistency.\n", "\n", @@ -1131,7 +1131,7 @@ " for p in sampled_papers\n", " ]\n", " knowledge_base_vs.add_texts(kb_texts, kb_meta)\n", - " print(f\"✅ Seeded knowledge base memory with {len(kb_texts)} arXiv papers\")\n" + " print(f\"\u2705 Seeded knowledge base memory with {len(kb_texts)} arXiv papers\")\n" ] }, { @@ -1153,21 +1153,21 @@ "\n", "| Operation | Programmatic | Agent-Triggered | Notes |\n", "|-----------|:------------:|:---------------:|-------|\n", - "| `read_conversational_memory()` | ✅ | ❌ | Always loaded at loop start (unsummarized units only) |\n", - "| `read_knowledge_base()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_workflow()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_entity()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_summary_context()` | ✅ | ❌ | Always loaded at loop start (IDs + descriptions) |\n", - "| `read_toolbox()` | ✅ | ❌ | Tool schemas are retrieved before model reasoning |\n", - "| `write_conversational_memory()` | ✅ | ❌ | User message (pre-loop) + assistant answer (post-loop) |\n", - "| `write_workflow()` | ✅ | ❌ | Persisted after loop when tool steps exist |\n", - "| `write_entity()` | ✅ | ❌ | Best-effort extraction around user/final assistant text |\n", - "| `write_tool_log()` | ✅ | ❌ | Full tool output offloaded to DB after every tool execution |\n", - "| Tool-call decision (`tool_choice=auto`) | ❌ | ✅ | Model decides whether to call tools |\n", - "| `search_tavily()` | ❌ | ✅ | Agent-triggered external retrieval |\n", - "| `expand_summary()` | ❌ | ✅ | Agent-triggered just-in-time summary expansion |\n", - "| `summarize_and_store()` | ❌ | ✅ | Agent-triggered context compaction primitive |\n", - "| `summarize_conversation()` | ❌ | ✅ | Agent-triggered conversation compaction for active thread |\n", + "| `read_conversational_memory()` | \u2705 | \u274c | Always loaded at loop start (unsummarized units only) |\n", + "| `read_knowledge_base()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_workflow()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_entity()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_summary_context()` | \u2705 | \u274c | Always loaded at loop start (IDs + descriptions) |\n", + "| `read_toolbox()` | \u2705 | \u274c | Tool schemas are retrieved before model reasoning |\n", + "| `write_conversational_memory()` | \u2705 | \u274c | User message (pre-loop) + assistant answer (post-loop) |\n", + "| `write_workflow()` | \u2705 | \u274c | Persisted after loop when tool steps exist |\n", + "| `write_entity()` | \u2705 | \u274c | Best-effort extraction around user/final assistant text |\n", + "| `write_tool_log()` | \u2705 | \u274c | Full tool output offloaded to DB after every tool execution |\n", + "| Tool-call decision (`tool_choice=auto`) | \u274c | \u2705 | Model decides whether to call tools |\n", + "| `search_tavily()` | \u274c | \u2705 | Agent-triggered external retrieval |\n", + "| `expand_summary()` | \u274c | \u2705 | Agent-triggered just-in-time summary expansion |\n", + "| `summarize_and_store()` | \u274c | \u2705 | Agent-triggered context compaction primitive |\n", + "| `summarize_conversation()` | \u274c | \u2705 | Agent-triggered conversation compaction for active thread |\n", "\n", "### What Is Programmatic in This Harness\n", "\n", @@ -1177,7 +1177,7 @@ "2. **Tool schema retrieval** before each model call.\n", "3. **Memory persistence** around the loop (store user turn, store assistant turn, persist workflow/entity updates).\n", "4. **Tool execution dispatch** after a tool call is chosen (once selected by the model, execution is deterministic in code).\n", - "5. **Tool output offloading** via `write_tool_log()` — full outputs are persisted to the database and replaced with compact references in the context window.\n", + "5. **Tool output offloading** via `write_tool_log()` \u2014 full outputs are persisted to the database and replaced with compact references in the context window.\n", "\n", "### What Is Agent-Triggered in This Harness\n", "\n", @@ -1190,9 +1190,9 @@ "\n", "### Why This Split Works for Memory-Centric Agents\n", "\n", - "1. **Reliability from programmatic memory** — critical memory load/save behavior never depends on the model remembering to do it.\n", - "2. **Adaptivity from agent-triggered tools** — the model can selectively fetch/expand/compact only when needed.\n", - "3. **Clear control boundaries** — the harness owns state integrity; the model owns strategy inside those boundaries." + "1. **Reliability from programmatic memory** \u2014 critical memory load/save behavior never depends on the model remembering to do it.\n", + "2. **Adaptivity from agent-triggered tools** \u2014 the model can selectively fetch/expand/compact only when needed.\n", + "3. **Clear control boundaries** \u2014 the harness owns state integrity; the model owns strategy inside those boundaries." ] }, { @@ -1225,11 +1225,11 @@ "\n", "### Key Features\n", "\n", - "- **Thread-based conversations** — Messages are organized by `thread_id` for multi-conversation support\n", - "- **Semantic search** — Vector-enabled SQL tables enable finding relevant content by meaning, not just keywords\n", - "- **Metadata filtering** — Workflows filter by `num_steps > 0`, summaries filter by `id`\n", - "- **LLM-powered entity extraction** — Automatically extracts people, places, and systems from text\n", - "- **Formatted context output** — Each read method returns formatted text ready for the LLM context\n" + "- **Thread-based conversations** \u2014 Messages are organized by `thread_id` for multi-conversation support\n", + "- **Semantic search** \u2014 Vector-enabled SQL tables enable finding relevant content by meaning, not just keywords\n", + "- **Metadata filtering** \u2014 Workflows filter by `num_steps > 0`, summaries filter by `id`\n", + "- **LLM-powered entity extraction** \u2014 Automatically extracts people, places, and systems from text\n", + "- **Formatted context output** \u2014 Each read method returns formatted text ready for the LLM context\n" ] }, { @@ -1350,7 +1350,7 @@ " \"\"\", {\"summary_id\": summary_id, \"thread_id\": thread_id})\n", " count = cur.rowcount\n", " self.conn.commit()\n", - " print(f\" 📦 Marked {count} messages as summarized (summary_id: {summary_id})\")\n", + " print(f\" \ud83d\udce6 Marked {count} messages as summarized (summary_id: {summary_id})\")\n", "\n", " # ==================== KNOWLEDGE BASE (Vector-Enabled SQL Table) ====================\n", " \n", @@ -1507,13 +1507,13 @@ " if not results:\n", " return \"## Entity Memory\\nNo entities found.\"\n", " \n", - " entities = [f\"• {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}\" \n", + " entities = [f\"\u2022 {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}\" \n", " for doc in results if hasattr(doc, 'metadata')]\n", " entities_formatted = '\\n'.join(entities)\n", " return f\"\"\"## Entity Memory\n", "### Purpose: Named entities (people, places, systems, paper titles) extracted from conversations.\n", "### When to use: Use these to resolve references like \"that author\" or \"the system we discussed\".\n", - "### Entity memory provides continuity across turns — ground your answers in known entities\n", + "### Entity memory provides continuity across turns \u2014 ground your answers in known entities\n", "### rather than guessing or re-asking the user for names and details already mentioned.\n", "\n", "{entities_formatted}\"\"\"\n", @@ -1551,13 +1551,13 @@ " \"### Purpose: Compressed snapshots of older conversations and context windows.\",\n", " \"### When to use: These are lightweight pointers. If a summary looks relevant,\",\n", " \"### call expand_summary(summary_id) to retrieve the full content just-in-time.\",\n", - " \"### Do NOT expand all summaries — only expand when you need specific details.\",\n", + " \"### Do NOT expand all summaries \u2014 only expand when you need specific details.\",\n", " \"\"\n", " ]\n", " for doc in results:\n", " sid = doc.metadata.get('id', '?')\n", " desc = doc.metadata.get('description', 'No description')\n", - " lines.append(f\" • [ID: {sid}] {desc}\")\n", + " lines.append(f\" \u2022 [ID: {sid}] {desc}\")\n", " return \"\\n\".join(lines)\n", " \n", " # ==================== TOOL LOG (SQL - Experimental Memory) ====================\n", @@ -1666,7 +1666,7 @@ "source": [ "### The Scalability Problem with Tools\n", "\n", - "As your AI system grows, you might have **hundreds of tools** available—APIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:\n", + "As your AI system grows, you might have **hundreds of tools** available\u2014APIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:\n", "\n", "| Problem | Impact |\n", "|---------|--------|\n", @@ -1681,9 +1681,9 @@ "\n", "The `Toolbox` class solves this by treating tools as a **searchable memory**:\n", "\n", - "1. **Register hundreds of tools** — Store all available tools with their descriptions and embeddings\n", - "2. **Retrieve only relevant tools** — At inference time, use vector search to find tools semantically relevant to the current query\n", - "3. **Pass a focused toolset** — Only the retrieved tools (typically 3-5) are passed to the LLM\n", + "1. **Register hundreds of tools** \u2014 Store all available tools with their descriptions and embeddings\n", + "2. **Retrieve only relevant tools** \u2014 At inference time, use vector search to find tools semantically relevant to the current query\n", + "3. **Pass a focused toolset** \u2014 Only the retrieved tools (typically 3-5) are passed to the LLM\n", "\n", "This approach means your system can **scale to hundreds of tools** while the LLM only sees the most relevant ones for each query.\n", "\n", @@ -1692,7 +1692,7 @@ "The `Toolbox` class uses **docstrings as the retrieval key**:\n", "\n", "```\n", - "User Query → Embed Query → Vector Search → Find tools with similar docstrings → Return relevant tools\n", + "User Query \u2192 Embed Query \u2192 Vector Search \u2192 Find tools with similar docstrings \u2192 Return relevant tools\n", "```\n", "\n", "| Component | Purpose |\n", @@ -1725,18 +1725,18 @@ "|---|---|---|\n", "| `__init__` | Initialises the Toolbox with a `MemoryManager`, OpenAI client, and model name | Sets up internal dicts `_tools` and `_tools_by_name` to track registered callables |\n", "| `get_embedding` | Converts a text string into a 768-dimensional vector using the configured embedding model | Used to embed tool descriptions so they can be stored and retrieved by semantic similarity |\n", - "| `_augment_docstring` | Sends a tool's docstring to the LLM and returns an improved, more detailed version | Makes tools more discoverable — a richer description embeds better and matches more user queries |\n", + "| `_augment_docstring` | Sends a tool's docstring to the LLM and returns an improved, more detailed version | Makes tools more discoverable \u2014 a richer description embeds better and matches more user queries |\n", "| `_generate_queries` | Uses the LLM to generate synthetic example queries a user might ask when needing this tool | Embeds the *usage intent* alongside the tool description, increasing retrieval accuracy |\n", "| `_get_tool_metadata` | Extracts function name, signature, parameters, and return type using Python's `inspect` module | Produces a structured `ToolMetadata` object used to build the OpenAI-compatible tool schema |\n", - "| `register_tool` | Registers a function as a tool — can be used as a plain decorator or with `augment=True` | The core method: embeds the tool description, stores it in Oracle via `MemoryManager`, and keeps a reference to the callable for execution |\n", + "| `register_tool` | Registers a function as a tool \u2014 can be used as a plain decorator or with `augment=True` | The core method: embeds the tool description, stores it in Oracle via `MemoryManager`, and keeps a reference to the callable for execution |\n", "\n", "\n", "### Key Insight\n", "\n", "The `augment=True` flag in `@toolbox.register_tool(augment=True)` triggers:\n", - "1. **Docstring augmentation** — LLM rewrites the docstring to be clearer and more searchable\n", - "2. **Synthetic query generation** — LLM generates example queries that would need this tool\n", - "3. **Rich embedding** — Combines name + augmented docstring + signature + queries for better retrieval\n", + "1. **Docstring augmentation** \u2014 LLM rewrites the docstring to be clearer and more searchable\n", + "2. **Synthetic query generation** \u2014 LLM generates example queries that would need this tool\n", + "3. **Rich embedding** \u2014 Combines name + augmented docstring + signature + queries for better retrieval\n", "\n", "This means a simple one-line docstring like `\"Search the web\"` becomes a rich, detailed description that's much more likely to be retrieved when the user asks something like `\"What's the latest news about AI?\"`" ] @@ -1813,7 +1813,7 @@ "\n", " # NOTE: The role description of a technical writer below is a prompt engineering technique that is used to improve the quality of the docstring\n", " # Athough there are research that suggest that role description doesn't realy affect the quality of the LLM's output, it is still a useful technique\n", - " # and it is a good [prompt engineering] technique to know.\n", + " #\u00a0and it is a good [prompt engineering] technique to know.\n", " prompt = f\"\"\"You are a technical writer. Improve the following function docstring to be more clear, \n", " comprehensive, and useful. Include:\n", " 1. A clear concise summary\n", @@ -1945,7 +1945,7 @@ " object_id_str = str(object_id)\n", "\n", " # NOTE: Augmentation is a technique that is used to improve the quality of the tool's docstring\n", - " # by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique\n", + " #\u00a0by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique\n", " if augment:\n", " # Use LLM to enhance the tool's discoverability\n", " augmented_docstring = self._augment_docstring(docstring)\n", @@ -1999,12 +1999,12 @@ "id": "api-key-warning", "metadata": {}, "source": [ - "### ⚠️ API Keys\n", + "### \u26a0\ufe0f API Keys\n", "\n", "Your API keys are pre-configured as environment variables in this Codespace:\n", "\n", - "- **`OCI_GENAI_API_KEY`** — OCI GenAI (xAI Grok 3 Fast) access\n", - "- **`TAVILY_API_KEY`** — Tavily web search (free tier, 1,000 searches/month)\n", + "- **`OCI_GENAI_API_KEY`** \u2014 OCI GenAI (xAI Grok 3 Fast) access\n", + "- **`TAVILY_API_KEY`** \u2014 Tavily web search (free tier, 1,000 searches/month)\n", "\n", "The next cell loads them from the environment. If you are running locally, set these environment variables before launching Jupyter.\n" ] @@ -2030,11 +2030,11 @@ "outputs": [], "source": [ "# Verify OCI GenAI key is available\n", - "assert oci_genai_api_key, \"OCI_GENAI_API_KEY not set — check your Codespace environment variables\"\n", + "assert oci_genai_api_key, \"OCI_GENAI_API_KEY not set \u2014 check your Codespace environment variables\"\n", "print(f\"OCI GenAI API key loaded (length: {len(oci_genai_api_key)})\")\n", "\n", "# Verify Tavily key is available\n", - "assert tavily_api_key, \"TAVILY_API_KEY not set — check your Codespace environment variables\"\n", + "assert tavily_api_key, \"TAVILY_API_KEY not set \u2014 check your Codespace environment variables\"\n", "print(f\"Tavily API key loaded (length: {len(tavily_api_key)})\")\n" ] }, @@ -2052,7 +2052,9 @@ "OCI_GENAI_ENDPOINT = os.environ.get(\n", " \"OCI_GENAI_ENDPOINT\",\n", " \"https://inference.generativeai.us-phoenix-1.oci.oraclecloud.com/openai/v1\"\n", - ")\n", + ").rstrip(\"/\")\n", + "if not OCI_GENAI_ENDPOINT.endswith(\"/openai/v1\"):\n", + " OCI_GENAI_ENDPOINT = f\"{OCI_GENAI_ENDPOINT}/openai/v1\"\n", "OCI_GENAI_API_KEY = os.environ[\"OCI_GENAI_API_KEY\"] # set via Codespaces secret\n", "\n", "client = OpenAI(base_url=OCI_GENAI_ENDPOINT, api_key=OCI_GENAI_API_KEY)\n", @@ -2066,7 +2068,7 @@ "id": "d16158d8", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 3**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 3**\n", ">\n", "> Different types of information need different storage and retrieval strategies. Chat history needs exact, ordered recall (SQL). Knowledge and workflows need relevance-ranked retrieval (vectors). Getting the storage type wrong means either missing context or flooding the window with noise." ] @@ -2080,7 +2082,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-4-context-engineering.md](../docs/part-4-context-engineering.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-4-context-engineering.md](../docs/part-4-context-engineering.md)\n" ] }, { @@ -2090,7 +2092,7 @@ "source": [ "> **Context engineering** refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.\n", "> \n", - "> — *Anthropic*\n", + "> \u2014 *Anthropic*\n", "\n", "While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *how to manage what's in the context window right now*. This includes monitoring usage, compressing information, and providing just-in-time access to details.\n", "\n", @@ -2103,14 +2105,14 @@ "| **3. Compact** | `summarize_conversation()` / `summarize_and_store()` | Agent-triggered compaction when context gets long |\n", "| **4. Just-in-Time Retrieval** | `expand_summary()` tool | Let agent expand summaries on demand |\n", "\n", - "**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent’s current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.\n", + "**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent\u2019s current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.\n", "\n", "## The Context Management Flow\n", "\n", "```\n", - "Context built → Check usage % → Agent may compact (summarize) → Store summary with ID\n", - " ↓\n", - "Agent sees: [Summary ID: abc123] Brief description ← Agent can call expand_summary(\"abc123\") if needed\n", + "Context built \u2192 Check usage % \u2192 Agent may compact (summarize) \u2192 Store summary with ID\n", + " \u2193\n", + "Agent sees: [Summary ID: abc123] Brief description \u2190 Agent can call expand_summary(\"abc123\") if needed\n", "```\n", "\n", "This approach keeps the context lean while giving the agent access to full details when required.\n", @@ -2121,7 +2123,7 @@ "\n", "**Your task:** Implement `calculate_context_usage()` in the cell below. The agent harness depends on the return dict having exactly three keys: `tokens`, `max`, and `percent`. The return dict must have exactly three keys: `tokens`, `max`, and `percent`.\n", "\n", - "> 📖 Open **docs/part-4-context-engineering.md** if you need guidance.\n" + "> \ud83d\udcd6 Open **docs/part-4-context-engineering.md** if you need guidance.\n" ] }, { @@ -2129,9 +2131,9 @@ "id": "af6ef9b4", "metadata": {}, "source": [ - "> 💡 **Key Definition — Context Engineering**\n", + "> \ud83d\udca1 **Key Definition \u2014 Context Engineering**\n", ">\n", - "> Context engineering is the discipline of deciding exactly which tokens enter the LLM's context window on each inference call. While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *what's in the window right now* — monitoring, compressing, and curating it." + "> Context engineering is the discipline of deciding exactly which tokens enter the LLM's context window on each inference call. While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *what's in the window right now* \u2014 monitoring, compressing, and curating it." ] }, { @@ -2157,48 +2159,48 @@ "source": [ "### Why We Need to Summarise the Context Window\n", "\n", - "Every time the agent processes a turn, it assembles a context window — a single block of text containing conversation history, retrieved knowledge, tool outputs, entity memory, and workflow patterns. This is what gets sent to the LLM on each inference call.\n", + "Every time the agent processes a turn, it assembles a context window \u2014 a single block of text containing conversation history, retrieved knowledge, tool outputs, entity memory, and workflow patterns. This is what gets sent to the LLM on each inference call.\n", "\n", "The problem is that this block grows with every turn. Left unchecked, it follows a predictable trajectory:\n", "\n", "```\n", - "Turn 1: [system prompt] + [query] → ~500 tokens\n", - "Turn 5: [system prompt] + [4 prior turns] + [tool outputs] → ~8,000 tokens\n", - "Turn 15: [system prompt] + [14 prior turns] + [tool outputs] → ~40,000 tokens\n", - "Turn 30: Context limit exceeded → API call fails\n", + "Turn 1: [system prompt] + [query] \u2192 ~500 tokens\n", + "Turn 5: [system prompt] + [4 prior turns] + [tool outputs] \u2192 ~8,000 tokens\n", + "Turn 15: [system prompt] + [14 prior turns] + [tool outputs] \u2192 ~40,000 tokens\n", + "Turn 30: Context limit exceeded \u2192 API call fails\n", "```\n", "\n", "This is called **context window bloat**, and it is one of the most common failure modes in production agent systems.\n", "\n", "#### The Two Failure Modes\n", "\n", - "**Hard failure** — the context exceeds the model's token limit and the API call errors. The agent crashes mid-task with no recovery path.\n", + "**Hard failure** \u2014 the context exceeds the model's token limit and the API call errors. The agent crashes mid-task with no recovery path.\n", "\n", - "**Soft failure** — the context is large but still within the limit. However, research has shown that LLMs suffer from the [\"lost in the middle\" problem](https://arxiv.org/abs/2307.03172): when relevant information is buried in a long context, models struggle to attend to it. The agent appears to \"forget\" things it was told earlier in the same session — not because the tokens were removed, but because the model's attention is diluted.\n", + "**Soft failure** \u2014 the context is large but still within the limit. However, research has shown that LLMs suffer from the [\"lost in the middle\" problem](https://arxiv.org/abs/2307.03172): when relevant information is buried in a long context, models struggle to attend to it. The agent appears to \"forget\" things it was told earlier in the same session \u2014 not because the tokens were removed, but because the model's attention is diluted.\n", "\n", "#### Summarisation as the Solution\n", "\n", "Context summarisation is the primary technique for managing this growth. Rather than appending every message and tool output indefinitely, the agent periodically compresses older context into a compact summary and stores it in Oracle's summary memory table.\n", "\n", - "The original content is not lost — it is stored in full in the database. Only a short reference pointer (`[Summary ID: abc-123]`) remains in the active context. If the agent needs the full detail later, it can call `expand_summary(summary_id)` to retrieve it on demand.\n", + "The original content is not lost \u2014 it is stored in full in the database. Only a short reference pointer (`[Summary ID: abc-123]`) remains in the active context. If the agent needs the full detail later, it can call `expand_summary(summary_id)` to retrieve it on demand.\n", "\n", "This gives you a **flat context growth curve** instead of an unbounded one:\n", "\n", "```\n", - "Without summarisation: context grows linearly → eventually fails\n", - "With summarisation: context stays bounded → runs indefinitely\n", + "Without summarisation: context grows linearly \u2192 eventually fails\n", + "With summarisation: context stays bounded \u2192 runs indefinitely\n", "```\n", "\n", "#### What a Good Summary Preserves\n", "\n", "A summarisation prompt is only useful if it faithfully compresses the right information. A well-written prompt ensures the summary retains:\n", "\n", - "- The **user's goal** — so the agent stays on task across turns\n", - "- **Key facts and findings** — so the agent does not re-discover what it already knows\n", - "- **Named entities** — paper titles, arXiv IDs, authors — so the agent can refer back to specific sources\n", - "- **Unresolved questions** — so the agent knows what still needs to be done\n", + "- The **user's goal** \u2014 so the agent stays on task across turns\n", + "- **Key facts and findings** \u2014 so the agent does not re-discover what it already knows\n", + "- **Named entities** \u2014 paper titles, arXiv IDs, authors \u2014 so the agent can refer back to specific sources\n", + "- **Unresolved questions** \u2014 so the agent knows what still needs to be done\n", "\n", - "This is why the summarisation prompt matters. A vague or poorly structured prompt produces a summary that loses critical detail — and once the original context is replaced, that detail is gone from the active window.\n", + "This is why the summarisation prompt matters. A vague or poorly structured prompt produces a summary that loses critical detail \u2014 and once the original context is replaced, that detail is gone from the active window.\n", "\n", "> The cell below implements `summarise_context_window()`. Your task is to write the prompt that instructs the LLM on exactly what to preserve and how to format the output.\n" ] @@ -2261,31 +2263,31 @@ "\n", "Summarisation alone is not the complete picture. The real architectural insight is **where the summary goes**.\n", "\n", - "When the agent compresses its context window, it does not simply discard the older content. It **offloads it to Oracle AI Database** — the agent's memory core — and replaces it in the active context with a compact reference pointer:\n", + "When the agent compresses its context window, it does not simply discard the older content. It **offloads it to Oracle AI Database** \u2014 the agent's memory core \u2014 and replaces it in the active context with a compact reference pointer:\n", "\n", "```\n", "Before offload:\n", - " [Full conversation history — 40,000 tokens]\n", - " [Retrieved papers — 8,000 tokens]\n", - " [Tool outputs — 12,000 tokens]\n", - " ──────────────────────────────────────────\n", - " Total: ~60,000 tokens → approaching limit\n", + " [Full conversation history \u2014 40,000 tokens]\n", + " [Retrieved papers \u2014 8,000 tokens]\n", + " [Tool outputs \u2014 12,000 tokens]\n", + " \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n", + " Total: ~60,000 tokens \u2192 approaching limit\n", "\n", "After offload:\n", " [Summary ID: a3f9c1b2] Agent researched planetary exploration papers,\n", " identified three relevant arXiv submissions, user asked follow-up on\n", " mission timelines. Next: retrieve funding data.\n", - " ──────────────────────────────────────────\n", - " Total: ~80 tokens → context reset, agent continues\n", + " \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n", + " Total: ~80 tokens \u2192 context reset, agent continues\n", "```\n", "\n", - "The original 60,000 tokens are not gone. They are stored in full in Oracle's `SUMMARY_MEMORY` table, indexed by the summary ID. The reference pointer in the active context — `[Summary ID: a3f9c1b2]` — is both a retrieval key and a human-readable label describing what was compressed.\n", + "The original 60,000 tokens are not gone. They are stored in full in Oracle's `SUMMARY_MEMORY` table, indexed by the summary ID. The reference pointer in the active context \u2014 `[Summary ID: a3f9c1b2]` \u2014 is both a retrieval key and a human-readable label describing what was compressed.\n", "\n", "#### Just-in-Time (JiT) Retrieval\n", "\n", "The agent does not expand every summary on every turn. Expanding all summaries would immediately re-inflate the context, defeating the purpose. Instead, the agent reads the description in the reference pointer and decides whether it needs the full content for the current task.\n", "\n", - "If it does, it calls `expand_summary(summary_id)` — a tool registered in the toolbox — which pulls the full original content from Oracle on demand. This is **Just-in-Time retrieval**: the agent fetches detail only when it needs it, not speculatively.\n", + "If it does, it calls `expand_summary(summary_id)` \u2014 a tool registered in the toolbox \u2014 which pulls the full original content from Oracle on demand. This is **Just-in-Time retrieval**: the agent fetches detail only when it needs it, not speculatively.\n", "\n", "```\n", "Active context contains:\n", @@ -2295,21 +2297,21 @@ "User asks: \"What were the arXiv IDs from earlier?\"\n", "\n", "Agent decides:\n", - " → Summary a3f9c1b2 is relevant → calls expand_summary(\"a3f9c1b2\")\n", - " → Summary d7e2f091 is not relevant → leaves it compressed\n", + " \u2192 Summary a3f9c1b2 is relevant \u2192 calls expand_summary(\"a3f9c1b2\")\n", + " \u2192 Summary d7e2f091 is not relevant \u2192 leaves it compressed\n", "\n", - "Oracle returns full content of a3f9c1b2 → agent answers accurately\n", + "Oracle returns full content of a3f9c1b2 \u2192 agent answers accurately\n", "```\n", "\n", "#### The Memory Core Architecture\n", "\n", "This is the reason Oracle AI Database is positioned as the **agent memory core** rather than just a storage backend. It provides three things simultaneously that no standalone vector store or cache can match:\n", "\n", - "- **Persistence** — summaries survive across sessions, container restarts, and Codespace rebuilds\n", - "- **Semantic indexing** — summary descriptions are embedded, so the agent can find relevant summaries by meaning, not just by exact ID\n", - "- **Faithful reconstruction** — the full original content is stored in a `CLOB` column alongside the summary, meaning the uncompressed version is always recoverable — the agent never permanently loses context, it only defers it\n", + "- **Persistence** \u2014 summaries survive across sessions, container restarts, and Codespace rebuilds\n", + "- **Semantic indexing** \u2014 summary descriptions are embedded, so the agent can find relevant summaries by meaning, not just by exact ID\n", + "- **Faithful reconstruction** \u2014 the full original content is stored in a `CLOB` column alongside the summary, meaning the uncompressed version is always recoverable \u2014 the agent never permanently loses context, it only defers it\n", "\n", - "The cell below implements `offload_to_summary()` — the function that triggers this entire flow when the context window crosses the threshold. Notice it is deliberately simple: it checks the usage percentage, calls `summarise_context_window()`, and returns the compact reference. The complexity lives in the database and the retrieval tools, not in this function.\n" + "The cell below implements `offload_to_summary()` \u2014 the function that triggers this entire flow when the context window crosses the threshold. Notice it is deliberately simple: it checks the usage percentage, calls `summarise_context_window()`, and returns the compact reference. The complexity lives in the database and the retrieval tools, not in this function.\n" ] }, { @@ -2355,24 +2357,24 @@ "\n", "**Our intuition:** Memory should be *compressed*, or *forgotten* not *erased*. By marking messages with a `summary_id` instead of deleting them:\n", "\n", - "1. **Full history is preserved** — Original messages remain in the database for auditing, debugging, or reprocessing\n", - "2. **Linkage is maintained** — Each summary knows which messages it represents (via `summary_id`)\n", - "3. **Reversible** — If a summary is deleted, you could \"unsummarize\" by clearing the `summary_id`\n", + "1. **Full history is preserved** \u2014 Original messages remain in the database for auditing, debugging, or reprocessing\n", + "2. **Linkage is maintained** \u2014 Each summary knows which messages it represents (via `summary_id`)\n", + "3. **Reversible** \u2014 If a summary is deleted, you could \"unsummarize\" by clearing the `summary_id`\n", "\n", "#### The Flow\n", "\n", "```\n", - "Thread has 50 messages → Context too large → summarize_conversation(thread_id)\n", - " ↓\n", + "Thread has 50 messages \u2192 Context too large \u2192 summarize_conversation(thread_id)\n", + " \u2193\n", " 1. Read unsummarized messages\n", " 2. LLM summarizes them\n", " 3. Store summary with unique ID\n", " 4. UPDATE messages SET summary_id = 'abc123'\n", - " ↓\n", + " \u2193\n", " Next read: Only new messages appear + Summary ID reference\n", "```\n", "\n", - "This is a form of **log compaction** — a pattern borrowed from databases and message queues where old entries are compressed but not lost." + "This is a form of **log compaction** \u2014 a pattern borrowed from databases and message queues where old entries are compressed but not lost." ] }, { @@ -2426,9 +2428,9 @@ "id": "e5decfcf", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 4**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 4**\n", ">\n", - "> An agent's context window is a finite budget. Without active management, it fills up within a few turns and the agent breaks. Summarisation and JIT retrieval are the two levers that keep context flat — compress what you can, and only fetch what you need." + "> An agent's context window is a finite budget. Without active management, it fills up within a few turns and the agent breaks. Summarisation and JIT retrieval are the two levers that keep context flat \u2014 compress what you can, and only fetch what you need." ] }, { @@ -2440,7 +2442,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-5-web-search.md](../docs/part-5-web-search.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-5-web-search.md](../docs/part-5-web-search.md)\n" ] }, { @@ -2454,23 +2456,23 @@ "\n", "## What This Section Does\n", "\n", - "1. **Initialize the Tavily client** — Set up the search API with an API key\n", - "2. **Register `search_tavily` as a tool** — Use `@toolbox.register_tool(augment=True)` to make it discoverable\n", - "3. **Implement the search-and-store pattern** — Results are automatically written to knowledge base memory\n", - "4. **Test tool retrieval** — Verify the tool can be found via semantic search\n", + "1. **Initialize the Tavily client** \u2014 Set up the search API with an API key\n", + "2. **Register `search_tavily` as a tool** \u2014 Use `@toolbox.register_tool(augment=True)` to make it discoverable\n", + "3. **Implement the search-and-store pattern** \u2014 Results are automatically written to knowledge base memory\n", + "4. **Test tool retrieval** \u2014 Verify the tool can be found via semantic search\n", "\n", "## The Search-and-Store Pattern\n", "\n", "One thing to note is that not only do we get external context that is not available to the Agent at execution, but we persists this to the knowledge base memory and the Agent can reuse this information in subsequent iteration.\n", - "When the agent calls `search_tavily()`, it doesn't just return results—it **persists them to the knowledge base**:\n", + "When the agent calls `search_tavily()`, it doesn't just return results\u2014it **persists them to the knowledge base**:\n", "\n", "```\n", "Agent calls search_tavily(\"latest AI news\")\n", - " ↓\n", + " \u2193\n", "Tavily API returns results\n", - " ↓\n", + " \u2193\n", "Each result is written to knowledge_base_vs with metadata (title, URL, timestamp)\n", - " ↓\n", + " \u2193\n", "Future queries can retrieve this information without searching again\n", "```\n", "\n", @@ -2482,9 +2484,9 @@ "id": "4efe33cb", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agentic Tool**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agentic Tool**\n", ">\n", - "> An agentic tool is a function the LLM can choose to call during its reasoning loop. Unlike programmatic operations (which the harness always runs), agentic tools are invoked only when the model decides they are needed — giving the agent autonomy over *when* to act." + "> An agentic tool is a function the LLM can choose to call during its reasoning loop. Unlike programmatic operations (which the harness always runs), agentic tools are invoked only when the model decides they are needed \u2014 giving the agent autonomy over *when* to act." ] }, { @@ -2495,8 +2497,8 @@ "outputs": [], "source": [ "# Verify Tavily key is available\n", - "assert tavily_api_key, \"TAVILY_API_KEY not set — check your Codespace environment variables\"\n", - "print(\"Tavily API key loaded ✓\")" + "assert tavily_api_key, \"TAVILY_API_KEY not set \u2014 check your Codespace environment variables\"\n", + "print(\"Tavily API key loaded \u2713\")" ] }, { @@ -2557,7 +2559,7 @@ "id": "72cdcc3e", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 5**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 5**\n", ">\n", "> Giving an agent web access turns it from a closed system into an open one. But the real pattern here is the *toolbox*: registering tools into a vector store so the agent discovers them by relevance rather than receiving every tool on every call. This scales to hundreds of tools without bloating the context." ] @@ -2571,7 +2573,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-6-agent-execution.md](../docs/part-6-agent-execution.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-6-agent-execution.md](../docs/part-6-agent-execution.md)\n" ] }, { @@ -2596,7 +2598,7 @@ "id": "efac6d69", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agent Harness**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agent Harness**\n", ">\n", "> An agent harness is the runtime scaffold that orchestrates the LLM's reasoning loop: building context, dispatching tool calls, persisting memory, and deciding when to stop. The LLM generates intent; the harness executes it." ] @@ -2613,7 +2615,7 @@ "\n", "client = OpenAI(base_url=OCI_GENAI_ENDPOINT, api_key=OCI_GENAI_API_KEY)\n", "\n", - "# Persistent context-window tracker — survives across call_agent() invocations\n", + "# Persistent context-window tracker \u2014 survives across call_agent() invocations\n", "context_size_history = [] # list of (run_label, iteration, estimated_tokens)\n", "\n", "# ==================== SYSTEM PROMPT ====================\n", @@ -2622,19 +2624,19 @@ "You are a Research Paper Assistant with access to memory and tools.\n", "\n", "IMPORTANT: The user's input contains CONTEXT retrieved from multiple memory systems.\n", - "Each memory section has a Purpose and When-to-use guide — follow them.\n", + "Each memory section has a Purpose and When-to-use guide \u2014 follow them.\n", "\n", "## Memory Priority Order\n", - "1. **Conversation Memory** — check what the user already asked and what you already answered.\n", - "2. **Knowledge Base Memory** — cite facts from stored papers/documents before searching externally.\n", - "3. **Entity Memory** — resolve named references (\"that author\", \"the system\") from here.\n", - "4. **Workflow Memory** — reuse proven tool sequences for similar past queries.\n", - "5. **Summary Memory** — expand a summary ID only when you need specific details from older context.\n", + "1. **Conversation Memory** \u2014 check what the user already asked and what you already answered.\n", + "2. **Knowledge Base Memory** \u2014 cite facts from stored papers/documents before searching externally.\n", + "3. **Entity Memory** \u2014 resolve named references (\"that author\", \"the system\") from here.\n", + "4. **Workflow Memory** \u2014 reuse proven tool sequences for similar past queries.\n", + "5. **Summary Memory** \u2014 expand a summary ID only when you need specific details from older context.\n", "\n", "## Tool Output Handling\n", "Tool call outputs are logged to a Tool Log table and replaced with compact references in context.\n", "The preview in each [Tool Log ...] reference contains enough to reason about the result.\n", - "If you need the full output, it can be retrieved from the database — but prefer working with\n", + "If you need the full output, it can be retrieved from the database \u2014 but prefer working with\n", "the preview and the knowledge base (where search results are also stored).\n", "\n", "## Context Management\n", @@ -2686,31 +2688,31 @@ "\n", "```\n", "1. BUILD CONTEXT (programmatic)\n", - " ├── Read conversational memory (unsummarized chat units)\n", - " ├── Read knowledge base (relevant documents)\n", - " ├── Read workflow memory (past action patterns)\n", - " ├── Read entity memory (people, places, systems)\n", - " └── Read summary context (available summary IDs + descriptions)\n", + " \u251c\u2500\u2500 Read conversational memory (unsummarized chat units)\n", + " \u251c\u2500\u2500 Read knowledge base (relevant documents)\n", + " \u251c\u2500\u2500 Read workflow memory (past action patterns)\n", + " \u251c\u2500\u2500 Read entity memory (people, places, systems)\n", + " \u2514\u2500\u2500 Read summary context (available summary IDs + descriptions)\n", "\n", "2. GET TOOLS (programmatic)\n", - " └── Retrieve semantically relevant tools from toolbox\n", + " \u2514\u2500\u2500 Retrieve semantically relevant tools from toolbox\n", "\n", "3. STORE USER MESSAGE (programmatic)\n", - " └── Persist the user message + best-effort entity extraction\n", + " \u2514\u2500\u2500 Persist the user message + best-effort entity extraction\n", "\n", "4. WITHIN-RUN TOOL-CALL LOOP (up to max_iterations and within max_execution_time_s)\n", - " ├── Call LLM with context + tool schemas\n", - " ├── If tool calls → execute tools and append tool outputs\n", - " ├── If tools changed memory (search/compaction) → rebuild context for the next iteration\n", - " └── If no tool calls → finalize answer\n", + " \u251c\u2500\u2500 Call LLM with context + tool schemas\n", + " \u251c\u2500\u2500 If tool calls \u2192 execute tools and append tool outputs\n", + " \u251c\u2500\u2500 If tools changed memory (search/compaction) \u2192 rebuild context for the next iteration\n", + " \u2514\u2500\u2500 If no tool calls \u2192 finalize answer\n", "\n", "5. GUARDED STOP\n", - " └── If iteration/time budget is hit → force a final best-effort answer (no tools)\n", + " \u2514\u2500\u2500 If iteration/time budget is hit \u2192 force a final best-effort answer (no tools)\n", "\n", "6. SAVE RESULTS (programmatic)\n", - " ├── Write workflow (if tools were used)\n", - " ├── Best-effort entity extraction on final answer\n", - " └── Store assistant response in conversational memory\n", + " \u251c\u2500\u2500 Write workflow (if tools were used)\n", + " \u251c\u2500\u2500 Best-effort entity extraction on final answer\n", + " \u2514\u2500\u2500 Store assistant response in conversational memory\n", "```\n", "\n", "## Key Design Decisions\n", @@ -2746,7 +2748,7 @@ "\n", " # 1. Build context from memory\n", " print(\"\\n\" + \"=\"*50)\n", - " print(\"🧠 BUILDING CONTEXT...\")\n", + " print(\"\ud83e\udde0 BUILDING CONTEXT...\")\n", "\n", " def build_context() -> str:\n", " \"\"\"Rebuild the full context from the current memory state.\"\"\"\n", @@ -2765,9 +2767,9 @@ "\n", " # 2. Check context usage (agent decides whether to summarize via tools)\n", " usage = calculate_context_usage(context)\n", - " print(f\"📊 Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", + " print(f\"\ud83d\udcca Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", " if usage['percent'] > 80:\n", - " print(\"⚠️ Context >80% - agent may call summarize_conversation(thread_id) for compaction.\")\n", + " print(\"\u26a0\ufe0f Context >80% - agent may call summarize_conversation(thread_id) for compaction.\")\n", "\n", " # 3. Get tools\n", " dynamic_tools = memory_manager.read_toolbox(query, k=5)\n", @@ -2785,7 +2787,7 @@ " dynamic_tools.append(tool)\n", " existing.add(name)\n", "\n", - " print(f\"🔧 Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", + " print(f\"\ud83d\udd27 Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", "\n", " # 4. Store user message & extract entities\n", " memory_manager.write_conversational_memory(query, \"user\", thread_id)\n", @@ -2801,7 +2803,7 @@ " # Estimate tool schema tokens (sent with every API call)\n", " tool_schema_tokens = len(json_lib.dumps(dynamic_tools)) // 4 if dynamic_tools else 0\n", "\n", - " print(\"\\n🤖 TOOL-CALL LOOP\")\n", + " print(\"\\n\ud83e\udd16 TOOL-CALL LOOP\")\n", " for iteration in range(max_iterations):\n", " print(f\"\\n--- Iteration {iteration + 1} ---\")\n", "\n", @@ -2814,7 +2816,7 @@ " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", - " print(f\"\\n⏱️ Time limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", + " print(f\"\\n\u23f1\ufe0f Time limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", " break\n", "\n", " response = call_openai_chat(messages, tools=dynamic_tools)\n", @@ -2833,15 +2835,15 @@ " tool_args = json_lib.loads(raw_args)\n", " except Exception as e:\n", " result = f\"Error: invalid JSON tool arguments for {tool_name}: {e}. Raw: {raw_args}\"\n", - " print(f\"🛠️ {tool_name}()\")\n", - " steps.append(f\"{tool_name}() → failed\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}()\")\n", + " steps.append(f\"{tool_name}() \u2192 failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", " if not isinstance(tool_args, dict):\n", " result = f\"Error: tool arguments for {tool_name} must be a JSON object. Got {type(tool_args).__name__}.\"\n", - " print(f\"🛠️ {tool_name}()\")\n", - " steps.append(f\"{tool_name}() → failed\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}()\")\n", + " steps.append(f\"{tool_name}() \u2192 failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", @@ -2851,26 +2853,26 @@ "\n", " args_display = {k: (v[:50] + '...' if isinstance(v, str) and len(v) > 50 else v)\n", " for k, v in tool_args.items()}\n", - " print(f\"🛠️ {tool_name}({args_display})\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}({args_display})\")\n", "\n", " if max_execution_time_s is not None:\n", " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", " result = f\"Error: time limit reached before executing tool {tool_name}.\"\n", - " steps.append(f\"{tool_name}({args_display}) → failed\")\n", - " print(f\" → {result}\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 failed\")\n", + " print(f\" \u2192 {result}\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " break\n", "\n", " try:\n", " result = execute_tool(tool_name, tool_args)\n", - " steps.append(f\"{tool_name}({args_display}) → success\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", - " steps.append(f\"{tool_name}({args_display}) → failed\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 failed\")\n", "\n", - " print(f\" → {result[:200]}...\")\n", + " print(f\" \u2192 {result[:200]}...\")\n", "\n", " # Offload tool output to TOOL_LOG table (experimental memory).\n", " # Full output is persisted in the DB; only a compact reference\n", @@ -2892,12 +2894,12 @@ " break\n", " else:\n", " final_answer = msg.content or \"\"\n", - " print(f\"\\n✅ DONE ({len(steps)} tool calls)\")\n", + " print(f\"\\n\u2705 DONE ({len(steps)} tool calls)\")\n", " break\n", "\n", " if not final_answer:\n", " reason = \"time limit\" if timed_out else \"iteration limit\"\n", - " print(f\"\\n⚠️ Stopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", + " print(f\"\\n\u26a0\ufe0f Stopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", " try:\n", " final_messages = messages + [{\"role\": \"user\", \"content\": \"Finalize your answer using the context and tool outputs so far. Do not call tools.\"}]\n", " final_resp = call_openai_chat(final_messages, tools=None)\n", @@ -2914,7 +2916,7 @@ " pass\n", " memory_manager.write_conversational_memory(final_answer, \"assistant\", thread_id)\n", "\n", - " print(\"\\n\" + \"=\"*50 + f\"\\n💬 ANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", + " print(\"\\n\" + \"=\"*50 + f\"\\n\ud83d\udcac ANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", " return final_answer" ] }, @@ -2934,7 +2936,7 @@ "call_agent(\"Which authors appear most frequently in this research area?\", thread_id=\"0022\")\n", "call_agent(\"Summarise everything we have discussed so far\", thread_id=\"0022\")\n", "\n", - "# Final question — tests whether conversational memory is working correctly.\n", + "# Final question \u2014 tests whether conversational memory is working correctly.\n", "call_agent(\"What was my first question to you\", thread_id=\"0022\")\n" ] }, @@ -2958,7 +2960,7 @@ " plt.tight_layout()\n", " plt.show()\n", "else:\n", - " print(\"No iterations recorded — run call_agent() first.\")" + " print(\"No iterations recorded \u2014 run call_agent() first.\")" ] }, { @@ -2966,9 +2968,9 @@ "id": "7299fbef", "metadata": {}, "source": [ - "> 💡 **Key Insight — Memory-Aware vs Memory-Augmented**\n", + "> \ud83d\udca1 **Key Insight \u2014 Memory-Aware vs Memory-Augmented**\n", ">\n", - "> A **memory-augmented** agent has access to memory stores — it can read and write. But that alone is not enough. A **memory-aware** agent also has *context engineering*: it monitors its context budget, summarises proactively, offloads tool outputs, and retrieves just-in-time. The naive agent below is memory-augmented (it uses the same LLM and tools) but not memory-aware — it has no strategy for managing what accumulates in its context window. The difference is what you are about to see in the chart." + "> A **memory-augmented** agent has access to memory stores \u2014 it can read and write. But that alone is not enough. A **memory-aware** agent also has *context engineering*: it monitors its context budget, summarises proactively, offloads tool outputs, and retrieves just-in-time. The naive agent below is memory-augmented (it uses the same LLM and tools) but not memory-aware \u2014 it has no strategy for managing what accumulates in its context window. The difference is what you are about to see in the chart." ] }, { @@ -2976,7 +2978,7 @@ "id": "1cdasgb4qzj", "metadata": {}, "source": [ - "## Step 2: Baseline — Agent Without Context Engineering\n", + "## Step 2: Baseline \u2014 Agent Without Context Engineering\n", "\n", "To appreciate the impact of the memory and context engineering techniques we've built, it helps to see what happens **without them**.\n", "\n", @@ -2985,13 +2987,13 @@ "| Technique Removed | What Happens Instead | Effect on Context Window |\n", "|---|---|---|\n", "| **Tool output offloading** (`write_tool_log`) | Full raw tool outputs stay in the `messages` list | Each tool call adds thousands of tokens (e.g. a web search returns ~2-4k tokens of results) |\n", - "| **Summarisation tools** (`summarize_conversation`, `summarize_and_store`) | Excluded from the tool list — the agent has no way to compact context | Context only grows, never shrinks |\n", + "| **Summarisation tools** (`summarize_conversation`, `summarize_and_store`) | Excluded from the tool list \u2014 the agent has no way to compact context | Context only grows, never shrinks |\n", "| **Context refresh after search** | No rebuild from memory after tool calls | Stale + bloated context persists across iterations |\n", - "| **Memory-backed context rebuild** | Messages persist as one flat list across calls | No separation of concerns — everything accumulates |\n", + "| **Memory-backed context rebuild** | Messages persist as one flat list across calls | No separation of concerns \u2014 everything accumulates |\n", "\n", "### Why This Matters\n", "\n", - "In a real agent loop, the LLM is called **once per iteration** with the full `messages` list. Without offloading, every tool output ever produced sits in that list. After just 3 web searches, the context could grow by 10,000+ tokens — consuming budget that could be used for reasoning.\n", + "In a real agent loop, the LLM is called **once per iteration** with the full `messages` list. Without offloading, every tool output ever produced sits in that list. After just 3 web searches, the context could grow by 10,000+ tokens \u2014 consuming budget that could be used for reasoning.\n", "\n", "The comparison chart below plots both approaches on the same axis so you can see the divergence." ] @@ -3005,18 +3007,18 @@ "source": [ "# Separate tracker for the naive agent\n", "naive_context_size_history = []\n", - "# Persistent messages per thread — simulates no context management across runs\n", + "# Persistent messages per thread \u2014 simulates no context management across runs\n", "_naive_messages_by_thread = {}\n", "\n", "def call_agent_naive(query: str, thread_id: str = \"naive_1\", dynamic_tools_override: list = None, max_iterations: int = 10, max_execution_time_s: float = 60.0) -> str:\n", - " \"\"\"Naive agent harness — NO context engineering.\n", + " \"\"\"Naive agent harness \u2014 NO context engineering.\n", " \n", " Differences from call_agent:\n", " 1. Full raw tool outputs stay in messages (no write_tool_log offloading)\n", " 2. No summarisation tools available (agent cannot compact context)\n", " 3. No context refresh after memory-mutating tools\n", - " 4. Messages persist across calls — context only grows, never shrinks\n", - " 5. No memory reads — conversation history IS the raw messages list\n", + " 4. Messages persist across calls \u2014 context only grows, never shrinks\n", + " 5. No memory reads \u2014 conversation history IS the raw messages list\n", " \"\"\"\n", " thread_id = str(thread_id)\n", " steps = []\n", @@ -3024,7 +3026,7 @@ " start_time = time.time()\n", " timed_out = False\n", "\n", - " # Get tools — but exclude summarisation tools\n", + " # Get tools \u2014 but exclude summarisation tools\n", " if dynamic_tools_override is not None:\n", " dynamic_tools = dynamic_tools_override\n", " else:\n", @@ -3034,7 +3036,7 @@ " {\"summarize_conversation\", \"summarize_and_store\", \"expand_summary\"}]\n", "\n", " # Initialize or reuse persistent messages for this thread.\n", - " # No memory reads — the raw messages list IS the only context.\n", + " # No memory reads \u2014 the raw messages list IS the only context.\n", " # This is the naive approach: everything accumulates in one flat list.\n", " if thread_id not in _naive_messages_by_thread:\n", " _naive_messages_by_thread[thread_id] = [\n", @@ -3042,7 +3044,7 @@ " ]\n", " messages = _naive_messages_by_thread[thread_id]\n", "\n", - " # Just append the raw query — no build_context(), no memory reads.\n", + " # Just append the raw query \u2014 no build_context(), no memory reads.\n", " # Prior turns, tool outputs, and assistant responses are already in messages.\n", " messages.append({\"role\": \"user\", \"content\": query})\n", " final_answer = \"\"\n", @@ -3072,10 +3074,10 @@ " tool_args = json_lib.loads(tc.function.arguments or \"{}\")\n", " try:\n", " result = execute_tool(tc.function.name, tool_args)\n", - " steps.append(f\"{tc.function.name} → success\")\n", + " steps.append(f\"{tc.function.name} \u2192 success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", - " steps.append(f\"{tc.function.name} → failed\")\n", + " steps.append(f\"{tc.function.name} \u2192 failed\")\n", "\n", " # KEY DIFFERENCE: raw tool output goes straight into messages (no offloading)\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": str(result)})\n", @@ -3093,7 +3095,7 @@ "\n", " # Append assistant answer to persistent messages (it stays for the next call)\n", " messages.append({\"role\": \"assistant\", \"content\": final_answer})\n", - " print(f\"✅ Naive agent done ({len(steps)} tool calls, {len(messages)} messages in context)\")\n", + " print(f\"\u2705 Naive agent done ({len(steps)} tool calls, {len(messages)} messages in context)\")\n", " return final_answer" ] }, @@ -3115,7 +3117,7 @@ "eng_thread = str(uuid.uuid4())[:8]\n", "naive_thread = str(uuid.uuid4())[:8]\n", "\n", - "# Five progressive queries that build on each other — tests memory continuity\n", + "# Five progressive queries that build on each other \u2014 tests memory continuity\n", "queries = [\n", " \"Search for recent papers on AI agent memory published in 2026\",\n", " \"Pick the 3rd paper from the list and give me the key takeaways\",\n", @@ -3126,7 +3128,7 @@ "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", - " print(f\"QUERY {i}/5 — WITH CONTEXT ENGINEERING (thread: {eng_thread})\")\n", + " print(f\"QUERY {i}/5 \u2014 WITH CONTEXT ENGINEERING (thread: {eng_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent(q, thread_id=eng_thread)\n", @@ -3134,7 +3136,7 @@ "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", - " print(f\"QUERY {i}/5 — NAIVE / NO CONTEXT ENGINEERING (thread: {naive_thread})\")\n", + " print(f\"QUERY {i}/5 \u2014 NAIVE / NO CONTEXT ENGINEERING (thread: {naive_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent_naive(q, thread_id=naive_thread)\n", @@ -3175,7 +3177,7 @@ "\n", "In OpenAI-style framing:\n", "- An **agent run** (one user turn handled) is what `call_agent(...)` executes.\n", - "- Within a run, the **tool-call loop** repeats: model reasoning → optional tool calls → harness executes tools → model observes results → repeat until a final answer.\n", + "- Within a run, the **tool-call loop** repeats: model reasoning \u2192 optional tool calls \u2192 harness executes tools \u2192 model observes results \u2192 repeat until a final answer.\n", "\n", "An **agent harness** is the runtime scaffolding around that loop. In this notebook, it is a **memory-based agent harness** where:\n", "- context is assembled from multiple memory types each run\n", @@ -3196,7 +3198,7 @@ "id": "7sp42fx6618", "metadata": {}, "source": [ - "## Step 3: LLM-as-a-Judge — Response Quality Evaluation\n", + "## Step 3: LLM-as-a-Judge \u2014 Response Quality Evaluation\n", "\n", "We've seen the **context window efficiency** difference between the two agents. But does better context engineering actually produce **better answers**?\n", "\n", @@ -3208,7 +3210,7 @@ "| Response A (memory-engineered agent) | A preference: **A**, **B**, or **Tie** |\n", "| Response B (naive agent) | A short explanation of its reasoning |\n", "\n", - "> **Why a warmup phase?** The memory agent's advantage is **cumulative** — it stores conversational memory, entities, and workflows across turns while managing context size. On a brand-new conversation, both agents perform similarly. We first run 5 warmup queries to build up conversation state, then evaluate on queries that specifically test **recall, continuity, and synthesis** — the capabilities that memory engineering enables." + "> **Why a warmup phase?** The memory agent's advantage is **cumulative** \u2014 it stores conversational memory, entities, and workflows across turns while managing context size. On a brand-new conversation, both agents perform similarly. We first run 5 warmup queries to build up conversation state, then evaluate on queries that specifically test **recall, continuity, and synthesis** \u2014 the capabilities that memory engineering enables." ] }, { @@ -3218,7 +3220,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ── Warmup phase: build up conversation history so the memory agent has state to leverage ──\n", + "# \u2500\u2500 Warmup phase: build up conversation history so the memory agent has state to leverage \u2500\u2500\n", "eval_thread_eng = str(uuid.uuid4())[:8]\n", "eval_thread_naive = str(uuid.uuid4())[:8]\n", "\n", @@ -3230,13 +3232,13 @@ " \"Compare the findings from the two searches we did\",\n", "]\n", "\n", - "print(\"🔄 WARMUP — building conversation history on both agents...\\n\")\n", + "print(\"\ud83d\udd04 WARMUP \u2014 building conversation history on both agents...\\n\")\n", "for i, q in enumerate(warmup_queries, 1):\n", " print(f\" Warmup {i}/{len(warmup_queries)}: {q[:60]}...\")\n", " call_agent(q, thread_id=eval_thread_eng)\n", " call_agent_naive(q, thread_id=eval_thread_naive)\n", "\n", - "# ── Evaluation phase: these queries test memory recall and continuity ──\n", + "# \u2500\u2500 Evaluation phase: these queries test memory recall and continuity \u2500\u2500\n", "eval_queries = [\n", " \"What was the very first paper we discussed and what were its key points?\",\n", " \"Summarize the full arc of our conversation so far\",\n", @@ -3245,16 +3247,16 @@ "\n", "eval_results = []\n", "\n", - "print(f\"\\n{'='*60}\\n📋 EVALUATION — collecting response pairs for judging\\n{'='*60}\")\n", + "print(f\"\\n{'='*60}\\n\ud83d\udccb EVALUATION \u2014 collecting response pairs for judging\\n{'='*60}\")\n", "for q in eval_queries:\n", " print(f\"\\nEVAL: {q}\")\n", - " print(\" ▶ Memory-engineered agent...\")\n", + " print(\" \u25b6 Memory-engineered agent...\")\n", " eng_resp = call_agent(q, thread_id=eval_thread_eng)\n", - " print(\" ▶ Naive agent...\")\n", + " print(\" \u25b6 Naive agent...\")\n", " naive_resp = call_agent_naive(q, thread_id=eval_thread_naive)\n", " eval_results.append((q, eng_resp, naive_resp))\n", "\n", - "print(f\"\\n✅ Collected {len(eval_results)} response pairs for judging.\")" + "print(f\"\\n\u2705 Collected {len(eval_results)} response pairs for judging.\")" ] }, { @@ -3275,10 +3277,10 @@ "{response_b}\n", "\n", "Evaluate both responses on:\n", - "1. **Accuracy** — Are the facts correct and claims well-supported?\n", - "2. **Completeness** — Does the response fully address the query?\n", - "3. **Relevance** — Does it stay on-topic and use context appropriately?\n", - "4. **Coherence** — Is it well-structured and easy to follow?\n", + "1. **Accuracy** \u2014 Are the facts correct and claims well-supported?\n", + "2. **Completeness** \u2014 Does the response fully address the query?\n", + "3. **Relevance** \u2014 Does it stay on-topic and use context appropriately?\n", + "4. **Coherence** \u2014 Is it well-structured and easy to follow?\n", "\n", "Reply with EXACTLY this JSON format (no other text):\n", "{{\"winner\": \"A\" or \"B\" or \"Tie\", \"reason\": \"one sentence explanation\"}}\"\"\"\n", @@ -3301,7 +3303,7 @@ " verdict = judge_responses(query, eng_resp, naive_resp)\n", " verdict[\"query\"] = query\n", " judgments.append(verdict)\n", - " label = {\"A\": \"Memory Agent ✅\", \"B\": \"Naive Agent\", \"Tie\": \"Tie 🤝\"}\n", + " label = {\"A\": \"Memory Agent \u2705\", \"B\": \"Naive Agent\", \"Tie\": \"Tie \ud83e\udd1d\"}\n", " print(f\"Query: {query[:60]}...\")\n", " print(f\" Winner: {label.get(verdict['winner'], verdict['winner'])}\")\n", " print(f\" Reason: {verdict['reason']}\\n\")" @@ -3340,6 +3342,502 @@ "print(f\"\\nMemory Agent wins {wins['Memory Agent']}/{len(judgments)} queries.\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Part 7: Agent Observability with LangSmith\n", + "\n", + "--------\n", + "\n", + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-7-observability.md](../docs/part-7-observability.md)\n", + "\n", + "In Part 6, you proved that memory and context engineering keep the agent's context window under control.\n", + "\n", + "Now you will make the agent's runtime behavior visible. You will add LangSmith trace runs around the same agent loop so you can inspect one turn: context assembly, Oracle memory reads, tool retrieval, LLM calls, tool execution, context checks, summarisation, and memory writes.\n", + "\n", + "Before running this part, set your LangSmith API key in a terminal or in this notebook:\n", + "\n", + "```bash\n", + "export LANGSMITH_API_KEY=\"lsv2_...\"\n", + "export LANGSMITH_TRACING=true\n", + "export LANGSMITH_PROJECT=agent-memory-workshop\n", + "```\n", + "\n", + "Then open `https://smith.langchain.com` and select the `agent-memory-workshop` project after you run an observed turn.\n", + "\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 7**\n", + ">\n", + "> Observability is not the memory store. Oracle AI Database still persists memory; LangSmith shows what happened during an agent run.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 17: Configure LangSmith Tracing\n", + "\n", + "LangSmith sends traces from this notebook to a project. Configure observability with:\n", + "\n", + "- `LANGSMITH_TRACING = \"true\"`\n", + "- `LANGSMITH_PROJECT = \"agent-memory-workshop\"`\n", + "- a LangSmith `Client`\n", + "- `tracer = langsmith` so later cells can call `tracer.trace(...)`\n", + "\n", + "Do not capture full prompts, responses, retrieved documents, API keys, raw tool output, or database connection strings in trace inputs, outputs, or metadata.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import langsmith as ls\n", + "from langsmith import Client\n", + "\n", + "def configure_agent_observability(\n", + " project_name: str = \"agent-memory-workshop\",\n", + "):\n", + " \"\"\"Configure LangSmith tracing for Part 7.\"\"\"\n", + " global _agent_observability\n", + " if globals().get(\"_agent_observability\") is not None:\n", + " return _agent_observability\n", + "\n", + " os.environ.setdefault(\"LANGSMITH_TRACING\", \"true\")\n", + " os.environ.setdefault(\"LANGSMITH_PROJECT\", project_name)\n", + "\n", + " if not os.environ.get(\"LANGSMITH_API_KEY\"):\n", + " raise RuntimeError(\n", + " \"Set LANGSMITH_API_KEY before running Part 7. \"\n", + " \"Create an API key in LangSmith, then export it in your shell or set it in this notebook.\"\n", + " )\n", + "\n", + " client = Client()\n", + " _agent_observability = {\"client\": client, \"project_name\": project_name}\n", + " return _agent_observability\n", + "\n", + "observability = configure_agent_observability()\n", + "tracer = ls\n", + "\n", + "with tracer.trace(\n", + " \"part7.connection.test\",\n", + " run_type=\"chain\",\n", + " inputs={\"test.kind\": \"notebook_setup\"},\n", + " project_name=observability[\"project_name\"],\n", + " client=observability[\"client\"],\n", + ") as test_run:\n", + " test_run.add_outputs({\"status\": \"configured\"})\n", + "\n", + "print(\"LangSmith tracing configured. Open https://smith.langchain.com and select project: agent-memory-workshop\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 17\n", + "assert observability is not None and observability.get(\"client\") is not None, \"TODO 17 incomplete \u2014 LangSmith tracing is not configured.\"\n", + "assert tracer is not None, \"TODO 17 incomplete \u2014 tracer is not configured.\"\n", + "print(\"TODO 17 passed \u2014 LangSmith tracing configured\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 18: Build an Observed Agent Wrapper\n", + "\n", + "Create `call_agent_observed()` as a wrapper-style copy of the Part 6 harness. Keep `call_agent()` unchanged.\n", + "\n", + "Your observed version should create trace runs for:\n", + "\n", + "- `agent.run`\n", + "- `agent.context.build`\n", + "- `agent.memory.read`\n", + "- `agent.context.check`\n", + "- `agent.toolbox.read`\n", + "- `agent.memory.write`\n", + "- `agent.llm.call`\n", + "- `agent.tool.execute`\n", + "- `agent.tool.log`\n", + "\n", + "Record safe metadata such as lengths, counts, memory type, tool name, model name, and status. Do not record full prompts or raw tool output.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def _safe_len(value) -> int:\n", + " return len(str(value or \"\"))\n", + "\n", + "def _trace_run(name: str, run_type: str = \"chain\", inputs: dict | None = None, metadata: dict | None = None):\n", + " return tracer.trace(\n", + " name,\n", + " run_type=run_type,\n", + " inputs=inputs or {},\n", + " metadata=metadata or {},\n", + " project_name=observability[\"project_name\"],\n", + " client=observability[\"client\"],\n", + " )\n", + "\n", + "def _mark_run_error(run, error: Exception):\n", + " run.add_metadata({\"error.type\": type(error).__name__})\n", + " run.end(error=f\"{type(error).__name__}: {error}\")\n", + "\n", + "def _span_status_value(result: str) -> str:\n", + " return \"error\" if str(result).lower().startswith(\"error:\") else \"ok\"\n", + "\n", + "def _observed_memory_read(memory_type: str, reader, *args, **kwargs) -> str:\n", + " with _trace_run(\"agent.memory.read\", metadata={\"memory.type\": memory_type}) as run:\n", + " try:\n", + " value = reader(*args, **kwargs)\n", + " run.add_outputs({\"memory.result_length\": _safe_len(value)})\n", + " return value\n", + " except Exception as exc:\n", + " _mark_run_error(run, exc)\n", + " raise\n", + "\n", + "def _build_observed_context(query: str, thread_id: str) -> str:\n", + " with _trace_run(\n", + " \"agent.context.build\",\n", + " inputs={\"query.length\": _safe_len(query)},\n", + " metadata={\"agent.thread_id\": thread_id},\n", + " ) as run:\n", + " ctx = f\"# Question\\n{query}\\n\\n\"\n", + " ctx += _observed_memory_read(\"conversational\", memory_manager.read_conversational_memory, thread_id) + \"\\n\\n\"\n", + " ctx += _observed_memory_read(\"knowledge_base\", memory_manager.read_knowledge_base, query) + \"\\n\\n\"\n", + " ctx += _observed_memory_read(\"workflow\", memory_manager.read_workflow, query) + \"\\n\\n\"\n", + " ctx += _observed_memory_read(\"entity\", memory_manager.read_entity, query) + \"\\n\\n\"\n", + " ctx += _observed_memory_read(\"summary\", memory_manager.read_summary_context, query) + \"\\n\\n\"\n", + "\n", + " run.add_outputs({\n", + " \"context.length\": len(ctx),\n", + " \"context.estimated_tokens\": len(ctx) // 4,\n", + " })\n", + " return ctx\n", + "\n", + "def _read_observed_tools(query: str) -> list:\n", + " with _trace_run(\"agent.toolbox.read\", inputs={\"query.length\": _safe_len(query)}) as run:\n", + " dynamic_tools = memory_manager.read_toolbox(query, k=5)\n", + " summary_tool_candidates = memory_manager.read_toolbox(\n", + " \"summarize conversation compact context expand summary memory\", k=5\n", + " )\n", + " must_have = {\"expand_summary\", \"summarize_conversation\", \"summarize_and_store\"}\n", + " existing = {t.get(\"function\", {}).get(\"name\") for t in dynamic_tools}\n", + "\n", + " for tool in summary_tool_candidates:\n", + " name = tool.get(\"function\", {}).get(\"name\")\n", + " if name in must_have and name not in existing:\n", + " dynamic_tools.append(tool)\n", + " existing.add(name)\n", + "\n", + " tool_names = [t.get(\"function\", {}).get(\"name\", \"unknown\") for t in dynamic_tools]\n", + " run.add_outputs({\"tool.count\": len(tool_names)})\n", + " run.add_metadata({\"tool.names\": \",\".join(tool_names)})\n", + " return dynamic_tools\n", + "\n", + "def call_agent_observed(query: str, thread_id: str = \"observed-1\", max_iterations: int = 5, max_execution_time_s: float = 60.0) -> str:\n", + " \"\"\"Observed version of call_agent() that sends LangSmith trace runs.\"\"\"\n", + " thread_id = str(thread_id)\n", + " steps = []\n", + " run_label = f\"Observed Run {len(set(r for r, _, _ in context_size_history)) + 1}\"\n", + "\n", + " import time\n", + "\n", + " start_time = time.time()\n", + " timed_out = False\n", + " final_answer = \"\"\n", + "\n", + " with _trace_run(\n", + " \"agent.run\",\n", + " inputs={\"query.length\": _safe_len(query)},\n", + " metadata={\n", + " \"agent.thread_id\": thread_id,\n", + " \"agent.mode\": \"memory_engineered_observed\",\n", + " \"llm.model\": \"xai.grok-3-fast\",\n", + " },\n", + " ) as run:\n", + " print(\"\\n\" + \"=\"*50)\n", + " print(\"BUILDING OBSERVED CONTEXT...\")\n", + " context = _build_observed_context(query, thread_id)\n", + "\n", + " with _trace_run(\"agent.context.check\") as check_run:\n", + " usage = calculate_context_usage(context)\n", + " check_run.add_outputs({\n", + " \"context.percent\": usage[\"percent\"],\n", + " \"context.estimated_tokens\": usage[\"tokens\"],\n", + " \"context.max_tokens\": usage[\"max\"],\n", + " })\n", + " print(f\"Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", + "\n", + " dynamic_tools = _read_observed_tools(query)\n", + " print(f\"Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", + "\n", + " with _trace_run(\"agent.memory.write\", metadata={\"memory.type\": \"conversational\", \"memory.role\": \"user\", \"agent.thread_id\": thread_id}) as write_run:\n", + " record_id = memory_manager.write_conversational_memory(query, \"user\", thread_id)\n", + " write_run.add_outputs({\"memory.record_id\": str(record_id), \"memory.content_length\": _safe_len(query)})\n", + "\n", + " with _trace_run(\"agent.memory.write\", metadata={\"memory.type\": \"entity\", \"memory.source\": \"user_message\"}) as write_run:\n", + " try:\n", + " memory_manager.write_entity(\"\", \"\", \"\", llm_client=client, text=query)\n", + " write_run.add_outputs({\"memory.write.status\": \"ok\"})\n", + " except Exception as exc:\n", + " write_run.add_outputs({\"memory.write.status\": \"skipped\"})\n", + " write_run.add_metadata({\"error.type\": type(exc).__name__})\n", + "\n", + " messages = [{\"role\": \"system\", \"content\": AGENT_SYSTEM_PROMPT}, {\"role\": \"user\", \"content\": context}]\n", + " tool_schema_tokens = len(json_lib.dumps(dynamic_tools)) // 4 if dynamic_tools else 0\n", + "\n", + " print(\"\\nOBSERVED TOOL-CALL LOOP\")\n", + " for iteration in range(max_iterations):\n", + " print(f\"\\n--- Iteration {iteration + 1} ---\")\n", + " run.add_metadata({\"agent.iteration\": iteration + 1})\n", + "\n", + " total_chars = sum(len(m.get(\"content\", \"\") or \"\") for m in messages)\n", + " est_tokens = (total_chars // 4) + tool_schema_tokens\n", + " context_size_history.append((run_label, iteration + 1, est_tokens))\n", + "\n", + " if max_execution_time_s is not None:\n", + " elapsed = time.time() - start_time\n", + " if elapsed > max_execution_time_s:\n", + " timed_out = True\n", + " run.add_metadata({\"agent.stop_reason\": \"time_limit\"})\n", + " print(f\"\\nTime limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", + " break\n", + "\n", + " with _trace_run(\n", + " \"agent.llm.call\",\n", + " run_type=\"llm\",\n", + " inputs={\n", + " \"llm.message_count\": len(messages),\n", + " \"tool.count\": len(dynamic_tools),\n", + " \"context.estimated_tokens\": est_tokens,\n", + " },\n", + " metadata={\"llm.model\": \"xai.grok-3-fast\"},\n", + " ) as llm_run:\n", + " try:\n", + " response = call_openai_chat(messages, tools=dynamic_tools)\n", + " msg = response.choices[0].message\n", + " tool_calls = msg.tool_calls or []\n", + " llm_run.add_outputs({\n", + " \"response.length\": _safe_len(msg.content),\n", + " \"agent.tool_call_count\": len(tool_calls),\n", + " })\n", + " except Exception as exc:\n", + " _mark_run_error(llm_run, exc)\n", + " raise\n", + "\n", + " run.add_metadata({\"agent.tool_call_count\": len(tool_calls)})\n", + "\n", + " if tool_calls:\n", + " messages.append({\"role\": \"assistant\", \"content\": msg.content, \"tool_calls\": [\n", + " {\"id\": tc.id, \"type\": \"function\", \"function\": {\"name\": tc.function.name, \"arguments\": tc.function.arguments}}\n", + " for tc in tool_calls\n", + " ]})\n", + "\n", + " for tc in tool_calls:\n", + " tool_name = tc.function.name\n", + " raw_args = tc.function.arguments or \"{}\"\n", + " try:\n", + " tool_args = json_lib.loads(raw_args)\n", + " except Exception as exc:\n", + " result = f\"Error: invalid JSON tool arguments for {tool_name}: {exc}. Raw length: {_safe_len(raw_args)}\"\n", + " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", + " steps.append(f\"{tool_name}() -> failed\")\n", + " continue\n", + "\n", + " if not isinstance(tool_args, dict):\n", + " result = f\"Error: tool arguments for {tool_name} must be a JSON object. Got {type(tool_args).__name__}.\"\n", + " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", + " steps.append(f\"{tool_name}() -> failed\")\n", + " continue\n", + "\n", + " if tool_name == \"summarize_conversation\":\n", + " tool_args[\"thread_id\"] = thread_id\n", + "\n", + " args_display = {k: (v[:50] + '...' if isinstance(v, str) and len(v) > 50 else v)\n", + " for k, v in tool_args.items()}\n", + " print(f\"Tool: {tool_name}({args_display})\")\n", + "\n", + " with _trace_run(\n", + " \"agent.tool.execute\",\n", + " run_type=\"tool\",\n", + " inputs={\"tool.argument_length\": _safe_len(raw_args)},\n", + " metadata={\"tool.name\": tool_name},\n", + " ) as tool_run:\n", + " try:\n", + " result = execute_tool(tool_name, tool_args)\n", + " steps.append(f\"{tool_name}({args_display}) -> success\")\n", + " except Exception as exc:\n", + " result = f\"Error: {exc}\"\n", + " steps.append(f\"{tool_name}({args_display}) -> failed\")\n", + " _mark_run_error(tool_run, exc)\n", + " tool_run.add_outputs({\n", + " \"tool.result_length\": _safe_len(result),\n", + " \"tool.status\": _span_status_value(result),\n", + " })\n", + "\n", + " print(f\" -> {str(result)[:200]}...\")\n", + "\n", + " with _trace_run(\"agent.tool.log\", metadata={\"tool.name\": tool_name, \"agent.thread_id\": thread_id}) as tool_log_run:\n", + " compact_result = memory_manager.write_tool_log(\n", + " thread_id, tc.id, tool_name, raw_args, str(result)\n", + " )\n", + " tool_log_run.add_outputs({\n", + " \"tool.result_length\": _safe_len(result),\n", + " \"tool.log_reference_length\": _safe_len(compact_result),\n", + " })\n", + " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": compact_result})\n", + "\n", + " if tool_name in {\"search_tavily\", \"summarize_conversation\", \"summarize_and_store\"}:\n", + " context = _build_observed_context(query, thread_id)\n", + " if len(messages) >= 2 and messages[1].get(\"role\") == \"user\":\n", + " messages[1][\"content\"] = context\n", + " with _trace_run(\"agent.context.check\") as check_run:\n", + " usage = calculate_context_usage(context)\n", + " check_run.add_outputs({\n", + " \"context.percent\": usage[\"percent\"],\n", + " \"context.estimated_tokens\": usage[\"tokens\"],\n", + " \"context.max_tokens\": usage[\"max\"],\n", + " })\n", + " print(f\" Refreshed context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", + " else:\n", + " final_answer = msg.content or \"\"\n", + " run.add_metadata({\"agent.stop_reason\": \"final_answer\"})\n", + " print(f\"\\nDONE ({len(steps)} tool calls)\")\n", + " break\n", + "\n", + " if not final_answer:\n", + " reason = \"time limit\" if timed_out else \"iteration limit\"\n", + " run.add_metadata({\"agent.stop_reason\": reason})\n", + " print(f\"\\nStopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", + " try:\n", + " final_messages = messages + [{\"role\": \"user\", \"content\": \"Finalize your answer using the context and tool outputs so far. Do not call tools.\"}]\n", + " with _trace_run(\n", + " \"agent.llm.call\",\n", + " run_type=\"llm\",\n", + " inputs={\"llm.message_count\": len(final_messages), \"tool.count\": 0},\n", + " metadata={\"llm.model\": \"xai.grok-3-fast\"},\n", + " ) as llm_run:\n", + " final_resp = call_openai_chat(final_messages, tools=None)\n", + " final_answer = final_resp.choices[0].message.content or \"\"\n", + " llm_run.add_outputs({\"response.length\": _safe_len(final_answer)})\n", + " except Exception as exc:\n", + " final_answer = f\"Error: unable to finalize answer: {exc}\"\n", + " _mark_run_error(run, exc)\n", + "\n", + " if steps:\n", + " with _trace_run(\"agent.memory.write\", metadata={\"memory.type\": \"workflow\"}) as write_run:\n", + " memory_manager.write_workflow(query, steps, final_answer)\n", + " write_run.add_outputs({\"workflow.step_count\": len(steps)})\n", + "\n", + " with _trace_run(\"agent.memory.write\", metadata={\"memory.type\": \"entity\", \"memory.source\": \"assistant_message\"}) as write_run:\n", + " try:\n", + " memory_manager.write_entity(\"\", \"\", \"\", llm_client=client, text=final_answer)\n", + " write_run.add_outputs({\"memory.write.status\": \"ok\"})\n", + " except Exception as exc:\n", + " write_run.add_outputs({\"memory.write.status\": \"skipped\"})\n", + " write_run.add_metadata({\"error.type\": type(exc).__name__})\n", + "\n", + " with _trace_run(\"agent.memory.write\", metadata={\"memory.type\": \"conversational\", \"memory.role\": \"assistant\", \"agent.thread_id\": thread_id}) as write_run:\n", + " record_id = memory_manager.write_conversational_memory(final_answer, \"assistant\", thread_id)\n", + " write_run.add_outputs({\"memory.record_id\": str(record_id), \"memory.content_length\": _safe_len(final_answer)})\n", + "\n", + " run.add_outputs({\n", + " \"response.length\": _safe_len(final_answer),\n", + " \"tool.call_steps\": len(steps),\n", + " })\n", + "\n", + " try:\n", + " observability[\"client\"].flush()\n", + " except Exception:\n", + " pass\n", + "\n", + " print(\"\\n\" + \"=\"*50 + f\"\\nANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", + " return final_answer\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 18\n", + "assert callable(call_agent_observed), \"TODO 18 incomplete \u2014 call_agent_observed is not defined.\"\n", + "print(\"TODO 18 passed \u2014 call_agent_observed is defined\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 19: Run Observed Turns and Inspect LangSmith\n", + "\n", + "Run a few short turns with a fresh thread ID. Then open LangSmith, select project `agent-memory-workshop`, and inspect the latest `agent.run` traces.\n", + "\n", + "Look for child runs that show memory reads, tool calls, LLM calls, context checks, and memory writes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "observed_thread = \"observed-0022\"\n", + "observed_queries = [\n", + " \"Find papers about memory in AI agents\",\n", + " \"What did we just discuss?\",\n", + " \"Search the web for recent agent observability ideas\",\n", + "]\n", + "\n", + "for q in observed_queries:\n", + " call_agent_observed(q, thread_id=observed_thread, max_iterations=5)\n", + "\n", + "print(\"Open LangSmith and select project: agent-memory-workshop\")\n", + "print(\"LangSmith: https://smith.langchain.com\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 19\n", + "print(\"TODO 19 checkpoint \u2014 after running observed turns, open LangSmith and inspect project 'agent-memory-workshop'.\")\n", + "print(\"LangSmith: https://smith.langchain.com\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What You Should See in LangSmith\n", + "\n", + "Open the latest `agent.run` trace. The child runs show the agent's execution path in order.\n", + "\n", + "The most useful runs are usually:\n", + "\n", + "- `agent.context.build` \u2014 how much context the agent assembled\n", + "- `agent.memory.read` \u2014 which memory systems were queried\n", + "- `agent.toolbox.read` \u2014 which tools were selected\n", + "- `agent.llm.call` \u2014 where model time is spent\n", + "- `agent.tool.execute` \u2014 whether tools ran and how large their outputs were\n", + "- `agent.memory.write` \u2014 what the agent persisted for the next turn\n", + "\n", + "This is the operational view behind the Part 6 context growth chart. The chart shows the outcome; the trace shows the path.\n" + ] + }, { "cell_type": "markdown", "id": "187a3a2a", @@ -3351,9 +3849,9 @@ "\n", "Now that you've built a complete memory and context engineering system, here are resources to keep going:\n", "\n", - "- **[Agent Memory: Building Memory-Aware Agents](https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/)** — DeepLearning.AI short course for deeper exploration of agent memory patterns\n", - "- **[Oracle AI Developer Hub](https://github.com/oracle-devrel/oracle-ai-developer-hub)** — More technical assets, samples, and projects with Oracle AI\n", - "- **[Oracle Developer Resource](https://www.oracle.com/developer/)** — Documentation, tools, and community for Oracle developers" + "- **[Agent Memory: Building Memory-Aware Agents](https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/)** \u2014 DeepLearning.AI short course for deeper exploration of agent memory patterns\n", + "- **[Oracle AI Developer Hub](https://github.com/oracle-devrel/oracle-ai-developer-hub)** \u2014 More technical assets, samples, and projects with Oracle AI\n", + "- **[Oracle Developer Resource](https://www.oracle.com/developer/)** \u2014 Documentation, tools, and community for Oracle developers" ] } ], diff --git a/workshops/agent_memory_workshop/workshop/notebook_student.ipynb b/workshops/agent_memory_workshop/workshop/notebook_student.ipynb index 569f59af..0ba14a1c 100644 --- a/workshops/agent_memory_workshop/workshop/notebook_student.ipynb +++ b/workshops/agent_memory_workshop/workshop/notebook_student.ipynb @@ -16,7 +16,7 @@ "\n", "\n", "In this notebook, you'll learn how to engineer memory systems that give AI agents the ability to remember, learn, and adapt across conversations. \n", - "Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory types—each serving a specific cognitive function." + "Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory types\u2014each serving a specific cognitive function." ] }, { @@ -26,10 +26,10 @@ "source": [ "## The Use Case: A Research Paper Assistant\n", "\n", - "Throughout this workshop you will build a **Research Paper Assistant** — an AI agent that can search, retrieve, and reason over arxiv research papers. \n", + "Throughout this workshop you will build a **Research Paper Assistant** \u2014 an AI agent that can search, retrieve, and reason over arxiv research papers. \n", "It ingests 50 papers into Oracle AI Database as vectors, answers multi-turn questions using memory that persists across conversations, \n", "and reaches the live web via Tavily when its knowledge base isn't enough. \n", - "The assistant is the vehicle — the real goal is learning the memory and context engineering patterns that make any agent reliable at scale." + "The assistant is the vehicle \u2014 the real goal is learning the memory and context engineering patterns that make any agent reliable at scale." ] }, { @@ -59,7 +59,7 @@ "source": [ "## The End Result\n", "\n", - "By the end of this workshop, your memory-engineered agent will keep its context window flat and stable — while a naive agent without memory or context engineering spirals toward the token limit within a few turns.\n", + "By the end of this workshop, your memory-engineered agent will keep its context window flat and stable \u2014 while a naive agent without memory or context engineering spirals toward the token limit within a few turns.\n", "\n", "![Context Window Growth: Engineered vs Naive Agent](../images/end_result.png)\n", "\n", @@ -116,7 +116,7 @@ "# If you are running this notebook locally (outside Codespaces), uncommenet the code below and run this cell first.\n", "# Otherwise you can skip it.\n", "\n", - "# ! pip install -qU langchain-oracledb sentence-transformers langchain-openai langchain tavily-python datasets oracledb openai matplotlib" + "# ! pip install -qU langchain-oracledb sentence-transformers langchain-community langchain-openai langchain-huggingface langchain tavily-python datasets oracledb openai matplotlib langsmith\n" ] }, { @@ -128,7 +128,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-1-oracle-setup.md](../docs/part-1-oracle-setup.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-1-oracle-setup.md](../docs/part-1-oracle-setup.md)\n" ] }, { @@ -140,7 +140,7 @@ "\n", "In this Codespace, **Oracle AI Database** is already running as a Docker service alongside your development environment. You do not need to install or start Docker manually.\n", "\n", - "Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engine. It supports native `VECTOR` column types, HNSW vector indexes, and SQL-based similarity search — making it purpose-built for AI agent memory infrastructure.\n", + "Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engine. It supports native `VECTOR` column types, HNSW vector indexes, and SQL-based similarity search \u2014 making it purpose-built for AI agent memory infrastructure.\n", "\n", "**What is already running:**\n", "- Oracle AI Database (`gvenzl/oracle-free`) on port `1521`\n", @@ -198,7 +198,7 @@ " dsn=dsn,\n", " program=program\n", " )\n", - " print(\"✓ Connected successfully!\")\n", + " print(\"\u2713 Connected successfully!\")\n", " \n", " # Test the connection\n", " with conn.cursor() as cur:\n", @@ -210,10 +210,10 @@ " \n", " except oracledb.OperationalError as e:\n", " error_msg = str(e)\n", - " print(f\"✗ Connection failed (attempt {attempt}/{max_retries})\")\n", + " print(f\"\u2717 Connection failed (attempt {attempt}/{max_retries})\")\n", " \n", " if \"DPY-4011\" in error_msg or \"Connection reset by peer\" in error_msg:\n", - " print(\" → This usually means:\")\n", + " print(\" \u2192 This usually means:\")\n", " print(\" 1. Database is still starting up (wait 2-3 minutes)\")\n", " print(\" 2. Listener configuration issue\")\n", " print(\" 3. Container is not running\")\n", @@ -226,7 +226,7 @@ " else:\n", " raise\n", " except Exception as e:\n", - " print(f\"✗ Unexpected error: {e}\")\n", + " print(f\"\u2717 Unexpected error: {e}\")\n", " raise\n", " \n", " raise ConnectionError(\"Failed to connect after all retries\")" @@ -237,7 +237,7 @@ "id": "1f8bacbe", "metadata": {}, "source": [ - "> Connect as the `VECTOR` user — a dedicated schema created for storing embeddings and agent memory. All workshop operations use this user rather than `SYS` to follow the principle of least privilege.\n" + "> Connect as the `VECTOR` user \u2014 a dedicated schema created for storing embeddings and agent memory. All workshop operations use this user rather than `SYS` to follow the principle of least privilege.\n" ] }, { @@ -282,13 +282,13 @@ " cur.execute(f'DROP INDEX \"{idx}\"')\n", " dropped.append(idx)\n", " except Exception as e:\n", - " print(f\" ⚠️ Could not drop index {idx}: {e}\")\n", + " print(f\" \u26a0\ufe0f Could not drop index {idx}: {e}\")\n", "\n", " conn.commit()\n", " if dropped:\n", - " print(f\"🧹 One-time cleanup: dropped {len(dropped)} old index(es): {', '.join(dropped)}\")\n", + " print(f\"\ud83e\uddf9 One-time cleanup: dropped {len(dropped)} old index(es): {', '.join(dropped)}\")\n", " else:\n", - " print(\"🧹 One-time cleanup: no existing user-created indexes on VECTOR_SEARCH_DEMO\")\n", + " print(\"\ud83e\uddf9 One-time cleanup: no existing user-created indexes on VECTOR_SEARCH_DEMO\")\n", "\n", "one_time_cleanup_vector_demo_indexes(vector_conn)\n" ] @@ -298,7 +298,7 @@ "id": "365bcafb", "metadata": {}, "source": [ - "✅ **Connection established!** Oracle AI Database is running in this Codespace and your `vector_conn` is active.\n", + "\u2705 **Connection established!** Oracle AI Database is running in this Codespace and your `vector_conn` is active.\n", "\n", "Next, we will create vector-enabled SQL tables using **LangChain's OracleVS integration** to store embeddings and metadata for semantic search.\n" ] @@ -308,9 +308,9 @@ "id": "cd4509ad", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 1**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 1**\n", ">\n", - "> Oracle AI Database is not a separate vector database — it is a converged engine where vectors, relational data, and SQL queries coexist. This means your agent's memory lives in one ACID-compliant system, not scattered across specialised stores." + "> Oracle AI Database is not a separate vector database \u2014 it is a converged engine where vectors, relational data, and SQL queries coexist. This means your agent's memory lives in one ACID-compliant system, not scattered across specialised stores." ] }, { @@ -322,7 +322,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-2-vector-search.md](../docs/part-2-vector-search.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-2-vector-search.md](../docs/part-2-vector-search.md)\n" ] }, { @@ -358,9 +358,9 @@ "id": "d8bf7a8c", "metadata": {}, "source": [ - "> 💡 **Key Definition — Vector Search**\n", + "> \ud83d\udca1 **Key Definition \u2014 Vector Search**\n", ">\n", - "> Vector search finds documents by *meaning*, not keywords. Text is converted to a numeric vector (embedding), and retrieval is based on distance between vectors in high-dimensional space. Two documents about the same topic will be close together — even if they share no words." + "> Vector search finds documents by *meaning*, not keywords. Text is converted to a numeric vector (embedding), and retrieval is based on distance between vectors in high-dimensional space. Two documents about the same topic will be close together \u2014 even if they share no words." ] }, { @@ -377,7 +377,7 @@ "\n", "**Your task:** Initialise `OracleVS` in the cell below using the provided parameters. Then run the next two cells to create the HNSW index and ingest 50 research paper abstracts. You need data in the table before the search cells in Step 3 will return results.\n", "\n", - "> 📖 Open **docs/part-2-vector-search.md** if you need guidance.\n" + "> \ud83d\udcd6 Open **docs/part-2-vector-search.md** if you need guidance.\n" ] }, { @@ -422,12 +422,12 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 1\n", + "# \u2705 Checkpoint: TODO 1\n", "assert \"vector_store\" in dir() and vector_store is not None, (\n", - " \"❌ TODO 1 incomplete — vector_store is not defined.\\n\"\n", + " \"\u274c TODO 1 incomplete \u2014 vector_store is not defined.\\n\"\n", " \"Go back and initialise OracleVS in the cell above.\"\n", ")\n", - "print(\"✅ TODO 1 passed — vector_store is initialised\")" + "print(\"\u2705 TODO 1 passed \u2014 vector_store is initialised\")" ] }, { @@ -446,11 +446,11 @@ " vector_store=vs,\n", " params={\"idx_name\": idx_name, \"idx_type\": \"HNSW\"}\n", " )\n", - " print(f\" ✅ Created index: {idx_name}\")\n", + " print(f\" \u2705 Created index: {idx_name}\")\n", " except Exception as e:\n", " err = str(e)\n", " if \"ORA-00955\" in err:\n", - " print(f\" ⏭️ Index already exists: {idx_name} (skipped)\")\n", + " print(f\" \u23ed\ufe0f Index already exists: {idx_name} (skipped)\")\n", " else:\n", " raise\n" ] @@ -497,7 +497,7 @@ "source": [ "Rather than downloading all papers at once into memory, streaming=True pulls them one at a time as you loop over them. \n", "\n", - "This is important because the full dataset could be millions of rows — streaming means you only ever hold one paper in memory at a time. `MAX_PAPERS = 50` then acts as an early exit so you only take the first 50.\n" + "This is important because the full dataset could be millions of rows \u2014 streaming means you only ever hold one paper in memory at a time. `MAX_PAPERS = 50` then acts as an early exit so you only take the first 50.\n" ] }, { @@ -523,7 +523,7 @@ "The loop builds three lists simultaneously from the same 50 papers:\n", "\n", "- `sampled_papers`: a full copy of each paper's raw fields, kept for inspection and reuse later in the notebook.\n", - "- `texts`: just the title and abstract combined into a single string. This is what gets embedded — the actual content that Oracle will turn into a vector\n", + "- `texts`: just the title and abstract combined into a single string. This is what gets embedded \u2014 the actual content that Oracle will turn into a vector\n", "- `metadata`: the identifiers and labels (arxiv ID, subject, authors) stored alongside the vector but not embedded. This is what you get back when you search, so you know which paper matched" ] }, @@ -567,16 +567,16 @@ "\n", " # TODO 2: Append to the three lists that feed the vector store.\n", " #\n", - " # 1. Append to sampled_papers — a dict with keys:\n", + " # 1. Append to sampled_papers \u2014 a dict with keys:\n", " # arxiv_id, title, abstract, primary_subject, authors\n", " #\n", - " # 2. Append to texts — the combined title + abstract string\n", + " # 2. Append to texts \u2014 the combined title + abstract string\n", " # (already stored in the variable `text`)\n", " #\n", - " # 3. Append to metadata — a dict with keys:\n", + " # 3. Append to metadata \u2014 a dict with keys:\n", " # id (same as arxiv_id), arxiv_id, title, primary_subject, authors\n", " #\n", - " # These three lists must stay in sync — each index i refers to the same paper.\n", + " # These three lists must stay in sync \u2014 each index i refers to the same paper.\n", " # texts[i] is what gets embedded. metadata[i] is what gets returned on search.\n", " #\n", " # YOUR CODE HERE" @@ -588,16 +588,16 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 2\n", + "# \u2705 Checkpoint: TODO 2\n", "assert \"sampled_papers\" in dir() and len(sampled_papers) > 0, (\n", - " \"❌ TODO 2 incomplete — sampled_papers is empty.\\n\"\n", + " \"\u274c TODO 2 incomplete \u2014 sampled_papers is empty.\\n\"\n", " \"Go back and complete the data loading loop.\"\n", ")\n", "assert \"texts\" in dir() and len(texts) > 0, (\n", - " \"❌ TODO 2 incomplete — texts list is empty.\\n\"\n", + " \"\u274c TODO 2 incomplete \u2014 texts list is empty.\\n\"\n", " \"Go back and build the texts list from the streamed data.\"\n", ")\n", - "print(f\"✅ TODO 2 passed — {len(sampled_papers)} papers loaded, {len(texts)} texts ready\")" + "print(f\"\u2705 TODO 2 passed \u2014 {len(sampled_papers)} papers loaded, {len(texts)} texts ready\")" ] }, { @@ -610,7 +610,7 @@ "1. Passes the text through the HuggingFace embedding model to produce a 768-dimensional vector\n", "Inserts both the vector and the metadata into the `VECTOR_SEARCH_DEMO` table in Oracle\n", "\n", - "2. After this cell completes, Oracle contains 50 rows — each with a vector representing the semantic meaning of that paper's title and abstract, plus the metadata needed to identify it. Every similarity search you run from this point is searching across those 50 vectors." + "2. After this cell completes, Oracle contains 50 rows \u2014 each with a vector representing the semantic meaning of that paper's title and abstract, plus the metadata needed to identify it. Every similarity search you run from this point is searching across those 50 vectors." ] }, { @@ -626,7 +626,7 @@ " metadatas=metadata,\n", ")\n", "\n", - "print(f\"✅ Ingested {len(texts)} research papers into VECTOR_SEARCH_DEMO\")" + "print(f\"\u2705 Ingested {len(texts)} research papers into VECTOR_SEARCH_DEMO\")" ] }, { @@ -661,7 +661,7 @@ "source": [ "### Basic Similarity Search\n", "\n", - "Run a semantic search against the 50 ingested papers. Notice that the query does not need to share any keywords with the documents — Oracle finds papers whose *meaning* is closest to your query.\n", + "Run a semantic search against the 50 ingested papers. Notice that the query does not need to share any keywords with the documents \u2014 Oracle finds papers whose *meaning* is closest to your query.\n", "\n", "**Your task:** Complete the cell below using `similarity_search()`. Return the top 3 results and print `page_content` and `metadata` for each.\n" ] @@ -689,10 +689,10 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 3\n", + "# \u2705 Checkpoint: TODO 3\n", "# If you reached this cell without errors, similarity_search() ran successfully.\n", "# Verify: did you see paper titles and metadata printed above?\n", - "print(\"✅ TODO 3 passed — similarity_search executed\")" + "print(\"\u2705 TODO 3 passed \u2014 similarity_search executed\")" ] }, { @@ -727,10 +727,10 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 4\n", + "# \u2705 Checkpoint: TODO 4\n", "# If you reached this cell without errors, similarity_search_with_score() ran successfully.\n", "# Verify: did you see scores alongside each result above?\n", - "print(\"✅ TODO 4 passed — similarity_search_with_score executed\")" + "print(\"\u2705 TODO 4 passed \u2014 similarity_search_with_score executed\")" ] }, { @@ -740,12 +740,12 @@ "source": [ "**Filtered Similarity Search**\n", "\n", - "Sometimes you want semantic search within a specific category — not across all documents. The cell below combines a natural language query with an exact metadata filter, restricting results to papers whose `primary_subject` matches a specific value.\n", + "Sometimes you want semantic search within a specific category \u2014 not across all documents. The cell below combines a natural language query with an exact metadata filter, restricting results to papers whose `primary_subject` matches a specific value.\n", "\n", - "This runs entirely inside Oracle as a single operation: the vector similarity and the metadata filter are evaluated together, not in sequence. Oracle does not fetch all matching vectors first and then filter — it applies both conditions simultaneously, which keeps it fast at scale.\n", + "This runs entirely inside Oracle as a single operation: the vector similarity and the metadata filter are evaluated together, not in sequence. Oracle does not fetch all matching vectors first and then filter \u2014 it applies both conditions simultaneously, which keeps it fast at scale.\n", "\n", "\n", - "`{\"primary_subject\": {\"$eq\": value}}` is the filter syntax. $eq means exact match — equivalent to WHERE primary_subject = value in SQL." + "`{\"primary_subject\": {\"$eq\": value}}` is the filter syntax. $eq means exact match \u2014 equivalent to WHERE primary_subject = value in SQL." ] }, { @@ -808,18 +808,18 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 5\n", + "# \u2705 Checkpoint: TODO 5\n", "assert len(docs) > 0, (\n", - " \"❌ TODO 5 incomplete — no results returned.\\n\"\n", + " \"\u274c TODO 5 incomplete \u2014 no results returned.\\n\"\n", " \"Did you add the filter= argument with the sample_arxiv_id?\"\n", ")\n", - "# Check the filter worked — all results should have the same arxiv_id\n", + "# Check the filter worked \u2014 all results should have the same arxiv_id\n", "_ids = [d.metadata.get(\"id\") or d.metadata.get(\"arxiv_id\") for d in docs]\n", "assert all(i == sample_arxiv_id for i in _ids if i), (\n", - " \"❌ TODO 5 — filter didn't restrict to sample_arxiv_id.\\n\"\n", + " \"\u274c TODO 5 \u2014 filter didn't restrict to sample_arxiv_id.\\n\"\n", " \"Use: filter={\\\"id\\\": {\\\"$in\\\": [sample_arxiv_id]}}\"\n", ")\n", - "print(f\"✅ TODO 5 passed — fetched {len(docs)} doc(s) filtered to {sample_arxiv_id}\")" + "print(f\"\u2705 TODO 5 passed \u2014 fetched {len(docs)} doc(s) filtered to {sample_arxiv_id}\")" ] }, { @@ -829,11 +829,11 @@ "source": [ "**Why would you ever do this?**\n", "\n", - "This pattern is useful in agent memory when you already know which document you want but you want to retrieve it through the same vector search pipeline that retrieves everything else. For example, an agent might know the arxiv ID of a paper the user mentioned earlier and want to pull it back into context — using $in filter lets you do that without writing a separate SQL query.\n", + "This pattern is useful in agent memory when you already know which document you want but you want to retrieve it through the same vector search pipeline that retrieves everything else. For example, an agent might know the arxiv ID of a paper the user mentioned earlier and want to pull it back into context \u2014 using $in filter lets you do that without writing a separate SQL query.\n", "\n", - "**The`$in` operator** works like SQL's IN clause — it matches any document whose id is in the provided list. You could pass multiple IDs: {\"id\": {\"$in\": [id1, id2, id3]}} to restrict the search to a specific set of documents.\n", + "**The`$in` operator** works like SQL's IN clause \u2014 it matches any document whose id is in the provided list. You could pass multiple IDs: {\"id\": {\"$in\": [id1, id2, id3]}} to restrict the search to a specific set of documents.\n", "\n", - "**The broader point for the workshop:** Oracle's vector search is not separate from its SQL capabilities — filters run as SQL predicates inside Oracle, meaning you get the full expressiveness of SQL filtering combined with semantic vector search in a single query. This is one of the key advantages of using a converged database over a standalone vector store." + "**The broader point for the workshop:** Oracle's vector search is not separate from its SQL capabilities \u2014 filters run as SQL predicates inside Oracle, meaning you get the full expressiveness of SQL filtering combined with semantic vector search in a single query. This is one of the key advantages of using a converged database over a standalone vector store." ] }, { @@ -841,9 +841,9 @@ "id": "c52e3f8f", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 2**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 2**\n", ">\n", - "> Vector search retrieves by meaning, not keywords — but metadata filters let you combine semantic similarity with exact constraints in a single query. This hybrid approach is what makes vector search practical for agent memory: find what's *relevant* and *scoped* at the same time." + "> Vector search retrieves by meaning, not keywords \u2014 but metadata filters let you combine semantic similarity with exact constraints in a single query. This hybrid approach is what makes vector search practical for agent memory: find what's *relevant* and *scoped* at the same time." ] }, { @@ -854,7 +854,7 @@ "# Part 3: Memory Engineering and Agent Memory\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-3-memory-engineering.md](../docs/part-3-memory-engineering.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-3-memory-engineering.md](../docs/part-3-memory-engineering.md)\n" ] }, { @@ -863,10 +863,10 @@ "metadata": {}, "source": [ "\n", - "**`Agent Memory`** is the exocortex that augments an LLM—capturing, encoding, storing, linking, and retrieving information beyond the model's parametric and contextual limits. \n", + "**`Agent Memory`** is the exocortex that augments an LLM\u2014capturing, encoding, storing, linking, and retrieving information beyond the model's parametric and contextual limits. \n", "It provides the persistence and structure required for long-horizon reasoning and reliable behaviour.\n", "\n", - "**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right time—so agents can behave reliably, believably, and capabaly.\n", + "**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right time\u2014so agents can behave reliably, believably, and capabaly.\n", "\n", "This is the core section of the notebook where we build a complete **`Memory Manager`** for AI agents. \n", "\n", @@ -898,7 +898,7 @@ "| **Summary** | Compressed memory | Condensed context for long conversations | Vector-Enabled SQL Table |\n", "| **Tool Log** | Episodic memory | Raw tool call outputs offloaded from context | SQL Table |\n", "\n", - "> **Note on Tool Log:** Tool Log is a form of episodic memory — it records *what happened* during each tool execution. Beyond keeping the context window lean, tool logs can serve as a source from which **procedural memories** (workflow patterns) and **semantic memories** (knowledge base entries) can be distilled over time.\n", + "> **Note on Tool Log:** Tool Log is a form of episodic memory \u2014 it records *what happened* during each tool execution. Beyond keeping the context window lean, tool logs can serve as a source from which **procedural memories** (workflow patterns) and **semantic memories** (knowledge base entries) can be distilled over time.\n", "\n", "## Steps in This Section\n", "\n", @@ -916,7 +916,7 @@ "id": "eca462ff", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agent Memory**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agent Memory**\n", ">\n", "> Agent memory is the persistent infrastructure that gives a stateless LLM the ability to remember across turns, sessions, and tasks. Without it, every inference starts from scratch." ] @@ -934,7 +934,7 @@ "id": "4b95d6d4", "metadata": {}, "source": [ - "> 💡 **Key Definition — Memory Engineering**\n", + "> \ud83d\udca1 **Key Definition \u2014 Memory Engineering**\n", ">\n", "> Memory engineering is the design of *what* to store, *where* to store it, and *when* to retrieve it. It is the scaffolding that moves information into, through, and across an AI system so agents behave reliably over long horizons." ] @@ -985,7 +985,7 @@ " if \"ORA-00942\" in str(e):\n", " print(f\" - {table} (not exists)\")\n", " else:\n", - " print(f\" ✗ {table}: {e}\")\n", + " print(f\" \u2717 {table}: {e}\")\n", " \n", "vector_conn.commit()" ] @@ -1025,7 +1025,7 @@ "- Adds an index on `thread_id` for fast conversation lookups\n", "- Adds an index on `timestamp` for chronological ordering\n", "\n", - "The `summary_id` column is critical for Part 4 — it links messages to their summaries when the context window is compacted." + "The `summary_id` column is critical for Part 4 \u2014 it links messages to their summaries when the context window is compacted." ] }, { @@ -1092,13 +1092,13 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 6\n", + "# \u2705 Checkpoint: TODO 6\n", "_test_result = create_conversational_history_table.__name__ if callable(create_conversational_history_table) else None\n", "assert _test_result is not None, (\n", - " \"❌ TODO 6 incomplete — create_conversational_history_table is not callable.\\n\"\n", + " \"\u274c TODO 6 incomplete \u2014 create_conversational_history_table is not callable.\\n\"\n", " \"Go back and complete the function.\"\n", ")\n", - "print(\"✅ TODO 6 passed — function is defined\")" + "print(\"\u2705 TODO 6 passed \u2014 function is defined\")" ] }, { @@ -1119,11 +1119,11 @@ "source": [ "### Step 1b: Create Tool Log Table (Experimental Memory)\n", "\n", - "Tool call outputs during agent execution can **bloat the context window** quickly — a single web search might return thousands of tokens that are only needed once. \n", + "Tool call outputs during agent execution can **bloat the context window** quickly \u2014 a single web search might return thousands of tokens that are only needed once. \n", "\n", "The `TOOL_LOG` table acts as an **experimental memory**: full tool outputs are persisted to the database and replaced in the context window with a compact one-line reference. The agent can retrieve full outputs later if needed via `read_tool_log`.\n", "\n", - "This is a form of **context offloading** — keeping the working memory lean while preserving full fidelity in durable storage." + "This is a form of **context offloading** \u2014 keeping the working memory lean while preserving full fidelity in durable storage." ] }, { @@ -1167,7 +1167,7 @@ "source": [ "### Step 1c: Create Vector-Enabled Tables for Each Memory Type\n", "\n", - "Here we create 5 separate OracleVS-backed vector-enabled SQL tables—one for each memory type. \n", + "Here we create 5 separate OracleVS-backed vector-enabled SQL tables\u2014one for each memory type. \n", "\n", "Each semantic memory is backed by its own Oracle table with a VECTOR column and uses the same embedding model for consistency.\n", "\n", @@ -1259,7 +1259,7 @@ " for p in sampled_papers\n", " ]\n", " knowledge_base_vs.add_texts(kb_texts, kb_meta)\n", - " print(f\"✅ Seeded knowledge base memory with {len(kb_texts)} arXiv papers\")\n" + " print(f\"\u2705 Seeded knowledge base memory with {len(kb_texts)} arXiv papers\")\n" ] }, { @@ -1281,21 +1281,21 @@ "\n", "| Operation | Programmatic | Agent-Triggered | Notes |\n", "|-----------|:------------:|:---------------:|-------|\n", - "| `read_conversational_memory()` | ✅ | ❌ | Always loaded at loop start (unsummarized units only) |\n", - "| `read_knowledge_base()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_workflow()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_entity()` | ✅ | ❌ | Always loaded at loop start |\n", - "| `read_summary_context()` | ✅ | ❌ | Always loaded at loop start (IDs + descriptions) |\n", - "| `read_toolbox()` | ✅ | ❌ | Tool schemas are retrieved before model reasoning |\n", - "| `write_conversational_memory()` | ✅ | ❌ | User message (pre-loop) + assistant answer (post-loop) |\n", - "| `write_workflow()` | ✅ | ❌ | Persisted after loop when tool steps exist |\n", - "| `write_entity()` | ✅ | ❌ | Best-effort extraction around user/final assistant text |\n", - "| `write_tool_log()` | ✅ | ❌ | Full tool output offloaded to DB after every tool execution |\n", - "| Tool-call decision (`tool_choice=auto`) | ❌ | ✅ | Model decides whether to call tools |\n", - "| `search_tavily()` | ❌ | ✅ | Agent-triggered external retrieval |\n", - "| `expand_summary()` | ❌ | ✅ | Agent-triggered just-in-time summary expansion |\n", - "| `summarize_and_store()` | ❌ | ✅ | Agent-triggered context compaction primitive |\n", - "| `summarize_conversation()` | ❌ | ✅ | Agent-triggered conversation compaction for active thread |\n", + "| `read_conversational_memory()` | \u2705 | \u274c | Always loaded at loop start (unsummarized units only) |\n", + "| `read_knowledge_base()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_workflow()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_entity()` | \u2705 | \u274c | Always loaded at loop start |\n", + "| `read_summary_context()` | \u2705 | \u274c | Always loaded at loop start (IDs + descriptions) |\n", + "| `read_toolbox()` | \u2705 | \u274c | Tool schemas are retrieved before model reasoning |\n", + "| `write_conversational_memory()` | \u2705 | \u274c | User message (pre-loop) + assistant answer (post-loop) |\n", + "| `write_workflow()` | \u2705 | \u274c | Persisted after loop when tool steps exist |\n", + "| `write_entity()` | \u2705 | \u274c | Best-effort extraction around user/final assistant text |\n", + "| `write_tool_log()` | \u2705 | \u274c | Full tool output offloaded to DB after every tool execution |\n", + "| Tool-call decision (`tool_choice=auto`) | \u274c | \u2705 | Model decides whether to call tools |\n", + "| `search_tavily()` | \u274c | \u2705 | Agent-triggered external retrieval |\n", + "| `expand_summary()` | \u274c | \u2705 | Agent-triggered just-in-time summary expansion |\n", + "| `summarize_and_store()` | \u274c | \u2705 | Agent-triggered context compaction primitive |\n", + "| `summarize_conversation()` | \u274c | \u2705 | Agent-triggered conversation compaction for active thread |\n", "\n", "### What Is Programmatic in This Harness\n", "\n", @@ -1305,7 +1305,7 @@ "2. **Tool schema retrieval** before each model call.\n", "3. **Memory persistence** around the loop (store user turn, store assistant turn, persist workflow/entity updates).\n", "4. **Tool execution dispatch** after a tool call is chosen (once selected by the model, execution is deterministic in code).\n", - "5. **Tool output offloading** via `write_tool_log()` — full outputs are persisted to the database and replaced with compact references in the context window.\n", + "5. **Tool output offloading** via `write_tool_log()` \u2014 full outputs are persisted to the database and replaced with compact references in the context window.\n", "\n", "### What Is Agent-Triggered in This Harness\n", "\n", @@ -1318,9 +1318,9 @@ "\n", "### Why This Split Works for Memory-Centric Agents\n", "\n", - "1. **Reliability from programmatic memory** — critical memory load/save behavior never depends on the model remembering to do it.\n", - "2. **Adaptivity from agent-triggered tools** — the model can selectively fetch/expand/compact only when needed.\n", - "3. **Clear control boundaries** — the harness owns state integrity; the model owns strategy inside those boundaries." + "1. **Reliability from programmatic memory** \u2014 critical memory load/save behavior never depends on the model remembering to do it.\n", + "2. **Adaptivity from agent-triggered tools** \u2014 the model can selectively fetch/expand/compact only when needed.\n", + "3. **Clear control boundaries** \u2014 the harness owns state integrity; the model owns strategy inside those boundaries." ] }, { @@ -1353,11 +1353,11 @@ "\n", "### Key Features\n", "\n", - "- **Thread-based conversations** — Messages are organized by `thread_id` for multi-conversation support\n", - "- **Semantic search** — Vector-enabled SQL tables enable finding relevant content by meaning, not just keywords\n", - "- **Metadata filtering** — Workflows filter by `num_steps > 0`, summaries filter by `id`\n", - "- **LLM-powered entity extraction** — Automatically extracts people, places, and systems from text\n", - "- **Formatted context output** — Each read method returns formatted text ready for the LLM context\n" + "- **Thread-based conversations** \u2014 Messages are organized by `thread_id` for multi-conversation support\n", + "- **Semantic search** \u2014 Vector-enabled SQL tables enable finding relevant content by meaning, not just keywords\n", + "- **Metadata filtering** \u2014 Workflows filter by `num_steps > 0`, summaries filter by `id`\n", + "- **LLM-powered entity extraction** \u2014 Automatically extracts people, places, and systems from text\n", + "- **Formatted context output** \u2014 Each read method returns formatted text ready for the LLM context\n" ] }, { @@ -1371,7 +1371,7 @@ "\n", "### Bind Variables in Oracle SQL\n", "\n", - "In the TODO below, you'll use **bind variables** — placeholders like `:thread_id`, `:role`, and `:content` in your SQL strings that get filled in safely at execution time. Instead of building SQL with f-strings (which is vulnerable to SQL injection), you write:\n", + "In the TODO below, you'll use **bind variables** \u2014 placeholders like `:thread_id`, `:role`, and `:content` in your SQL strings that get filled in safely at execution time. Instead of building SQL with f-strings (which is vulnerable to SQL injection), you write:\n", "\n", "```python\n", "cur.execute(\"INSERT INTO my_table (name) VALUES (:name)\", {\"name\": user_input})\n", @@ -1379,7 +1379,7 @@ "\n", "Oracle parses the SQL template once and reuses it, which is both **safer** (no injection) and **faster** (cached execution plans).\n", "\n", - "You'll also see an **output bind variable** — a special pattern for getting data back from an INSERT:\n", + "You'll also see an **output bind variable** \u2014 a special pattern for getting data back from an INSERT:\n", "\n", "```python\n", "id_var = cur.var(str) # create a placeholder to receive a value\n", @@ -1510,7 +1510,7 @@ " \"\"\", {\"summary_id\": summary_id, \"thread_id\": thread_id})\n", " count = cur.rowcount\n", " self.conn.commit()\n", - " print(f\" 📦 Marked {count} messages as summarized (summary_id: {summary_id})\")\n", + " print(f\" \ud83d\udce6 Marked {count} messages as summarized (summary_id: {summary_id})\")\n", "\n", " # ==================== KNOWLEDGE BASE (Vector-Enabled SQL Table) ====================\n", " \n", @@ -1671,7 +1671,7 @@ " When called with llm_client + text, extracts entities from the text using the LLM.\n", "\n", " TODO 11: Implement the direct storage branch (no LLM extraction).\n", - " Steps (for the else branch — direct storage):\n", + " Steps (for the else branch \u2014 direct storage):\n", " 1. Call self.entity_vs.add_texts() with:\n", " texts = [f\"{name} ({entity_type}): {description}\"]\n", " metadatas = [{\"name\": name, \"type\": entity_type, \"description\": description}]\n", @@ -1679,7 +1679,7 @@ " The LLM extraction branch (if text and llm_client) is provided for you below.\n", " \"\"\"\n", " if text and llm_client:\n", - " # LLM extraction branch — provided, do not modify\n", + " # LLM extraction branch \u2014 provided, do not modify\n", " entities = self.extract_entities(text, llm_client)\n", " for e in entities:\n", " self.entity_vs.add_texts(\n", @@ -1697,13 +1697,13 @@ " if not results:\n", " return \"## Entity Memory\\nNo entities found.\"\n", " \n", - " entities = [f\"• {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}\" \n", + " entities = [f\"\u2022 {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}\" \n", " for doc in results if hasattr(doc, 'metadata')]\n", " entities_formatted = '\\n'.join(entities)\n", " return f\"\"\"## Entity Memory\n", "### Purpose: Named entities (people, places, systems, paper titles) extracted from conversations.\n", "### When to use: Use these to resolve references like \"that author\" or \"the system we discussed\".\n", - "### Entity memory provides continuity across turns — ground your answers in known entities\n", + "### Entity memory provides continuity across turns \u2014 ground your answers in known entities\n", "### rather than guessing or re-asking the user for names and details already mentioned.\n", "\n", "{entities_formatted}\"\"\"\n", @@ -1741,13 +1741,13 @@ " \"### Purpose: Compressed snapshots of older conversations and context windows.\",\n", " \"### When to use: These are lightweight pointers. If a summary looks relevant,\",\n", " \"### call expand_summary(summary_id) to retrieve the full content just-in-time.\",\n", - " \"### Do NOT expand all summaries — only expand when you need specific details.\",\n", + " \"### Do NOT expand all summaries \u2014 only expand when you need specific details.\",\n", " \"\"\n", " ]\n", " for doc in results:\n", " sid = doc.metadata.get('id', '?')\n", " desc = doc.metadata.get('description', 'No description')\n", - " lines.append(f\" • [ID: {sid}] {desc}\")\n", + " lines.append(f\" \u2022 [ID: {sid}] {desc}\")\n", " return \"\\n\".join(lines)\n", " \n", " # ==================== TOOL LOG (SQL - Experimental Memory) ====================\n", @@ -1856,7 +1856,7 @@ "source": [ "### The Scalability Problem with Tools\n", "\n", - "As your AI system grows, you might have **hundreds of tools** available—APIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:\n", + "As your AI system grows, you might have **hundreds of tools** available\u2014APIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:\n", "\n", "| Problem | Impact |\n", "|---------|--------|\n", @@ -1871,9 +1871,9 @@ "\n", "The `Toolbox` class solves this by treating tools as a **searchable memory**:\n", "\n", - "1. **Register hundreds of tools** — Store all available tools with their descriptions and embeddings\n", - "2. **Retrieve only relevant tools** — At inference time, use vector search to find tools semantically relevant to the current query\n", - "3. **Pass a focused toolset** — Only the retrieved tools (typically 3-5) are passed to the LLM\n", + "1. **Register hundreds of tools** \u2014 Store all available tools with their descriptions and embeddings\n", + "2. **Retrieve only relevant tools** \u2014 At inference time, use vector search to find tools semantically relevant to the current query\n", + "3. **Pass a focused toolset** \u2014 Only the retrieved tools (typically 3-5) are passed to the LLM\n", "\n", "This approach means your system can **scale to hundreds of tools** while the LLM only sees the most relevant ones for each query.\n", "\n", @@ -1882,7 +1882,7 @@ "The `Toolbox` class uses **docstrings as the retrieval key**:\n", "\n", "```\n", - "User Query → Embed Query → Vector Search → Find tools with similar docstrings → Return relevant tools\n", + "User Query \u2192 Embed Query \u2192 Vector Search \u2192 Find tools with similar docstrings \u2192 Return relevant tools\n", "```\n", "\n", "| Component | Purpose |\n", @@ -1915,18 +1915,18 @@ "|---|---|---|\n", "| `__init__` | Initialises the Toolbox with a `MemoryManager`, OpenAI client, and model name | Sets up internal dicts `_tools` and `_tools_by_name` to track registered callables |\n", "| `get_embedding` | Converts a text string into a 768-dimensional vector using the configured embedding model | Used to embed tool descriptions so they can be stored and retrieved by semantic similarity |\n", - "| `_augment_docstring` | Sends a tool's docstring to the LLM and returns an improved, more detailed version | Makes tools more discoverable — a richer description embeds better and matches more user queries |\n", + "| `_augment_docstring` | Sends a tool's docstring to the LLM and returns an improved, more detailed version | Makes tools more discoverable \u2014 a richer description embeds better and matches more user queries |\n", "| `_generate_queries` | Uses the LLM to generate synthetic example queries a user might ask when needing this tool | Embeds the *usage intent* alongside the tool description, increasing retrieval accuracy |\n", "| `_get_tool_metadata` | Extracts function name, signature, parameters, and return type using Python's `inspect` module | Produces a structured `ToolMetadata` object used to build the OpenAI-compatible tool schema |\n", - "| `register_tool` | Registers a function as a tool — can be used as a plain decorator or with `augment=True` | The core method: embeds the tool description, stores it in Oracle via `MemoryManager`, and keeps a reference to the callable for execution |\n", + "| `register_tool` | Registers a function as a tool \u2014 can be used as a plain decorator or with `augment=True` | The core method: embeds the tool description, stores it in Oracle via `MemoryManager`, and keeps a reference to the callable for execution |\n", "\n", "\n", "### Key Insight\n", "\n", "The `augment=True` flag in `@toolbox.register_tool(augment=True)` triggers:\n", - "1. **Docstring augmentation** — LLM rewrites the docstring to be clearer and more searchable\n", - "2. **Synthetic query generation** — LLM generates example queries that would need this tool\n", - "3. **Rich embedding** — Combines name + augmented docstring + signature + queries for better retrieval\n", + "1. **Docstring augmentation** \u2014 LLM rewrites the docstring to be clearer and more searchable\n", + "2. **Synthetic query generation** \u2014 LLM generates example queries that would need this tool\n", + "3. **Rich embedding** \u2014 Combines name + augmented docstring + signature + queries for better retrieval\n", "\n", "This means a simple one-line docstring like `\"Search the web\"` becomes a rich, detailed description that's much more likely to be retrieved when the user asks something like `\"What's the latest news about AI?\"`" ] @@ -2003,7 +2003,7 @@ "\n", " # NOTE: The role description of a technical writer below is a prompt engineering technique that is used to improve the quality of the docstring\n", " # Athough there are research that suggest that role description doesn't realy affect the quality of the LLM's output, it is still a useful technique\n", - " # and it is a good [prompt engineering] technique to know.\n", + " #\u00a0and it is a good [prompt engineering] technique to know.\n", " prompt = f\"\"\"You are a technical writer. Improve the following function docstring to be more clear, \n", " comprehensive, and useful. Include:\n", " 1. A clear concise summary\n", @@ -2135,7 +2135,7 @@ " object_id_str = str(object_id)\n", "\n", " # NOTE: Augmentation is a technique that is used to improve the quality of the tool's docstring\n", - " # by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique\n", + " #\u00a0by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique\n", " if augment:\n", " # Use LLM to enhance the tool's discoverability\n", " augmented_docstring = self._augment_docstring(docstring)\n", @@ -2189,12 +2189,12 @@ "id": "api-key-warning", "metadata": {}, "source": [ - "### ⚠️ API Keys\n", + "### \u26a0\ufe0f API Keys\n", "\n", "Your API keys are pre-configured as environment variables in this Codespace:\n", "\n", - "- **`OCI_GENAI_API_KEY`** — OCI GenAI (xAI Grok 3 Fast) access\n", - "- **`TAVILY_API_KEY`** — Tavily web search (free tier, 1,000 searches/month)\n", + "- **`OCI_GENAI_API_KEY`** \u2014 OCI GenAI (xAI Grok 3 Fast) access\n", + "- **`TAVILY_API_KEY`** \u2014 Tavily web search (free tier, 1,000 searches/month)\n", "\n", "The next cell loads them from the environment. If you are running locally, set these environment variables before launching Jupyter.\n" ] @@ -2220,11 +2220,11 @@ "outputs": [], "source": [ "# Verify OCI GenAI key is available\n", - "assert oci_genai_api_key, \"OCI_GENAI_API_KEY not set — check your Codespace environment variables\"\n", + "assert oci_genai_api_key, \"OCI_GENAI_API_KEY not set \u2014 check your Codespace environment variables\"\n", "print(f\"OCI GenAI API key loaded (length: {len(oci_genai_api_key)})\")\n", "\n", "# Verify Tavily key is available\n", - "assert tavily_api_key, \"TAVILY_API_KEY not set — check your Codespace environment variables\"\n", + "assert tavily_api_key, \"TAVILY_API_KEY not set \u2014 check your Codespace environment variables\"\n", "print(f\"Tavily API key loaded (length: {len(tavily_api_key)})\")\n" ] }, @@ -2242,7 +2242,9 @@ "OCI_GENAI_ENDPOINT = os.environ.get(\n", " \"OCI_GENAI_ENDPOINT\",\n", " \"https://inference.generativeai.us-phoenix-1.oci.oraclecloud.com/openai/v1\"\n", - ")\n", + ").rstrip(\"/\")\n", + "if not OCI_GENAI_ENDPOINT.endswith(\"/openai/v1\"):\n", + " OCI_GENAI_ENDPOINT = f\"{OCI_GENAI_ENDPOINT}/openai/v1\"\n", "OCI_GENAI_API_KEY = os.environ[\"OCI_GENAI_API_KEY\"] # set via Codespaces secret\n", "\n", "client = OpenAI(base_url=OCI_GENAI_ENDPOINT, api_key=OCI_GENAI_API_KEY)\n", @@ -2256,7 +2258,7 @@ "id": "76a43a07", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 3**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 3**\n", ">\n", "> Different types of information need different storage and retrieval strategies. Chat history needs exact, ordered recall (SQL). Knowledge and workflows need relevance-ranked retrieval (vectors). Getting the storage type wrong means either missing context or flooding the window with noise." ] @@ -2270,7 +2272,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-4-context-engineering.md](../docs/part-4-context-engineering.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-4-context-engineering.md](../docs/part-4-context-engineering.md)\n" ] }, { @@ -2280,7 +2282,7 @@ "source": [ "> **Context engineering** refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.\n", "> \n", - "> — *Anthropic*\n", + "> \u2014 *Anthropic*\n", "\n", "While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *how to manage what's in the context window right now*. This includes monitoring usage, compressing information, and providing just-in-time access to details.\n", "\n", @@ -2293,14 +2295,14 @@ "| **3. Compact** | `summarize_conversation()` / `summarize_and_store()` | Agent-triggered compaction when context gets long |\n", "| **4. Just-in-Time Retrieval** | `expand_summary()` tool | Let agent expand summaries on demand |\n", "\n", - "**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent’s current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.\n", + "**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent\u2019s current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.\n", "\n", "## The Context Management Flow\n", "\n", "```\n", - "Context built → Check usage % → Agent may compact (summarize) → Store summary with ID\n", - " ↓\n", - "Agent sees: [Summary ID: abc123] Brief description ← Agent can call expand_summary(\"abc123\") if needed\n", + "Context built \u2192 Check usage % \u2192 Agent may compact (summarize) \u2192 Store summary with ID\n", + " \u2193\n", + "Agent sees: [Summary ID: abc123] Brief description \u2190 Agent can call expand_summary(\"abc123\") if needed\n", "```\n", "\n", "This approach keeps the context lean while giving the agent access to full details when required.\n", @@ -2311,7 +2313,7 @@ "\n", "**Your task:** Implement `calculate_context_usage()` in the cell below. The agent harness depends on the return dict having exactly three keys: `tokens`, `max`, and `percent`. See the TODO comment for the full specification.\n", "\n", - "> 📖 Open **docs/part-4-context-engineering.md** if you need guidance.\n" + "> \ud83d\udcd6 Open **docs/part-4-context-engineering.md** if you need guidance.\n" ] }, { @@ -2319,9 +2321,9 @@ "id": "35d2bf67", "metadata": {}, "source": [ - "> 💡 **Key Definition — Context Engineering**\n", + "> \ud83d\udca1 **Key Definition \u2014 Context Engineering**\n", ">\n", - "> Context engineering is the discipline of deciding exactly which tokens enter the LLM's context window on each inference call. While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *what's in the window right now* — monitoring, compressing, and curating it." + "> Context engineering is the discipline of deciding exactly which tokens enter the LLM's context window on each inference call. While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *what's in the window right now* \u2014 monitoring, compressing, and curating it." ] }, { @@ -2351,17 +2353,17 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 12\n", + "# \u2705 Checkpoint: TODO 12\n", "assert callable(calculate_context_usage), (\n", - " \"❌ TODO 12 incomplete — calculate_context_usage is not defined.\\n\"\n", + " \"\u274c TODO 12 incomplete \u2014 calculate_context_usage is not defined.\\n\"\n", " \"Go back and implement the function.\"\n", ")\n", "_test = calculate_context_usage(\"test \" * 100, \"xai.grok-3-fast\")\n", "assert isinstance(_test, dict) and \"percent_used\" in _test, (\n", - " \"❌ TODO 12 incomplete — function must return a dict with 'percent_used' key.\\n\"\n", + " \"\u274c TODO 12 incomplete \u2014 function must return a dict with 'percent_used' key.\\n\"\n", " \"Check that you return the correct dictionary shape.\"\n", ")\n", - "print(f\"✅ TODO 12 passed — returns {list(_test.keys())}\")" + "print(f\"\u2705 TODO 12 passed \u2014 returns {list(_test.keys())}\")" ] }, { @@ -2371,48 +2373,48 @@ "source": [ "### Why We Need to Summarise the Context Window\n", "\n", - "Every time the agent processes a turn, it assembles a context window — a single block of text containing conversation history, retrieved knowledge, tool outputs, entity memory, and workflow patterns. This is what gets sent to the LLM on each inference call.\n", + "Every time the agent processes a turn, it assembles a context window \u2014 a single block of text containing conversation history, retrieved knowledge, tool outputs, entity memory, and workflow patterns. This is what gets sent to the LLM on each inference call.\n", "\n", "The problem is that this block grows with every turn. Left unchecked, it follows a predictable trajectory:\n", "\n", "```\n", - "Turn 1: [system prompt] + [query] → ~500 tokens\n", - "Turn 5: [system prompt] + [4 prior turns] + [tool outputs] → ~8,000 tokens\n", - "Turn 15: [system prompt] + [14 prior turns] + [tool outputs] → ~40,000 tokens\n", - "Turn 30: Context limit exceeded → API call fails\n", + "Turn 1: [system prompt] + [query] \u2192 ~500 tokens\n", + "Turn 5: [system prompt] + [4 prior turns] + [tool outputs] \u2192 ~8,000 tokens\n", + "Turn 15: [system prompt] + [14 prior turns] + [tool outputs] \u2192 ~40,000 tokens\n", + "Turn 30: Context limit exceeded \u2192 API call fails\n", "```\n", "\n", "This is called **context window bloat**, and it is one of the most common failure modes in production agent systems.\n", "\n", "#### The Two Failure Modes\n", "\n", - "**Hard failure** — the context exceeds the model's token limit and the API call errors. The agent crashes mid-task with no recovery path.\n", + "**Hard failure** \u2014 the context exceeds the model's token limit and the API call errors. The agent crashes mid-task with no recovery path.\n", "\n", - "**Soft failure** — the context is large but still within the limit. However, research has shown that LLMs suffer from the [\"lost in the middle\" problem](https://arxiv.org/abs/2307.03172): when relevant information is buried in a long context, models struggle to attend to it. The agent appears to \"forget\" things it was told earlier in the same session — not because the tokens were removed, but because the model's attention is diluted.\n", + "**Soft failure** \u2014 the context is large but still within the limit. However, research has shown that LLMs suffer from the [\"lost in the middle\" problem](https://arxiv.org/abs/2307.03172): when relevant information is buried in a long context, models struggle to attend to it. The agent appears to \"forget\" things it was told earlier in the same session \u2014 not because the tokens were removed, but because the model's attention is diluted.\n", "\n", "#### Summarisation as the Solution\n", "\n", "Context summarisation is the primary technique for managing this growth. Rather than appending every message and tool output indefinitely, the agent periodically compresses older context into a compact summary and stores it in Oracle's summary memory table.\n", "\n", - "The original content is not lost — it is stored in full in the database. Only a short reference pointer (`[Summary ID: abc-123]`) remains in the active context. If the agent needs the full detail later, it can call `expand_summary(summary_id)` to retrieve it on demand.\n", + "The original content is not lost \u2014 it is stored in full in the database. Only a short reference pointer (`[Summary ID: abc-123]`) remains in the active context. If the agent needs the full detail later, it can call `expand_summary(summary_id)` to retrieve it on demand.\n", "\n", "This gives you a **flat context growth curve** instead of an unbounded one:\n", "\n", "```\n", - "Without summarisation: context grows linearly → eventually fails\n", - "With summarisation: context stays bounded → runs indefinitely\n", + "Without summarisation: context grows linearly \u2192 eventually fails\n", + "With summarisation: context stays bounded \u2192 runs indefinitely\n", "```\n", "\n", "#### What a Good Summary Preserves\n", "\n", "A summarisation prompt is only useful if it faithfully compresses the right information. A well-written prompt ensures the summary retains:\n", "\n", - "- The **user's goal** — so the agent stays on task across turns\n", - "- **Key facts and findings** — so the agent does not re-discover what it already knows\n", - "- **Named entities** — paper titles, arXiv IDs, authors — so the agent can refer back to specific sources\n", - "- **Unresolved questions** — so the agent knows what still needs to be done\n", + "- The **user's goal** \u2014 so the agent stays on task across turns\n", + "- **Key facts and findings** \u2014 so the agent does not re-discover what it already knows\n", + "- **Named entities** \u2014 paper titles, arXiv IDs, authors \u2014 so the agent can refer back to specific sources\n", + "- **Unresolved questions** \u2014 so the agent knows what still needs to be done\n", "\n", - "This is why the summarisation prompt matters. A vague or poorly structured prompt produces a summary that loses critical detail — and once the original context is replaced, that detail is gone from the active window.\n", + "This is why the summarisation prompt matters. A vague or poorly structured prompt produces a summary that loses critical detail \u2014 and once the original context is replaced, that detail is gone from the active window.\n", "\n", "> The cell below implements `summarise_context_window()`. Your task is to write the prompt that instructs the LLM on exactly what to preserve and how to format the output.\n" ] @@ -2441,7 +2443,7 @@ " # important entities (paper titles, arXiv IDs, authors),\n", " # and unresolved questions or next actions\n", " # 3. Output 4-7 short bullet points\n", - " # 4. Be faithful to the source — do not add new facts\n", + " # 4. Be faithful to the source \u2014 do not add new facts\n", " # 5. Include the content to summarise at the end (use content[:3000] to avoid token overflow)\n", " #\n", " # Assign your completed prompt string to the variable: summary_prompt\n", @@ -2475,12 +2477,12 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 13\n", + "# \u2705 Checkpoint: TODO 13\n", "assert callable(summarise_context_window), (\n", - " \"❌ TODO 13 incomplete — summarise_context_window is not defined.\\n\"\n", + " \"\u274c TODO 13 incomplete \u2014 summarise_context_window is not defined.\\n\"\n", " \"Go back and implement the summarisation prompt.\"\n", ")\n", - "print(\"✅ TODO 13 passed — function is defined\")" + "print(\"\u2705 TODO 13 passed \u2014 function is defined\")" ] }, { @@ -2492,31 +2494,31 @@ "\n", "Summarisation alone is not the complete picture. The real architectural insight is **where the summary goes**.\n", "\n", - "When the agent compresses its context window, it does not simply discard the older content. It **offloads it to Oracle AI Database** — the agent's memory core — and replaces it in the active context with a compact reference pointer:\n", + "When the agent compresses its context window, it does not simply discard the older content. It **offloads it to Oracle AI Database** \u2014 the agent's memory core \u2014 and replaces it in the active context with a compact reference pointer:\n", "\n", "```\n", "Before offload:\n", - " [Full conversation history — 40,000 tokens]\n", - " [Retrieved papers — 8,000 tokens]\n", - " [Tool outputs — 12,000 tokens]\n", - " ──────────────────────────────────────────\n", - " Total: ~60,000 tokens → approaching limit\n", + " [Full conversation history \u2014 40,000 tokens]\n", + " [Retrieved papers \u2014 8,000 tokens]\n", + " [Tool outputs \u2014 12,000 tokens]\n", + " \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n", + " Total: ~60,000 tokens \u2192 approaching limit\n", "\n", "After offload:\n", " [Summary ID: a3f9c1b2] Agent researched planetary exploration papers,\n", " identified three relevant arXiv submissions, user asked follow-up on\n", " mission timelines. Next: retrieve funding data.\n", - " ──────────────────────────────────────────\n", - " Total: ~80 tokens → context reset, agent continues\n", + " \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n", + " Total: ~80 tokens \u2192 context reset, agent continues\n", "```\n", "\n", - "The original 60,000 tokens are not gone. They are stored in full in Oracle's `SUMMARY_MEMORY` table, indexed by the summary ID. The reference pointer in the active context — `[Summary ID: a3f9c1b2]` — is both a retrieval key and a human-readable label describing what was compressed.\n", + "The original 60,000 tokens are not gone. They are stored in full in Oracle's `SUMMARY_MEMORY` table, indexed by the summary ID. The reference pointer in the active context \u2014 `[Summary ID: a3f9c1b2]` \u2014 is both a retrieval key and a human-readable label describing what was compressed.\n", "\n", "#### Just-in-Time (JiT) Retrieval\n", "\n", "The agent does not expand every summary on every turn. Expanding all summaries would immediately re-inflate the context, defeating the purpose. Instead, the agent reads the description in the reference pointer and decides whether it needs the full content for the current task.\n", "\n", - "If it does, it calls `expand_summary(summary_id)` — a tool registered in the toolbox — which pulls the full original content from Oracle on demand. This is **Just-in-Time retrieval**: the agent fetches detail only when it needs it, not speculatively.\n", + "If it does, it calls `expand_summary(summary_id)` \u2014 a tool registered in the toolbox \u2014 which pulls the full original content from Oracle on demand. This is **Just-in-Time retrieval**: the agent fetches detail only when it needs it, not speculatively.\n", "\n", "```\n", "Active context contains:\n", @@ -2526,21 +2528,21 @@ "User asks: \"What were the arXiv IDs from earlier?\"\n", "\n", "Agent decides:\n", - " → Summary a3f9c1b2 is relevant → calls expand_summary(\"a3f9c1b2\")\n", - " → Summary d7e2f091 is not relevant → leaves it compressed\n", + " \u2192 Summary a3f9c1b2 is relevant \u2192 calls expand_summary(\"a3f9c1b2\")\n", + " \u2192 Summary d7e2f091 is not relevant \u2192 leaves it compressed\n", "\n", - "Oracle returns full content of a3f9c1b2 → agent answers accurately\n", + "Oracle returns full content of a3f9c1b2 \u2192 agent answers accurately\n", "```\n", "\n", "#### The Memory Core Architecture\n", "\n", "This is the reason Oracle AI Database is positioned as the **agent memory core** rather than just a storage backend. It provides three things simultaneously that no standalone vector store or cache can match:\n", "\n", - "- **Persistence** — summaries survive across sessions, container restarts, and Codespace rebuilds\n", - "- **Semantic indexing** — summary descriptions are embedded, so the agent can find relevant summaries by meaning, not just by exact ID\n", - "- **Faithful reconstruction** — the full original content is stored in a `CLOB` column alongside the summary, meaning the uncompressed version is always recoverable — the agent never permanently loses context, it only defers it\n", + "- **Persistence** \u2014 summaries survive across sessions, container restarts, and Codespace rebuilds\n", + "- **Semantic indexing** \u2014 summary descriptions are embedded, so the agent can find relevant summaries by meaning, not just by exact ID\n", + "- **Faithful reconstruction** \u2014 the full original content is stored in a `CLOB` column alongside the summary, meaning the uncompressed version is always recoverable \u2014 the agent never permanently loses context, it only defers it\n", "\n", - "The cell below implements `offload_to_summary()` — the function that triggers this entire flow when the context window crosses the threshold. Notice it is deliberately simple: it checks the usage percentage, calls `summarise_context_window()`, and returns the compact reference. The complexity lives in the database and the retrieval tools, not in this function.\n" + "The cell below implements `offload_to_summary()` \u2014 the function that triggers this entire flow when the context window crosses the threshold. Notice it is deliberately simple: it checks the usage percentage, calls `summarise_context_window()`, and returns the compact reference. The complexity lives in the database and the retrieval tools, not in this function.\n" ] }, { @@ -2586,24 +2588,24 @@ "\n", "**Our intuition:** Memory should be *compressed*, or *forgotten* not *erased*. By marking messages with a `summary_id` instead of deleting them:\n", "\n", - "1. **Full history is preserved** — Original messages remain in the database for auditing, debugging, or reprocessing\n", - "2. **Linkage is maintained** — Each summary knows which messages it represents (via `summary_id`)\n", - "3. **Reversible** — If a summary is deleted, you could \"unsummarize\" by clearing the `summary_id`\n", + "1. **Full history is preserved** \u2014 Original messages remain in the database for auditing, debugging, or reprocessing\n", + "2. **Linkage is maintained** \u2014 Each summary knows which messages it represents (via `summary_id`)\n", + "3. **Reversible** \u2014 If a summary is deleted, you could \"unsummarize\" by clearing the `summary_id`\n", "\n", "#### The Flow\n", "\n", "```\n", - "Thread has 50 messages → Context too large → summarize_conversation(thread_id)\n", - " ↓\n", + "Thread has 50 messages \u2192 Context too large \u2192 summarize_conversation(thread_id)\n", + " \u2193\n", " 1. Read unsummarized messages\n", " 2. LLM summarizes them\n", " 3. Store summary with unique ID\n", " 4. UPDATE messages SET summary_id = 'abc123'\n", - " ↓\n", + " \u2193\n", " Next read: Only new messages appear + Summary ID reference\n", "```\n", "\n", - "This is a form of **log compaction** — a pattern borrowed from databases and message queues where old entries are compressed but not lost." + "This is a form of **log compaction** \u2014 a pattern borrowed from databases and message queues where old entries are compressed but not lost." ] }, { @@ -2657,9 +2659,9 @@ "id": "510596c2", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 4**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 4**\n", ">\n", - "> An agent's context window is a finite budget. Without active management, it fills up within a few turns and the agent breaks. Summarisation and JIT retrieval are the two levers that keep context flat — compress what you can, and only fetch what you need." + "> An agent's context window is a finite budget. Without active management, it fills up within a few turns and the agent breaks. Summarisation and JIT retrieval are the two levers that keep context flat \u2014 compress what you can, and only fetch what you need." ] }, { @@ -2671,7 +2673,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-5-web-search.md](../docs/part-5-web-search.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-5-web-search.md](../docs/part-5-web-search.md)\n" ] }, { @@ -2685,23 +2687,23 @@ "\n", "## What This Section Does\n", "\n", - "1. **Initialize the Tavily client** — Set up the search API with an API key\n", - "2. **Register `search_tavily` as a tool** — Use `@toolbox.register_tool(augment=True)` to make it discoverable\n", - "3. **Implement the search-and-store pattern** — Results are automatically written to knowledge base memory\n", - "4. **Test tool retrieval** — Verify the tool can be found via semantic search\n", + "1. **Initialize the Tavily client** \u2014 Set up the search API with an API key\n", + "2. **Register `search_tavily` as a tool** \u2014 Use `@toolbox.register_tool(augment=True)` to make it discoverable\n", + "3. **Implement the search-and-store pattern** \u2014 Results are automatically written to knowledge base memory\n", + "4. **Test tool retrieval** \u2014 Verify the tool can be found via semantic search\n", "\n", "## The Search-and-Store Pattern\n", "\n", "One thing to note is that not only do we get external context that is not available to the Agent at execution, but we persists this to the knowledge base memory and the Agent can reuse this information in subsequent iteration.\n", - "When the agent calls `search_tavily()`, it doesn't just return results—it **persists them to the knowledge base**:\n", + "When the agent calls `search_tavily()`, it doesn't just return results\u2014it **persists them to the knowledge base**:\n", "\n", "```\n", "Agent calls search_tavily(\"latest AI news\")\n", - " ↓\n", + " \u2193\n", "Tavily API returns results\n", - " ↓\n", + " \u2193\n", "Each result is written to knowledge_base_vs with metadata (title, URL, timestamp)\n", - " ↓\n", + " \u2193\n", "Future queries can retrieve this information without searching again\n", "```\n", "\n", @@ -2713,9 +2715,9 @@ "id": "d6450b13", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agentic Tool**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agentic Tool**\n", ">\n", - "> An agentic tool is a function the LLM can choose to call during its reasoning loop. Unlike programmatic operations (which the harness always runs), agentic tools are invoked only when the model decides they are needed — giving the agent autonomy over *when* to act." + "> An agentic tool is a function the LLM can choose to call during its reasoning loop. Unlike programmatic operations (which the harness always runs), agentic tools are invoked only when the model decides they are needed \u2014 giving the agent autonomy over *when* to act." ] }, { @@ -2726,8 +2728,8 @@ "outputs": [], "source": [ "# Verify Tavily key is available\n", - "assert tavily_api_key, \"TAVILY_API_KEY not set — check your Codespace environment variables\"\n", - "print(\"Tavily API key loaded ✓\")" + "assert tavily_api_key, \"TAVILY_API_KEY not set \u2014 check your Codespace environment variables\"\n", + "print(\"Tavily API key loaded \u2713\")" ] }, { @@ -2750,7 +2752,7 @@ "# 4. Return the results list\n", "# Include a docstring describing the tool clearly for the agent\n", "#\n", - "# IMPORTANT: The function MUST be named search_tavily — the agent harness\n", + "# IMPORTANT: The function MUST be named search_tavily \u2014 the agent harness\n", "# references this name for context refresh after web searches.\n", "\n", "# YOUR CODE HERE\n" @@ -2762,12 +2764,12 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 14\n", + "# \u2705 Checkpoint: TODO 14\n", "assert \"search_tavily\" in dir() and callable(search_tavily), (\n", - " \"❌ TODO 14 incomplete — search_tavily is not defined.\\n\"\n", + " \"\u274c TODO 14 incomplete \u2014 search_tavily is not defined.\\n\"\n", " \"Go back and register the tool with @toolbox.register_tool.\"\n", ")\n", - "print(\"✅ TODO 14 passed — search_tavily tool is registered\")" + "print(\"\u2705 TODO 14 passed \u2014 search_tavily tool is registered\")" ] }, { @@ -2787,7 +2789,7 @@ "id": "25ef8b59", "metadata": {}, "source": [ - "> 💡 **Key Insight — Part 5**\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 5**\n", ">\n", "> Giving an agent web access turns it from a closed system into an open one. But the real pattern here is the *toolbox*: registering tools into a vector store so the agent discovers them by relevance rather than receiving every tool on every call. This scales to hundreds of tools without bloating the context." ] @@ -2801,7 +2803,7 @@ "\n", "--------\n", "\n", - "> 📖 **Workshop Guide:** [docs/part-6-agent-execution.md](../docs/part-6-agent-execution.md)\n" + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-6-agent-execution.md](../docs/part-6-agent-execution.md)\n" ] }, { @@ -2826,7 +2828,7 @@ "id": "50661128", "metadata": {}, "source": [ - "> 💡 **Key Definition — Agent Harness**\n", + "> \ud83d\udca1 **Key Definition \u2014 Agent Harness**\n", ">\n", "> An agent harness is the runtime scaffold that orchestrates the LLM's reasoning loop: building context, dispatching tool calls, persisting memory, and deciding when to stop. The LLM generates intent; the harness executes it." ] @@ -2843,7 +2845,7 @@ "\n", "client = OpenAI(base_url=OCI_GENAI_ENDPOINT, api_key=OCI_GENAI_API_KEY)\n", "\n", - "# Persistent context-window tracker — survives across call_agent() invocations\n", + "# Persistent context-window tracker \u2014 survives across call_agent() invocations\n", "context_size_history = [] # list of (run_label, iteration, estimated_tokens)\n", "\n", "# ==================== SYSTEM PROMPT ====================\n", @@ -2852,19 +2854,19 @@ "You are a Research Paper Assistant with access to memory and tools.\n", "\n", "IMPORTANT: The user's input contains CONTEXT retrieved from multiple memory systems.\n", - "Each memory section has a Purpose and When-to-use guide — follow them.\n", + "Each memory section has a Purpose and When-to-use guide \u2014 follow them.\n", "\n", "## Memory Priority Order\n", - "1. **Conversation Memory** — check what the user already asked and what you already answered.\n", - "2. **Knowledge Base Memory** — cite facts from stored papers/documents before searching externally.\n", - "3. **Entity Memory** — resolve named references (\"that author\", \"the system\") from here.\n", - "4. **Workflow Memory** — reuse proven tool sequences for similar past queries.\n", - "5. **Summary Memory** — expand a summary ID only when you need specific details from older context.\n", + "1. **Conversation Memory** \u2014 check what the user already asked and what you already answered.\n", + "2. **Knowledge Base Memory** \u2014 cite facts from stored papers/documents before searching externally.\n", + "3. **Entity Memory** \u2014 resolve named references (\"that author\", \"the system\") from here.\n", + "4. **Workflow Memory** \u2014 reuse proven tool sequences for similar past queries.\n", + "5. **Summary Memory** \u2014 expand a summary ID only when you need specific details from older context.\n", "\n", "## Tool Output Handling\n", "Tool call outputs are logged to a Tool Log table and replaced with compact references in context.\n", "The preview in each [Tool Log ...] reference contains enough to reason about the result.\n", - "If you need the full output, it can be retrieved from the database — but prefer working with\n", + "If you need the full output, it can be retrieved from the database \u2014 but prefer working with\n", "the preview and the knowledge base (where search results are also stored).\n", "\n", "## Context Management\n", @@ -2916,31 +2918,31 @@ "\n", "```\n", "1. BUILD CONTEXT (programmatic)\n", - " ├── Read conversational memory (unsummarized chat units)\n", - " ├── Read knowledge base (relevant documents)\n", - " ├── Read workflow memory (past action patterns)\n", - " ├── Read entity memory (people, places, systems)\n", - " └── Read summary context (available summary IDs + descriptions)\n", + " \u251c\u2500\u2500 Read conversational memory (unsummarized chat units)\n", + " \u251c\u2500\u2500 Read knowledge base (relevant documents)\n", + " \u251c\u2500\u2500 Read workflow memory (past action patterns)\n", + " \u251c\u2500\u2500 Read entity memory (people, places, systems)\n", + " \u2514\u2500\u2500 Read summary context (available summary IDs + descriptions)\n", "\n", "2. GET TOOLS (programmatic)\n", - " └── Retrieve semantically relevant tools from toolbox\n", + " \u2514\u2500\u2500 Retrieve semantically relevant tools from toolbox\n", "\n", "3. STORE USER MESSAGE (programmatic)\n", - " └── Persist the user message + best-effort entity extraction\n", + " \u2514\u2500\u2500 Persist the user message + best-effort entity extraction\n", "\n", "4. WITHIN-RUN TOOL-CALL LOOP (up to max_iterations and within max_execution_time_s)\n", - " ├── Call LLM with context + tool schemas\n", - " ├── If tool calls → execute tools and append tool outputs\n", - " ├── If tools changed memory (search/compaction) → rebuild context for the next iteration\n", - " └── If no tool calls → finalize answer\n", + " \u251c\u2500\u2500 Call LLM with context + tool schemas\n", + " \u251c\u2500\u2500 If tool calls \u2192 execute tools and append tool outputs\n", + " \u251c\u2500\u2500 If tools changed memory (search/compaction) \u2192 rebuild context for the next iteration\n", + " \u2514\u2500\u2500 If no tool calls \u2192 finalize answer\n", "\n", "5. GUARDED STOP\n", - " └── If iteration/time budget is hit → force a final best-effort answer (no tools)\n", + " \u2514\u2500\u2500 If iteration/time budget is hit \u2192 force a final best-effort answer (no tools)\n", "\n", "6. SAVE RESULTS (programmatic)\n", - " ├── Write workflow (if tools were used)\n", - " ├── Best-effort entity extraction on final answer\n", - " └── Store assistant response in conversational memory\n", + " \u251c\u2500\u2500 Write workflow (if tools were used)\n", + " \u251c\u2500\u2500 Best-effort entity extraction on final answer\n", + " \u2514\u2500\u2500 Store assistant response in conversational memory\n", "```\n", "\n", "## Key Design Decisions\n", @@ -2976,7 +2978,7 @@ "\n", " # 1. Build context from memory\n", " print(\"\\n\" + \"=\"*50)\n", - " print(\"🧠 BUILDING CONTEXT...\")\n", + " print(\"\ud83e\udde0 BUILDING CONTEXT...\")\n", "\n", " def build_context() -> str:\n", " \"\"\"Rebuild the full context from the current memory state.\n", @@ -2990,12 +2992,12 @@ " Build ctx by concatenating the following in order, each followed by \"\\n\\n\":\n", " 1. The current question:\n", " f\"# Question\\n{query}\\n\\n\"\n", - " 2. Conversational memory → memory_manager.read_conversational_memory(thread_id)\n", - " 3. Knowledge base → memory_manager.read_knowledge_base(query)\n", - " 4. Workflow memory → memory_manager.read_workflow(query)\n", - " 5. Entity memory → memory_manager.read_entity(query)\n", - " 6. Summary context → memory_manager.read_summary_context(query)\n", - " (this returns summary IDs + descriptions only — not full content)\n", + " 2. Conversational memory \u2192 memory_manager.read_conversational_memory(thread_id)\n", + " 3. Knowledge base \u2192 memory_manager.read_knowledge_base(query)\n", + " 4. Workflow memory \u2192 memory_manager.read_workflow(query)\n", + " 5. Entity memory \u2192 memory_manager.read_entity(query)\n", + " 6. Summary context \u2192 memory_manager.read_summary_context(query)\n", + " (this returns summary IDs + descriptions only \u2014 not full content)\n", "\n", " Return the assembled ctx string.\n", " \"\"\"\n", @@ -3009,9 +3011,9 @@ "\n", " # 2. Check context usage (agent decides whether to summarize via tools)\n", " usage = calculate_context_usage(context)\n", - " print(f\"📊 Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", + " print(f\"\ud83d\udcca Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", " if usage['percent'] > 80:\n", - " print(\"⚠️ Context >80% - agent may call summarize_conversation(thread_id) for compaction.\")\n", + " print(\"\u26a0\ufe0f Context >80% - agent may call summarize_conversation(thread_id) for compaction.\")\n", "\n", " # 3. Get tools\n", " dynamic_tools = memory_manager.read_toolbox(query, k=5)\n", @@ -3029,7 +3031,7 @@ " dynamic_tools.append(tool)\n", " existing.add(name)\n", "\n", - " print(f\"🔧 Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", + " print(f\"\ud83d\udd27 Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", "\n", " # 4. Store user message & extract entities\n", " memory_manager.write_conversational_memory(query, \"user\", thread_id)\n", @@ -3045,7 +3047,7 @@ " # Estimate tool schema tokens (sent with every API call)\n", " tool_schema_tokens = len(json_lib.dumps(dynamic_tools)) // 4 if dynamic_tools else 0\n", "\n", - " print(\"\\n🤖 TOOL-CALL LOOP\")\n", + " print(\"\\n\ud83e\udd16 TOOL-CALL LOOP\")\n", " for iteration in range(max_iterations):\n", " print(f\"\\n--- Iteration {iteration + 1} ---\")\n", "\n", @@ -3058,7 +3060,7 @@ " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", - " print(f\"\\n⏱️ Time limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", + " print(f\"\\n\u23f1\ufe0f Time limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", " break\n", "\n", " response = call_openai_chat(messages, tools=dynamic_tools)\n", @@ -3077,15 +3079,15 @@ " tool_args = json_lib.loads(raw_args)\n", " except Exception as e:\n", " result = f\"Error: invalid JSON tool arguments for {tool_name}: {e}. Raw: {raw_args}\"\n", - " print(f\"🛠️ {tool_name}()\")\n", - " steps.append(f\"{tool_name}() → failed\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}()\")\n", + " steps.append(f\"{tool_name}() \u2192 failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", " if not isinstance(tool_args, dict):\n", " result = f\"Error: tool arguments for {tool_name} must be a JSON object. Got {type(tool_args).__name__}.\"\n", - " print(f\"🛠️ {tool_name}()\")\n", - " steps.append(f\"{tool_name}() → failed\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}()\")\n", + " steps.append(f\"{tool_name}() \u2192 failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", @@ -3095,26 +3097,26 @@ "\n", " args_display = {k: (v[:50] + '...' if isinstance(v, str) and len(v) > 50 else v)\n", " for k, v in tool_args.items()}\n", - " print(f\"🛠️ {tool_name}({args_display})\")\n", + " print(f\"\ud83d\udee0\ufe0f {tool_name}({args_display})\")\n", "\n", " if max_execution_time_s is not None:\n", " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", " result = f\"Error: time limit reached before executing tool {tool_name}.\"\n", - " steps.append(f\"{tool_name}({args_display}) → failed\")\n", - " print(f\" → {result}\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 failed\")\n", + " print(f\" \u2192 {result}\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " break\n", "\n", " try:\n", " result = execute_tool(tool_name, tool_args)\n", - " steps.append(f\"{tool_name}({args_display}) → success\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", - " steps.append(f\"{tool_name}({args_display}) → failed\")\n", + " steps.append(f\"{tool_name}({args_display}) \u2192 failed\")\n", "\n", - " print(f\" → {result[:200]}...\")\n", + " print(f\" \u2192 {result[:200]}...\")\n", "\n", " # Offload tool output to TOOL_LOG table (experimental memory).\n", " # Full output is persisted in the DB; only a compact reference\n", @@ -3136,12 +3138,12 @@ " break\n", " else:\n", " final_answer = msg.content or \"\"\n", - " print(f\"\\n✅ DONE ({len(steps)} tool calls)\")\n", + " print(f\"\\n\u2705 DONE ({len(steps)} tool calls)\")\n", " break\n", "\n", " if not final_answer:\n", " reason = \"time limit\" if timed_out else \"iteration limit\"\n", - " print(f\"\\n⚠️ Stopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", + " print(f\"\\n\u26a0\ufe0f Stopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", " try:\n", " final_messages = messages + [{\"role\": \"user\", \"content\": \"Finalize your answer using the context and tool outputs so far. Do not call tools.\"}]\n", " final_resp = call_openai_chat(final_messages, tools=None)\n", @@ -3158,7 +3160,7 @@ " pass\n", " memory_manager.write_conversational_memory(final_answer, \"assistant\", thread_id)\n", "\n", - " print(\"\\n\" + \"=\"*50 + f\"\\n💬 ANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", + " print(\"\\n\" + \"=\"*50 + f\"\\n\ud83d\udcac ANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", " return final_answer" ] }, @@ -3168,12 +3170,12 @@ "metadata": {}, "outputs": [], "source": [ - "# ✅ Checkpoint: TODO 15\n", + "# \u2705 Checkpoint: TODO 15\n", "assert callable(call_agent), (\n", - " \"❌ TODO 15 incomplete — call_agent is not defined.\\n\"\n", + " \"\u274c TODO 15 incomplete \u2014 call_agent is not defined.\\n\"\n", " \"Go back and implement the agent harness.\"\n", ")\n", - "print(\"✅ TODO 15 passed — call_agent is defined\")" + "print(\"\u2705 TODO 15 passed \u2014 call_agent is defined\")" ] }, { @@ -3186,7 +3188,7 @@ "# TODO 16: Run 5 questions through the agent before the final memory recall question.\n", "#\n", "# Use the same thread_id=\"0022\" for all calls so the agent builds up\n", - "# conversational memory across turns — this is what allows it to answer\n", + "# conversational memory across turns \u2014 this is what allows it to answer\n", "# \"What was my first question to you?\" correctly at the end.\n", "#\n", "# Choose any 5 questions related to the workshop topic, for example:\n", @@ -3197,7 +3199,7 @@ "# The more varied your questions, the richer the memory state you will\n", "# observe in the context window visualisation that follows.\n", "#\n", - "# YOUR CODE HERE — replace each placeholder with a real call_agent() call\n", + "# YOUR CODE HERE \u2014 replace each placeholder with a real call_agent() call\n", "\n", "# Question 1\n", "# call_agent(\"YOUR QUESTION HERE\", thread_id=\"0022\")\n", @@ -3222,7 +3224,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Final question — tests whether conversational memory is working correctly.\n", + "# Final question \u2014 tests whether conversational memory is working correctly.\n", "# The agent should recall your first question from the memory it has built above.\n", "call_agent(\"What was my first question to you\", thread_id=\"0022\")" ] @@ -3247,7 +3249,7 @@ " plt.tight_layout()\n", " plt.show()\n", "else:\n", - " print(\"No iterations recorded — run call_agent() first.\")" + " print(\"No iterations recorded \u2014 run call_agent() first.\")" ] }, { @@ -3255,7 +3257,7 @@ "id": "1cdasgb4qzj", "metadata": {}, "source": [ - "## Step 2: Baseline — Agent Without Context Engineering\n", + "## Step 2: Baseline \u2014 Agent Without Context Engineering\n", "\n", "To appreciate the impact of the memory and context engineering techniques we've built, it helps to see what happens **without them**.\n", "\n", @@ -3264,13 +3266,13 @@ "| Technique Removed | What Happens Instead | Effect on Context Window |\n", "|---|---|---|\n", "| **Tool output offloading** (`write_tool_log`) | Full raw tool outputs stay in the `messages` list | Each tool call adds thousands of tokens (e.g. a web search returns ~2-4k tokens of results) |\n", - "| **Summarisation tools** (`summarize_conversation`, `summarize_and_store`) | Excluded from the tool list — the agent has no way to compact context | Context only grows, never shrinks |\n", + "| **Summarisation tools** (`summarize_conversation`, `summarize_and_store`) | Excluded from the tool list \u2014 the agent has no way to compact context | Context only grows, never shrinks |\n", "| **Context refresh after search** | No rebuild from memory after tool calls | Stale + bloated context persists across iterations |\n", - "| **Memory-backed context rebuild** | Messages persist as one flat list across calls | No separation of concerns — everything accumulates |\n", + "| **Memory-backed context rebuild** | Messages persist as one flat list across calls | No separation of concerns \u2014 everything accumulates |\n", "\n", "### Why This Matters\n", "\n", - "In a real agent loop, the LLM is called **once per iteration** with the full `messages` list. Without offloading, every tool output ever produced sits in that list. After just 3 web searches, the context could grow by 10,000+ tokens — consuming budget that could be used for reasoning.\n", + "In a real agent loop, the LLM is called **once per iteration** with the full `messages` list. Without offloading, every tool output ever produced sits in that list. After just 3 web searches, the context could grow by 10,000+ tokens \u2014 consuming budget that could be used for reasoning.\n", "\n", "The comparison chart below plots both approaches on the same axis so you can see the divergence." ] @@ -3280,9 +3282,9 @@ "id": "36f532a6", "metadata": {}, "source": [ - "> 💡 **Key Insight — Memory-Aware vs Memory-Augmented**\n", + "> \ud83d\udca1 **Key Insight \u2014 Memory-Aware vs Memory-Augmented**\n", ">\n", - "> A **memory-augmented** agent has access to memory stores — it can read and write. But that alone is not enough. A **memory-aware** agent also has *context engineering*: it monitors its context budget, summarises proactively, offloads tool outputs, and retrieves just-in-time. The naive agent below is memory-augmented (it uses the same LLM and tools) but not memory-aware — it has no strategy for managing what accumulates in its context window. The difference is what you are about to see in the chart." + "> A **memory-augmented** agent has access to memory stores \u2014 it can read and write. But that alone is not enough. A **memory-aware** agent also has *context engineering*: it monitors its context budget, summarises proactively, offloads tool outputs, and retrieves just-in-time. The naive agent below is memory-augmented (it uses the same LLM and tools) but not memory-aware \u2014 it has no strategy for managing what accumulates in its context window. The difference is what you are about to see in the chart." ] }, { @@ -3294,18 +3296,18 @@ "source": [ "# Separate tracker for the naive agent\n", "naive_context_size_history = []\n", - "# Persistent messages per thread — simulates no context management across runs\n", + "# Persistent messages per thread \u2014 simulates no context management across runs\n", "_naive_messages_by_thread = {}\n", "\n", "def call_agent_naive(query: str, thread_id: str = \"naive_1\", dynamic_tools_override: list = None, max_iterations: int = 10, max_execution_time_s: float = 60.0) -> str:\n", - " \"\"\"Naive agent harness — NO context engineering.\n", + " \"\"\"Naive agent harness \u2014 NO context engineering.\n", " \n", " Differences from call_agent:\n", " 1. Full raw tool outputs stay in messages (no write_tool_log offloading)\n", " 2. No summarisation tools available (agent cannot compact context)\n", " 3. No context refresh after memory-mutating tools\n", - " 4. Messages persist across calls — context only grows, never shrinks\n", - " 5. No memory reads — conversation history IS the raw messages list\n", + " 4. Messages persist across calls \u2014 context only grows, never shrinks\n", + " 5. No memory reads \u2014 conversation history IS the raw messages list\n", " \"\"\"\n", " thread_id = str(thread_id)\n", " steps = []\n", @@ -3313,7 +3315,7 @@ " start_time = time.time()\n", " timed_out = False\n", "\n", - " # Get tools — but exclude summarisation tools\n", + " # Get tools \u2014 but exclude summarisation tools\n", " if dynamic_tools_override is not None:\n", " dynamic_tools = dynamic_tools_override\n", " else:\n", @@ -3323,7 +3325,7 @@ " {\"summarize_conversation\", \"summarize_and_store\", \"expand_summary\"}]\n", "\n", " # Initialize or reuse persistent messages for this thread.\n", - " # No memory reads — the raw messages list IS the only context.\n", + " # No memory reads \u2014 the raw messages list IS the only context.\n", " # This is the naive approach: everything accumulates in one flat list.\n", " if thread_id not in _naive_messages_by_thread:\n", " _naive_messages_by_thread[thread_id] = [\n", @@ -3331,7 +3333,7 @@ " ]\n", " messages = _naive_messages_by_thread[thread_id]\n", "\n", - " # Just append the raw query — no build_context(), no memory reads.\n", + " # Just append the raw query \u2014 no build_context(), no memory reads.\n", " # Prior turns, tool outputs, and assistant responses are already in messages.\n", " messages.append({\"role\": \"user\", \"content\": query})\n", " final_answer = \"\"\n", @@ -3361,10 +3363,10 @@ " tool_args = json_lib.loads(tc.function.arguments or \"{}\")\n", " try:\n", " result = execute_tool(tc.function.name, tool_args)\n", - " steps.append(f\"{tc.function.name} → success\")\n", + " steps.append(f\"{tc.function.name} \u2192 success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", - " steps.append(f\"{tc.function.name} → failed\")\n", + " steps.append(f\"{tc.function.name} \u2192 failed\")\n", "\n", " # KEY DIFFERENCE: raw tool output goes straight into messages (no offloading)\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": str(result)})\n", @@ -3382,7 +3384,7 @@ "\n", " # Append assistant answer to persistent messages (it stays for the next call)\n", " messages.append({\"role\": \"assistant\", \"content\": final_answer})\n", - " print(f\"✅ Naive agent done ({len(steps)} tool calls, {len(messages)} messages in context)\")\n", + " print(f\"\u2705 Naive agent done ({len(steps)} tool calls, {len(messages)} messages in context)\")\n", " return final_answer" ] }, @@ -3404,7 +3406,7 @@ "eng_thread = str(uuid.uuid4())[:8]\n", "naive_thread = str(uuid.uuid4())[:8]\n", "\n", - "# Five progressive queries that build on each other — tests memory continuity\n", + "# Five progressive queries that build on each other \u2014 tests memory continuity\n", "queries = [\n", " \"Search for recent papers on AI agent memory published in 2026\",\n", " \"Pick the 3rd paper from the list and give me the key takeaways\",\n", @@ -3415,7 +3417,7 @@ "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", - " print(f\"QUERY {i}/5 — WITH CONTEXT ENGINEERING (thread: {eng_thread})\")\n", + " print(f\"QUERY {i}/5 \u2014 WITH CONTEXT ENGINEERING (thread: {eng_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent(q, thread_id=eng_thread)\n", @@ -3423,7 +3425,7 @@ "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", - " print(f\"QUERY {i}/5 — NAIVE / NO CONTEXT ENGINEERING (thread: {naive_thread})\")\n", + " print(f\"QUERY {i}/5 \u2014 NAIVE / NO CONTEXT ENGINEERING (thread: {naive_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent_naive(q, thread_id=naive_thread)\n", @@ -3464,7 +3466,7 @@ "\n", "In OpenAI-style framing:\n", "- An **agent run** (one user turn handled) is what `call_agent(...)` executes.\n", - "- Within a run, the **tool-call loop** repeats: model reasoning → optional tool calls → harness executes tools → model observes results → repeat until a final answer.\n", + "- Within a run, the **tool-call loop** repeats: model reasoning \u2192 optional tool calls \u2192 harness executes tools \u2192 model observes results \u2192 repeat until a final answer.\n", "\n", "An **agent harness** is the runtime scaffolding around that loop. In this notebook, it is a **memory-based agent harness** where:\n", "- context is assembled from multiple memory types each run\n", @@ -3485,7 +3487,7 @@ "id": "7sp42fx6618", "metadata": {}, "source": [ - "## Step 3: LLM-as-a-Judge — Response Quality Evaluation\n", + "## Step 3: LLM-as-a-Judge \u2014 Response Quality Evaluation\n", "\n", "We've seen the **context window efficiency** difference between the two agents. But does better context engineering actually produce **better answers**?\n", "\n", @@ -3497,7 +3499,7 @@ "| Response A (memory-engineered agent) | A preference: **A**, **B**, or **Tie** |\n", "| Response B (naive agent) | A short explanation of its reasoning |\n", "\n", - "> **Why a warmup phase?** The memory agent's advantage is **cumulative** — it stores conversational memory, entities, and workflows across turns while managing context size. On a brand-new conversation, both agents perform similarly. We first run 5 warmup queries to build up conversation state, then evaluate on queries that specifically test **recall, continuity, and synthesis** — the capabilities that memory engineering enables." + "> **Why a warmup phase?** The memory agent's advantage is **cumulative** \u2014 it stores conversational memory, entities, and workflows across turns while managing context size. On a brand-new conversation, both agents perform similarly. We first run 5 warmup queries to build up conversation state, then evaluate on queries that specifically test **recall, continuity, and synthesis** \u2014 the capabilities that memory engineering enables." ] }, { @@ -3507,7 +3509,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ── Warmup phase: build up conversation history so the memory agent has state to leverage ──\n", + "# \u2500\u2500 Warmup phase: build up conversation history so the memory agent has state to leverage \u2500\u2500\n", "eval_thread_eng = str(uuid.uuid4())[:8]\n", "eval_thread_naive = str(uuid.uuid4())[:8]\n", "\n", @@ -3519,13 +3521,13 @@ " \"Compare the findings from the two searches we did\",\n", "]\n", "\n", - "print(\"🔄 WARMUP — building conversation history on both agents...\\n\")\n", + "print(\"\ud83d\udd04 WARMUP \u2014 building conversation history on both agents...\\n\")\n", "for i, q in enumerate(warmup_queries, 1):\n", " print(f\" Warmup {i}/{len(warmup_queries)}: {q[:60]}...\")\n", " call_agent(q, thread_id=eval_thread_eng)\n", " call_agent_naive(q, thread_id=eval_thread_naive)\n", "\n", - "# ── Evaluation phase: these queries test memory recall and continuity ──\n", + "# \u2500\u2500 Evaluation phase: these queries test memory recall and continuity \u2500\u2500\n", "eval_queries = [\n", " \"What was the very first paper we discussed and what were its key points?\",\n", " \"Summarize the full arc of our conversation so far\",\n", @@ -3534,16 +3536,16 @@ "\n", "eval_results = []\n", "\n", - "print(f\"\\n{'='*60}\\n📋 EVALUATION — collecting response pairs for judging\\n{'='*60}\")\n", + "print(f\"\\n{'='*60}\\n\ud83d\udccb EVALUATION \u2014 collecting response pairs for judging\\n{'='*60}\")\n", "for q in eval_queries:\n", " print(f\"\\nEVAL: {q}\")\n", - " print(\" ▶ Memory-engineered agent...\")\n", + " print(\" \u25b6 Memory-engineered agent...\")\n", " eng_resp = call_agent(q, thread_id=eval_thread_eng)\n", - " print(\" ▶ Naive agent...\")\n", + " print(\" \u25b6 Naive agent...\")\n", " naive_resp = call_agent_naive(q, thread_id=eval_thread_naive)\n", " eval_results.append((q, eng_resp, naive_resp))\n", "\n", - "print(f\"\\n✅ Collected {len(eval_results)} response pairs for judging.\")" + "print(f\"\\n\u2705 Collected {len(eval_results)} response pairs for judging.\")" ] }, { @@ -3564,10 +3566,10 @@ "{response_b}\n", "\n", "Evaluate both responses on:\n", - "1. **Accuracy** — Are the facts correct and claims well-supported?\n", - "2. **Completeness** — Does the response fully address the query?\n", - "3. **Relevance** — Does it stay on-topic and use context appropriately?\n", - "4. **Coherence** — Is it well-structured and easy to follow?\n", + "1. **Accuracy** \u2014 Are the facts correct and claims well-supported?\n", + "2. **Completeness** \u2014 Does the response fully address the query?\n", + "3. **Relevance** \u2014 Does it stay on-topic and use context appropriately?\n", + "4. **Coherence** \u2014 Is it well-structured and easy to follow?\n", "\n", "Reply with EXACTLY this JSON format (no other text):\n", "{{\"winner\": \"A\" or \"B\" or \"Tie\", \"reason\": \"one sentence explanation\"}}\"\"\"\n", @@ -3590,7 +3592,7 @@ " verdict = judge_responses(query, eng_resp, naive_resp)\n", " verdict[\"query\"] = query\n", " judgments.append(verdict)\n", - " label = {\"A\": \"Memory Agent ✅\", \"B\": \"Naive Agent\", \"Tie\": \"Tie 🤝\"}\n", + " label = {\"A\": \"Memory Agent \u2705\", \"B\": \"Naive Agent\", \"Tie\": \"Tie \ud83e\udd1d\"}\n", " print(f\"Query: {query[:60]}...\")\n", " print(f\" Winner: {label.get(verdict['winner'], verdict['winner'])}\")\n", " print(f\" Reason: {verdict['reason']}\\n\")" @@ -3629,6 +3631,240 @@ "print(f\"\\nMemory Agent wins {wins['Memory Agent']}/{len(judgments)} queries.\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Part 7: Agent Observability with LangSmith\n", + "\n", + "--------\n", + "\n", + "> \ud83d\udcd6 **Workshop Guide:** [docs/part-7-observability.md](../docs/part-7-observability.md)\n", + "\n", + "In Part 6, you proved that memory and context engineering keep the agent's context window under control.\n", + "\n", + "Now you will make the agent's runtime behavior visible. You will add LangSmith trace runs around the same agent loop so you can inspect one turn: context assembly, Oracle memory reads, tool retrieval, LLM calls, tool execution, context checks, summarisation, and memory writes.\n", + "\n", + "Before running this part, set your LangSmith API key in a terminal or in this notebook:\n", + "\n", + "```bash\n", + "export LANGSMITH_API_KEY=\"lsv2_...\"\n", + "export LANGSMITH_TRACING=true\n", + "export LANGSMITH_PROJECT=agent-memory-workshop\n", + "```\n", + "\n", + "Then open `https://smith.langchain.com` and select the `agent-memory-workshop` project after you run an observed turn.\n", + "\n", + "> \ud83d\udca1 **Key Insight \u2014 Part 7**\n", + ">\n", + "> Observability is not the memory store. Oracle AI Database still persists memory; LangSmith shows what happened during an agent run.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 17: Configure LangSmith Tracing\n", + "\n", + "LangSmith sends traces from this notebook to a project. Configure observability with:\n", + "\n", + "- `LANGSMITH_TRACING = \"true\"`\n", + "- `LANGSMITH_PROJECT = \"agent-memory-workshop\"`\n", + "- a LangSmith `Client`\n", + "- `tracer = langsmith` so later cells can call `tracer.trace(...)`\n", + "\n", + "Do not capture full prompts, responses, retrieved documents, API keys, raw tool output, or database connection strings in trace inputs, outputs, or metadata.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import langsmith as ls\n", + "from langsmith import Client\n", + "\n", + "def configure_agent_observability(\n", + " project_name: str = \"agent-memory-workshop\",\n", + "):\n", + " \"\"\"Configure LangSmith tracing for Part 7.\"\"\"\n", + " # TODO 17: Enable LangSmith tracing and return a dict with the client and project name.\n", + " #\n", + " # Hints:\n", + " # os.environ.setdefault(\"LANGSMITH_TRACING\", \"true\")\n", + " # os.environ.setdefault(\"LANGSMITH_PROJECT\", project_name)\n", + " # if not os.environ.get(\"LANGSMITH_API_KEY\"):\n", + " # raise RuntimeError(\"Set LANGSMITH_API_KEY before running Part 7.\")\n", + " # client = Client()\n", + " # return {\"client\": client, \"project_name\": project_name}\n", + " pass\n", + "\n", + "observability = configure_agent_observability()\n", + "tracer = ls\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 17\n", + "assert observability is not None and observability.get(\"client\") is not None, (\n", + " \"TODO 17 incomplete \u2014 LangSmith tracing is not configured.\\n\"\n", + " \"Open docs/part-7-observability.md and complete configure_agent_observability().\"\n", + ")\n", + "assert tracer is not None, \"TODO 17 incomplete \u2014 tracer is not configured.\"\n", + "print(\"TODO 17 passed \u2014 LangSmith tracing configured\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 18: Build an Observed Agent Wrapper\n", + "\n", + "Create `call_agent_observed()` as a wrapper-style copy of the Part 6 harness. Keep `call_agent()` unchanged.\n", + "\n", + "Your observed version should create trace runs for:\n", + "\n", + "- `agent.run`\n", + "- `agent.context.build`\n", + "- `agent.memory.read`\n", + "- `agent.context.check`\n", + "- `agent.toolbox.read`\n", + "- `agent.memory.write`\n", + "- `agent.llm.call`\n", + "- `agent.tool.execute`\n", + "- `agent.tool.log`\n", + "\n", + "Record safe metadata such as lengths, counts, memory type, tool name, model name, and status. Do not record full prompts or raw tool output.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def _safe_len(value) -> int:\n", + " return len(str(value or \"\"))\n", + "\n", + "def _trace_run(name: str, run_type: str = \"chain\", inputs: dict | None = None, metadata: dict | None = None):\n", + " return tracer.trace(\n", + " name,\n", + " run_type=run_type,\n", + " inputs=inputs or {},\n", + " metadata=metadata or {},\n", + " project_name=observability[\"project_name\"],\n", + " client=observability[\"client\"],\n", + " )\n", + "\n", + "def _mark_run_error(run, error: Exception):\n", + " run.add_metadata({\"error.type\": type(error).__name__})\n", + " run.end(error=f\"{type(error).__name__}: {error}\")\n", + "\n", + "def call_agent_observed(query: str, thread_id: str = \"observed-1\", max_iterations: int = 5, max_execution_time_s: float = 60.0) -> str:\n", + " \"\"\"Observed version of call_agent() that sends LangSmith trace runs.\"\"\"\n", + " # TODO 18: Copy the Part 6 call_agent() flow and wrap the major operations in trace runs.\n", + " #\n", + " # Minimum trace shape:\n", + " # with _trace_run(\"agent.run\", inputs={\"query.length\": _safe_len(query)}) as run:\n", + " # with _trace_run(\"agent.context.build\"):\n", + " # ... read conversational, knowledge, workflow, entity, and summary memory ...\n", + " # with _trace_run(\"agent.toolbox.read\"):\n", + " # ... read tools ...\n", + " # with _trace_run(\"agent.llm.call\", run_type=\"llm\"):\n", + " # ... call_openai_chat(...) ...\n", + " # with _trace_run(\"agent.tool.execute\", run_type=\"tool\"):\n", + " # ... execute selected tools ...\n", + " # with _trace_run(\"agent.memory.write\"):\n", + " # ... write user and assistant messages ...\n", + " #\n", + " # Remember: store lengths and counts, not raw prompt/tool/document content.\n", + " pass\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 18\n", + "assert callable(call_agent_observed), (\n", + " \"TODO 18 incomplete \u2014 call_agent_observed is not defined.\\n\"\n", + " \"Open docs/part-7-observability.md and build the observed wrapper.\"\n", + ")\n", + "print(\"TODO 18 passed \u2014 call_agent_observed is defined\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TODO 19: Run Observed Turns and Inspect LangSmith\n", + "\n", + "Run a few short turns with a fresh thread ID. Then open LangSmith, select project `agent-memory-workshop`, and inspect the latest `agent.run` traces.\n", + "\n", + "Look for child runs that show memory reads, tool calls, LLM calls, context checks, and memory writes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO 19: Run a short observed conversation.\n", + "#\n", + "# observed_thread = \"observed-0022\"\n", + "# observed_queries = [\n", + "# \"Find papers about memory in AI agents\",\n", + "# \"What did we just discuss?\",\n", + "# \"Search the web for recent agent observability ideas\",\n", + "# ]\n", + "#\n", + "# for q in observed_queries:\n", + "# call_agent_observed(q, thread_id=observed_thread, max_iterations=5)\n", + "#\n", + "# After this runs, open LangSmith and select project: agent-memory-workshop\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Checkpoint: TODO 19\n", + "print(\"TODO 19 checkpoint \u2014 after running observed turns, open LangSmith and inspect project 'agent-memory-workshop'.\")\n", + "print(\"LangSmith: https://smith.langchain.com\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What You Should See in LangSmith\n", + "\n", + "Open the latest `agent.run` trace. The child runs show the agent's execution path in order.\n", + "\n", + "The most useful runs are usually:\n", + "\n", + "- `agent.context.build` \u2014 how much context the agent assembled\n", + "- `agent.memory.read` \u2014 which memory systems were queried\n", + "- `agent.toolbox.read` \u2014 which tools were selected\n", + "- `agent.llm.call` \u2014 where model time is spent\n", + "- `agent.tool.execute` \u2014 whether tools ran and how large their outputs were\n", + "- `agent.memory.write` \u2014 what the agent persisted for the next turn\n", + "\n", + "This is the operational view behind the Part 6 context growth chart. The chart shows the outcome; the trace shows the path.\n" + ] + }, { "cell_type": "markdown", "id": "cac9f9dc", @@ -3640,9 +3876,9 @@ "\n", "Now that you've built a complete memory and context engineering system, here are resources to keep going:\n", "\n", - "- **[Agent Memory: Building Memory-Aware Agents](https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/)** — DeepLearning.AI short course for deeper exploration of agent memory patterns\n", - "- **[Oracle AI Developer Hub](https://github.com/oracle-devrel/oracle-ai-developer-hub)** — More technical assets, samples, and projects with Oracle AI\n", - "- **[Oracle Developer Resource](https://www.oracle.com/developer/)** — Documentation, tools, and community for Oracle developers" + "- **[Agent Memory: Building Memory-Aware Agents](https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/)** \u2014 DeepLearning.AI short course for deeper exploration of agent memory patterns\n", + "- **[Oracle AI Developer Hub](https://github.com/oracle-devrel/oracle-ai-developer-hub)** \u2014 More technical assets, samples, and projects with Oracle AI\n", + "- **[Oracle Developer Resource](https://www.oracle.com/developer/)** \u2014 Documentation, tools, and community for Oracle developers" ] } ],