diff --git a/cookbook/integration_ejentum_cognitive_harness.ipynb b/cookbook/integration_ejentum_cognitive_harness.ipynb new file mode 100644 index 0000000000..8d7a5189d7 --- /dev/null +++ b/cookbook/integration_ejentum_cognitive_harness.ipynb @@ -0,0 +1,212 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tracing an Ejentum cognitive harness agent loop in Langfuse\n", + "\n", + "This cookbook shows how to instrument an agent that calls the [Ejentum cognitive harness](https://ejentum.com) REST API between an LLM call and its response, so each turn shows up in Langfuse as a nested trace with the harness retrieval, the model call, and their token usage.\n", + "\n", + "The pattern is reusable. Any in-loop REST call to a third-party tool can be wrapped the same way; Ejentum is the worked example because the agent calls one endpoint with a `mode` argument, which keeps the cookbook short.\n", + "\n", + "## What you'll set up\n", + "\n", + "1. The Langfuse v3 client and a project to send traces to.\n", + "2. The `from langfuse.openai import openai` drop-in so OpenAI calls are auto-instrumented.\n", + "3. An `@observe`-decorated function around each Ejentum REST call so the harness retrieval appears as a child span, with the `mode` and a scaffold length attached via `update_current_span(metadata=...)`.\n", + "4. A two-turn agent that calls `harness_reasoning` then `harness_anti_deception` and answers a reasoning-heavy task each turn.\n", + "\n", + "Open the trace in your Langfuse project to inspect the tree per turn." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q \"langfuse>=3.0\" openai requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Configure Langfuse\n", + "\n", + "Set `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST` from your Langfuse project. The host defaults to `https://cloud.langfuse.com`; self-hosted users should set this to their instance URL." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Replace with your project's keys, or load from a .env file.\n", + "os.environ.setdefault(\"LANGFUSE_HOST\", \"https://cloud.langfuse.com\")\n", + "assert os.environ.get(\"LANGFUSE_PUBLIC_KEY\"), \"LANGFUSE_PUBLIC_KEY is not set.\"\n", + "assert os.environ.get(\"LANGFUSE_SECRET_KEY\"), \"LANGFUSE_SECRET_KEY is not set.\"\n", + "assert os.environ.get(\"OPENAI_API_KEY\"), \"OPENAI_API_KEY is not set.\"\n", + "assert os.environ.get(\"EJENTUM_API_KEY\"), \"EJENTUM_API_KEY is not set. Get one at https://ejentum.com/dashboard\"\n", + "\n", + "from langfuse import get_client\n", + "\n", + "langfuse = get_client()\n", + "assert langfuse.auth_check(), \"Langfuse auth check failed; verify your keys.\"\n", + "print(\"Langfuse client authenticated. Project:\", os.environ[\"LANGFUSE_HOST\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Auto-instrument OpenAI and wrap the Ejentum REST call\n", + "\n", + "Langfuse ships a drop-in for OpenAI (`from langfuse.openai import openai`). It captures the request, response, model, and token usage automatically. For Ejentum we add a thin `@observe`-decorated wrapper so each harness retrieval shows up as a child span in the same trace, with the `mode` and scaffold length surfaced via `langfuse.update_current_span(metadata=...)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from langfuse import observe\n", + "from langfuse.openai import openai\n", + "\n", + "EJENTUM_URL = \"https://ejentum-main-ab125c3.zuplo.app/logicv1/\"\n", + "\n", + "\n", + "@observe(name=\"ejentum.call\")\n", + "def call_ejentum(query: str, mode: str = \"reasoning\") -> str:\n", + " \"\"\"Fetch a cognitive scaffold from the Ejentum REST gateway.\n", + "\n", + " Emits a Langfuse span with input, output, and metadata\n", + " (mode, query_length, scaffold_length).\n", + " \"\"\"\n", + " r = requests.post(\n", + " EJENTUM_URL,\n", + " headers={\n", + " \"Authorization\": f\"Bearer {os.environ['EJENTUM_API_KEY']}\",\n", + " \"Content-Type\": \"application/json\",\n", + " },\n", + " json={\"query\": query, \"mode\": mode},\n", + " timeout=10,\n", + " )\n", + " r.raise_for_status()\n", + " payload = r.json()\n", + " scaffold = payload[0].get(mode, \"\") if isinstance(payload, list) else \"\"\n", + " langfuse.update_current_span(\n", + " metadata={\n", + " \"mode\": mode,\n", + " \"query_length\": len(query),\n", + " \"scaffold_length\": len(scaffold),\n", + " },\n", + " )\n", + " return scaffold" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Run a two-turn agent and view the trace\n", + "\n", + "Each turn does the same shape:\n", + "\n", + "1. `call_ejentum(...)` (Langfuse span with mode + scaffold length).\n", + "2. `openai.chat.completions.create(...)` from the auto-instrumented client.\n", + "\n", + "Both spans sit inside a single trace per turn, so you can drill from the trace summary into the harness retrieval, the system message it produced, and the model call that consumed it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@observe(name=\"harness_augmented_turn\")\n", + "def harness_augmented_turn(task: str, mode: str = \"reasoning\") -> str:\n", + " scaffold = call_ejentum(task, mode=mode)\n", + " completion = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\",\n", + " messages=[\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": (\n", + " \"Apply the cognitive scaffold below before answering.\\n\\n\"\n", + " f\"[SCAFFOLD]\\n{scaffold}\\n[END]\"\n", + " ),\n", + " },\n", + " {\"role\": \"user\", \"content\": task},\n", + " ],\n", + " temperature=0,\n", + " )\n", + " return completion.choices[0].message.content or \"\"\n", + "\n", + "\n", + "print(harness_augmented_turn(\n", + " \"We have 50M users and want to add a NOT NULL column. \"\n", + " \"Walk through the trade-offs and recommend a migration path.\",\n", + " mode=\"reasoning\",\n", + "))\n", + "print(\"---\")\n", + "print(harness_augmented_turn(\n", + " \"A user insists their DB is the bottleneck and refuses other diagnostics. \"\n", + " \"Walk through whether the framing is sound before recommending.\",\n", + " mode=\"anti-deception\",\n", + "))\n", + "\n", + "# Flush so the traces show up immediately rather than on process exit.\n", + "langfuse.flush()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What to look at in Langfuse\n", + "\n", + "Open your Langfuse project. For each turn you should see one trace with two child spans:\n", + "\n", + "- `ejentum.call`: the harness retrieval. Open the metadata panel to confirm `mode` and `scaffold_length`.\n", + "- `openai.chat.completions.create` (from `langfuse.openai`): the model call. Token usage, model name, prompt, and response are captured automatically.\n", + "\n", + "Useful filters in the Traces view:\n", + "\n", + "- `metadata.mode = \"anti-deception\"` to see only deception-resistance turns.\n", + "- `metadata.scaffold_length > 0` to confirm every harness call returned a non-empty scaffold.\n", + "\n", + "## Adapting the pattern\n", + "\n", + "Any in-loop REST call to a third-party tool can be instrumented the same way. The four steps in this cookbook (drop-in OpenAI client + `@observe`-decorated REST wrapper + metadata via `update_current_span` + scaffold-aware system message) generalise to harness calls in any agent loop. For larger agent frameworks, swap step 4 for the framework's native control flow; the span shape stays the same." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}