self-rag

A self-improving RAG system for legal questions against a Victorian bench book. An autonomous agent loop evaluates retrieval quality using Arize, then iteratively improves the LangGraph agent and indexing pipeline until recall targets are met.

Repository Structure

agent/              LangGraph RAG agent (OpenSearch retriever + GPT-4o-mini)
index/              OpenSearch indexing pipeline (chunking, embedding, bulk indexing)
scripts/ralph/      Autonomous improvement loop (ralph.sh + Claude Code)
scripts/            Evaluation and re-indexing scripts
skills/             Skill definitions for the Ralph agent (PRD generation, story execution)

Prerequisites

Arize account — sign up at arize.com (free tier available)
Python 3.10+ and uv
Claude Code — install via npm install -g @anthropic-ai/claude-code

Arize AX CLI:

pip install arize-ax-cli
ax config set --space-id <your-space-id> --api-key <your-arize-api-key>
ax config show  # verify profile

Find your Space ID and API key in the Arize UI under Settings > Space Settings > API Keys.

Arize Skills plugin for Claude Code:

claude /plugin marketplace add Arize-ai/arize-skills
claude /plugin install arize-skills@Arize-ai-arize-skills

OpenSearch instance with kNN enabled
API keys — OpenAI or Anthropic, Arize, and OpenSearch credentials (see Environment Variables)

Getting Started

1. Clone and install dependencies

git clone <repo-url> && cd self-rag-2

# Agent
cd agent && uv sync --dev && cd ..

# Index pipeline
cd index && uv sync --dev && cd ..

2. Configure environment variables

cp agent/.env.example agent/.env
cp index/.env.example index/.env
# Edit both .env files with your credentials

3. Upload the QA dataset to Arize

Inside Claude Code, use arize-skills to download the qa split from isaacus/legal-rag-bench and upload it as an Arize dataset. This dataset is used by the self-improvement loop to evaluate retrieval recall.

4. Index the corpus

cd index
python index.py

This loads the legal-rag-bench corpus, chunks it, generates embeddings with text-embedding-3-large (1024 dims), and bulk-indexes into OpenSearch.

5. Start the LangGraph agent

cd agent
langgraph dev

The agent runs on port 2024.

6. Run the self-improvement loop

Start Claude Code with --dangerously-skip-permissions so the autonomous agent can freely edit code and run commands:

./scripts/ralph/ralph.sh --tool claude

Ralph drives the improvement loop:

Reads a PRD (scripts/ralph/prd.json) and picks the highest-priority failing user story
Implements changes in agent/ (retrieval logic) and/or index/ (indexing pipeline)
Runs quality checks (lint, typecheck, tests)
Commits passing changes
Runs an Arize experiment against the QA dataset to measure recall@1, recall@5, recall@10
Analyzes failures and adds new improvement stories if recall@5 < 80%
Repeats until recall targets are met or max iterations reached

Architecture

Agent

A LangGraph StateGraph with two nodes:

retrieve — kNN search against OpenSearch using text-embedding-3-large (1024 dims)
call_model — RAG prompt answered by GPT-4o-mini

Tracing via Arize OTel + LangChainInstrumentor.

Index Pipeline

Loads the legal-rag-bench corpus, chunks with RecursiveCharacterTextSplitter, embeds with text-embedding-3-large (1024 dims), and bulk-indexes into OpenSearch with HNSW cosine similarity.

Self-Improvement Loop

                    ┌─────────────────────────┐
                    │   ralph.sh (loop driver) │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  Pick next failing story │
                    │  from prd.json           │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  Implement changes in    │
                    │  agent/ and/or index/    │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  Lint, typecheck, test   │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  Run Arize experiment    │
                    │  (recall@1, @5, @10)     │
                    └────────────┬────────────┘
                                 │
                         ┌───────▼───────┐
                         │ Recall@5 > 80%? │
                         └───┬───────┬───┘
                          no │       │ yes
                    ┌────────▼──┐  ┌─▼──────────┐
                    │ Add new   │  │   Done      │
                    │ stories   │  └─────────────┘
                    └─────┬─────┘
                          │
                          └──── (next iteration)

Environment Variables

Agent (`agent/.env`)

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI API key
`HOST`	Yes	OpenSearch host
`USERNAME`	Yes	OpenSearch username
`PASSWORD`	Yes	OpenSearch password
`INDEX`	Yes	OpenSearch index name
`ARIZE_SPACE_ID`	No	Arize space ID
`ARIZE_API_KEY`	No	Arize API key
`ARIZE_PROJECT_NAME`	No	Arize project name

Index Pipeline (`index/.env`)

Variable	Required	Description
`OPENSEARCH_HOST`	Yes	OpenSearch host URL
`OPENSEARCH_USER`	Yes	OpenSearch username
`OPENSEARCH_PASS`	Yes	OpenSearch password
`OPENAI_API_KEY`	Yes	OpenAI API key for embeddings

Development

# Agent
cd agent
make test              # unit tests
make lint              # ruff + mypy --strict
make format            # auto-fix

# Index
cd index
uv sync --dev

Both Python packages use Ruff (pycodestyle, pyflakes, isort, pydocstyle) and mypy --strict.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
index		index
scripts		scripts
skills		skills
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

self-rag

Repository Structure

Prerequisites

Getting Started

1. Clone and install dependencies

2. Configure environment variables

3. Upload the QA dataset to Arize

4. Index the corpus

5. Start the LangGraph agent

6. Run the self-improvement loop

Architecture

Agent

Index Pipeline

Self-Improvement Loop

Environment Variables

Agent (`agent/.env`)

Index Pipeline (`index/.env`)

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

self-rag

Repository Structure

Prerequisites

Getting Started

1. Clone and install dependencies

2. Configure environment variables

3. Upload the QA dataset to Arize

4. Index the corpus

5. Start the LangGraph agent

6. Run the self-improvement loop

Architecture

Agent

Index Pipeline

Self-Improvement Loop

Environment Variables

Agent (agent/.env)

Index Pipeline (index/.env)

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Agent (`agent/.env`)

Index Pipeline (`index/.env`)

Packages