Harper Demo Agent

A conversational AI agent with persistent semantic memory, a two-layer semantic cache, web search, cost tracking, and a browser chat UI — all running on Harper with Claude (via the Anthropic API or Google Cloud Vertex AI).

Live demo: agent-example.stephen-demo-org.harperfabric.com/Chat

What It Does

Chat with Claude via a REST endpoint (POST /Agent) or the built-in browser chat UI (GET /Chat)
Semantic cache — two-layer cache catches repeated and rephrased questions before they reach Claude, returning answers instantly at zero LLM cost
Web search — Anthropic's built-in server-side web search (web_search_20250305, up to 5 uses per turn); no external API key required
Persistent memory — every message is embedded and stored in Harper; semantic recall surfaces relevant context from past conversations automatically
Local embeddings — bge-small-en-v1.5 runs via harper-fabric-embeddings (llama.cpp wrapper), entirely in-process; no embedding API key or billing
Per-response metadata — every API response includes latency, token counts, cost breakdown, web searches used, and vector context stats
Global savings tracker — cache hits accumulate a running total of USD saved and hit count in a Stats table, displayed live in the chat sidebar
Auto-generated REST APIs — full CRUD on Conversation, Message, and Stats tables, generated from the GraphQL schema with zero route code

Architecture

User Query
    │
    ▼
┌──────────────────────────────────────────────────────────┐
│                         Harper                           │
│                                                          │
│  1. Embed user message                                   │
│     ┌─────────────────────┐                              │
│     │   EmbeddingCache    │ ← normalized text → vector   │
│     │   hit: ~1ms lookup  │   miss: SLM generates it,   │
│     │   (skip SLM)        │   then stores for next time  │
│     └─────────────────────┘                              │
│     Local SLM: bge-small-en-v1.5 (llama.cpp, in-process)│
│                                                          │
│  2. Store user message + embedding                       │
│  3. HNSW semantic cache check (cosine distance < 0.12)   │
│       │                          │                       │
│   Cache HIT                  Cache MISS                  │
│       │                          │                       │
│  Return $0.00           Call Claude ──────────────────────┼──► Anthropic API
│  + saved $X                      │                       │    + Web Search
│                          Embed response (via cache/SLM)  │◄──────────┘
│                          Store in Harper                  │
│                                                          │
└──────────────────────────────────────────────────────────┘

Every request is standalone. Ask once, pay for Claude. Ask again — or rephrase the same question — and Harper serves the cached answer instantly at $0. The embedding cache eliminates the SLM cost on repeated text (~2.3s on Fabric → ~1ms).

How the Semantic Cache Works

Before calling Claude, the agent searches Harper's HNSW vector index for semantically similar past questions:

tables.Message.search({
  conditions: {
    attribute: 'embedding',
    comparator: 'lt',
    value: 0.12,           // cosine distance < 0.12 ≡ cosine similarity ≥ 0.88
    target: userEmbedding,
  },
  limit: 10,
})

Harper's HNSW index evaluates the distance threshold internally — no full table scan, no in-memory cosine math. When a match is found, the agent looks up the assistant reply that followed it and returns that directly. No Claude call, no tokens, no cost.

Cache hits return cost.total: 0 and include a cost.saved field showing what the call would have cost. The saved amount is added to the global Stats record (totalSaved, cacheHits).

Prerequisites

Node.js 22+
Harper CLI: npm install -g harperdb
One of:
- Anthropic API key (direct API — default)
- Google Cloud project with Vertex AI enabled (GCP Vertex AI)

No embedding API key needed — embeddings run in-process.

Quick Start

# Clone the repo
git clone https://github.com/stephengoldberg/agent-example-harper.git
cd agent-example-harper

# Install dependencies
npm install

# Configure environment
cp dot-env.example .env
# Edit .env — see "LLM Provider Setup" below

# Start the dev server
npm run dev

LLM Provider Setup

This agent supports two LLM backends — the direct Anthropic API and Google Cloud Vertex AI. Set LLM_PROVIDER in your .env to choose which one to use.

Option A: Anthropic API (default)

The simplest path. You just need an API key from console.anthropic.com.

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...

Web search is included automatically via Anthropic's server-side web_search_20250305 tool — no additional API keys required.

Option B: Google Cloud Vertex AI

Run Claude through your own GCP project. Useful for enterprise environments, org-level billing, data residency, and keeping everything inside Google Cloud.

1. Enable the Vertex AI API in your GCP project:

https://console.developers.google.com/apis/api/aiplatform.googleapis.com/overview?project=YOUR_PROJECT_ID

2. Enable a Claude model in the Vertex AI Model Garden — search for "Claude" and enable the model you want.

3. Request quota — new projects start with 0 tokens/min. Go to IAM & Admin → Quotas, filter for your Claude model, and request an increase.

4. Create a service account with the Vertex AI User role, download the JSON key, and place it in the project root.

5. Configure .env:

LLM_PROVIDER=vertex
VERTEX_PROJECT_ID=my-gcp-project
VERTEX_REGION=us-east5
GOOGLE_APPLICATION_CREDENTIALS=./your-service-account-key.json

Note: Web search is not available on Vertex AI by default (requires an org policy change). The agent automatically disables it when running on Vertex.

Environment Variable Reference

Variable	Required	Default	Description
`LLM_PROVIDER`	No	`anthropic`	`anthropic` or `vertex`
`ANTHROPIC_API_KEY`	When `anthropic`	—	Anthropic API key
`VERTEX_PROJECT_ID`	When `vertex`	—	GCP project ID
`VERTEX_REGION`	No	`global`	Vertex AI region (e.g. `us-east5`, `global`)
`VERTEX_MODEL`	No	`claude-sonnet-4-6`	Vertex model ID
`GOOGLE_APPLICATION_CREDENTIALS`	When `vertex`	—	Path to GCP service account JSON key
`CLAUDE_MODEL`	No	`claude-sonnet-4-5-20250929`	Anthropic direct API model ID

First run: On startup, bge-small-en-v1.5 (~24 MB) is auto-downloaded into ./models/. This only happens once.

The server starts at http://localhost:9926. Open http://localhost:9926/Chat in your browser.

Usage

Open the chat UI:

http://localhost:9926/Chat

Send a message via API:

curl -X POST http://localhost:9926/Agent \
  -H "Content-Type: application/json" \
  -d '{"message": "What is Harper?"}'

Response:

{
  "conversationId": "abc-123",
  "message": { "role": "assistant", "content": "Harper is..." },
  "meta": {
    "latencyMs": 1842,
    "tokens": { "input": 312, "output": 148, "total": 460 },
    "cost": { "input": 0.000936, "output": 0.00222, "search": 0, "total": 0.003156 },
    "webSearches": 0,
    "vectorContext": { "hit": false, "count": 0, "cached": false }
  }
}

Continue a conversation:

curl -X POST http://localhost:9926/Agent \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "abc-123",
    "message": "Tell me more about its vector search"
  }'

Ask the same question again (cache hit — free and instant):

curl -X POST http://localhost:9926/Agent \
  -H "Content-Type: application/json" \
  -d '{"message": "What is Harper?"}'

# meta.cost.total = 0, meta.cost.saved = 0.003156

Check global savings:

curl http://localhost:9926/PublicStats/global
# { "id": "global", "totalSaved": 0.003156, "cacheHits": 1, "updatedAt": "..." }

Auto-generated CRUD (from schema, no route code written):

# List all conversations
curl http://localhost:9926/Conversation

# Get messages for a conversation
curl "http://localhost:9926/Message?conversationId=abc-123"

Project Structure

├── config.yaml              # Harper app configuration (6 lines)
├── schemas/
│   └── schema.graphql       # Database schema (Conversation, Message, Stats + HNSW index)
├── resources/
│   ├── Agent.js             # POST /Agent (agent loop + semantic cache + web search)
│   │                        # GET  /PublicStats/:id (public stats endpoint)
│   └── Chat.js              # GET  /Chat (full browser chat UI served as HTML)
├── lib/
│   ├── config.js            # Environment variable helpers
│   └── embeddings.js        # Local llama.cpp embeddings via harper-fabric-embeddings
├── models/                  # Auto-downloaded GGUF model (gitignored)
├── .env.example             # Environment template
└── package.json

Schema

type Conversation @table @export {
  id: ID @primaryKey
  title: String
  createdAt: String
  updatedAt: String
}

type Message @table @export {
  id: ID @primaryKey
  conversationId: String @indexed
  role: String
  content: String
  cost: Float
  embedding: [Float] @indexed(type: "HNSW", distance: "cosine")
  createdAt: String
}

type Stats @table @export {
  id: ID @primaryKey
  totalSaved: Float
  cacheHits: Int
  updatedAt: String
}

@table creates the database table. @export generates the full REST CRUD API. @indexed(type: "HNSW", distance: "cosine") adds the HNSW vector index used for both semantic cache lookup and context retrieval.

Deploying to Harper Fabric

# 1. Create a cluster at https://fabric.harper.fast/
# 2. Add credentials to .env
CLI_TARGET=https://your-instance.your-org.harperfabric.com:9925/
CLI_TARGET_USERNAME=your-username
CLI_TARGET_PASSWORD=your-password

# 3. Deploy
npm run deploy

Rolling restarts and replication are handled automatically.

Public access note: To make endpoints accessible without authentication, set target.checkPermission = false inside the handler method. This is the V2 Resource API pattern (loadAsInstance = false). The V1 method allowRead() is ignored in V2 Resources and has no effect.

Why Harper for AI Agents

Concern	Traditional Stack	Harper
Database	Postgres / MongoDB	Built in
Vector search	Pinecone / Weaviate	Built in (HNSW — one schema directive)
Semantic cache	Redis + custom logic	Built in (native HNSW threshold filter)
API server	Express / Fastify	Auto-generated from schema
Chat UI server	Vite / Next.js	Resource returning `Response(html)`
Embeddings	Voyage / OpenAI API	Local via `harper-fabric-embeddings` (24 MB, in-process)
Deployment	Docker + K8s + cloud	`harperdb deploy .`

Key insights from building this:

Native HNSW conditions search scales. Passing comparator: 'lt' to Harper's vector search evaluates the distance threshold inside the index. No JS cosine math, no full scans.
Everything in one process means no network hops. Database, vector index, cache, API, and agent code share the same runtime. No Redis round-trip, no vector DB round-trip.
The schema is the only config you need. One @indexed(type: "HNSW", distance: "cosine") directive creates the vector index. One @export generates the CRUD API. One @indexed on conversationId creates the secondary index.
Resources can return anything. A Resource subclass can return a Response with any content type — JSON, HTML, plain text. The chat UI lives in the same project and deploy as the agent logic.
Local embeddings eliminate a cost center. bge-small-en-v1.5 runs in-process via llama.cpp. No per-embedding billing, no embedding service SLA to worry about.

License

Apache 2.0 — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
lib		lib
public		public
resources		resources
schemas		schemas
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
blog-part-2-vertex.md		blog-part-2-vertex.md
blog-post.md		blog-post.md
config.yaml		config.yaml
dot-env.example		dot-env.example
login.js		login.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harper Demo Agent

What It Does

Architecture

How the Semantic Cache Works

Prerequisites

Quick Start

LLM Provider Setup

Option A: Anthropic API (default)

Option B: Google Cloud Vertex AI

Environment Variable Reference

Usage

Project Structure

Schema

Deploying to Harper Fabric

Why Harper for AI Agents

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Harper Demo Agent

What It Does

Architecture

How the Semantic Cache Works

Prerequisites

Quick Start

LLM Provider Setup

Option A: Anthropic API (default)

Option B: Google Cloud Vertex AI

Environment Variable Reference

Usage

Project Structure

Schema

Deploying to Harper Fabric

Why Harper for AI Agents

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages