Debriefer

Multi-source research orchestration with Wikipedia-grade reliability scoring.

You need facts from the open web, but every source has a different API, a different credibility, and a different cost. Debriefer queries dozens of sources in parallel — news wires, digital archives, structured databases, search engines — scores each one using Wikipedia's Reliable Sources editorial methodology, and stops early once it has enough high-quality findings. You define the subject, the output shape, and the quality bar. Debriefer handles the orchestration, the budget, and the trust math.

Extracted from a production enrichment pipeline that uses it to research thousands of records across dozens of sources.

The Hook

What does a research run actually look like? Here's the CLI researching Audrey Hepburn across structured data and news sources (all free, no API keys needed):

$ debriefer debrief "Audrey Hepburn" --no-synthesis --categories structured,news

Subject: Audrey Hepburn
Sources: 18/20  Cost: $0.0000  Duration: 3.1s
Stopped at phase 1

--- Findings (18) ---

  Source: Wikidata
  Tier: structured_data  Confidence: 0.95
  URL: https://www.wikidata.org/wiki/Q41282
  Belgian-British actress (1929-1993). Known for Roman Holiday, Breakfast at...

  Source: Wikipedia
  Tier: secondary  Confidence: 0.92
  URL: https://en.wikipedia.org/wiki/Audrey_Hepburn
  Audrey Hepburn (born Audrey Kathleen Ruston; 4 May 1929 – 20 January 1993)...

  Source: AP News
  Tier: tier_1_news  Confidence: 0.88
  URL: https://apnews.com/search?q=Audrey+Hepburn
  ...

Structured data from Wikidata. Tier-1 news from AP, BBC, and Reuters. Wikipedia compilation. Twenty sources scored for reliability and queried in parallel — all in one call. Add API keys for Guardian, NYT, and archive sources to expand coverage further.

How It Works

Subject ──> Orchestrator ──> Phase 1 (free / free-tier) ──> Phase 2 (paid search) ──> Synthesis
                 │                 │                              │                       │
                 ├─ Cost Tracker   ├─ Wikidata                   ├─ Google Search         v
                 ├─ Rate Limiter   ├─ Wikipedia                  ├─ Bing Search       Structured
                 ├─ Cache          ├─ Guardian, NYT (free key)   ├─ Brave Search      output with
                 └─ Telemetry      └─ 20+ site-search (no key)  └─ ...                citations

The orchestrator runs phases in order — cheap sources first, expensive sources later. After each phase, it checks whether the early stop threshold has been met (enough distinct source families returned high-quality findings) or the cost limit has been exceeded. If either is true, remaining phases are skipped and synthesis runs on what's been collected.

Quality is measured on two independent axes: source reliability (how trustworthy is the publisher?) and content confidence (does this result actually answer the query?). See Reliability Scoring & Sources for the full tier table and methodology.

Quick Start

git clone https://github.com/chenders/debriefer.git
cd debriefer && npm install && npm run build

import { ResearchOrchestrator, NoopSynthesizer } from "@debriefer/core"
import { wikipedia, wikidata, openLibrary } from "@debriefer/sources"

const orchestrator = new ResearchOrchestrator(
  [{ phase: 1, name: "Free Sources", sources: [wikidata(), wikipedia(), openLibrary()] }],
  new NoopSynthesizer()
)

const result = await orchestrator.debrief({ id: "nm0000030", name: "Audrey Hepburn" })

for (const finding of result.findings) {
  console.log(`[${finding.sourceName}] (reliability: ${finding.reliabilityScore}) ${finding.url}`)
}

No API keys required — Wikipedia, Wikidata, and Open Library are free and open.

Use Cases

RAG with provenance — Feed your LLM only trusted context with reliability scores attached, so it cites real sources instead of hallucinating URLs
Database enrichment at scale — Pull data for thousands of records across dozens of APIs, with per-subject cost caps keeping the bill predictable
Cross-archive research — Query digitized newspaper archives across multiple countries and institutions in one call
AI agent tooling — Give AI agents structured access to research sources via the MCP server, with built-in cost guardrails

See Integration Examples for full code examples across RAG pipelines, historical research, pharmaceutical data, corporate due diligence, and more.

Packages

Package	Description
`@debriefer/core`	Orchestration engine — phased execution, early stopping, cost control
`@debriefer/sources`	Built-in source integrations (news, archives, structured data, search)
`@debriefer/ai`	AI-first defaults — Claude synthesis, confidence scoring, section filtering
`@debriefer/browser`	Browser stealth, CAPTCHA solving, and archive fallbacks
`@debriefer/cli`	Command-line interface
`@debriefer/server`	REST API server + Docker
`@debriefer/mcp`	Model Context Protocol for AI assistants
`debriefer` (Python)	Python HTTP client

Deploy

Method	Description	Details
Library	Import `@debriefer/core` and `@debriefer/sources` directly	Core README
CLI	`debriefer debrief "Marie Curie" --categories structured,news`	CLI README
HTTP	REST API with Docker support	Server README
MCP	Research tools for AI assistants (Claude, etc.)	MCP README
Python	`AsyncDebriefer` HTTP client	Python README

Interesting Implementation Details

Two-axis quality model — Source reliability (publisher trust) and content confidence (query relevance) are scored independently. A trusted source returning an irrelevant page doesn't count. Both must exceed thresholds for a finding to matter.
One hard dependency — The entire core package depends only on p-limit. Cache, telemetry, rate limiting, and synthesis are all injected interfaces. Swap in Redis, Datadog, or your own implementation without touching orchestration code.
Wikipedia RSP scoring — Reliability tiers are derived from the same Perennial Sources classification system that Wikipedia editors use to settle sourcing disputes. Not invented metrics — borrowed editorial standards.
AI is optional — The Anthropic SDK is an optional peer dependency. Use ClaudeSynthesizer to distill findings into structured output, or use NoopSynthesizer and process raw findings yourself. The engine doesn't care.
Browser fallback chain — When a source blocks automated requests, @debriefer/browser provides a stealth browser with CAPTCHA solving and archive-specific fallback strategies. The browser package itself is optional — sources degrade gracefully without it.
Supply chain provenance — Published via GitHub Releases with provenance attestations. What you see is what you get.

Project History

Debriefer was extracted from Dead on Film, a site that researches the lives and deaths of people in film and television. The enrichment pipeline behind that project — querying dozens of sources, scoring reliability, managing costs, stopping early when quality thresholds are met — turned out to be completely domain-agnostic.

The orchestration logic, reliability scoring, phased execution, and cost control had nothing to do with film or mortality. So it was extracted into a standalone engine that works with any subject type, any output schema, and any source you can wrap in a class. Debriefer is the general-purpose tool; Dead on Film is the first consumer.

Contributing

git clone https://github.com/chenders/debriefer.git
cd debriefer && npm install
npm run build && npm test
cd clients/python && pip install -e ".[dev]" && pytest

License

MIT

github.com/chenders/debriefer

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.claude		.claude
.github		.github
.serena		.serena
clients/python		clients/python
docker		docker
docs		docs
packages		packages
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Debriefer

The Hook

How It Works

Quick Start

Use Cases

Packages

Deploy

Interesting Implementation Details

Project History

Contributing

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Debriefer

The Hook

How It Works

Quick Start

Use Cases

Packages

Deploy

Interesting Implementation Details

Project History

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages