AI Web Intelligence Platform

Production-oriented FastAPI prototype for AI-native web research, verification, extraction, persistent memory, and monitored web intelligence.

The system is designed around three priorities:

Truthfulness before speed
Reasoning quality before verbosity
Learning capability before static responses

The default configuration runs without paid API keys. It uses DuckDuckGo HTML search, Wikipedia search, deterministic local embeddings, FAISS when available, SQLite memory, and heuristic reasoning. Commercial search APIs, hosted LLMs, browser rendering, distributed task queues, and external vector databases can be enabled through the adapter layers.

Core Capabilities

Multi-source web search aggregation
Ranking by semantic relevance, recency, source credibility, and historical source accuracy
Dynamic multi-path research loops with decomposition, reflection, retry, and final path selection
Claim-level verification with confidence scores and explicit disagreement surfacing
Evidence basis for every verified claim or structured field
Persistent semantic and structured memory
Autonomous webpage extraction with metadata, JSON-LD, table extraction, and schema inference
Processor-tier task runs with blocking and async modes
Entity discovery from natural-language criteria
Persistent web monitors for change detection
Evaluation harness for truthfulness, citation coverage, and three-source support
Metrics for latency, confidence, feedback, calibration, and estimated hallucination risk

Architecture

FastAPI
  app/api/routes.py
    /search
    /deep-research
    /verify
    /extract
    /memory
    /tasks
    /tasks/{run_id}
    /find-all
    /monitor
    /eval/run
    /metrics

Agents
  ReasoningAgent       -> query decomposition, draft synthesis, reflection
  ResearchOrchestrator -> multi-path loop, retries, trace storage, path voting
  TaskRunner           -> processor-tier research and enrichment runs
  FindAllAgent         -> criteria-to-entity discovery and verification
  MonitorEngine        -> persistent change detection over search snapshots

Core Systems
  search/              -> providers, aggregation, credibility-aware ranking
  verification/        -> claim extraction, triangulation, contradiction detection
  memory/              -> SQLite facts/queries/traces/monitors + FAISS vector recall
  extraction/          -> webpage fetch, HTML parsing, metadata and schema inference
  models/              -> LLM provider interfaces
  evaluation/          -> benchmark scoring and calibration harness
  observability/       -> latency, confidence, feedback, and calibration metrics

Evidence Basis

Every claim-level or field-level result can include:

confidence: calibrated score from 0 to 1
verification_status: verified, partially_supported, contradicted, or insufficient_evidence
independent_source_count: number of independent supporting web domains
source_agreement: support versus disagreement ratio
freshness_score: recency signal from source dates when available
citations: supporting sources
contradictions: conflicting sources
reasoning: concise audit rationale

This makes outputs auditable by machines and humans instead of only returning prose.

Install

cd C:\Users\rakes\Downloads\ai_web_intelligence_platform
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Optional production extras:

pip install playwright celery redis beautifulsoup4
python -m playwright install chromium

Configuration

Copy .env.example to .env if you want to customize runtime behavior.

Important environment variables:

AIWI_DATA_DIR: local data directory for SQLite and vector memory
AIWI_ENABLE_NETWORK: enable or disable live web requests
AIWI_SEARCH_TIMEOUT_SECONDS: request timeout for search and extraction
AIWI_LLM_PROVIDER: heuristic or an OpenAI-compatible provider
OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL: hosted or local OpenAI-compatible model settings
BRAVE_SEARCH_API_KEY, SERPAPI_API_KEY: optional authenticated search providers

Run

uvicorn app.main:app --host 127.0.0.1 --port 8010 --reload

Open:

http://127.0.0.1:8010/docs
http://127.0.0.1:8010/health

API Examples

Search

curl -X POST http://127.0.0.1:8010/search ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"latest evidence on sodium-ion batteries commercialization\",\"max_results\":8}"

Deep Research

curl -X POST http://127.0.0.1:8010/deep-research ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"Are sodium-ion batteries commercially viable for grid storage?\",\"max_iterations\":2,\"strategy_count\":3}"

Verify Claims

curl -X POST http://127.0.0.1:8010/verify ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"sodium-ion batteries\",\"claims\":[\"Sodium-ion batteries are being commercialized for stationary storage.\"]}"

Extract Webpage Data

curl -X POST http://127.0.0.1:8010/extract ^
  -H "Content-Type: application/json" ^
  -d "{\"url\":\"https://en.wikipedia.org/wiki/Sodium-ion_battery\"}"

Task Run

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":\"Research sodium-ion battery grid storage commercialization\",\"processor\":\"pro\",\"mode\":\"blocking\"}"

Async Task Polling

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":\"Research AI-native search systems\",\"processor\":\"lite\",\"mode\":\"async\"}"

curl http://127.0.0.1:8010/tasks/<run_id>

Structured Enrichment

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":{\"company\":\"CATL\",\"website\":\"catl.com\"},\"processor\":\"core\",\"task_spec\":{\"output_schema\":{\"type\":\"json\",\"json_schema\":{\"type\":\"object\",\"properties\":{\"sodium_ion_products\":{\"type\":\"string\",\"description\":\"Sodium-ion battery products or commercialization status\"}}}}}}"

Entity Discovery

curl -X POST http://127.0.0.1:8010/find-all ^
  -H "Content-Type: application/json" ^
  -d "{\"criteria\":\"companies commercializing sodium-ion batteries for grid storage\",\"entity_type\":\"company\",\"max_entities\":5}"

Monitor

curl -X POST http://127.0.0.1:8010/monitor ^
  -H "Content-Type: application/json" ^
  -d "{\"action\":\"create\",\"query\":\"sodium-ion battery grid storage announcements\",\"cadence_minutes\":1440}"

Run a monitor:

curl -X POST http://127.0.0.1:8010/monitor ^
  -H "Content-Type: application/json" ^
  -d "{\"action\":\"run\",\"monitor_id\":\"<monitor_id>\"}"

Evaluation

curl -X POST http://127.0.0.1:8010/eval/run ^
  -H "Content-Type: application/json" ^
  -d "{\"items\":[{\"query\":\"sodium-ion batteries\",\"expected_claims\":[\"Sodium-ion batteries are used for stationary storage.\"],\"forbidden_claims\":[\"Sodium-ion batteries are always more energy dense than lithium-ion batteries.\"]}]}"

Response Shape

Every main endpoint returns structured JSON with:

reasoning trace
citations
claim or field confidence scores
evidence basis where applicable
overall confidence
latency metadata

Abbreviated deep-research response:

{
  "query": "Are sodium-ion batteries commercially viable for grid storage?",
  "answer": "Best-supported answer...",
  "overall_confidence": 0.74,
  "claims": [
    {
      "claim": "Sodium-ion batteries are being commercialized for stationary storage.",
      "confidence": 0.81,
      "support_count": 3,
      "contradiction_count": 0,
      "verification_status": "verified"
    }
  ],
  "basis": [
    {
      "field": "Sodium-ion batteries are being commercialized for stationary storage.",
      "confidence": 0.81,
      "independent_source_count": 3,
      "source_agreement": 1.0,
      "verification_status": "verified"
    }
  ],
  "citations": [
    {
      "title": "Sodium-ion battery",
      "url": "https://en.wikipedia.org/wiki/Sodium-ion_battery",
      "domain": "wikipedia.org",
      "credibility": 0.72
    }
  ],
  "reasoning_trace": [
    {
      "agent": "reasoning",
      "step_type": "reflection",
      "content": "Reflection: needs more independent sources for 1 claim."
    }
  ]
}

Testing

python -m pytest

Current test coverage includes:

extraction and schema inference
structured memory and vector recall
search ranking
claim extraction and truth scoring
entity candidate ranking
benchmark scoring

Production Upgrade Path

Replace heuristic LLMClient with OpenAI, Claude, or local model adapters.
Add authenticated search providers in app/search/providers.py.
Enable Playwright extraction for JavaScript-heavy pages and PDFs.
Swap SQLite for Postgres.
Replace local FAISS files with Weaviate, Qdrant, Milvus, or another managed vector database.
Replace the in-process task run store with durable queue-backed task persistence.
Route long-running research through Celery/Redis or another distributed worker system.
Add SSE streaming for progress updates.
Add human feedback labels to calibrate confidence and update source reliability.
Add offline benchmark datasets and scheduled regression runs.

Repository Layout

app/
  agents/          task, research, monitor, and entity-discovery agents
  api/             FastAPI routes and dependency wiring
  core/            settings, schemas, scoring, task abstractions
  evaluation/      benchmark runner
  extraction/      webpage extraction and schema inference
  memory/          SQLite and vector memory
  models/          LLM client adapters
  observability/   metrics registry
  search/          providers, credibility scoring, ranking engine
  verification/    claim extraction and truth scoring
examples/          HTTP and Python examples
tests/             unit tests

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Web Intelligence Platform

Core Capabilities

Architecture

Evidence Basis

Install

Configuration

Run

API Examples

Search

Deep Research

Verify Claims

Extract Webpage Data

Task Run

Async Task Polling

Structured Enrichment

Entity Discovery

Monitor

Evaluation

Response Shape

Testing

Production Upgrade Path

Repository Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Web Intelligence Platform

Core Capabilities

Architecture

Evidence Basis

Install

Configuration

Run

API Examples

Search

Deep Research

Verify Claims

Extract Webpage Data

Task Run

Async Task Polling

Structured Enrichment

Entity Discovery

Monitor

Evaluation

Response Shape

Testing

Production Upgrade Path

Repository Layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages