Skip to content

rakeshguptak/ai-web-intelligence-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Web Intelligence Platform

Production-oriented FastAPI prototype for AI-native web research, verification, extraction, persistent memory, and monitored web intelligence.

The system is designed around three priorities:

  1. Truthfulness before speed
  2. Reasoning quality before verbosity
  3. Learning capability before static responses

The default configuration runs without paid API keys. It uses DuckDuckGo HTML search, Wikipedia search, deterministic local embeddings, FAISS when available, SQLite memory, and heuristic reasoning. Commercial search APIs, hosted LLMs, browser rendering, distributed task queues, and external vector databases can be enabled through the adapter layers.

Core Capabilities

  • Multi-source web search aggregation
  • Ranking by semantic relevance, recency, source credibility, and historical source accuracy
  • Dynamic multi-path research loops with decomposition, reflection, retry, and final path selection
  • Claim-level verification with confidence scores and explicit disagreement surfacing
  • Evidence basis for every verified claim or structured field
  • Persistent semantic and structured memory
  • Autonomous webpage extraction with metadata, JSON-LD, table extraction, and schema inference
  • Processor-tier task runs with blocking and async modes
  • Entity discovery from natural-language criteria
  • Persistent web monitors for change detection
  • Evaluation harness for truthfulness, citation coverage, and three-source support
  • Metrics for latency, confidence, feedback, calibration, and estimated hallucination risk

Architecture

FastAPI
  app/api/routes.py
    /search
    /deep-research
    /verify
    /extract
    /memory
    /tasks
    /tasks/{run_id}
    /find-all
    /monitor
    /eval/run
    /metrics

Agents
  ReasoningAgent       -> query decomposition, draft synthesis, reflection
  ResearchOrchestrator -> multi-path loop, retries, trace storage, path voting
  TaskRunner           -> processor-tier research and enrichment runs
  FindAllAgent         -> criteria-to-entity discovery and verification
  MonitorEngine        -> persistent change detection over search snapshots

Core Systems
  search/              -> providers, aggregation, credibility-aware ranking
  verification/        -> claim extraction, triangulation, contradiction detection
  memory/              -> SQLite facts/queries/traces/monitors + FAISS vector recall
  extraction/          -> webpage fetch, HTML parsing, metadata and schema inference
  models/              -> LLM provider interfaces
  evaluation/          -> benchmark scoring and calibration harness
  observability/       -> latency, confidence, feedback, and calibration metrics

Evidence Basis

Every claim-level or field-level result can include:

  • confidence: calibrated score from 0 to 1
  • verification_status: verified, partially_supported, contradicted, or insufficient_evidence
  • independent_source_count: number of independent supporting web domains
  • source_agreement: support versus disagreement ratio
  • freshness_score: recency signal from source dates when available
  • citations: supporting sources
  • contradictions: conflicting sources
  • reasoning: concise audit rationale

This makes outputs auditable by machines and humans instead of only returning prose.

Install

cd C:\Users\rakes\Downloads\ai_web_intelligence_platform
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Optional production extras:

pip install playwright celery redis beautifulsoup4
python -m playwright install chromium

Configuration

Copy .env.example to .env if you want to customize runtime behavior.

Important environment variables:

  • AIWI_DATA_DIR: local data directory for SQLite and vector memory
  • AIWI_ENABLE_NETWORK: enable or disable live web requests
  • AIWI_SEARCH_TIMEOUT_SECONDS: request timeout for search and extraction
  • AIWI_LLM_PROVIDER: heuristic or an OpenAI-compatible provider
  • OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL: hosted or local OpenAI-compatible model settings
  • BRAVE_SEARCH_API_KEY, SERPAPI_API_KEY: optional authenticated search providers

Run

uvicorn app.main:app --host 127.0.0.1 --port 8010 --reload

Open:

  • http://127.0.0.1:8010/docs
  • http://127.0.0.1:8010/health

API Examples

Search

curl -X POST http://127.0.0.1:8010/search ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"latest evidence on sodium-ion batteries commercialization\",\"max_results\":8}"

Deep Research

curl -X POST http://127.0.0.1:8010/deep-research ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"Are sodium-ion batteries commercially viable for grid storage?\",\"max_iterations\":2,\"strategy_count\":3}"

Verify Claims

curl -X POST http://127.0.0.1:8010/verify ^
  -H "Content-Type: application/json" ^
  -d "{\"query\":\"sodium-ion batteries\",\"claims\":[\"Sodium-ion batteries are being commercialized for stationary storage.\"]}"

Extract Webpage Data

curl -X POST http://127.0.0.1:8010/extract ^
  -H "Content-Type: application/json" ^
  -d "{\"url\":\"https://en.wikipedia.org/wiki/Sodium-ion_battery\"}"

Task Run

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":\"Research sodium-ion battery grid storage commercialization\",\"processor\":\"pro\",\"mode\":\"blocking\"}"

Async Task Polling

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":\"Research AI-native search systems\",\"processor\":\"lite\",\"mode\":\"async\"}"

curl http://127.0.0.1:8010/tasks/<run_id>

Structured Enrichment

curl -X POST http://127.0.0.1:8010/tasks ^
  -H "Content-Type: application/json" ^
  -d "{\"input\":{\"company\":\"CATL\",\"website\":\"catl.com\"},\"processor\":\"core\",\"task_spec\":{\"output_schema\":{\"type\":\"json\",\"json_schema\":{\"type\":\"object\",\"properties\":{\"sodium_ion_products\":{\"type\":\"string\",\"description\":\"Sodium-ion battery products or commercialization status\"}}}}}}"

Entity Discovery

curl -X POST http://127.0.0.1:8010/find-all ^
  -H "Content-Type: application/json" ^
  -d "{\"criteria\":\"companies commercializing sodium-ion batteries for grid storage\",\"entity_type\":\"company\",\"max_entities\":5}"

Monitor

curl -X POST http://127.0.0.1:8010/monitor ^
  -H "Content-Type: application/json" ^
  -d "{\"action\":\"create\",\"query\":\"sodium-ion battery grid storage announcements\",\"cadence_minutes\":1440}"

Run a monitor:

curl -X POST http://127.0.0.1:8010/monitor ^
  -H "Content-Type: application/json" ^
  -d "{\"action\":\"run\",\"monitor_id\":\"<monitor_id>\"}"

Evaluation

curl -X POST http://127.0.0.1:8010/eval/run ^
  -H "Content-Type: application/json" ^
  -d "{\"items\":[{\"query\":\"sodium-ion batteries\",\"expected_claims\":[\"Sodium-ion batteries are used for stationary storage.\"],\"forbidden_claims\":[\"Sodium-ion batteries are always more energy dense than lithium-ion batteries.\"]}]}"

Response Shape

Every main endpoint returns structured JSON with:

  • reasoning trace
  • citations
  • claim or field confidence scores
  • evidence basis where applicable
  • overall confidence
  • latency metadata

Abbreviated deep-research response:

{
  "query": "Are sodium-ion batteries commercially viable for grid storage?",
  "answer": "Best-supported answer...",
  "overall_confidence": 0.74,
  "claims": [
    {
      "claim": "Sodium-ion batteries are being commercialized for stationary storage.",
      "confidence": 0.81,
      "support_count": 3,
      "contradiction_count": 0,
      "verification_status": "verified"
    }
  ],
  "basis": [
    {
      "field": "Sodium-ion batteries are being commercialized for stationary storage.",
      "confidence": 0.81,
      "independent_source_count": 3,
      "source_agreement": 1.0,
      "verification_status": "verified"
    }
  ],
  "citations": [
    {
      "title": "Sodium-ion battery",
      "url": "https://en.wikipedia.org/wiki/Sodium-ion_battery",
      "domain": "wikipedia.org",
      "credibility": 0.72
    }
  ],
  "reasoning_trace": [
    {
      "agent": "reasoning",
      "step_type": "reflection",
      "content": "Reflection: needs more independent sources for 1 claim."
    }
  ]
}

Testing

python -m pytest

Current test coverage includes:

  • extraction and schema inference
  • structured memory and vector recall
  • search ranking
  • claim extraction and truth scoring
  • entity candidate ranking
  • benchmark scoring

Production Upgrade Path

  • Replace heuristic LLMClient with OpenAI, Claude, or local model adapters.
  • Add authenticated search providers in app/search/providers.py.
  • Enable Playwright extraction for JavaScript-heavy pages and PDFs.
  • Swap SQLite for Postgres.
  • Replace local FAISS files with Weaviate, Qdrant, Milvus, or another managed vector database.
  • Replace the in-process task run store with durable queue-backed task persistence.
  • Route long-running research through Celery/Redis or another distributed worker system.
  • Add SSE streaming for progress updates.
  • Add human feedback labels to calibrate confidence and update source reliability.
  • Add offline benchmark datasets and scheduled regression runs.

Repository Layout

app/
  agents/          task, research, monitor, and entity-discovery agents
  api/             FastAPI routes and dependency wiring
  core/            settings, schemas, scoring, task abstractions
  evaluation/      benchmark runner
  extraction/      webpage extraction and schema inference
  memory/          SQLite and vector memory
  models/          LLM client adapters
  observability/   metrics registry
  search/          providers, credibility scoring, ranking engine
  verification/    claim extraction and truth scoring
examples/          HTTP and Python examples
tests/             unit tests

About

AI-native web research and verification platform with claim-level confidence, source agreement, contradictions, memory, monitors, and metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages