Production-oriented FastAPI prototype for AI-native web research, verification, extraction, persistent memory, and monitored web intelligence.
The system is designed around three priorities:
- Truthfulness before speed
- Reasoning quality before verbosity
- Learning capability before static responses
The default configuration runs without paid API keys. It uses DuckDuckGo HTML search, Wikipedia search, deterministic local embeddings, FAISS when available, SQLite memory, and heuristic reasoning. Commercial search APIs, hosted LLMs, browser rendering, distributed task queues, and external vector databases can be enabled through the adapter layers.
- Multi-source web search aggregation
- Ranking by semantic relevance, recency, source credibility, and historical source accuracy
- Dynamic multi-path research loops with decomposition, reflection, retry, and final path selection
- Claim-level verification with confidence scores and explicit disagreement surfacing
- Evidence basis for every verified claim or structured field
- Persistent semantic and structured memory
- Autonomous webpage extraction with metadata, JSON-LD, table extraction, and schema inference
- Processor-tier task runs with blocking and async modes
- Entity discovery from natural-language criteria
- Persistent web monitors for change detection
- Evaluation harness for truthfulness, citation coverage, and three-source support
- Metrics for latency, confidence, feedback, calibration, and estimated hallucination risk
FastAPI
app/api/routes.py
/search
/deep-research
/verify
/extract
/memory
/tasks
/tasks/{run_id}
/find-all
/monitor
/eval/run
/metrics
Agents
ReasoningAgent -> query decomposition, draft synthesis, reflection
ResearchOrchestrator -> multi-path loop, retries, trace storage, path voting
TaskRunner -> processor-tier research and enrichment runs
FindAllAgent -> criteria-to-entity discovery and verification
MonitorEngine -> persistent change detection over search snapshots
Core Systems
search/ -> providers, aggregation, credibility-aware ranking
verification/ -> claim extraction, triangulation, contradiction detection
memory/ -> SQLite facts/queries/traces/monitors + FAISS vector recall
extraction/ -> webpage fetch, HTML parsing, metadata and schema inference
models/ -> LLM provider interfaces
evaluation/ -> benchmark scoring and calibration harness
observability/ -> latency, confidence, feedback, and calibration metrics
Every claim-level or field-level result can include:
confidence: calibrated score from0to1verification_status:verified,partially_supported,contradicted, orinsufficient_evidenceindependent_source_count: number of independent supporting web domainssource_agreement: support versus disagreement ratiofreshness_score: recency signal from source dates when availablecitations: supporting sourcescontradictions: conflicting sourcesreasoning: concise audit rationale
This makes outputs auditable by machines and humans instead of only returning prose.
cd C:\Users\rakes\Downloads\ai_web_intelligence_platform
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtOptional production extras:
pip install playwright celery redis beautifulsoup4
python -m playwright install chromiumCopy .env.example to .env if you want to customize runtime behavior.
Important environment variables:
AIWI_DATA_DIR: local data directory for SQLite and vector memoryAIWI_ENABLE_NETWORK: enable or disable live web requestsAIWI_SEARCH_TIMEOUT_SECONDS: request timeout for search and extractionAIWI_LLM_PROVIDER:heuristicor an OpenAI-compatible providerOPENAI_API_KEY,OPENAI_BASE_URL,OPENAI_MODEL: hosted or local OpenAI-compatible model settingsBRAVE_SEARCH_API_KEY,SERPAPI_API_KEY: optional authenticated search providers
uvicorn app.main:app --host 127.0.0.1 --port 8010 --reloadOpen:
http://127.0.0.1:8010/docshttp://127.0.0.1:8010/health
curl -X POST http://127.0.0.1:8010/search ^
-H "Content-Type: application/json" ^
-d "{\"query\":\"latest evidence on sodium-ion batteries commercialization\",\"max_results\":8}"curl -X POST http://127.0.0.1:8010/deep-research ^
-H "Content-Type: application/json" ^
-d "{\"query\":\"Are sodium-ion batteries commercially viable for grid storage?\",\"max_iterations\":2,\"strategy_count\":3}"curl -X POST http://127.0.0.1:8010/verify ^
-H "Content-Type: application/json" ^
-d "{\"query\":\"sodium-ion batteries\",\"claims\":[\"Sodium-ion batteries are being commercialized for stationary storage.\"]}"curl -X POST http://127.0.0.1:8010/extract ^
-H "Content-Type: application/json" ^
-d "{\"url\":\"https://en.wikipedia.org/wiki/Sodium-ion_battery\"}"curl -X POST http://127.0.0.1:8010/tasks ^
-H "Content-Type: application/json" ^
-d "{\"input\":\"Research sodium-ion battery grid storage commercialization\",\"processor\":\"pro\",\"mode\":\"blocking\"}"curl -X POST http://127.0.0.1:8010/tasks ^
-H "Content-Type: application/json" ^
-d "{\"input\":\"Research AI-native search systems\",\"processor\":\"lite\",\"mode\":\"async\"}"
curl http://127.0.0.1:8010/tasks/<run_id>curl -X POST http://127.0.0.1:8010/tasks ^
-H "Content-Type: application/json" ^
-d "{\"input\":{\"company\":\"CATL\",\"website\":\"catl.com\"},\"processor\":\"core\",\"task_spec\":{\"output_schema\":{\"type\":\"json\",\"json_schema\":{\"type\":\"object\",\"properties\":{\"sodium_ion_products\":{\"type\":\"string\",\"description\":\"Sodium-ion battery products or commercialization status\"}}}}}}"curl -X POST http://127.0.0.1:8010/find-all ^
-H "Content-Type: application/json" ^
-d "{\"criteria\":\"companies commercializing sodium-ion batteries for grid storage\",\"entity_type\":\"company\",\"max_entities\":5}"curl -X POST http://127.0.0.1:8010/monitor ^
-H "Content-Type: application/json" ^
-d "{\"action\":\"create\",\"query\":\"sodium-ion battery grid storage announcements\",\"cadence_minutes\":1440}"Run a monitor:
curl -X POST http://127.0.0.1:8010/monitor ^
-H "Content-Type: application/json" ^
-d "{\"action\":\"run\",\"monitor_id\":\"<monitor_id>\"}"curl -X POST http://127.0.0.1:8010/eval/run ^
-H "Content-Type: application/json" ^
-d "{\"items\":[{\"query\":\"sodium-ion batteries\",\"expected_claims\":[\"Sodium-ion batteries are used for stationary storage.\"],\"forbidden_claims\":[\"Sodium-ion batteries are always more energy dense than lithium-ion batteries.\"]}]}"Every main endpoint returns structured JSON with:
- reasoning trace
- citations
- claim or field confidence scores
- evidence basis where applicable
- overall confidence
- latency metadata
Abbreviated deep-research response:
{
"query": "Are sodium-ion batteries commercially viable for grid storage?",
"answer": "Best-supported answer...",
"overall_confidence": 0.74,
"claims": [
{
"claim": "Sodium-ion batteries are being commercialized for stationary storage.",
"confidence": 0.81,
"support_count": 3,
"contradiction_count": 0,
"verification_status": "verified"
}
],
"basis": [
{
"field": "Sodium-ion batteries are being commercialized for stationary storage.",
"confidence": 0.81,
"independent_source_count": 3,
"source_agreement": 1.0,
"verification_status": "verified"
}
],
"citations": [
{
"title": "Sodium-ion battery",
"url": "https://en.wikipedia.org/wiki/Sodium-ion_battery",
"domain": "wikipedia.org",
"credibility": 0.72
}
],
"reasoning_trace": [
{
"agent": "reasoning",
"step_type": "reflection",
"content": "Reflection: needs more independent sources for 1 claim."
}
]
}python -m pytestCurrent test coverage includes:
- extraction and schema inference
- structured memory and vector recall
- search ranking
- claim extraction and truth scoring
- entity candidate ranking
- benchmark scoring
- Replace heuristic
LLMClientwith OpenAI, Claude, or local model adapters. - Add authenticated search providers in
app/search/providers.py. - Enable Playwright extraction for JavaScript-heavy pages and PDFs.
- Swap SQLite for Postgres.
- Replace local FAISS files with Weaviate, Qdrant, Milvus, or another managed vector database.
- Replace the in-process task run store with durable queue-backed task persistence.
- Route long-running research through Celery/Redis or another distributed worker system.
- Add SSE streaming for progress updates.
- Add human feedback labels to calibrate confidence and update source reliability.
- Add offline benchmark datasets and scheduled regression runs.
app/
agents/ task, research, monitor, and entity-discovery agents
api/ FastAPI routes and dependency wiring
core/ settings, schemas, scoring, task abstractions
evaluation/ benchmark runner
extraction/ webpage extraction and schema inference
memory/ SQLite and vector memory
models/ LLM client adapters
observability/ metrics registry
search/ providers, credibility scoring, ranking engine
verification/ claim extraction and truth scoring
examples/ HTTP and Python examples
tests/ unit tests