InsightRush | Approximate Query Processing Engine

High-performance approximate query processing (AQP) platform for interactive analytics on large datasets with configurable accuracy.

Overview

InsightRush is a local-first analytics system that enables fast query execution on large CSV datasets using statistical sampling techniques. It supports both exact and approximate execution paths, allowing users to trade accuracy for performance with measurable confidence.

The system is designed for experimentation, benchmarking, and understanding query-performance tradeoffs rather than production deployment.

Key Features

Approximate Query Processing (AQP)
Execute SUM, AVG, COUNT, MIN, MAX with configurable error bounds and confidence levels
Exact vs Approximate Comparison
Side-by-side execution to evaluate speed vs accuracy tradeoffs
CSV Data Ingestion Pipeline
Upload and process large datasets into DuckDB
Interactive Query Workbench
UI-driven query building with execution metadata
Statistical Estimation Engine
Sampling + estimators with confidence-aware outputs
System Monitoring
Track execution time, sampling rate, and engine behavior

Tech Stack

Backend

FastAPI (API layer)
DuckDB (analytical database)
Pandas (data ingestion)
Uvicorn (ASGI server)

Frontend

Next.js (App Router)
React
Tailwind CSS
Recharts (visualizations)

Architecture

The AQP engine follows a layered execution model:

Ingestion Layer
Parses and loads CSV datasets into DuckDB
Validation Layer
Validates schema, columns, and query parameters
Sampling Layer
Generates sampled subsets based on accuracy targets
Estimation Layer
Computes aggregates with statistical guarantees
Execution Layer
Chooses between exact and approximate execution

Project Structure

insightRush/
├── backend/
│   ├── api/           # FastAPI routes
│   ├── engine/        # AQP engine (sampler, estimator, executor)
│   ├── storage/       # Ingestion + DB handling
│   └── main.py
├── src/               # Next.js frontend
├── data/              # Sample datasets
├── tmp_uploads/       # Uploaded CSVs
└── generate_data.py   # Synthetic dataset generator

Running Locally

Backend

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
uvicorn backend.main:app --reload --port 8000

Frontend

npm install
npm run dev

App runs on:

Frontend: http://localhost:3000
Backend: http://localhost:8000

Usage

Upload a CSV dataset
Select aggregation type (SUM / AVG / COUNT)
Configure accuracy target
Execute query
Compare approximate vs exact results

Benchmarking & Scale

Designed for local experimentation with large datasets
Supports testing with synthetic data via generate_data.py
Performance depends on:
- dataset size
- sampling rate
- system memory and CPU

This project focuses on query performance tradeoffs, not production-scale distributed systems.

Security Notes

Input validation is implemented for schema and query parameters
Not production-hardened:
- No authentication
- No rate limiting
- Open local usage model

Limitations

Local deployment only (not hosted)
Single-node execution (DuckDB)
No persistent multi-user data isolation
Limited query types (aggregations only)

Future Improvements

Add authentication and dataset isolation
Introduce query caching
Add benchmarking scripts with reproducible results
Improve validation and query safety controls
Extend to additional query types and joins

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
backend		backend
public		public
src		src
tmp		tmp
.codex		.codex
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
generate_data.py		generate_data.py
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InsightRush | Approximate Query Processing Engine

Overview

Key Features

Tech Stack

Backend

Frontend

Architecture

Project Structure

Running Locally

Backend

Frontend

Usage

Benchmarking & Scale

Security Notes

Limitations

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InsightRush | Approximate Query Processing Engine

Overview

Key Features

Tech Stack

Backend

Frontend

Architecture

Project Structure

Running Locally

Backend

Frontend

Usage

Benchmarking & Scale

Security Notes

Limitations

Future Improvements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages