Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
5bc1d13
chore: add pypdf dependency for PDF document support
soneylegal Jun 4, 2026
84c268f
feat: compile StateGraph with MemorySaver and update AgentState with …
soneylegal Jun 4, 2026
0916ade
feat: partition VectorStoreService by tenant using dynamic collections
soneylegal Jun 4, 2026
a0eec09
feat: isolate semantic cache lookup and updates by tenant_id
soneylegal Jun 4, 2026
e5b26e6
feat: refactor all graph nodes to be async with history support and t…
soneylegal Jun 4, 2026
44a0c37
feat: add tenant_id and session_id to ChatRequest schema and add Docu…
soneylegal Jun 4, 2026
4cfe694
feat: add post chat/stream sse endpoint and support session checkpoin…
soneylegal Jun 4, 2026
ac8c7d1
feat: add api/v1/documents endpoint supporting markdown and PDF uploa…
soneylegal Jun 4, 2026
a61ddf1
feat: register documents router in main.py
soneylegal Jun 4, 2026
3a9c8e0
chore: implement multi-stage Docker build targeting development stage
soneylegal Jun 4, 2026
bfde717
chore: add docker targets to Makefile for unified verification inside…
soneylegal Jun 4, 2026
8aedfd0
chore: add langchain-text-splitters explicitly to dependencies
soneylegal Jun 4, 2026
d563b5e
chore: add python-multipart to dependencies
soneylegal Jun 4, 2026
b6d4cfd
test: add test suite v0.5.0 and fix semantic cache lookup filters
soneylegal Jun 4, 2026
2b33ca7
test: refactor fallback test to be async
soneylegal Jun 4, 2026
4a50ed6
test: align chaos tests with async workflow changes
soneylegal Jun 4, 2026
138e71c
test: use real Document in chaos generation test for serialization
soneylegal Jun 4, 2026
bc70d7d
docs: update developer guide and readme, fix style formatting, mypy t…
soneylegal Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.12-slim
FROM python:3.12-slim AS base

WORKDIR /app

Expand All @@ -7,16 +7,19 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*

# Copy project files first, then install
COPY pyproject.toml README.md ./
COPY src/ ./src/

# Install the package (non-editable for production)
RUN pip install --no-cache-dir .

# Copy remaining files (data, scripts, etc.)
# ---- Development stage ----
FROM base AS development
RUN pip install --no-cache-dir -e ".[dev]"
COPY . .

EXPOSE 8000
CMD ["uvicorn", "src.app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

# ---- Production stage ----
FROM base AS production
COPY src/ ./src/
RUN pip install --no-cache-dir .
COPY . .
EXPOSE 8000
CMD ["uvicorn", "src.app.main:app", "--host", "0.0.0.0", "--port", "8000"]
26 changes: 25 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: install test lint format run ingest ci docker clean
.PHONY: install test lint format run ingest ci docker clean docker-lint docker-format docker-typecheck docker-test docker-ci docs docker-docs

# ── Development ─────────────────────────────────────────────────────────

Expand Down Expand Up @@ -43,6 +43,30 @@ up:
down:
docker compose down

docker-lint:
docker compose run --rm api ruff check .

docker-format:
docker compose run --rm api ruff format .
docker compose run --rm api ruff check --fix .

docker-typecheck:
docker compose run --rm api mypy src/ --ignore-missing-imports

docker-test:
docker compose run --rm api pytest tests/ -v

docker-ci: docker-lint docker-typecheck docker-test
@echo "✓ All Docker CI checks passed."

# ── Documentation ───────────────────────────────────────────────────────

docs:
mkdocs serve

docker-docs:
docker compose run --rm -p 8001:8001 api mkdocs serve -a 0.0.0.0:8001

# ── Cleanup ─────────────────────────────────────────────────────────────

clean:
Expand Down
69 changes: 25 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,11 @@ graph TD
## 🚀 Key Features

* **Self-Corrective RAG (CRAG)**: LangGraph state machine dynamically grades retrieved documentation relevance, filters out noise, and initiates rewrite loops for failed queries.
* **Real-time SSE Streaming**: High-performance Server-Sent Events (SSE) streaming API (`POST /api/v1/chat/stream`) yielding token-by-token generation chunks and final structured references.
* **Multi-Tenant Isolation**: Physical namespace partitioning for ChromaDB collections (`vortex_kb_{tenant_id}`) and dynamic cache lookup isolating tenant context.
* **In-Memory Ingestion API**: Endpoint (`POST /api/v1/documents`) supporting hot-loading of `.md` and `.pdf` files directly into tenant vector stores.
* **Bring Your Own Key (BYOK)**: Supports dynamic, request-level API credentials and provider routing via HTTP headers (`Authorization`, `X-API-Key`, `X-Provider`).
* **Zero-Cost Local Semantic Cache**: Persistent ChromaDB-backed similarity cache using a shared local Sentence-Transformers embeddings model. It features provider-level partition isolation to avoid cross-provider context leaks.
* **Zero-Cost Local Semantic Cache**: Persistent ChromaDB-backed similarity cache using a shared local Sentence-Transformers embeddings model. It features provider-level and tenant-level partition isolation to avoid context leaks.
* **Chaos Engineering Resilience**: Complete exception shielding across all graph nodes (ChromaDB down, LLM timeouts, grading exceptions) with fast-bypass routing to fallbacks, guaranteeing zero HTTP 500 crashes.
* **Model-Agnostic Engine**: Native support for Google Gemini, Anthropic Claude, and local Ollama models with seamless environment-level fallback.
* **Observability**: Integrated OpenTelemetry/OpenInference telemetry compatible with Arize Phoenix for step-by-step agent execution tracing.
Expand All @@ -84,78 +87,56 @@ graph TD

## 📖 Live Documentation Portal

Vortex includes a fully-featured documentation portal built with **MkDocs-Material**. To view the complete architectural guides, step-by-step setups, caching metrics, and API design specifications:
Vortex includes a fully-featured documentation portal built with **MkDocs-Material**:

```bash
# Run local live-reload documentation server
mkdocs serve
make docs # Local live-reload server (requires venv)
make docker-docs # Serve docs inside Docker container
```
Then visit [http://localhost:8000](http://localhost:8000) in your browser.
Then visit [http://localhost:8001](http://localhost:8001) in your browser.

---

## 🏁 Quick Start

### 1. Installation
Clone the repository and install the development dependencies in editable mode:

```bash
# 1. Clone and configure
git clone https://github.com/soneylegal/vortex.git
cd vortex
cp .env.example .env # Fill in your LLM API keys

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install requirements
pip install -e ".[dev]"
```

### 2. Configuration
Copy the configuration template and fill in your model API credentials:
```bash
cp .env.example .env
```

### 3. Ingest Documentation
Incorporate the Cortex serverless data pipeline and Sentinel monitoring manuals into the vector database:
```bash
make ingest
```

### 4. Run the API Server
Start the FastAPI server:
```bash
make run
# 2. Start the full stack
docker compose up -d
```
* **API Endpoint**: `http://localhost:8000`
* **Interactive Swagger UI**: `http://localhost:8000/docs`
* **Phoenix Tracing Dashboard**: `http://localhost:6006`

> For local development without Docker (venv, pip install, manual server), see the [Developer Guide](docs/developer-guide.md).

---

## 🧪 Development & Quality Checks

Maintain repository standards by executing the local check suite before committing changes:
Maintain repository standards by executing the check suite:

### Local Quality Tools
```bash
make lint # Run Ruff code linter
make format # Run Ruff code formatter check
make format # Run Ruff code formatter and check
make typecheck # Run Mypy static type verification
make test # Run Pytest suite (including chaos and caching tests)
make test # Run Pytest suite
make ci # Run all linting, typechecking, and tests in one command
```

---

## 🐳 Docker Compose (With Observability Stack)

Spin up the entire stack including the API and the Arize Phoenix tracing UI:

### 🐳 Docker Quality Tools
```bash
docker compose up -d
make docker-lint # Lint inside container
make docker-format # Format inside container
make docker-typecheck # Typecheck inside container
make docker-test # Run tests inside container
make docker-ci # Run all linting, typechecking, and tests inside container
```
* **API Portal**: `http://localhost:8000`
* **Phoenix Dashboard**: `http://localhost:6006`

---

Expand Down
5 changes: 3 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
version: '3.8'

services:
api:
build:
context: .
dockerfile: Dockerfile
target: development
ports:
- "8000:8000"
volumes:
- ./src:/app/src
- ./data:/app/data
- ./tests:/app/tests
- ./pyproject.toml:/app/pyproject.toml
env_file:
- .env
environment:
Expand Down
36 changes: 24 additions & 12 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,33 +42,38 @@ graph TD

## 🧠 State Representation (`AgentState`)

The workflow maintains a single state object, `AgentState` (`TypedDict`), which is passed and updated between nodes:
The workflow maintains a single state object, `AgentState` (`TypedDict`), which is passed and updated between nodes. The graph is compiled with a `MemorySaver` checkpointer to persist state across multi-turn conversations:

```python
class AgentState(TypedDict):
question: str # The user's query (potentially rewritten)
generation: str # The final generated response
documents: list[Document] # Retrieved and filtered knowledge documents
steps: list[str] # Execution steps for auditability and tracing
route: str # Router classification ("retrieve", "direct", "fallback")
retry_count: int # Counter protecting against infinite loops
api_key: NotRequired[str] # Dynamic client-provided API key
provider: NotRequired[str] # Selected model provider (gemini, anthropic, ollama)
error: NotRequired[str] # Error message propagated to fallback nodes
question: str # The user's query (potentially rewritten)
generation: str # The final generated response
documents: list[Document] # Retrieved and filtered knowledge documents
steps: list[str] # Execution steps for auditability and tracing
route: str # Router classification ("retrieve", "direct", "fallback")
retry_count: int # Counter protecting against infinite loops
api_key: NotRequired[str] # Dynamic client-provided API key
provider: NotRequired[str] # Selected model provider (gemini, anthropic, ollama)
error: NotRequired[str] # Error message propagated to fallback nodes
tenant_id: NotRequired[str | None] # Tenant namespace for collection isolation
history: NotRequired[list[dict[str, str]]] # Conversation history for multi-turn sessions
```

---

## 🧱 Workflow Nodes & Operations

All nodes are **fully asynchronous** (`async def`) and use `await` for LLM and I/O operations.

### 1. Router Node (`router_node`)
Classifies incoming queries into technical support questions needing retrieval (`retrieve`) or general conversational questions (`direct`).
* **Implementation**: Utilizes structured LLM outputs (`RouteDecision` schema).
* **State Reset**: Clears execution keys (`steps`, `retry_count`, `documents`, `generation`, `error`) at each invocation to prevent carry-over from previous checkpointed turns.
* **Error Handling**: If the LLM call fails, the node sets `route` to `"fallback"` and populates the `error` state.

### 2. Retrieve Node (`retrieve_node`)
Queries the local ChromaDB vector store for documents related to the current question.
* **Implementation**: Performs vector search using `HuggingFaceEmbeddings` distance search.
* **Implementation**: Performs async vector search via `asyncio.to_thread`, targeting the tenant-specific collection (`vortex_kb_{tenant_id}`).
* **Error Handling**: Captures any connection errors to ChromaDB, clears document state, and sets `error` to trigger fallback routing.

### 3. Grade Documents Node (`grade_documents_node`)
Expand All @@ -84,7 +89,14 @@ Optimizes the search query when retrieved documents are found irrelevant.
### 5. Generate Node (`generate_node`)
Synthesizes the final response using the user's question and relevant retrieved context.
* **System Prompt**: Enforces boundaries, instructing the LLM to only answer based on context and state if it doesn't know the answer.
* **Tags**: Generation calls are tagged with `"generate_answer"` for SSE stream filtering.
* **History**: Appends the current turn to the conversation `history` list.

### 6. Direct Response Node (`direct_response_node`)
Handles general queries that don't require document retrieval.
* **History**: Appends the current turn to the conversation `history` list.

### 6. Fallback Node (`fallback_node`)
### 7. Fallback Node (`fallback_node`)
The terminal safety node. Returns a polite, helpful explanation to the user.
* **Dual Mode**: Differentiates between a *knowledge base miss* (no docs found after retries) and a *system exception* (e.g., ChromaDB offline, LLM api timeouts).

62 changes: 56 additions & 6 deletions docs/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,13 @@ Once running, the interactive Swagger documentation is available at:

## 🧪 Running Tests

### Unit & Integration Tests
Vortex is equipped with 26 automated tests verifying routing logic, BYOK headers, semantic cache, vector store, and chaos fallbacks:
### Local Quality Tools
Vortex is equipped with automated tests verifying routing logic, BYOK headers, semantic cache, vector store, and chaos fallbacks:

```bash
pytest tests/ -v
```

### Style & Types Check
Ensure style and type alignment using Ruff and Mypy before pushing code:

```bash
Expand All @@ -117,18 +116,69 @@ ruff format --check .
mypy src/ --ignore-missing-imports
```

### 🐳 Docker Quality Tools
Run the quality suite inside standard development containers:

```bash
# Lint inside container
make docker-lint

# Format and check inside container
make docker-format

# Typecheck inside container
make docker-typecheck

# Run tests inside container
make docker-test

# Run full CI inside container
make docker-ci
```

---

## 👥 Multi-Tenancy & Streaming APIs

Vortex supports dynamic namespace isolation and real-time streaming:

### 1. Document Ingestion API (`POST /api/v1/documents`)
Upload PDF or Markdown files isolated under a specific `tenant_id`:

```bash
curl -X POST http://localhost:8000/api/v1/documents \
-F "file=@manual.pdf" \
-F "tenant_id=tenant-alpha"
```

This chunks the document and indexes it into the isolated `vortex_kb_tenant-alpha` collection namespace in ChromaDB.

### 2. Conversation Chat Streaming API (`POST /api/v1/chat/stream`)
Stream responses token-by-token using Server-Sent Events (SSE), maintaining separate session context with `session_id` and isolating data via `tenant_id`:

```bash
curl -X POST http://localhost:8000/api/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{
"query": "How do I configure Sentinel?",
"tenant_id": "tenant-alpha",
"session_id": "session-123"
}'
```

---

## 📚 Building the Documentation Portal

To edit and preview this documentation portal locally:

```bash
# Run local live-reload server
mkdocs serve
# Via Makefile (recommended)
make docs # Local live-reload server (requires venv)
make docker-docs # Serve docs inside Docker container
```

Access the preview at [http://127.0.0.1:8000](http://127.0.0.1:8000).
Access the preview at [http://localhost:8001](http://localhost:8001).

To compile the documentation into static HTML files (suitable for GitHub Pages):

Expand Down
13 changes: 9 additions & 4 deletions docs/features/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,16 @@ To maintain a zero-cost stack, the semantic cache does not require Redis or clou

---

## 🔒 Provider Partition Isolation
## 🔒 Provider & Tenant Partition Isolation

To prevent returning a response generated by one model provider to a client executing a different provider, the cache collection partitions entries:
* Entries are tagged with `provider` metadata.
* Lookup queries apply a metadata filter ensuring only matches from the *active provider* are checked.
To prevent returning a response generated by one model provider or tenant to another, the cache collection partitions entries using a compound filter:
* Entries are tagged with `provider` and `tenant_id` metadata.
* Lookup queries apply a `$and` metadata filter ensuring only matches from the *active provider* **and** the *active tenant* are checked.
* If no `tenant_id` is provided, lookups default to the `"default"` namespace.

### Session-Based Cache Bypassing

For multi-turn conversations (requests with a `session_id`), the semantic cache is **automatically bypassed**. This prevents stale cached responses from polluting ongoing dialogues where context evolves with each turn.

---

Expand Down
Loading
Loading