Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Thumbs.db

# Docs
docs/_build/
docs/superpowers/
claude_notes.md
claude.md

Expand Down
43 changes: 32 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ This package follows [LangChain's official integration guidelines](https://pytho

- **LangChain & LangGraph Integration**: First-class support for modern LLM frameworks
- **Vector Store Agnostic**: Compatible with Pinecone, FAISS, Weaviate, Chroma, and more
- **Post-Filter Authorization**: Filters retrieved documents based on SpiceDB permissions
- **Post-Filter Authorization**: Retrieve semantically, then filter by SpiceDB permissions
- **Pre-Filter Authorization**: Fetch authorized resource IDs via LookupResources first, then run a filtered vector store search — ideal when users have access to a small fraction of a large corpus
- **Efficient Bulk Permissions**: Uses SpiceDB's native bulk API for optimal performance
- **Observable**: Returns detailed metrics about authorization decisions
- **Type-Safe**: Full type hints for better IDE support
Expand All @@ -19,24 +20,30 @@ This package follows [LangChain's official integration guidelines](https://pytho
Most RAG pipelines retrieve documents without considering user permissions. This package solves that by:

1. **Post-retrieval filtering**: Retrieve best semantic matches first, then filter by permissions
2. **Deterministic authorization**: Every document is checked against SpiceDB before being used
3. **Framework integration**: Native LangChain and LangGraph components for seamless integration
4. **Vector store agnostic**: Not tied to any specific vector database
2. **Pre-retrieval filtering**: Fetch all resource IDs the user can access via SpiceDB's `LookupResources` API, then run a filtered vector store search — no unauthorized documents are retrieved
3. **Deterministic authorization**: Every document is checked against SpiceDB before being used
4. **Framework integration**: Native LangChain and LangGraph components for seamless integration
5. **Vector store agnostic**: Not tied to any specific vector database

## Which Component Should I Use?

Choose the right component based on your use case:

| Component | Use Case | Best For |
|-----------|----------|----------|
| **SpiceDBRetriever** | Simple RAG pipelines | Drop-in replacement for any retriever. Wraps your existing retriever with authorization. |
| **SpiceDBAuthFilter** | LangChain chains with middleware | Filtering documents in the middle of a chain. Reusable across different users via `config`. |
| **create_auth_node** | LangGraph workflows | Complex multi-step workflows with state management. Provides authorization metrics in state. |
| **SpiceDBPermissionTool** | Agentic workflows | Give agents the ability to check permissions before taking actions. |
| **SpiceDBBulkPermissionTool** | Agentic workflows (batch) | Same as above but for checking multiple resources at once. |
| Component | Pattern | Use Case |
|-----------|---------|----------|
| **SpiceDBRetriever** | Post-filter | Simple RAG pipelines. Drop-in replacement for any retriever. Retrieves semantically then filters by permission. Best when users have broad access. |
| **SpiceDBPreFilterRetriever** | Pre-filter | Use when users can only access a small fraction of a large corpus. Fetches authorized IDs from SpiceDB first, then runs a filtered vector search. Requires a `filter_factory` matching your vector store's filter syntax. |
| **SpiceDBAuthFilter** | Post-filter | LangChain chains with middleware. Filtering documents in the middle of a chain. Reusable across different users via `config`. |
| **create_auth_node** | Post-filter | LangGraph workflows. Complex multi-step workflows with state management. Provides authorization metrics in state. |
| **SpiceDBPermissionTool** | Check | Agentic workflows. Give agents the ability to check a single permission before taking actions. |
| **SpiceDBBulkPermissionTool** | Check | Agentic workflows (batch). Same as above but for checking multiple resources at once. |

### Quick Decision Guide

**Pre-filter vs Post-filter:**
- Use **post-filter** (`SpiceDBRetriever`, `SpiceDBAuthFilter`) when users have access to most documents. Semantic search quality is highest because all documents are candidates.
- Use **pre-filter** (`SpiceDBPreFilterRetriever`) when users have access to a small subset of a large corpus. Avoids retrieving unauthorized content entirely. Requires knowing your vector store's filter syntax.

**Use SpiceDBRetriever if:**
- You have a simple RAG pipeline
- You always use the same user per retriever instance and you don't need to reuse the retriever across different users
Expand Down Expand Up @@ -91,6 +98,20 @@ agent = create_agent(llm, tools, system_prompt="You are a helpful assistant.")
# Agent can check "Can user alice delete document 123?" and explain the result
```

**Pattern 5: SpiceDBPreFilterRetriever (pre-filter)**
```python
retriever = SpiceDBPreFilterRetriever(
vector_store=vector_store,
filter_factory=lambda ids: {"filter": {"article_id": {"$in": ids}}},
subject_id="tim",
resource_type="article",
permission="view",
spicedb_endpoint="localhost:50051",
spicedb_token="sometoken",
)
chain = retriever | prompt | llm
```

## Installation

```bash
Expand Down
215 changes: 215 additions & 0 deletions examples/pre_filter_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
"""
SpiceDBPreFilterRetriever Example - Pre-Filter Authorization RAG Pipeline

This example demonstrates how to use SpiceDBPreFilterRetriever to pre-filter
vector store searches using SpiceDB's LookupResources API.

Unlike SpiceDBRetriever (post-filter), this approach:
1. Calls SpiceDB first to get all resource IDs the user can access
2. Passes those IDs as a filter into the vector store search
3. Only retrieves documents the user is authorized to see

Use this pattern when users have access to a small fraction of a large corpus.
"""

import asyncio
import os
from typing import List
from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import ChatOpenAI

from langchain_spicedb import SpiceDBPreFilterRetriever

load_dotenv()


class MockVectorStore:
"""
Mock vector store simulating Pinecone with metadata filter support.

In a real application, replace this with:
from langchain_pinecone import PineconeVectorStore
knowledge = PineconeVectorStore.from_existing_index(
index_name="my-index",
embedding=OpenAIEmbeddings(...),
)
"""

async def asimilarity_search(
self, query: str, k: int = 4, filter: dict = None
) -> List[Document]:
"""Return mock documents, filtered by article_id if filter is provided."""
all_docs = [
Document(
page_content="Python is a high-level programming language known for simplicity.",
metadata={"article_id": "123", "title": "Python Basics"},
),
Document(
page_content="JavaScript is the language of the web.",
metadata={"article_id": "456", "title": "JavaScript Guide"},
),
Document(
page_content="Machine learning models can be trained on large datasets.",
metadata={"article_id": "789", "title": "ML Introduction"},
),
Document(
page_content="SpiceDB is a database for fine-grained authorization.",
metadata={"article_id": "101", "title": "SpiceDB Overview"},
),
]

if filter and "article_id" in filter:
authorized = filter["article_id"].get("$in", [])
return [d for d in all_docs if d.metadata["article_id"] in authorized][:k]

return all_docs[:k]


async def main():
print("=" * 80)
print("SpiceDBPreFilterRetriever Example - Pre-Filter Authorization RAG")
print("=" * 80)
print()

spicedb_endpoint = os.getenv("SPICEDB_ENDPOINT", "localhost:50051")
spicedb_token = os.getenv("SPICEDB_TOKEN", "somerandomkeyhere")
subject_id = os.getenv("SUBJECT_ID", "tim")

print("Configuration:")
print(f" SpiceDB Endpoint: {spicedb_endpoint}")
print(f" Subject (User): {subject_id}")
print(" Resource Type: article")
print(" Permission: view")
print()
print("Pattern: LookupResources → authorized IDs → vector store filter → docs")
print()

vector_store = MockVectorStore()

# SpiceDBPreFilterRetriever:
# 1. Calls LookupResources(subject=tim, permission=view, resource_type=article)
# 2. Gets back e.g. ["123", "101"] (articles tim can view)
# 3. Calls filter_factory(["123", "101"]) → {"filter": {"article_id": {"$in": ["123", "101"]}}}
# 4. Calls vector_store.asimilarity_search(query, k=4, filter=...)
# 5. Returns only authorized + semantically relevant documents
retriever = SpiceDBPreFilterRetriever(
vector_store=vector_store,
filter_factory=lambda ids: {"filter": {"article_id": {"$in": ids}}},
subject_id=subject_id,
resource_type="article",
permission="view",
spicedb_endpoint=spicedb_endpoint,
spicedb_token=spicedb_token,
k=4,
)

llm = ChatOpenAI(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
(
"system",
"Answer questions based only on the provided context. "
"If the context doesn't contain enough information, say so.",
),
("human", "Question: {question}\n\nContext:\n{context}"),
])

def format_docs(docs):
if not docs:
return "No authorized documents found."
return "\n\n".join(
f"Document {i + 1}:\n{doc.page_content}" for i, doc in enumerate(docs)
)

rag_chain = (
RunnableParallel({
"context": retriever | format_docs,
"question": RunnablePassthrough(),
})
| prompt
| llm
| StrOutputParser()
)

query = "Tell me about SpiceDB"
print(f"Query: {query}")
print("-" * 40)

print(f"\nDocuments after pre-filter (user: {subject_id}):")
authorized_docs = await retriever.ainvoke(query)
if authorized_docs:
for doc in authorized_docs:
print(f" ✓ {doc.metadata['title']} (ID: {doc.metadata['article_id']})")
else:
print(" ✗ No authorized documents")

print("\nLLM Answer:")
answer = await rag_chain.ainvoke(query)
print(answer)
print()
print("=" * 80)


async def demo_without_openai():
"""Demo showing document pre-filtering without requiring an LLM."""
print("=" * 80)
print("SpiceDBPreFilterRetriever Demo - Pre-Filter Only")
print("=" * 80)
print()

spicedb_endpoint = os.getenv("SPICEDB_ENDPOINT", "localhost:50051")
spicedb_token = os.getenv("SPICEDB_TOKEN", "somerandomkeyhere")
subject_id = os.getenv("SUBJECT_ID", "tim")

print(f"Looking up authorized articles for user: {subject_id}")
print()

vector_store = MockVectorStore()

retriever = SpiceDBPreFilterRetriever(
vector_store=vector_store,
filter_factory=lambda ids: {"filter": {"article_id": {"$in": ids}}},
subject_id=subject_id,
resource_type="article",
permission="view",
spicedb_endpoint=spicedb_endpoint,
spicedb_token=spicedb_token,
)

query = "programming languages"
docs = await retriever.ainvoke(query)

print(f"Documents returned for query '{query}':")
if docs:
for doc in docs:
print(f" ✓ {doc.metadata['title']} (ID: {doc.metadata['article_id']})")
else:
print(" ✗ No authorized documents found")
print()


if __name__ == "__main__":
print()
print("Prerequisites:")
print("1. SpiceDB running on localhost:50051 (or set SPICEDB_ENDPOINT)")
print("2. Set SPICEDB_TOKEN environment variable")
print("3. SpiceDB schema with 'article' resource type and 'view' permission")
print("4. Create relationships: zed relationship create article:123 viewer user:tim")
print()
print("Optional:")
print("5. Set OPENAI_API_KEY for full RAG demo")
print("6. Set SUBJECT_ID to test different users (default: tim)")
print()
print("=" * 80)
print()

if os.getenv("OPENAI_API_KEY"):
asyncio.run(main())
else:
print("OpenAI API key not found. Running pre-filter demo without LLM...")
print()
asyncio.run(demo_without_openai())
4 changes: 2 additions & 2 deletions langchain_spicedb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@

# Import LangChain standard components (retrievers, tools)
try:
from .retrievers import SpiceDBRetriever # noqa: F401
from .retrievers import SpiceDBRetriever, SpiceDBPreFilterRetriever # noqa: F401

_has_retrievers = True
except ImportError:
Expand Down Expand Up @@ -74,7 +74,7 @@
__all__.extend(["SpiceDBAuthFilter", "SpiceDBAuthLambda"])

if _has_retrievers:
__all__.extend(["SpiceDBRetriever"])
__all__.extend(["SpiceDBRetriever", "SpiceDBPreFilterRetriever"])

if _has_tools:
__all__.extend(["SpiceDBPermissionTool", "SpiceDBBulkPermissionTool"])
Expand Down
Loading
Loading