Skip to content

Commit c134511

Browse files
authored
feat: more querying implementation (#4)
2 parents 12dfc39 + 75411ce commit c134511

31 files changed

Lines changed: 2076 additions & 40 deletions

.github/workflows/ci.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: ["**"]
6+
pull_request:
7+
8+
jobs:
9+
lint-type-unit:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
- uses: actions/setup-python@v5
14+
with:
15+
python-version: "3.11"
16+
- name: Install dependencies
17+
run: |
18+
python -m pip install --upgrade pip
19+
python -m pip install .[test,dev]
20+
- name: Ruff
21+
run: ruff check .
22+
- name: Mypy
23+
run: mypy paradedb
24+
- name: Unit tests
25+
run: python -m pytest tests/unit
26+
27+
integration:
28+
runs-on: ubuntu-latest
29+
services:
30+
paradedb:
31+
image: paradedb/paradedb:latest
32+
ports:
33+
- 5432:5432
34+
env:
35+
POSTGRES_PASSWORD: postgres
36+
POSTGRES_USER: postgres
37+
POSTGRES_DB: postgres
38+
options: >-
39+
--health-cmd "pg_isready -U postgres"
40+
--health-interval 10s
41+
--health-timeout 5s
42+
--health-retries 12
43+
steps:
44+
- uses: actions/checkout@v4
45+
- uses: actions/setup-python@v5
46+
with:
47+
python-version: "3.11"
48+
- name: Install dependencies
49+
run: |
50+
python -m pip install --upgrade pip
51+
python -m pip install .[test,dev]
52+
- name: Install pg client
53+
run: sudo apt-get update && sudo apt-get install -y postgresql-client
54+
- name: Wait for ParadeDB
55+
env:
56+
PGPASSWORD: postgres
57+
run: |
58+
for i in {1..30}; do
59+
pg_isready -h localhost -p 5432 -U postgres && exit 0
60+
sleep 2
61+
done
62+
echo "ParadeDB did not become ready" >&2
63+
exit 1
64+
- name: Integration tests
65+
env:
66+
PARADEDB_TEST_DSN: postgres://postgres:postgres@localhost:5432/postgres
67+
run: python -m pytest -m integration
68+
- name: Run examples
69+
env:
70+
DATABASE_URL: postgresql+psycopg://postgres:postgres@localhost:5432/postgres
71+
run: |
72+
python examples/quickstart.py
73+
python examples/autocomplete.py
74+
python examples/more_like_this.py
75+
python examples/faceted_search.py
76+
python examples/hybrid_rrf.py
77+
python examples/rag.py

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@
44
__pycache__/
55
*.py[cod]
66
*$py.class
7+
8+
# macOS
9+
.DS_Store

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Changelog
2+
3+
## Unreleased
4+
5+
### Added
6+
7+
- Full BM25 search/query helper set with advanced operators.
8+
- Facet and aggregation builders plus rows+facets helper.
9+
- Alembic custom operations and autogenerate render hooks.
10+
- Centralized validation helpers and expanded runtime guard errors.
11+
- Unit/integration suites for indexing, querying, facets, and migrations.
12+
- CI workflow for lint, typing, unit, and integration checks.
13+
- Example scripts for quickstart, facets, autocomplete, MLT, hybrid RRF, and RAG retrieval.

CONTRIBUTING.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Contributing
2+
3+
## Setup
4+
5+
```bash
6+
python -m venv .venv
7+
source .venv/bin/activate
8+
pip install -e .[test,dev]
9+
```
10+
11+
## Run Checks
12+
13+
```bash
14+
ruff check .
15+
mypy paradedb
16+
python -m pytest tests/unit
17+
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5432/postgres python -m pytest -m integration
18+
```
19+
20+
## Guidelines
21+
22+
- Keep helpers typed and composable with standard SQLAlchemy expressions.
23+
- Add integration tests for runtime behavior changes.
24+
- Add unit tests for SQL compilation and validation paths.
25+
- Preserve PostgreSQL-only safeguards for ParadeDB-specific expressions.

README.md

Lines changed: 128 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,130 @@
11
# sqlalchemy-paradedb
22

3-
SQLAlchemy query helpers for ParadeDB.
3+
Typed SQLAlchemy helpers for ParadeDB BM25 indexing and query composition.
4+
5+
## Requirements
6+
7+
- Python 3.10+
8+
- PostgreSQL with ParadeDB (`pg_search`) available
9+
- SQLAlchemy 2.x
10+
11+
## Install
12+
13+
```bash
14+
pip install sqlalchemy-paradedb
15+
```
16+
17+
For local development:
18+
19+
```bash
20+
pip install -e .[test,dev]
21+
```
22+
23+
## Core Modules
24+
25+
- `paradedb.sqlalchemy.indexing`: BM25 field definitions and tokenizer specs.
26+
- `paradedb.sqlalchemy.search`: ParadeDB predicates (`match_all`, `fuzzy`, `parse`, `more_like_this`, etc.).
27+
- `paradedb.sqlalchemy.pdb`: function wrappers (`score`, `snippet`, `snippets`, `agg`).
28+
- `paradedb.sqlalchemy.facets`: aggregate/facet JSON builders and rows+facets helper.
29+
- `paradedb.sqlalchemy.select_with`: select decorators for score/snippet columns.
30+
- `paradedb.sqlalchemy.alembic`: Alembic operations for BM25 index lifecycle.
31+
32+
## Quickstart
33+
34+
```python
35+
from sqlalchemy import Index, select
36+
from paradedb.sqlalchemy import indexing, search
37+
38+
products_bm25_idx = Index(
39+
"products_bm25_idx",
40+
indexing.BM25Field(Product.id),
41+
indexing.BM25Field(Product.description, tokenizer=indexing.tokenize.unicode(lowercase=True)),
42+
indexing.BM25Field(Product.category, tokenizer=indexing.tokenize.literal()),
43+
postgresql_using="bm25",
44+
postgresql_with={"key_field": "id"},
45+
)
46+
47+
products_bm25_idx.create(engine)
48+
49+
stmt = select(Product.id, Product.description).where(search.match_any(Product.description, "running", "shoes"))
50+
```
51+
52+
## Query APIs
53+
54+
- Basic predicates: `match_all`, `match_any`, `term`, `phrase`, `fuzzy`, `regex`, `all`
55+
- Advanced predicates: `parse`, `phrase_prefix`, `regex_phrase`, `near`, `proximity`, `more_like_this`
56+
- Scoring/snippets: `pdb.score`, `pdb.snippet`, `pdb.snippets`, `select_with.score`, `select_with.snippet`
57+
- Aggregations/facets: `facets.*` builders + `pdb.agg(...)`
58+
- Rows + facets: `facets.with_rows(...)`
59+
60+
## Facets
61+
62+
```python
63+
from sqlalchemy import select
64+
from paradedb.sqlalchemy import facets, pdb, search
65+
66+
stmt = (
67+
select(
68+
pdb.agg(facets.value_count(field="id")).label("count"),
69+
pdb.agg(facets.avg(field="rating")).label("avg_rating"),
70+
)
71+
.select_from(Product)
72+
.where(search.match_all(Product.description, "running"))
73+
)
74+
```
75+
76+
## Alembic Operations
77+
78+
Import once in migration env startup so operations are registered:
79+
80+
```python
81+
import paradedb.sqlalchemy.alembic # noqa: F401
82+
```
83+
84+
Usage:
85+
86+
```python
87+
op.create_bm25_index("products_bm25_idx", "products", ["id", "description"], key_field="id")
88+
op.reindex_bm25("products_bm25_idx", concurrently=True)
89+
op.drop_bm25_index("products_bm25_idx", if_exists=True)
90+
```
91+
92+
## Validation and Guardrails
93+
94+
- Search and facet builders validate option bounds and shapes at build time.
95+
- `select_with.snippet*` raises `SnippetWithFuzzyPredicateError` with fuzzy predicates.
96+
- `facets.with_rows` enforces `ORDER BY` + `LIMIT`, and can auto-inject a ParadeDB sentinel (`pdb.all()`).
97+
98+
## Examples
99+
100+
See `examples/`:
101+
102+
- `quickstart.py`
103+
- `faceted_search.py`
104+
- `autocomplete.py`
105+
- `more_like_this.py`
106+
- `hybrid_rrf.py`
107+
- `rag.py`
108+
109+
## Testing
110+
111+
Unit tests:
112+
113+
```bash
114+
python -m pytest tests/unit
115+
```
116+
117+
Integration tests (requires running ParadeDB):
118+
119+
```bash
120+
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5432/postgres python -m pytest -m integration
121+
```
122+
123+
## CI
124+
125+
GitHub Actions workflow at `.github/workflows/ci.yml` runs:
126+
127+
- Ruff lint
128+
- Mypy type check
129+
- Unit tests
130+
- Integration tests against a ParadeDB service container

examples/autocomplete.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
from __future__ import annotations
2+
3+
from sqlalchemy import Integer, String, Text, select
4+
from sqlalchemy.orm import DeclarativeBase, Mapped, Session, mapped_column
5+
6+
from common import engine_from_env, setup_products
7+
from paradedb.sqlalchemy import search
8+
9+
10+
class Base(DeclarativeBase):
11+
pass
12+
13+
14+
class Product(Base):
15+
__tablename__ = "products"
16+
17+
id: Mapped[int] = mapped_column(Integer, primary_key=True)
18+
description: Mapped[str] = mapped_column(Text, nullable=False)
19+
category: Mapped[str] = mapped_column(String(120), nullable=False)
20+
21+
22+
def main() -> None:
23+
engine = engine_from_env()
24+
setup_products(engine)
25+
26+
stmt = (
27+
select(Product.id, Product.description)
28+
.where(search.phrase_prefix(Product.description, ["running", "sh"]))
29+
.order_by(Product.id)
30+
.limit(10)
31+
)
32+
33+
with Session(engine) as session:
34+
for row in session.execute(stmt):
35+
print(dict(row._mapping))
36+
37+
38+
if __name__ == "__main__":
39+
main()

examples/common.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
from __future__ import annotations
2+
3+
import os
4+
5+
from sqlalchemy import text
6+
from sqlalchemy import create_engine
7+
from sqlalchemy.engine import Engine
8+
9+
10+
PRODUCT_ROWS = [
11+
(1, "Sleek running shoes for daily training", "Footwear", 5),
12+
(2, "Trail running shoes with durable grip", "Footwear", 4),
13+
(3, "Wireless noise-canceling headphones", "Electronics", 5),
14+
(4, "Budget walking sneakers", "Footwear", 2),
15+
(5, "Artistic ceramic vase", "Home", 3),
16+
]
17+
18+
DOCUMENT_ROWS = [
19+
(1, "ParadeDB is a Postgres extension for full-text search."),
20+
(2, "BM25 ranking helps relevance-based retrieval in PostgreSQL."),
21+
(3, "RAG pipelines combine retrieval with LLM generation."),
22+
]
23+
24+
25+
def engine_from_env() -> Engine:
26+
dsn = os.getenv("DATABASE_URL", "postgresql+psycopg://postgres:postgres@localhost:5432/postgres")
27+
return create_engine(dsn)
28+
29+
30+
def setup_products(engine: Engine) -> None:
31+
with engine.begin() as conn:
32+
conn.execute(text("DROP INDEX IF EXISTS products_bm25_idx"))
33+
conn.execute(text("DROP TABLE IF EXISTS products"))
34+
conn.execute(
35+
text(
36+
"""
37+
CREATE TABLE products (
38+
id int primary key,
39+
description text not null,
40+
category text not null,
41+
rating int not null
42+
)
43+
"""
44+
)
45+
)
46+
conn.execute(
47+
text(
48+
"CREATE INDEX products_bm25_idx ON products USING bm25 (id, description, category, rating) WITH (key_field='id')"
49+
)
50+
)
51+
for row in PRODUCT_ROWS:
52+
conn.execute(
53+
text(
54+
"INSERT INTO products (id, description, category, rating) VALUES (:id, :description, :category, :rating)"
55+
),
56+
{
57+
"id": row[0],
58+
"description": row[1],
59+
"category": row[2],
60+
"rating": row[3],
61+
},
62+
)
63+
64+
65+
def setup_documents(engine: Engine) -> None:
66+
with engine.begin() as conn:
67+
conn.execute(text("DROP INDEX IF EXISTS documents_bm25_idx"))
68+
conn.execute(text("DROP TABLE IF EXISTS documents"))
69+
conn.execute(
70+
text(
71+
"""
72+
CREATE TABLE documents (
73+
id int primary key,
74+
content text not null
75+
)
76+
"""
77+
)
78+
)
79+
conn.execute(text("CREATE INDEX documents_bm25_idx ON documents USING bm25 (id, content) WITH (key_field='id')"))
80+
for row in DOCUMENT_ROWS:
81+
conn.execute(
82+
text("INSERT INTO documents (id, content) VALUES (:id, :content)"),
83+
{"id": row[0], "content": row[1]},
84+
)

0 commit comments

Comments
 (0)