Skip to content

Commit 33845e0

Browse files
committed
Align ParadeDB scripts to 5443 and close DSL parity gaps
1 parent 5104bf2 commit 33845e0

18 files changed

Lines changed: 316 additions & 26 deletions

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ pip install -e .[test,dev]
1414
ruff check .
1515
mypy paradedb
1616
python -m pytest tests/unit
17-
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5432/postgres python -m pytest -m integration
17+
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5443/postgres python -m pytest -m integration
1818
```
1919

2020
## Guidelines

README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ pip install -e .[test,dev]
2626
- `paradedb.sqlalchemy.search`: ParadeDB predicates (`match_all`, `fuzzy`, `parse`, `more_like_this`, etc.).
2727
- `paradedb.sqlalchemy.pdb`: function wrappers (`score`, `snippet`, `snippets`, `agg`).
2828
- `paradedb.sqlalchemy.facets`: aggregate/facet JSON builders and rows+facets helper.
29-
- `paradedb.sqlalchemy.select_with`: select decorators for score/snippet columns.
29+
- `paradedb.sqlalchemy.select_with`: select decorators for score/snippet/snippet_positions columns.
3030
- `paradedb.sqlalchemy.alembic`: Alembic operations for BM25 index lifecycle.
3131

3232
## Quickstart
@@ -49,11 +49,32 @@ products_bm25_idx.create(engine)
4949
stmt = select(Product.id, Product.description).where(search.match_any(Product.description, "running", "shoes"))
5050
```
5151

52+
Index JSON keys using BM25Field expressions:
53+
54+
```python
55+
from paradedb.sqlalchemy import expr
56+
57+
products_bm25_idx = Index(
58+
"products_bm25_idx",
59+
indexing.BM25Field(Product.id),
60+
indexing.BM25Field(
61+
expr.json_text(Product.metadata, "color"),
62+
tokenizer=indexing.tokenize.literal(alias="metadata_color"),
63+
),
64+
indexing.BM25Field(
65+
expr.json_text(Product.metadata, "location"),
66+
tokenizer=indexing.tokenize.literal(alias="metadata_location"),
67+
),
68+
postgresql_using="bm25",
69+
postgresql_with={"key_field": "id"},
70+
)
71+
```
72+
5273
## Query APIs
5374

5475
- Basic predicates: `match_all`, `match_any`, `term`, `phrase`, `fuzzy`, `regex`, `all`
5576
- Advanced predicates: `parse`, `phrase_prefix`, `regex_phrase`, `near`, `proximity`, `more_like_this`
56-
- Scoring/snippets: `pdb.score`, `pdb.snippet`, `pdb.snippets`, `select_with.score`, `select_with.snippet`
77+
- Scoring/snippets: `pdb.score`, `pdb.snippet`, `pdb.snippets`, `pdb.snippet_positions`, `select_with.score`, `select_with.snippet`, `select_with.snippet_positions`
5778
- Aggregations/facets: `facets.*` builders + `pdb.agg(...)`
5879
- Rows + facets: `facets.with_rows(...)`
5980

@@ -67,6 +88,8 @@ stmt = (
6788
select(
6889
pdb.agg(facets.value_count(field="id")).label("count"),
6990
pdb.agg(facets.avg(field="rating")).label("avg_rating"),
91+
pdb.agg(facets.percentiles(field="rating", percents=[50, 95])).label("rating_percentiles"),
92+
pdb.agg(facets.top_hits(size=2, sort=[{"rating": "desc"}], docvalue_fields=["id", "rating"])).label("top_hits"),
7093
)
7194
.select_from(Product)
7295
.where(search.match_all(Product.description, "running"))
@@ -117,7 +140,7 @@ python -m pytest tests/unit
117140
Integration tests (requires running ParadeDB):
118141

119142
```bash
120-
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5432/postgres python -m pytest -m integration
143+
PARADEDB_TEST_DSN=postgres://postgres:postgres@localhost:5443/postgres python -m pytest -m integration
121144
```
122145

123146
## CI

examples/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424

2525
def engine_from_env() -> Engine:
26-
dsn = os.getenv("DATABASE_URL", "postgresql+psycopg://postgres:postgres@localhost:5432/postgres")
26+
dsn = os.getenv("DATABASE_URL", "postgresql+psycopg://postgres:postgres@localhost:5443/postgres")
2727
return create_engine(dsn)
2828

2929

examples/faceted_search.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,14 @@ def main() -> None:
2828
select(
2929
pdb.agg(facets.value_count(field="id")).label("count"),
3030
pdb.agg(facets.avg(field="rating")).label("avg_rating"),
31+
pdb.agg(facets.percentiles(field="rating", percents=[50, 95])).label("rating_percentiles"),
32+
pdb.agg(
33+
facets.top_hits(
34+
size=2,
35+
sort=[{"rating": "desc"}],
36+
docvalue_fields=["id", "rating"],
37+
)
38+
).label("top_hits"),
3139
)
3240
.select_from(Product)
3341
.where(search.match_all(Product.description, "running"))

paradedb/sqlalchemy/indexing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ def _compile_bm25_field(element: BM25Field, compiler, **kw: Any) -> str:
123123
expr_sql = compiler.process(element.expr, **kw)
124124
if element.tokenizer is None:
125125
return expr_sql
126-
return f"({expr_sql}::{element.tokenizer.render()})"
126+
return f"(({expr_sql})::{element.tokenizer.render()})"
127127

128128

129129
@compiles(BM25Field)

paradedb/sqlalchemy/search.py

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,17 @@ class ProximityExpr:
102102
def __init__(self, expr: ClauseElement) -> None:
103103
self.expr = expr
104104

105-
def near(self, other: str | ClauseElement | ProximityExpr, *, distance: int, ordered: bool = False) -> ProximityExpr:
106-
return ProximityExpr(_near_chain(self.expr, other, distance=distance, ordered=ordered))
105+
def near(
106+
self,
107+
other: str | ClauseElement | ProximityExpr | None = None,
108+
*,
109+
distance: int,
110+
ordered: bool = False,
111+
right_pattern: str | None = None,
112+
max_expansions: int = 100,
113+
) -> ProximityExpr:
114+
right = _resolve_near_operand(other, right_pattern=right_pattern, max_expansions=max_expansions)
115+
return ProximityExpr(_near_chain(self.expr, right, distance=distance, ordered=ordered))
107116

108117

109118
def _to_proximity_operand(value: str | ClauseElement | ProximityExpr) -> ClauseElement:
@@ -122,12 +131,29 @@ def _to_proximity_clause(value: str | ClauseElement | ProximityExpr) -> ClauseEl
122131

123132

124133
def _near_chain(left: str | ClauseElement | ProximityExpr, right: str | ClauseElement | ProximityExpr, *, distance: int, ordered: bool = False) -> ClauseElement:
134+
require_non_negative(distance, field_name="distance")
125135
left_expr = _to_proximity_clause(left)
126136
right_expr = _to_proximity_clause(right)
127137
op = _NEAR_ORDERED if ordered else _NEAR
128138
return left_expr.operate(op, literal(distance)).operate(op, right_expr)
129139

130140

141+
def _resolve_near_operand(
142+
right: str | ClauseElement | ProximityExpr | None,
143+
*,
144+
right_pattern: str | None,
145+
max_expansions: int,
146+
) -> str | ClauseElement | ProximityExpr:
147+
if right_pattern is not None:
148+
if right is not None:
149+
raise InvalidArgumentError("right and right_pattern cannot be used together")
150+
require_non_negative(max_expansions, field_name="max_expansions")
151+
return prox_regex(right_pattern, max_expansions)
152+
if right is None:
153+
raise InvalidArgumentError("right is required unless right_pattern is provided")
154+
return right
155+
156+
131157
def parse(field: ColumnElement, query: str, *, lenient: bool = False, conjunction_mode: bool = False) -> ColumnElement[bool]:
132158
return field.operate(_QUERY, func.pdb.parse(query, lenient, conjunction_mode))
133159

@@ -153,11 +179,22 @@ def regex_phrase(
153179
return field.operate(_QUERY, func.pdb.regex_phrase(array(terms, type_=Text()), slop, max_expansions))
154180

155181

156-
def near(field: ColumnElement, left: str | ClauseElement, right: str | ClauseElement, *, distance: int, ordered: bool = False) -> ColumnElement[bool]:
157-
return field.operate(_QUERY, _near_chain(left, right, distance=distance, ordered=ordered))
182+
def near(
183+
field: ColumnElement,
184+
left: str | ClauseElement | ProximityExpr,
185+
right: str | ClauseElement | ProximityExpr | None = None,
186+
*,
187+
distance: int,
188+
ordered: bool = False,
189+
right_pattern: str | None = None,
190+
max_expansions: int = 100,
191+
) -> ColumnElement[bool]:
192+
right_operand = _resolve_near_operand(right, right_pattern=right_pattern, max_expansions=max_expansions)
193+
return field.operate(_QUERY, _near_chain(left, right_operand, distance=distance, ordered=ordered))
158194

159195

160196
def prox_regex(pattern: str, max_expansions: int = 100) -> ProximityExpr:
197+
require_non_negative(max_expansions, field_name="max_expansions")
161198
return ProximityExpr(func.pdb.prox_regex(pattern, max_expansions))
162199

163200

paradedb/sqlalchemy/select_with.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,13 @@ def snippets(
5757
sort_by=sort_by,
5858
).label(label)
5959
)
60+
61+
62+
def snippet_positions(
63+
stmt: Select,
64+
field: ColumnElement,
65+
*,
66+
label: str = "snippet_positions",
67+
) -> Select:
68+
_assert_snippet_supported(stmt)
69+
return stmt.add_columns(pdb.snippet_positions(field).label(label))

scripts/run_examples.sh

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,12 @@ if [[ "${SKIP_PARADEDB_START:-0}" != "1" ]]; then
88
source "${SCRIPT_DIR}/run_paradedb.sh"
99
fi
1010

11-
export DATABASE_URL="${DATABASE_URL:-postgresql+psycopg://postgres:postgres@localhost:${PARADEDB_PORT:-5443}/postgres}"
11+
PORT="${PARADEDB_PORT:-5443}"
12+
USER="${PARADEDB_USER:-postgres}"
13+
PASSWORD="${PARADEDB_PASSWORD:-postgres}"
14+
DB="${PARADEDB_DB:-postgres}"
15+
16+
export DATABASE_URL="${DATABASE_URL:-postgresql+psycopg://${USER}:${PASSWORD}@localhost:${PORT}/${DB}}"
1217

1318
./.venv/bin/python examples/quickstart.py
1419
./.venv/bin/python examples/autocomplete.py

scripts/run_integration_tests.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
66

77
source "${SCRIPT_DIR}/run_paradedb.sh"
88

9-
PORT="${PARADEDB_PORT:-5443}"
10-
USER="${PARADEDB_USER:-postgres}"
11-
PASSWORD="${PARADEDB_PASSWORD:-postgres}"
12-
DB="${PARADEDB_DB:-postgres}"
9+
PORT="${PARADEDB_PORT}"
10+
USER="${PARADEDB_USER}"
11+
PASSWORD="${PARADEDB_PASSWORD}"
12+
DB="${PARADEDB_DB}"
1313

1414
export PARADEDB_INTEGRATION=1
1515
export PARADEDB_TEST_DSN="postgresql+psycopg://${USER}:${PASSWORD}@localhost:${PORT}/${DB}"

scripts/run_paradedb.sh

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,26 @@ else
88
RUNNING=0
99
fi
1010

11-
IMAGE="${PARADEDB_IMAGE:-paradedb/paradedb:0.21.4-pg18}"
12-
CONTAINER_NAME="${PARADEDB_CONTAINER_NAME:-paradedb-sqlalchemy-integration}"
11+
PARADEDB_IMAGE="${PARADEDB_IMAGE:-paradedb/paradedb:0.21.8-pg18}"
12+
PARADEDB_CONTAINER_NAME="${PARADEDB_CONTAINER_NAME:-paradedb-sqlalchemy-integration}"
13+
PARADEDB_PORT="${PARADEDB_PORT:-5443}"
14+
PARADEDB_USER="${PARADEDB_USER:-postgres}"
15+
PARADEDB_PASSWORD="${PARADEDB_PASSWORD:-postgres}"
16+
PARADEDB_DB="${PARADEDB_DB:-postgres}"
1317

14-
PORT="${PARADEDB_PORT:-5443}"
15-
USER="${PARADEDB_USER:-postgres}"
16-
PASSWORD="${PARADEDB_PASSWORD:-postgres}"
17-
DB="${PARADEDB_DB:-postgres}"
18+
export PARADEDB_IMAGE
19+
export PARADEDB_CONTAINER_NAME
20+
export PARADEDB_PORT
21+
export PARADEDB_USER
22+
export PARADEDB_PASSWORD
23+
export PARADEDB_DB
24+
25+
IMAGE="${PARADEDB_IMAGE}"
26+
CONTAINER_NAME="${PARADEDB_CONTAINER_NAME}"
27+
PORT="${PARADEDB_PORT}"
28+
USER="${PARADEDB_USER}"
29+
PASSWORD="${PARADEDB_PASSWORD}"
30+
DB="${PARADEDB_DB}"
1831

1932
DATABASE_URL="${DATABASE_URL:-postgresql+psycopg://${USER}:${PASSWORD}@localhost:${PORT}/${DB}}"
2033
export DATABASE_URL

0 commit comments

Comments
 (0)