Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* Added support for numeric `minimum_should_match` values in `TermsSet` query [#2293](https://github.com/ruflin/Elastica/pull/2293)
* Added support for "search after" based pagination [#1645](https://github.com/ruflin/Elastica/issues/1645)
* Added support for the `seq_no_primary_term` search option and the `if_seq_no` / `if_primary_term` index options to enable optimistic concurrency control [#2284](https://github.com/ruflin/Elastica/pull/2284)
* Added `Elastica\Query\Knn` and `Elastica\Query::setKnn()` to support top-level kNN search (single or multiple kNN searches), with optional filters, similarity threshold and boost
### Changed
* `Elastica\Util::convertDate()` now throws `Elastica\Exception\InvalidException` when given a string that `strtotime()` cannot parse, instead of silently producing `1970-01-01T00:00:00Z`. The return type has also been narrowed to `string` and a typo-prone `null` argument is no longer accepted at runtime.
### Deprecated
Expand Down
26 changes: 25 additions & 1 deletion src/Query.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
use Elastica\Aggregation\AbstractAggregation;
use Elastica\Exception\InvalidException;
use Elastica\Query\AbstractQuery;
use Elastica\Query\Knn;
use Elastica\Query\MatchAll;
use Elastica\Query\QueryString;
use Elastica\Rescore\Query as QueryRescore;
Expand Down Expand Up @@ -52,6 +53,7 @@
* from?: int,
* highlight?: THighlightArgs,
* indices_boost?: array<string, float>,
* knn?: array<string, mixed>|list<array<string, mixed>>,
* min_score?: float,
* pit?: PointInTime,
* post_filter?: AbstractQuery,
Expand Down Expand Up @@ -336,7 +338,7 @@ public function addAggregation(AbstractAggregation $agg): self
*/
public function toArray(): array
{
if (!$this->hasSuggest && !isset($this->_params['query'])) {
if (!isset($this->_params['knn']) && !$this->hasSuggest && !isset($this->_params['query'])) {
$this->setQuery(new MatchAll());
}

Expand Down Expand Up @@ -494,4 +496,26 @@ public function setSearchAfter(array $searchAfter): self

return $this;
}

/**
* Sets a top-level kNN search.
*
* Pass a list of {@see Knn} to combine several kNN searches in the same request.
* The Knn is serialized at call time, so further mutations on the passed object
* are not reflected in the query - compose it fully before calling setKnn().
*
* @param Knn|list<Knn> $knn
*
* @see https://www.elastic.co/docs/solutions/search/vector/knn
*/
public function setKnn(Knn|array $knn): self
{
if (\is_array($knn)) {
$value = \array_map(static fn (Knn $k): array => $k->toArray()['knn'], $knn);
} else {
$value = $knn->toArray()['knn'];
}
Comment on lines +511 to +517

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate setKnn() array input before mapping.

Line 513 accepts any array, but the mapper assumes list<Knn>. Empty/non-list/non-Knn entries fail late (or build invalid knn payloads). Add explicit guards and throw InvalidException with a clear message.

Proposed fix
 public function setKnn(Knn|array $knn): self
 {
     if (\is_array($knn)) {
+        if ([] === $knn || !\array_is_list($knn)) {
+            throw new InvalidException('Knn must be a non-empty list of Knn instances.');
+        }
+        foreach ($knn as $entry) {
+            if (!$entry instanceof Knn) {
+                throw new InvalidException('Each knn entry must be an instance of '.Knn::class.'.');
+            }
+        }
         $value = \array_map(static fn (Knn $k): array => $k->toArray()['knn'], $knn);
     } else {
         $value = $knn->toArray()['knn'];
     }

     return $this->setParam('knn', $value);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/Query.php` around lines 511 - 517, The setKnn method currently accepts
any array and then maps assuming list<Knn>; update setKnn(Knn|array $knn) to
validate array inputs before mapping: if $knn is an array ensure it is a
non-empty list (no associative keys), and each element is an instance of Knn; if
validation fails throw InvalidException with a clear message like "setKnn
expects a non-empty list of Knn instances"; after validation proceed to map each
Knn to its ->toArray()['knn'] value (preserving existing mapping behavior) so
only valid Knn payloads are produced.


return $this->setParam('knn', $value);
}
}
55 changes: 55 additions & 0 deletions src/Query/Knn.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<?php

declare(strict_types=1);

namespace Elastica\Query;

use Elastica\Param;

/**
* Top-level kNN search.
*
* Note: `knn` is a sibling of `query` in the search request body, not a clause inside it.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the part I stumble over. Should we make it part of Query or make it it's own top level thing extending Parms instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, knn is a sibling of query in the search body.

Nothing today prevents a user from passing a Knn to Query::setQuery() or inside a BoolQuery, where it would silently produce an invalid request.

I'll switch to extending Param indeed. Pushing the change shortly !

* Attach it via {@see \Elastica\Query::setKnn()} rather than putting it inside a BoolQuery.
*
* @see https://www.elastic.co/docs/solutions/search/vector/knn
*/
class Knn extends Param
{
/**
* @param float[] $queryVector
*/
public function __construct(string $field, array $queryVector, int $k, int $numCandidates)
{
$this->setParam('field', $field);
$this->setParam('query_vector', $queryVector);
$this->setParam('k', $k);
$this->setParam('num_candidates', $numCandidates);
}

/**
* Adds a Query DSL filter applied before the kNN search.
*
* Filters are ANDed together by Elasticsearch.
*/
public function addFilter(AbstractQuery $filter): self
{
return $this->addParam('filter', $filter);
}

/**
* Sets the minimum similarity required for a document to be considered a match.
*/
public function setSimilarity(float $similarity): self
{
return $this->setParam('similarity', $similarity);
}

/**
* Boost applied to the kNN score before it is combined with other clauses.
*/
public function setBoost(float $boost): self
{
return $this->setParam('boost', $boost);
}
}
134 changes: 134 additions & 0 deletions tests/Query/KnnTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
<?php

declare(strict_types=1);

namespace Elastica\Test\Query;

use Elastica\Document;
use Elastica\Mapping;
use Elastica\Query;
use Elastica\Query\Knn;
use Elastica\Query\Range;
use Elastica\Query\Terms;
use Elastica\Test\Base as BaseTest;
use PHPUnit\Framework\Attributes\Group;

/**
* @internal
*/
class KnnTest extends BaseTest
{
#[Group('unit')]
public function testToArray(): void
{
$knn = new Knn('vector', [0.1, 0.2, 0.3], 100, 200);

$expected = [
'knn' => [
'field' => 'vector',
'query_vector' => [0.1, 0.2, 0.3],
'k' => 100,
'num_candidates' => 200,
],
];

$this->assertSame($expected, $knn->toArray());
}

#[Group('unit')]
public function testToArrayWithFiltersSimilarityAndBoost(): void
{
$knn = new Knn('vector', [0.5, 0.5], 10, 20);
$knn->addFilter(new Terms('tag', ['foo']));
$knn->addFilter(new Range('age', ['gte' => 20]));
$knn->setSimilarity(0.7);
$knn->setBoost(1.5);

$expected = [
'knn' => [
'field' => 'vector',
'query_vector' => [0.5, 0.5],
'k' => 10,
'num_candidates' => 20,
'filter' => [
['terms' => ['tag' => ['foo']]],
['range' => ['age' => ['gte' => 20]]],
],
'similarity' => 0.7,
'boost' => 1.5,
],
];

$this->assertSame($expected, $knn->toArray());
}

#[Group('unit')]
public function testQuerySetKnnEmbedsSingleKnnAtTopLevel(): void
{
$query = new Query();
$query->setKnn(new Knn('vector', [0.1, 0.2], 5, 10));

$body = $query->toArray();

$this->assertSame([
'field' => 'vector',
'query_vector' => [0.1, 0.2],
'k' => 5,
'num_candidates' => 10,
], $body['knn']);
$this->assertArrayNotHasKey('query', $body, 'knn-only requests must not be auto-padded with a match_all query');
}

#[Group('unit')]
public function testQuerySetKnnAcceptsListOfKnnForMultipleKnnSearches(): void
{
$query = new Query();
$query->setKnn([
new Knn('a.vector', [0.1], 5, 10),
new Knn('b.vector', [0.2], 5, 10),
]);

$body = $query->toArray();

$this->assertCount(2, $body['knn']);
$this->assertSame('a.vector', $body['knn'][0]['field']);
$this->assertSame('b.vector', $body['knn'][1]['field']);
$this->assertArrayNotHasKey('query', $body, 'multi-knn requests must not be auto-padded with a match_all query');
}

#[Group('functional')]
public function testKnnSearchAgainstDenseVectorField(): void
{
$index = $this->_createIndex();
$index->setMapping(new Mapping([
'tag' => ['type' => 'keyword'],
'vector' => [
'type' => 'dense_vector',
'dims' => 3,
'index' => true,
'similarity' => 'cosine',
],
]));

$index->addDocuments([
new Document('1', ['tag' => 'foo', 'vector' => [1.0, 0.0, 0.0]]),
new Document('2', ['tag' => 'foo', 'vector' => [0.9, 0.1, 0.0]]),
new Document('3', ['tag' => 'bar', 'vector' => [0.0, 0.0, 1.0]]),
]);
$index->refresh();

$knn = new Knn('vector', [1.0, 0.0, 0.0], 2, 10);
$knn->addFilter(new Terms('tag', ['foo']));

$query = new Query();
$query->setKnn($knn);

$results = $index->search($query);

$ids = \array_map(static fn ($r): string => $r->getId(), $results->getResults());

$this->assertContains('1', $ids);
$this->assertContains('2', $ids);
$this->assertNotContains('3', $ids, 'tag filter must exclude documents with another tag value');
}
}