`RetrievalRuntime`: Streaming pipeline for ingestion and retrieval by Amir-R25 · Pull Request #1109 · RailtownAI/railtracks

Amir-R25 · 2026-05-20T18:31:33Z

Summary

Adds the RetrievalRuntime orchestrator and the supporting Store / loader
changes needed to drive the full ingest → retrieve flow end-to-end. Also
removes the legacy railtracks.vector_stores package now that
railtracks.retrieval supersedes it.

┌────────┐   ┌─────────┐   ┌──────────┐   ┌────────┐   ┌───────────┐
│ Loader │ → │ Chunker │ → │ Embedder │ → │ Store  │ → │ Retrieval │
└────────┘   └─────────┘   └──────────┘   └────────┘   └───────────┘
     ▲             ▲             ▲             ▲              ▲
     └─────────────┴─────────────┴─────────────┴──────────────┘
                                 │
                       ┌─────────────────────┐
                       │  RetrievalRuntime   │
                       │  (the orchestrator) │
                       └─────────────────────┘

`RetrievalRuntime`

The orchestrator that wires a chunker + embedder + Store (+ optional scope)
into the ingest/retrieve flow.

Loader is passed to ingest(), not the constructor: one runtime captures
how to process (chunker/embedder/store/scope); the loader decides what.
A single runtime can ingest from many sources and re-ingest to update.
Streaming + aggregate APIs: ingest(loader) is an async generator yielding
per-batch events; ingest_all(loader) drains it and returns IngestionStats.
Events: BatchIngested (carries per-batch EmbeddingMetrics — tokens, cost,
latency, vector count), EmbeddingFailure, DocumentFailed, DocumentSkipped.
batch_index is per-document, not run-global.
Upsert semantics: before writing the first chunk of a document the runtime
fires store.delete_where({"document_id": str(doc.id)}) to clear the prior
version. The delete only runs once a batch succeeds, so a total embedding
failure preserves the previous version. Writes are per-chunk and not
transactional — a crash mid-write leaves a partial document (recovered on the
next ingest, see below).
Count-aware staleness (skip unchanged docs): a document is skipped only when
the store already holds a complete copy — matched on source_path +
content_hash and the persisted doc_chunk_count. A partially-written document
(fewer chunks than expected after an interrupted run) is re-ingested rather than
left broken. Counting is done via find() rather than a count() call so the
runtime depends only on the Store protocol.
Token-size guard: when max_tokens is set, chunks over the per-item limit
are dropped before embedding and surfaced as EmbeddingFailure instead of
causing provider 4xx errors. Uses TiktokenTokenizer by default. (Partial fix
for the embedding per-item token-cap gap — see Known limitations.)
Embedding-model consistency: the model is captured from the first successful
batch; a later retrieve() with a different embedder raises
EmbeddingModelMismatchError (cross-model similarity scores are meaningless).
Note: capture is in-process only — a fresh runtime over an existing store
won't enforce until its first ingest.
on_ingest / on_retrieve callbacks for logging/observability;
delete_document(id) convenience wrapper.

`stores` module

Store protocol:

added delete_where(filters) and find(filters, limit=1) (metadata-only
lookup, no vector search) — both required by the runtime's upsert/staleness paths.

StoreEntry:

vector is now list[float] | None. Read results no longer round-trip the
vector (was [], now None) — the backend owns the stored vector; callers must
not rely on this field on retrieved entries.

StoreQuery:

scope is now optional (StoreScope | None) for single-tenant callers.
metadata_filters retyped dict[str, Any] (was dict[str, str]).
removed the unused strategies field and the RetrievalStrategy enum.

VectorStore (base) / VectorBackend:

VectorBackend protocol gained list_where(filters, limit) and count(filters).
count lives on the backend, not Store — keeps the runtime's dependency
surface to the Store protocol alone.
VectorStore now implements find, delete_where, and count.
Payload encoding spreads scalar chunk_metadata values to the top level (in
addition to the JSON-encoded blob) so flat-equality metadata_filters / find
work against them.

Backend implementations (chroma, in_memory, pgvector) all implement
list_where + count. Plus:

pgvector _build_where now compares JSONB-to-JSONB
(payload->$k::text = $v::jsonb) so non-string scalars (int/bool/None) keep
their JSON type instead of being stringified. Filters are parameterized; LIMIT
is int-cast before interpolation. Added pool_kwargs passthrough to
asyncpg.create_pool for tuning min_size/max_size/etc.
in_memory _flush is now async — JSON encode happens under the lock, the
disk write is offloaded to a thread so the event loop isn't blocked. Search now
sanitizes non-finite scores (NaN/inf from a misbehaving embedder): they're
logged and sorted/dropped to the end instead of corrupting the ranking.

`loaders` module

Document:

id is now derived deterministically from source via
uuid5(NAMESPACE_URL, source) so re-ingesting the same source yields the same
id across processes. Fixes a silent upsert bug where modified files left their
prior chunks orphaned in the store, because delete_where({"document_id": ...})
was keyed on a fresh random UUID each pass. Sourceless documents fall back to
uuid4() (no stable identity ⇒ no upsert semantics).
added content_hash (SHA-256, computed by the runtime at ingest time; loaders
leave it None) used by staleness detection. type now defaults to
DocumentType.TEXT.
Sanitizer protocol for PII redaction (sync or async sanitize; errors
propagate, no logic baked into the framework).
SanitizingLoader wraps any BaseDocumentLoader + a Sanitizer, running every
yielded document through it.

Removals / cleanup

Deleted the legacy railtracks.vector_stores package (chroma, chunking/,
filter, vector_store_base) and its tests — fully superseded by
railtracks.retrieval.stores and railtracks.retrieval.chunking (~7.5k lines).
retrieval.__init__ now exports the public surface: RetrievalRuntime, the
ingestion event/stats types, Store, StoreEntry, StoreQuery, StoreScope,
VectorStore, EmbeddingFailure, EmbeddingModelMismatchError.

Type of change

Checklist

Lint & format pass (ruff check . && ruff format .)
Tests added/updated and pass locally (pytest tests)
Docs updated if user-facing behavior changed
Breaking changes include migration notes

Notes

Review callouts

Ingest upsert is not transactional (per-chunk writes); count-aware staleness is
what makes an interrupted ingest self-heal on the next run.
_captured_model is in-process only — model-mismatch enforcement doesn't
survive a fresh runtime over a pre-populated store until its first ingest.
pgvector list_where interpolates LIMIT {int(limit)} (int-cast, safe); all
filter values stay parameterized.

Known limitations / follow-ups

The max_tokens guard enforces a per-item token cap (drops oversized chunks
pre-embedding); it does not do batch-level token-budget packing. Batches are
still sized by count (default_batch_size), so a batch of in-spec chunks can
still exceed a provider's per-request token limit (e.g. OpenAI's 8191). Worth a
follow-up for token-aware batch packing.

… classes

…e, filter, and vector store base classes.

… corresponding tests

Amir-R25 · 2026-05-21T19:02:48Z

A few example scripts to play with. All three share the same wiring — only the
backend and whether you ingest differ.

In-memory ingest + retrieve

from rich import print

from railtracks.retrieval import RetrievalRuntime, VectorStore
from railtracks.retrieval.chunking import SentenceChunker
from railtracks.retrieval.loaders import TextLoader
from railtracks.retrieval.stores import InMemoryVectorBackend
from railtracks.retrieval.embedding import OpenAIEmbedding
from railtracks.retrieval.embedding.models import EmbeddingFailure
from railtracks.retrieval.runtime import BatchIngested, DocumentFailed, DocumentSkipped


async def main() -> None:
    docs_path = "path/to/directory"
    rr = RetrievalRuntime(
        chunker=SentenceChunker(chunk_size=5, overlap=2),
        embedder=OpenAIEmbedding(model="text-embedding-3-small"),
        store=VectorStore(InMemoryVectorBackend()),
        batch_size=64,
    )

    loader = TextLoader(str(docs_path))

    async for event in rr.ingest(loader):
        match event:
            case BatchIngested(document_id=did, embedded_chunks=ch, batch_index=i):
                print(f"  + doc={str(did)[:8]} batch={i} chunks={len(ch)}")
            case EmbeddingFailure(errors=errs):
                print(f"  ! embedding failed: {errs[0]}")
            case DocumentFailed(document_id=did):
                print(f"  ! doc {str(did)[:8]} partially failed")
            case DocumentSkipped(source=src):
                print(f"  ~ skipped (unchanged): {src}")

    result = await rr.retrieve("query text")
    print(f"\nQuery: {result.query}")
    for hit in result.chunks:
        snippet = hit.chunk.content.replace("\n", " ")
        print(f"  [score={hit.score:.3f}] {snippet}")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Persistent ingest + retrieve with Chroma

from rich import print

from railtracks.retrieval import RetrievalRuntime, VectorStore
from railtracks.retrieval.chunking import SentenceChunker
from railtracks.retrieval.loaders import TextLoader
from railtracks.retrieval.stores import ChromaBackend
from railtracks.retrieval.embedding import OpenAIEmbedding
from railtracks.retrieval.embedding.models import EmbeddingFailure
from railtracks.retrieval.runtime import BatchIngested, DocumentFailed, DocumentSkipped


async def main() -> None:
    docs_path = "path/to/directory"
    vsb = ChromaBackend("my_collection", path="retrieval-demos/stores")
    await vsb.initialize()

    rr = RetrievalRuntime(
        chunker=SentenceChunker(chunk_size=5, overlap=2),
        embedder=OpenAIEmbedding(model="text-embedding-3-small"),
        store=VectorStore(vsb),
        batch_size=64,
    )

    loader = TextLoader(str(docs_path))

    async for event in rr.ingest(loader):
        match event:
            case BatchIngested(document_id=did, embedded_chunks=ch, batch_index=i):
                print(f"  + doc={str(did)[:8]} batch={i} chunks={len(ch)}")
            case EmbeddingFailure(errors=errs):
                print(f"  ! embedding failed: {errs[0]}")
            case DocumentFailed(document_id=did):
                print(f"  ! doc {str(did)[:8]} partially failed")
            case DocumentSkipped(source=src):
                print(f"  ~ skipped (unchanged): {src}")

    result = await rr.retrieve("query text")
    print(f"\nQuery: {result.query}")
    for hit in result.chunks:
        snippet = hit.chunk.content.replace("\n", " ")
        print(f"  [score={hit.score:.3f}] {snippet}")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Retrieve against a previously-ingested Chroma store

Re-running ingest on unchanged sources is a no-op (see count-aware staleness), so
a retrieve-only script just opens the existing collection — no loader needed.

from rich import print

from railtracks.retrieval import RetrievalRuntime, VectorStore
from railtracks.retrieval.chunking import SentenceChunker
from railtracks.retrieval.stores import ChromaBackend
from railtracks.retrieval.embedding import OpenAIEmbedding


async def main() -> None:
    vsb = await ChromaBackend.create("my_collection", path="retrieval-demos/stores")

    rr = RetrievalRuntime(
        chunker=SentenceChunker(chunk_size=5, overlap=2),
        embedder=OpenAIEmbedding(model="text-embedding-3-small"),
        store=VectorStore(vsb),
        batch_size=64,
    )

    result = await rr.retrieve("query text")
    print(f"\nQuery: {result.query}")
    for hit in result.chunks:
        snippet = hit.chunk.content.replace("\n", " ")
        print(f"  [score={hit.score:.3f}] {snippet}")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Amir-R25 added 8 commits May 8, 2026 16:14

Refactor retrieval subsystem: update module docstring and add runtime…

9244f59

… classes

update from feature-branch-rag

c35dcc0

Merge branch 'feature-branch-rag' into rag/retrievalruntime

d94f959

Add RetrievalRuntime. Remove obsolete unit tests for ChromaVectorStor…

298f922

…e, filter, and vector store base classes.

Remove count method from Store protocol and its test implementation

ad2f8a4

deterministic Document ID generation from source and update tests

d247d57

implement count-aware staleness checks for document ingestion and add…

4b5762f

… corresponding tests

formatting and linting fixes

958404d

Amir-R25 marked this pull request as ready for review May 21, 2026 19:04

Amir-R25 requested a review from soulFood5632 as a code owner May 21, 2026 19:04

Amir-R25 requested a review from Pooria90 May 21, 2026 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RetrievalRuntime`: Streaming pipeline for ingestion and retrieval#1109

`RetrievalRuntime`: Streaming pipeline for ingestion and retrieval#1109
Amir-R25 wants to merge 8 commits into
feature-branch-ragfrom
rag/retrievalruntime

Amir-R25 commented May 20, 2026 •

edited

Loading

Uh oh!

Amir-R25 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Amir-R25 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

RetrievalRuntime

stores module

loaders module

Removals / cleanup

Type of change

Checklist

Notes

Uh oh!

Amir-R25 commented May 21, 2026

In-memory ingest + retrieve

Persistent ingest + retrieve with Chroma

Retrieve against a previously-ingested Chroma store

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Amir-R25 commented May 20, 2026 •

edited

Loading

`RetrievalRuntime`

`stores` module

`loaders` module