feat: add remote storage loaders and writers (S3, GCS, Azure Blob, SQL) by CoronRing · Pull Request #1102 · RailtownAI/railtracks

CoronRing · 2026-05-19T18:28:24Z

Summary

Adds S3Loader / S3Writer, GCSLoader / GCSWriter, AzureBlobLoader / AzureBlobWriter, and SQLLoader / SQLWriter under railtracks.loaders and railtracks.writers
All providers use optional extras (railtracks[aws], railtracks[gcp], railtracks[azure-blob], railtracks[sql]) so the core package stays lean
All loaders/writers expose sync and async interfaces (load/aload, write/awrite)
SQL classes include a context-manager (with SQLLoader(...) as l) and explicit close() for engine lifecycle management
SQL identifier arguments validated against a strict allowlist at construction time to prevent injection
Full unit test coverage across all 8 classes (127 tests, all passing)
Comprehensive developer docs under docs/integrations/storage/ with pip + uv install tabs, security callouts, and provider-specific auth guidance

Security hardening

SQL table/column names validated at __init__ time — raises ValueError on any metacharacter ([A-Za-z_][A-Za-z0-9_$]* allowlist, supports schema.table)
Helpful ValueError when content_column is missing from query results (was a bare KeyError)
__repr__ on all classes exposes only non-sensitive fields (bucket/container name); credentials never appear in repr
UserWarning emitted when prefix is passed to SQLLoader.load() or SQLWriter.write() (unsupported, silently ignored before)
All ImportError messages include both pip install and uv add forms

Limitations documented

CTE (WITH …) queries not supported as table_or_query; workaround shown in docs
aload/awrite are thread-backed (asyncio.to_thread) not true-async; noted in docs with guidance for high-concurrency cases
SQLWriter.write() is all-or-nothing (single transaction); partial-failure pattern documented

Test plan

127 unit tests passing across all 4 providers × loader + writer
SQL tests use real in-memory SQLite (no mocks for correctness)
Cloud tests (S3/GCS/Azure) use provider SDK mocks
Async variants covered for all classes

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pooria90

Good work @CoronRing

I went over the code. A few substantial changes needs to be applied:

We need to move loaders and writers modules to packages/railtracks/src/railtracks/retrieval/loaders as they are being built on top of Amir's work on loaders. I recommend having all classes for a specific provider under one single module: for example having loaders/s3.py and writers/s3.py all under one s3.py.
Or we can also have them under a loaders/cloud folder. We can discuss the structure.
The modules are returning Chunk which is located in our old vector_stores module. That module is deprecated now and will be removed from the framework. The new type that we use for our retrieval module is called Document which is located in packages/railtracks/src/railtracks/retrieval/models.py. Please refer to Amir's loaders for examples.
Please adjust the Base classes, data models, and the docs accordingly.

CoronRing requested review from Amir-R25 and soulFood5632 as code owners May 19, 2026 18:28

CoronRing changed the base branch from main to feature-branch-rag May 19, 2026 18:32

CoronRing force-pushed the guan/1090/remote_store branch from 4cdece1 to b354919 Compare May 19, 2026 18:36

Amir-R25 assigned Amir-R25 and Pooria90 May 21, 2026

Pooria90 reviewed May 21, 2026

View reviewed changes

Comment thread uv.lock

Pooria90 reviewed May 21, 2026

View reviewed changes

Comment thread pyproject.toml

Pooria90 reviewed May 21, 2026

View reviewed changes

Comment thread .gitignore Outdated

CoronRing force-pushed the guan/1090/remote_store branch 2 times, most recently from 5a4c7fe to 57be8ec Compare May 21, 2026 23:37

CoronRing and others added 4 commits May 21, 2026 16:44

feat: add remote storage loaders and writers (S3, GCS, Azure Blob, SQL)

8439560

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: ruff lint — remove unused imports in s3 loader and azure blob wr…

fc779cb

…iter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: remove unused asyncio import in sql loader

5818bea

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

REset git ignore and uv

e0e6f9e

CoronRing force-pushed the guan/1090/remote_store branch from 57be8ec to e0e6f9e Compare May 21, 2026 23:45

Adding RAG notebooks

2524581

Pooria90 requested changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add remote storage loaders and writers (S3, GCS, Azure Blob, SQL)#1102

feat: add remote storage loaders and writers (S3, GCS, Azure Blob, SQL)#1102
CoronRing wants to merge 5 commits into
feature-branch-ragfrom
guan/1090/remote_store

CoronRing commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pooria90 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CoronRing commented May 19, 2026

Summary

Security hardening

Limitations documented

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pooria90 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants