RailtownAI · CoronRing · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 21, 2026
@@ -0,0 +1,125 @@
+# Azure Blob Storage
+
+`AzureBlobLoader` fetches blobs from an Azure Blob Storage container and returns them as
+[`Chunk`](../../rag/vector_stores/vector_store_info.md) objects containing
+UTF-8 decoded content plus source metadata (`source`, `account_url`, `container`,
+`blob_name`).
+
+## Installation
+
+=== "pip"
+
+    ```bash
+    pip install railtracks[azure-blob]
+    ```
+
+=== "uv"
+
+    ```bash
+    uv add railtracks[azure-blob]
+    ```
+
+## Authentication
+
+Authentication defaults to **`DefaultAzureCredential`**, which automatically resolves
+credentials from the following sources (in order):
+
+1. Environment variables (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`)
+2. Workload identity (Kubernetes)
+3. Managed identity (Azure-hosted compute)
+4. Azure CLI (`az login`)
+5. Azure PowerShell / Visual Studio / IntelliJ
+
+Pass an explicit `credential` to override.
+
+!!! tip "Prefer managed identity over connection strings"
+    Managed identity is the recommended authentication method for Azure-hosted
+    workloads — it requires no secrets and rotates automatically.  Avoid
+    embedding storage account keys or SAS tokens in source code; store them
+    in Azure Key Vault or environment variables instead.
+
+## Basic usage
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_basic"
+```
+
+## Load by prefix
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_prefix"
+```
+
+## Load specific blobs
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_load_keys"
+```
+
+## Async usage
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_async"
+```
+
+!!! note "Async is thread-backed"
+    `aload()` and `aload_keys()` run the synchronous `azure-storage-blob`
+    client on a thread-pool thread via `asyncio.to_thread()`.  This is correct
+    for most workloads; for very high concurrency consider the async Azure SDK
+    (`azure.storage.blob.aio`).
+
+## Override credentials
+
+**SAS token**
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_sas"
+```
+
+**System-assigned or user-assigned managed identity**
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:azure_managed_identity"
+```
+
+## Chunk metadata
+
+Each returned `Chunk` carries:
+
+| Key | Value |
+|---|---|
+| `source` | Full blob URL: `https://<account>.blob.core.windows.net/<container>/<blob>` |
+| `account_url` | Storage account URL |
+| `container` | Container name |
+| `blob_name` | Blob name (path within the container) |
+
+## Full RAG pipeline example
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:pipeline_azure_to_rag"
+```
+
+---
+
+## Writing to Azure Blob Storage
+
+`AzureBlobWriter` uploads text content to a blob container.  Existing blobs at
+the same name are overwritten.
+
+### Basic write
+
+```python
+--8<-- "docs/scripts/storage_writers.py:azure_write_basic"
+```
+
+### SAS token credential
+
+```python
+--8<-- "docs/scripts/storage_writers.py:azure_write_sas"
+```
+
+### Async write
+
+```python
+--8<-- "docs/scripts/storage_writers.py:azure_write_async"
+```
@@ -0,0 +1,110 @@
+# Google Cloud Storage
+
+`GCSLoader` fetches objects from a GCS bucket and returns them as
+[`Chunk`](../../rag/vector_stores/vector_store_info.md) objects containing
+UTF-8 decoded content plus source metadata (`source`, `bucket`, `name`).
+
+## Installation
+
+=== "pip"
+
+    ```bash
+    pip install railtracks[gcp]
+    ```
+
+=== "uv"
+
+    ```bash
+    uv add railtracks[gcp]
+    ```
+
+## Authentication
+
+Authentication uses **Application Default Credentials (ADC)** by default:
+
+1. `GOOGLE_APPLICATION_CREDENTIALS` environment variable (path to a service-account JSON)
+2. `gcloud auth application-default login` (developer workstation)
+3. Workload Identity / attached service account (GCE, GKE, Cloud Run, Cloud Functions …)
+
+Pass explicit `credentials` to override ADC.
+
+!!! tip "Prefer Workload Identity over service-account key files"
+    Service-account JSON key files are long-lived credentials that require
+    manual rotation.  On GCP-hosted compute, Workload Identity or attached
+    service accounts are more secure and require zero key management.
+
+## Basic usage
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:gcs_basic"
+```
+
+## Load by prefix
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:gcs_prefix"
+```
+
+## Load specific objects
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:gcs_load_keys"
+```
+
+## Async usage
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:gcs_async"
+```
+
+!!! note "Async is thread-backed"
+    `aload()` and `aload_keys()` run the synchronous `google-cloud-storage`
+    client on a thread-pool thread via `asyncio.to_thread()`.  This is correct
+    for most workloads.
+
+## Override credentials (service account key file)
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:gcs_service_account"
+```
+
+## Chunk metadata
+
+Each returned `Chunk` carries:
+
+| Key | Value |
+|---|---|
+| `source` | `gs://<bucket>/<name>` |
+| `bucket` | GCS bucket name |
+| `name` | Object name (path within the bucket) |
+
+## Full RAG pipeline example
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:pipeline_gcs_to_rag"
+```
+
+---
+
+## Writing to GCS
+
+`GCSWriter` uploads text content to a GCS bucket.  Existing objects at the
+same name are overwritten.
+
+### Basic write
+
+```python
+--8<-- "docs/scripts/storage_writers.py:gcs_write_basic"
+```
+
+### Service account credentials
+
+```python
+--8<-- "docs/scripts/storage_writers.py:gcs_write_service_account"
+```
+
+### Async write
+
+```python
+--8<-- "docs/scripts/storage_writers.py:gcs_write_async"
+```
@@ -0,0 +1,136 @@
+# Cloud Storage & Database Loaders / Writers
+
+Railtracks ships first-class **loaders** and **writers** for popular cloud
+storage providers and relational databases.
+
+- **Loaders** fetch documents and return them as
+  [`Chunk`](../../rag/vector_stores/vector_store_info.md) objects — pipe remote
+  data straight into a vector store or agent without any glue code.
+- **Writers** persist `Chunk` objects (or raw text) back to the same providers —
+  close the loop by saving AI-generated content to storage.
+
+## Supported providers
+
+| Provider | Loader | Writer | Install extra |
+|---|---|---|---|
+| AWS S3 | `S3Loader` | `S3Writer` | `railtracks[aws]` |
+| Azure Blob Storage | `AzureBlobLoader` | `AzureBlobWriter` | `railtracks[azure-blob]` |
+| Google Cloud Storage | `GCSLoader` | `GCSWriter` | `railtracks[gcp]` |
+| SQL (PostgreSQL, Supabase, MySQL, SQLite …) | `SQLLoader` | `SQLWriter` | `railtracks[sql]` |
+
+Install any combination:
+
+=== "pip"
+
+    ```bash
+    pip install "railtracks[aws,gcp,azure-blob,sql]"
+    ```
+
+=== "uv"
+
+    ```bash
+    uv add "railtracks[aws,gcp,azure-blob,sql]"
+    ```
+
+## Loading — quick examples
+
+=== "AWS S3"
+
+    ```python
+    --8<-- "docs/scripts/storage_loaders.py:s3_basic"
+    ```
+
+=== "Azure Blob"
+
+    ```python
+    --8<-- "docs/scripts/storage_loaders.py:azure_basic"
+    ```
+
+=== "Google Cloud Storage"
+
+    ```python
+    --8<-- "docs/scripts/storage_loaders.py:gcs_basic"
+    ```
+
+=== "SQL / Database"
+
+    ```python
+    --8<-- "docs/scripts/storage_loaders.py:sql_basic_postgres"
+    ```
+
+## Writing — quick examples
+
+=== "AWS S3"
+
+    ```python
+    --8<-- "docs/scripts/storage_writers.py:s3_write_basic"
+    ```
+
+=== "Azure Blob"
+
+    ```python
+    --8<-- "docs/scripts/storage_writers.py:azure_write_basic"
+    ```
+
+=== "Google Cloud Storage"
+
+    ```python
+    --8<-- "docs/scripts/storage_writers.py:gcs_write_basic"
+    ```
+
+=== "SQL / Database"
+
+    ```python
+    --8<-- "docs/scripts/storage_writers.py:sql_write_basic"
+    ```
+
+## Feeding chunks into a RAG pipeline
+
+All loaders return the same `Chunk` type that `ChromaVectorStore.upsert()` accepts,
+making it trivial to build a full load → index → retrieve → answer pipeline:
+
+```python
+--8<-- "docs/scripts/storage_loaders.py:pipeline_s3_to_rag"
+```
+
+## Load → Generate → Write back
+
+Writers make it easy to persist AI-generated content alongside source data:
+
+```python
+--8<-- "docs/scripts/storage_writers.py:pipeline_generate_and_write"
+```
+
+## Async support
+
+Every loader and writer exposes async variants (`aload`, `aload_keys`, `awrite`,
+`awrite_key`) that are safe to use in `async` agent pipelines:
+
+```python
+chunks = await loader.aload(prefix="reports/2024/")
+uris   = await writer.awrite(chunks, prefix="summaries/")
+```
+
+The async methods delegate to `asyncio.to_thread()`, so they are non-blocking
+from the caller's perspective while the underlying SDK call runs on a thread-pool
+thread.
+
+## Key derivation for writers
+
+When writing `Chunk` objects, the storage key (S3 key, GCS object name, blob
+name, SQL id) is derived in this order:
+
+1. Return value of `key_fn(chunk)` — if `key_fn` is provided
+2. `chunk.id` — if set
+3. `chunk.document` — if set
+4. A freshly generated UUID4 — as a last resort
+
+Pass `key_fn` to take full control of the naming scheme:
+
+```python
+writer = S3Writer("my-bucket", key_fn=lambda c: f"docs/{c.id}.txt")
+```
+
+!!! tip "Next steps"
+    - [AWS S3](s3.md) · [Azure Blob Storage](azure_blob.md) · [Google Cloud Storage](gcs.md) · [SQL](sql.md)
+    - [Cloud Storage Loaders Tutorial](../../tutorials/walkthroughs/storage_loaders_tutorial.md)