Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: R2 SQL now supports JOINs, subqueries, and multi-table queries
description: Join multiple Iceberg tables, use subqueries, and write multi-table CTEs in R2 SQL.
products:
- r2-sql
date: 2026-05-14
Comment thread
sejoker marked this conversation as resolved.
Outdated
---

[R2 SQL](/r2-sql/) is Cloudflare's serverless, distributed SQL engine for querying [Apache Iceberg](https://iceberg.apache.org/) tables stored in [R2 Data Catalog](/r2/data-catalog/). R2 SQL runs directly on Cloudflare's global network with no infrastructure to manage, so you can analyze data in R2 without exporting it to an external warehouse.

R2 SQL now supports joining multiple Iceberg tables in a single query. You can combine tables with JOINs, filter with subqueries, and define multi-table CTEs to build complex analytical queries.

## New capabilities

- **JOINs** — `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`, `CROSS JOIN`, and implicit joins (comma-separated `FROM` with conditions in `WHERE`)
- **Subqueries** — `IN` / `NOT IN`, `EXISTS` / `NOT EXISTS`, scalar subqueries in `SELECT` / `WHERE` / `HAVING`, and derived tables (subqueries in `FROM`)
- **Multi-table CTEs** — `WITH` clauses can reference different tables and include JOINs
- **Self-joins** — join a table with itself using different aliases
- **Multi-way joins** — join three or more tables in a single query

## Examples

### Two-table JOIN with aggregation

```sql
SELECT z.domain, z.plan, COUNT(*) AS request_count
FROM my_namespace.zones z
INNER JOIN my_namespace.http_requests h ON z.zone_id = h.zone_id
WHERE z.plan = 'enterprise'
GROUP BY z.domain, z.plan
ORDER BY request_count DESC
LIMIT 20
```

### `EXISTS` subquery

```sql
SELECT z.domain, z.plan
FROM my_namespace.zones z
WHERE EXISTS (
SELECT 1 FROM my_namespace.firewall_events f
WHERE f.zone_id = z.zone_id AND f.action = 'block'
)
ORDER BY z.domain
LIMIT 20
```

### Multi-table CTE with JOIN

```sql
WITH top_zones AS (
SELECT zone_id, COUNT(*) AS req_count
FROM my_namespace.http_requests
GROUP BY zone_id
ORDER BY req_count DESC
LIMIT 50
),
zone_threats AS (
SELECT zone_id, COUNT(*) AS threat_count
FROM my_namespace.firewall_events
WHERE risk_score > 0.5
GROUP BY zone_id
)
SELECT tz.zone_id, tz.req_count, COALESCE(zt.threat_count, 0) AS threat_count
FROM top_zones tz
LEFT JOIN zone_threats zt ON tz.zone_id = zt.zone_id
ORDER BY tz.req_count DESC
LIMIT 20
```

For the full syntax reference, refer to the [SQL reference](/r2-sql/sql-reference/). For performance guidance with joins, refer to [Limitations and best practices](/r2-sql/reference/limitations-best-practices/).
23 changes: 13 additions & 10 deletions src/content/docs/r2-sql/reference/limitations-best-practices.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,14 @@ This page summarizes supported features, limitations, and best practices.
| 33 aggregate functions | Yes | Basic, approximate, statistical, bitwise, boolean, positional |
| Approximate aggregates | Yes | `approx_distinct`, `approx_median`, `approx_percentile_cont`, `approx_top_k` |
| Struct / Array / Map column types | Yes | Bracket notation, `get_field()`, array functions, map functions |
| CTEs (`WITH ... AS`) | Yes | Single-table only. No JOINs or cross-table references within CTEs. |
| JOINs | No | Single-table only |
| Subqueries | No | |
| CTEs (`WITH ... AS`) | Yes | Can reference different tables and include JOINs |
| JOINs (INNER, LEFT, RIGHT, FULL OUTER, CROSS) | Yes | All standard join types |
| Implicit joins (comma FROM) | Yes | |
| Subqueries (`IN`, `NOT IN`) | Yes | Correlated and uncorrelated |
| Subqueries (`EXISTS`, `NOT EXISTS`) | Yes | semi-join and anti-join patterns |
| Scalar subqueries | Yes | In SELECT, WHERE, HAVING |
| Derived tables (FROM subqueries) | Yes | Can be nested and joined |
Comment thread
Marcinthecloud marked this conversation as resolved.
Outdated
| Self-joins | Yes | Same table with different aliases |
| Window functions (`OVER`) | No | |
| `SELECT DISTINCT` | No | Use `approx_distinct` |
| `OFFSET` | No | |
Expand All @@ -46,9 +51,6 @@ For the full SQL syntax, refer to the [SQL reference](/r2-sql/sql-reference/).

| Feature | Error |
| :---------------------------------------------------------------------------- | :------------------------------------------------------- |
| JOINs (any type) | `unsupported feature: JOIN operations are not supported` |
| Multi-table CTEs (JOINs or cross-table references within `WITH`) | Single-table CTEs are supported |
| Subqueries (FROM, WHERE, scalar) | `unsupported feature: subqueries` |
| `SELECT DISTINCT` | `unsupported feature: SELECT DISTINCT is not supported` |
| `OFFSET` | `unsupported feature: OFFSET clause is not supported` |
| `UNION` / `INTERSECT` / `EXCEPT` | Set operations not supported |
Expand All @@ -70,17 +72,14 @@ For the full SQL syntax, refer to the [SQL reference](/r2-sql/sql-reference/).
| `MEDIAN` | Use [`approx_median`](/r2-sql/sql-reference/aggregate-functions/#approx_median) |
| `ARRAY_AGG` | No alternative (unsupported for memory safety) |
| `STRING_AGG` | No alternative (unsupported for memory safety) |
| Scalar subqueries (`SELECT ... WHERE x = (SELECT ...)`) | Not supported |
| `EXISTS (SELECT ...)` | Not supported |
| `IN (SELECT ...)` | Use `IN (value1, value2, ...)` with a literal list |

---

## Runtime constraints

| Constraint | Details |
| :----------------------------------- | :---------------------------------------------------------------------------------------------------- |
| Single table per query | Queries must reference exactly one table. No JOINs, no subqueries. CTEs may reference a single table. |
| Multi-table queries | JOINs, subqueries (IN, EXISTS, scalar, derived tables), and multi-table CTEs are supported. Performance depends on intermediate result size; use WHERE filters to manage join selectivity. |
| Partitioned and unpartitioned tables | Both partitioned and unpartitioned Iceberg tables are supported. |
| Parquet format only | No CSV, JSON, or other formats. |
| Read-only | R2 SQL is a query engine, not a database. No writes. |
Expand All @@ -106,3 +105,7 @@ For the full SQL syntax, refer to the [SQL reference](/r2-sql/sql-reference/).
4. Use approximate aggregation functions (`approx_distinct`, `approx_median`, `approx_percentile_cont`) instead of exact alternatives on large datasets.
5. Enable compaction in R2 Data Catalog to reduce the number of files scanned per query.
6. Use `EXPLAIN` to inspect the execution plan and verify predicate pushdown.
7. Use `WHERE` filters with multi-way joins to reduce intermediate result sizes. Joining three or more large tables without filters can exceed resource limits.
8. Join large fact tables through dimension tables rather than directly joining two large fact tables. For example, join `http_requests` to `firewall_events` through a shared `zones` dimension rather than cross-joining both fact tables.
9. Be cautious with `COUNT(DISTINCT)` across multi-way joins. This combination can produce very large intermediate results. Consider using `approx_distinct()` or breaking the query into smaller steps.
10. Use explicit `JOIN` syntax instead of implicit joins (comma-separated `FROM`) for readability and to ensure the optimizer can choose optimal join ordering.
Loading