Skip to content

I/O: Adapter for Apache Iceberg#444

Merged
amotl merged 30 commits intomainfrom
iceberg
Mar 4, 2026
Merged

I/O: Adapter for Apache Iceberg#444
amotl merged 30 commits intomainfrom
iceberg

Conversation

@amotl
Copy link
Copy Markdown
Member

@amotl amotl commented Jun 4, 2025

About

Import from and export to Apache Iceberg tables, for interoperability purposes.

Documentation

https://cratedb-toolkit--444.org.readthedocs.build/io/iceberg/

References

Review

Please don't hesitate to review in retrospective when applicable.
Any feedback is much appreciated and well received.

/cc @karynzv, @hammerhead, @zolbatar, @grbade, @WalBeh

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jun 4, 2025

Walkthrough

Adds Apache Iceberg I/O: new iceberg and util modules, CLI split into load/save groups, cluster-level save_table routing and iceberg-backed load/save implementations, tests, docs, and optional dependency updates for Iceberg support.

Changes

Cohort / File(s) Summary
Top-level CLI
cratedb_toolkit/cli.py
Register two IO command groups: load wired to io_cli_load and new save wired to io_cli_save.
IO CLI
cratedb_toolkit/io/cli.py
Split CLI entry: clicli_load; add cli_save group and save table command; load_table uses cli_load; new save_table implements export options and normalizes transformation paths.
Iceberg IO module
cratedb_toolkit/io/iceberg.py
New module with IcebergAddress, URL/catalog parsing, from_iceberg() (Iceberg→CrateDB) and to_iceberg() (CrateDB→Iceberg), chunked transfers, and catalog/storage property handling.
IO utilities
cratedb_toolkit/io/util.py
New helpers: parse_uri, polars_to_cratedb, pandas_from_cratedb, read_cratedb for URI parsing and streaming data between CrateDB and Polars/Pandas.
Cluster core
cratedb_toolkit/cluster/core.py
Add save_table() to ManagedCluster and StandaloneCluster; extend load_table() to route iceberg schemes to iceberg IO; Standalone supports to_iceberg, Managed save remains NotImplemented.
Tests
tests/io/test_iceberg.py
Add end-to-end Iceberg tests and example_iceberg fixture covering CLI load/save flows, metadata discovery, schema/rowcount assertions, and error cases for missing namespace/name.
Dependencies & tooling
pyproject.toml
Add optional iceberg extras (pyiceberg[...]), add io-opentable extra referencing cratedb-toolkit[iceberg], remove pandas from io extras, update docs autobuild watch path.
Docs / Backlog
doc/io/iceberg/index.md, doc/backlog/io.md, doc/backlog/index.md, doc/io/index.md
Add user-facing Iceberg I/O documentation, backlog notes, and register pages in toctree.
Changelog
CHANGES.md
Add unreleased note about new Apache Iceberg I/O adapter.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant IO_CLI_Save as IO CLI (cli_save)
    participant Cluster as Cluster.save_table()
    participant CrateDB as CrateDB
    participant Iceberg as Iceberg (to_iceberg)

    User->>IO_CLI_Save: run `save table` (source=crate://schema.table, target=iceberg://...)
    IO_CLI_Save->>Cluster: instantiate cluster & call save_table(target)
    Cluster->>CrateDB: query & stream row chunks
    CrateDB-->>Cluster: return chunk
    Cluster->>Iceberg: to_iceberg(target_url, chunk)
    Iceberg->>Iceberg: ensure catalog/namespace and write chunk
    Iceberg-->>Cluster: ack
    Cluster-->>IO_CLI_Save: return result
    IO_CLI_Save-->>User: exit
Loading
sequenceDiagram
    participant User as User/CLI
    participant IO_CLI_Load as IO CLI (cli_load)
    participant Cluster as Cluster.load_table()
    participant Iceberg as Iceberg (from_iceberg)
    participant CrateDB as CrateDB

    User->>IO_CLI_Load: run `load table` (source=iceberg://..., target=crate://schema.table)
    IO_CLI_Load->>Cluster: instantiate cluster & call load_table(source, target)
    Cluster->>Iceberg: from_iceberg(source_url, target_url)
    Iceberg->>Iceberg: parse URL, load catalog/table, stream Polars chunks
    Iceberg->>CrateDB: insert_bulk per chunk
    CrateDB-->>Iceberg: ack inserted
    Iceberg-->>Cluster: completion
    Cluster->>CrateDB: refresh target table
    Cluster-->>IO_CLI_Load: return result
    IO_CLI_Load-->>User: exit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • hammerhead

Poem

🐰 I hopped along an iceberg trail so bright,
Split load and save with a twitch of delight,
Catalogs hummed while Polars streamed the rows,
Tables found new homes where the cold wind blows,
A rabbit cheers — two commands and data flows!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an I/O adapter for Apache Iceberg tables.
Docstring Coverage ✅ Passed Docstring coverage is 90.32% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly describes the implementation of Apache Iceberg import/export support, directly aligned with the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch iceberg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

amotl

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

amotl

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@amotl amotl requested review from matriv and seut February 19, 2026 13:09
@amotl amotl marked this pull request as ready for review February 19, 2026 13:16
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@amotl amotl merged commit f304428 into main Mar 4, 2026
25 checks passed
@amotl amotl deleted the iceberg branch March 4, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant