66
77Import and export data into/from [ Apache Iceberg] tables, for humans and machines.
88
9- Iceberg works with the concept of a [ FileIO] which is a pluggable module for
10- reading, writing, and deleting files. It supports different backends like
11- S3, HDFS, Azure Data Lake, Google Cloud Storage, Alibaba Cloud Object Storage,
12- and Hugging Face.
13-
149## Synopsis
1510
16- - Load from Iceberg table: ` ctk load table file+iceberg://... ` ,
17- ` ctk load table s3+iceberg://... `
11+ Load from Iceberg table:
12+ ``` shell
13+ ctk load table {file,s3,abfs,gs,hdfs}+iceberg://...
14+ ```
1815
19- - Export to Iceberg table: ` ctk save table file+iceberg://... `
16+ Export to Iceberg table:
17+ ``` shell
18+ ctk save table {file,s3,abfs,gs,hdfs}+iceberg://...
19+ ```
2020
2121## Install
2222
@@ -32,6 +32,14 @@ other operating systems.
3232
3333## Usage
3434
35+ Iceberg works with the concept of a [ FileIO] which is a pluggable module for
36+ reading, writing, and deleting files. It supports different backends like
37+ S3, HDFS, Azure Data Lake, Google Cloud Storage, Alibaba Cloud Object Storage,
38+ and Hugging Face.
39+
40+ Please look up available configuration parameters in the reference documentation,
41+ otherwise derive your ETL commands from the examples shared below.
42+
3543### Load
3644
3745Load from metadata file on filesystem.
@@ -48,18 +56,55 @@ ctk load table \
4856 --cluster-url=" crate://crate:crate@localhost:4200/demo/taxi-tiny"
4957```
5058
51- Load from REST catalog and AWS S3 storage.
59+ Use REST catalog and AWS S3 storage.
5260``` shell
5361ctk load table \
5462 " s3+iceberg://bucket1/?catalog-uri=http://iceberg-catalog.example.org:5000&catalog-token=foo&catalog=default&namespace=demo&table=taxi-tiny&s3.access-key-id=<your_access_key_id>&s3.secret-access-key=<your_secret_access_key>&s3.endpoint=<endpoint_url>&s3.region=<s3-region>" \
5563 --cluster-url=" crate://crate:crate@localhost:4200/demo/taxi-tiny"
5664```
5765
58- Query data in CrateDB.
66+ Use catalog in Apache Hive.
67+ ``` shell
68+ ctk load table " s3+iceberg://bucket1/?catalog-uri=thrift://localhost:9083/&catalog-credential=t-1234:secret&..."
69+ ```
70+
71+ Use catalog in AWS Glue.
72+ ``` shell
73+ ctk load table " s3+iceberg://bucket1/?catalog-type=glue&glue.id=foo&glue.profile-name=bar&glue.region=region&glue.access-key-id=key&glue.secret-access-key=secret&..."
74+ ```
75+
76+ Use catalog in Google BigQuery.
77+ ``` shell
78+ ctk load table " s3+iceberg://bucket1/?catalog-type=bigquery&gcp.bigquery.project-id=foo&..."
79+ ```
80+
81+ Use catalog in DynamoDB.
82+ ``` shell
83+ ctk load table " s3+iceberg://bucket1/?catalog-type=dynamodb&dynamodb.profile-name=foo&dynamodb.region=bar&dynamodb.access-key-id=key&dynamodb.secret-access-key=secret&..."
84+ ```
85+
86+ Load data from Azure Data Lake Storage.
87+ ``` shell
88+ ctk load table " abfs+iceberg://container/path/?adls.account-name=devstoreaccount1&adls.account-key=foo&..."
89+ ```
90+
91+ Load data from Google Cloud Storage.
92+ ``` shell
93+ ctk load table " gs+iceberg://bucket?gcs.project-id=..."
94+ ```
95+
96+ Load data from HDFS Storage.
97+ ``` shell
98+ ctk load table " hdfs+iceberg://path?hdfs.host=https://10.0.19.25/&hdfs.port=9000&hdfs.user=&hdfs.kerberos_ticket="
99+ ```
100+
101+ :::{tip}
102+ After loading your data into CrateDB, query it.
59103``` shell
60104ctk shell --command ' SELECT * FROM demo."taxi-tiny";'
61105ctk show table ' demo."taxi-tiny"'
62106```
107+ :::
63108
64109### Save
65110
@@ -77,6 +122,8 @@ ctk save table \
77122 " s3+iceberg://bucket1/?catalog=default&namespace=demo&table=taxi-tiny&s3.access-key-id=<your_access_key_id>&s3.secret-access-key=<your_secret_access_key>&s3.endpoint=<endpoint_url>&s3.region=<s3-region>"
78123```
79124
125+ For other target URLs, see "Source" section.
126+
80127### Cloud
81128
82129A canonical invocation for copying data from an Iceberg table on AWS S3 to CrateDB Cloud.
@@ -141,7 +188,16 @@ to a truthy value, save operations will append to an existing table.
141188ctk save table " file+iceberg://./var/lib/iceberg/?...&append=true"
142189```
143190
191+ #### PyIceberg
192+
193+ The PyIceberg I/O adapters accept a plethora of options that can be used 1:1.
194+ For a list of all available options, please consult the [ FileIO] documentation.
195+ For I/O adapters not part of the documentation yet, please consult the source
196+ code about [ catalog options] and [ storage options] .
197+
144198
145199[ Apache Iceberg ] : https://iceberg.apache.org/
200+ [ catalog options ] : https://github.com/apache/iceberg-python/tree/main/pyiceberg/catalog
146201[ FileIO ] : https://py.iceberg.apache.org/configuration/#fileio
202+ [ storage options ] : https://github.com/apache/iceberg-python/tree/main/pyiceberg/io
147203[ uv ] : https://docs.astral.sh/uv/
0 commit comments