Create an Iceberg Writer application

In this release, Iceberg Writer supports only Google Cloud Platform.

In brief summary, the following are the steps to create a pipeline application (see Pipelines) with an Iceberg Writer target.

Create Google Cloud Storage (GCS) buckets to host the Iceberg data lake and staging area.
Create a new Google Dataproc (Spark) cluster to provide Iceberg compute resources.
Select and if necessary configure an Iceberg catalog host such as BigQuery Metastore or Nessie.
Create a Google storage account key with the necessary permissions for the above resources.
Create connection profiles for the data lake, compute resources, and (if necessary) external stage and catalog.
Choose which writing mode to use: Append Only (Iceberg tables contain a history of all source events, including inserts, updates, and deletes) or Merge (keeps the Iceberg tables in sync with the source tables).
Create an application using a wizard, the Flow Designer, or TQL.

Search results