Prerequisites for creating an Iceberg Writer application
Before creating an Iceberg Writer target in a Striim application:
Perform the tasks described in Iceberg Writer initial setup.
Choose which writing mode to use: Append Only or Merge. We recommend Append Only for initial load. Note that if you choose Merge mode, the Parallel Threads property will be ignored (see Creating multiple writer instances (parallel threads)).
Create a connection profile for Google Dataproc. See Introducing connection profiles and Setting Google Dataproc connection profile properties.
Create a connection profile for Google Cloud Storage (GCS). See Introducing connection profiles and Setting Google Cloud Storage (GCS) connection profile properties. If the external stage will be in a different GCS instnace than the data lake, create a second connection profile for that instance.
If you are not using the Hadoop catalog in GCS (not recommended for production environments), create a connection profile for the Iceberg catalog. See Introducing connection profiles and Catalog connection profile properties.
Choose which writing mode to use
Append Only (default)
In Append Only mode, inserts, updates, and deletes from a Database Reader, Incremental Batch Reader, or SQL CDC source are all handled as inserts in the target. This allows you to query past data in Iceberg that no longer exists in the source database(s), for example, for month-over-month or year-over-year reports.
Primary key updates result in two records in the target, one with the previous value and one with the new value. If the Tables setting has a ColumnMap that includes @METADATA(OperationName)
, the operation name for the first event will be DELETE and for the second INSERT.
To use this mode, set Mode to APPENDONLY
.
Merge
In Merge mode, inserts, updates, and deletes from Database Reader, Incremental Batch Reader, and SQL CDC sources are handled as inserts, updates, and deletes in the target. The data in Iceberg thus duplicates the data in the source database(s).
To use this mode, et Mode to Merge
. If Iceberg Writer's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source and the source events will include partial records, also set Optimized Merge to True
.