Skip to main content

BigQuery continuous incremental replication

After initial load has completed, you can use Incremental Batch Reader to read, at regular intervals, the new data created in your BigQuery dataset, and then write this new source data to the target, allowing for continuous updates in near real time.

Before creating the continuous incremental replication application, see Switching from initial load to continuous replication of BigQuery sources.

Incremental Batch Reader properties for BigQuery sources

property

type

default value

notes

Check Column

String

Specify the name of the column containing the start position value using the syntax <database name>.<table name>=<column name> . The column must have an integer or timestamp data type (such as the creation timestamp or an employee ID number) and the values must be unique and continuously increasing. Names are case-sensitive.

If you specify multiple tables in the Tables property, you may specify different check columns for the tables separated by semicolons. In this case, you may specify the check column for the remaining tables using wildcards: for example, MYSCHEMA.TABLE1=UUID; MYSCHEMA.%=LAST_UPDATED would use UUID as the start column for TABLE1 and LAST_UPDATED as the start column for the other tables.

When rows have the same Check Column value, they will be buffered in memory until the a row with a different Check Column value is read. To avoid memory usage spikes, do not set Check Column so that many rows will have the same value. For example, selecting a single Check Column containing the day's date (rather than a timestamp) might result in thousands or millions of rows with the same value.

Connection Pool Size

Integer

1

Specify the maximum number of connections that Striim will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that ThreadPool Size is set to the same or a higher number.

Connection URL

String

When connecting through an SSH tunnel (see Using an SSH tunnel to connect to a source or target), specify the IP address of the tunnel.

Specify as jdbc:googlebigquery:AuthScheme=OAuthJWT;OAuthJWTCert=<path to service account key>;InitiateOAuth=GETANDREFRESH;ProjectId=<BigQuery project ID> or use a connection profile (see Connection profiles).

If Striim Platform is running in a VM in Google Cloud Platform, you may use the following syntax to use the service account key associated with the VM: jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=<BigQuery project ID>.

Database Provider Type

String

Default

Set to BigQuery. Controls which icon appears in the Flow Designer.

Excluded Tables

String

Data for any tables specified here will not be returned. For example, if Tables uses a wildcard, data from any tables specified here will be omitted. Multiple table names (separated by semicolons) and wildcards may be used exactly as for Tables.

Fetch Size

Integer

100

Sets the maximum number of records to be fetched from the database in a single JDBC method execution (see the discussion of fetchsize in the documentation for your JDBC driver).

Password

encrypted password

The password for the specified user. See Encrypted passwords.

Parallel Threads

Integer

1

For Fast Snapshot Loading to Azure Synapse, BiqQuery, Databricks, Fabric Data Warehouse, Microsoft Dataverse, or Snowflake, specify the number of threads Database Reader will use. This value should not be higher than the number of tables to be read. For other targets, leave set to the default of 1.

When Database Reader is started, the tables will be distributed among the threads in round-robin fashion, then each thread will read one table at a time. When the Tables property is a list of tables, the tables are read in that order. If the Tables property uses a wildcard, the sequence is determined by the order in which the JDBC driver gives Database Reader the list of tables.

Known issue DEV-49013: the maximum number of tables supported by Parallel Threads is 255.

Polling Interval

String

120sec

This property controls how often the adapter reads from the source. By default, it checks the source for new data every two minutes (120 seconds). If there is new data, the adapter reads it and sends it to the adapter's output stream. The value may be specified in seconds (as in the default) or milliseconds (for example, 500ms).

Query

String

Optionally, specify a single SQL SELECT statement specifying the data to return. You may query tables, aliases, synonyms, and views.

Query is not supported when Create Schema is True, Parallel Threads is greater than 1, or Restart Behavior on IL Interruption is Truncate_target_table or Replace_target_table.

When Query is specified and Tables is not, the WAEvent TableName metadata field value will be QUERY. When both Query and Tables are specified, the data specified by Query will be returned, and the Tables setting will be used only to populate the TableName field.

If the query includes a synonym containing a period, it must be enclosed in escaped quotes. For example: select * from \"synonym.name\"

If using a query when the output of a DatabaseReader source is the input of a DatabaseWriter target, specify the target table name as the value of DatabaseReader's Tables field.

Return DateTime As

String

Joda

For BigQuery, this setting is ignored and timestamp values are always returned as Java DateTime.

SSL Config

String

Not applicable to BigQuery in this release.

Start Position

String

The value in the specified check column from which Striim will start reading. Striim will read rows in which the check column's value is the same as or greater or later than this value and skip the other rows. Since Check Column may specify multiple tables you must specify the corresponding table name or wildcard for each value. With the default value %=-1, Striim will read all data from all tables specified in the Tables property. With the Check Column example above, the Start Position value could be MYSCHEMA.TABLE1=1234; MYSCHEMA.%=2018-OCT-07 18:37:55. Note that this timestamp format is just an example: your source may require a different java.sql.timestamp format, such as yyyy-mm-dd hh:mm:ss, to match the check column's data type.

Tables

String

Specify the table(s) or view(s) to be read. Specify names as <database name>.<table name>. Names are case-sensitive unless BigQuery's is_case_insensitive option is set to TRUE.

You may specify multiple tables and views as a list separated by semicolons or with the % wildcard. For example, HR% would read all the tables whose names start with HR. You may use the % wildcard only for tables, not for schemas or databases. The wildcard is allowed only at the end of the string: for example, mydb.prefix% is valid, but mydb.%suffix is not.

If you are using the Query property, specify QUERY as the table name.

Modifying this property can interfere with recovery. If recovery is enabled for the application and it is necessary to modify the tables list, export the application (see Apps page), drop it, update the exported TQL with the new values, and import it (see Apps page).

Threadpool Size

Integer

5

Specify the maximum number of threads the adapter will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that Connection Pool Size is set to the same or a higher number.

Username

String

Specify the user name the adapter will use to log in to the server specified in ConnectionURL. This user must have SELECT permission on the tables specified in the Tables property.

Incremental Batch Reader’s output type is WAEvent. Output is the same as for Database Reader. See Sample Database Reader WAEvent for an example.