Skip to main content

BigQuery continuous incremental replication

After initial load has completed, you can use Incremental Batch Reader to read, at regular intervals, the new data created in your BigQuery dataset, and then write this new source data to the target, allowing for continuous updates in near real time.

Before creating the continuous incremental replication application, see Switching from initial load to continuous replication of BigQuery sources.

Incremental Batch Reader properties for BigQuery sources

property	type	default value	notes
Check Column	String		Specify the name of the column containing the start position value using the syntax `<database name>.<table name>=<column name>` . The column must have an integer or timestamp data type (such as the creation timestamp or an employee ID number) and the values must be unique and continuously increasing. Names are case-sensitive. If you specify multiple tables in the Tables property, you may specify different check columns for the tables separated by semicolons. In this case, you may specify the check column for the remaining tables using wildcards: for example, `MYSCHEMA.TABLE1=UUID; MYSCHEMA.%=LAST_UPDATED` would use UUID as the start column for TABLE1 and LAST_UPDATED as the start column for the other tables. When rows have the same Check Column value, they will be buffered in memory until the a row with a different Check Column value is read. To avoid memory usage spikes, do not set Check Column so that many rows will have the same value. For example, selecting a single Check Column containing the day's date (rather than a timestamp) might result in thousands or millions of rows with the same value.
Connection Pool Size	Integer	1	Specify the maximum number of connections that Striim will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that ThreadPool Size is set to the same or a higher number.
Connection URL	String		When connecting through an SSH tunnel (see Using an SSH tunnel to connect to a source or target), specify the IP address of the tunnel. Specify as `jdbc:googlebigquery:AuthScheme=OAuthJWT;OAuthJWTCert=<path to service account key>;InitiateOAuth=GETANDREFRESH;ProjectId=<BigQuery project ID>` or use a connection profile (see Connection profiles). If Striim Platform is running in a VM in Google Cloud Platform, you may use the following syntax to use the service account key associated with the VM: `jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=<BigQuery project ID>`.
Database Provider Type	String	Default	Set to BigQuery. Controls which icon appears in the Flow Designer.
Excluded Tables	String		Data for any tables specified here will not be returned. For example, if `Tables` uses a wildcard, data from any tables specified here will be omitted. Multiple table names (separated by semicolons) and wildcards may be used exactly as for `Tables`.
Fetch Size	Integer	100	Sets the maximum number of records to be fetched from the database in a single JDBC method execution (see the discussion of fetchsize in the documentation for your JDBC driver).
Password	encrypted password		The password for the specified user. See Encrypted passwords.
Parallel Threads	Integer	1	For Fast Snapshot Loading to Azure Synapse, BiqQuery, Databricks, Fabric Data Warehouse, Microsoft Dataverse, or Snowflake, specify the number of threads Database Reader will use. This value should not be higher than the number of tables to be read. For other targets, leave set to the default of 1. When Database Reader is started, the tables will be distributed among the threads in round-robin fashion, then each thread will read one table at a time. When the Tables property is a list of tables, the tables are read in that order. If the Tables property uses a wildcard, the sequence is determined by the order in which the JDBC driver gives Database Reader the list of tables. Known issue DEV-49013: the maximum number of tables supported by Parallel Threads is 255.
Polling Interval	String	120sec	This property controls how often the adapter reads from the source. By default, it checks the source for new data every two minutes (120 seconds). If there is new data, the adapter reads it and sends it to the adapter's output stream. The value may be specified in seconds (as in the default) or milliseconds (for example, 500ms).
Query	String		Optionally, specify a single SQL SELECT statement specifying the data to return. You may query tables, aliases, synonyms, and views. Query is not supported when Create Schema is True, Parallel Threads is greater than 1, or Restart Behavior on IL Interruption is Truncate_target_table or Replace_target_table. When `Query` is specified and `Tables` is not, the WAEvent TableName metadata field value will be QUERY. When both `Query` and `Tables` are specified, the data specified by `Query` will be returned, and the `Tables` setting will be used only to populate the TableName field. If the query includes a synonym containing a period, it must be enclosed in escaped quotes. For example: `select * from \"synonym.name\"` If using a query when the output of a DatabaseReader source is the input of a DatabaseWriter target, specify the target table name as the value of DatabaseReader's Tables field.
Return DateTime As	String	Joda	For BigQuery, this setting is ignored and timestamp values are always returned as Java DateTime.
SSL Config	String		Not applicable to BigQuery in this release.
Start Position	String		The value in the specified check column from which Striim will start reading. Striim will read rows in which the check column's value is the same as or greater or later than this value and skip the other rows. Since Check Column may specify multiple tables you must specify the corresponding table name or wildcard for each value. With the default value `%=-1`, Striim will read all data from all tables specified in the Tables property. With the Check Column example above, the Start Position value could be `MYSCHEMA.TABLE1=1234; MYSCHEMA.%=2018-OCT-07 18:37:55`. Note that this timestamp format is just an example: your source may require a different java.sql.timestamp format, such as `yyyy-mm-dd hh:mm:ss`, to match the check column's data type.
Tables	String		Specify the table(s) or view(s) to be read. Specify names as `<database name>.<table name>`. Names are case-sensitive unless BigQuery's is_case_insensitive option is set to TRUE. You may specify multiple tables and views as a list separated by semicolons or with the `%` wildcard. For example, `HR%` would read all the tables whose names start with HR. You may use the `%` wildcard only for tables, not for schemas or databases. The wildcard is allowed only at the end of the string: for example, `mydb.prefix%` is valid, but `mydb.%suffix` is not. If you are using the Query property, specify QUERY as the table name. Modifying this property can interfere with recovery. If recovery is enabled for the application and it is necessary to modify the tables list, export the application (see Apps page), drop it, update the exported TQL with the new values, and import it (see Apps page).
Threadpool Size	Integer	5	Specify the maximum number of threads the adapter will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that Connection Pool Size is set to the same or a higher number.
Username	String		Specify the user name the adapter will use to log in to the server specified in ConnectionURL. This user must have SELECT permission on the tables specified in the Tables property.

Incremental Batch Reader’s output type is WAEvent. Output is the same as for Database Reader. See Sample Database Reader WAEvent for an example.

Would you like to provide feedback? Just click here to suggest edits.