Striim 3.10.1 documentation

BigQuery Writer properties

The adapter properties are:

property

type

default value

notes

Allow Quoted Newlines

Boolean

False

Set to True to allow quoted newlines in the delimited text files in which BigQueryWriter accumulates batched data.

Batch Policy

String

eventCount:1000000, Interval:90

The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to EventCount:1,Interval:0.

With the default setting, data will be written every 90 seconds or sooner if the buffer accumulates 1,000,000 events.

When Streaming Upload is False, use Interval:60 so as not to exceed the quota for 1500 a day. When Streaming Upload isTrue, use EventCount = 10000 since that is the quota for one batch. (Quotas are subject to change by Google.)

Do not exceed BigQuery's quotas or limits (see Load jobs in the "Quotas and limits" section of Google's BigQuery documentation). For example, if you exceed the quota of batches per table per day day, BigQueryWriter will throw an exception such as error code 500, "An internal error occurred and the request could not be completed," and stop the application. To avoid this, reduce the number of batches by increasing the event count and/or interval.

Column Delimiter

String

| (UTF-8 007C)

The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the | character, change the default value to a sequence of characters that will not appear in the data.

Connection Retry Policy

String

totalTimeout=600, initialRetryDelay=10, retryDelayMultiplier=2.0, maxRetryDelay=60 , maxAttempts=5, jittered=True, initialRpcTimeout=10, rpcTimeoutMultiplier=2.0, maxRpcTimeout=30

Do not change unless instructed to by Striim support.

Data Location

String

Specify the dataset's Data location property value if necessary (see Dataset Locations).

Encoding

String

UTF-8

Encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see Loading encoded data).

Ignorable Exception Code

String

Set to TABLE_NOT_FOUND if you do not want the application to crash when Striim tries to write to a table that does not exist in the target database. See Handling "table not found" errors for more information.

Include Insert ID

Boolean

True

When Streaming Upload is False, this setting is ignored.

When Mode is APPENDONLY and Streaming Upload is True, with the default setting of True, BigQuery will add a unique ID to every row. Set to False if you prefer that BigQuery not add unique IDs. For more information, see Ensuring data consistency and Disabling best effort de-duplication.

When Mode is MERGE, you may set this to False as Striim will de-duplicate the events before writing them to the target.

Mode

String

APPENDONLY

With the default value APPENDONLY:

  • Updates and deletes from DatabaseReader, IncrementalBatchReader, and SQL CDC sources are handled as inserts in the target.

  • Primary key updates result in two records in the target, one with the previous value and one with the new value. If the Tables setting has a ColumnMap that includes @METADATA(OperationName), the operation name for the first event will be DELETE and for the second INSERT.

  • Data should be available for querying immediately after it has been written, but itcopying and modification may not be possible for up to 90 minutes (see Checking for data availability).

Set to MERGE to handle updates and deletes as updates and deletes instead. When using MERGE:

  • Data will not be written to any target tables that have streaming buffers.

  • Since BigQuery does not have primary keys, you may include the keycolumns option in the Tables property to specify a column in the target table that will contain a unique identifier for each row: for example, Tables:'SCOTT.EMP,mydataset.employee keycolumns(emp_num)'.

  • You may use wildcards for the source table provided all the tables have the key columns: for example, Tables:'DEMO.%,mydataset.% KeyColumns(...)'.

  • If you do not specify keycolumns , Striim will concatenate all column values and use that as a unique identifier.

Null Marker

String

NULL

When Streaming Upload is False, a string inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string NULL, change this to a sequence of characters that will not appear in the data.

When Streaming Upload is True, this setting has no effect.

Optimized Merge

Boolean

false

Set to true only when the target's input stream is the output of an HP NonStop reader, MySQL Reader, or OracleReader source, and the source events will include partial records. For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates. When the source events will always include full records, leave this set to false.

Parallel Threads

Integer

See Creating multiple writer instances.

Project Id

String

Specify the project ID of the dataset's project.

Quote Character

String

" (UTF-8 0022)

The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain ", change the default value to a sequence of characters that will not appear in the data.

Service Account Key

String

The path (from root or the Striim program directory) and file name to the .json credentials file downloaded from Google (see BigQuery setup).

Standard SQL

Boolean

True

With the default setting of True, BigQueryWriter constrains timestamp values to standard SQL. Set to False to use legacy SQL. See Migrating to Standard SQL for more information.

Streaming Upload

Boolean

False

With the default value of False, the writer uses the load method. Set to True to use the streaming method. See discussion in BigQuery Writer..

Tables

String

The name(s) of the table(s) to write to, in the format <dataset>.<table>. The table(s) must exist when the application is started.

When the target's input stream is a user-defined event, specify a single table.

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use wildcards for the table names, but not for the schema or database. For example:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

Known issue DEV-21740: If the columns in the target are not in the same order as the source, writing will fail, even if the column names are the same. Workaround: Use ColumnMap to map at least one column.

See Mapping columns for additional options.

Transport Options

String

connectTimeout=300, readTimeout=120

Sets timeouts in BigQuery.