Striim 3.9.7 documentation

BigQuery Writer

Writes to a table in Google BigQuery.

property

type

default value

notes

Allow Quoted Newlines

java.lang.Boolean

False

Set to True to allow quoted newlines in the delimited text files in which BigQueryWriter accumulates batched data.

Batch Policy

java.lang.String

eventCount:1000000, Interval:90

The batch policy includes eventcount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to EventCount:1,Interval:0.

With the default setting, data will be written every 90 seconds or sooner if the buffer accumulates 1,000,000 events.

Do not exceed BigQuery's quota policy. For example, if you exceed 1000 batches per table in one day, BigQueryWriter will throw an exception and stop the application.

Column Delimiter

java.lang.String

| (UTF-8 007C)

The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the | character, change the default value to a sequence of characters that will not appear in the data.

Connection Retry Policy

java.lang.String

retryInterval=30, maxRetries=3

The connection retry policy includes retryInterval and maxRetries. With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds (retryInterval. If the second attempt is unsuccessful, in 30 seconds it will try a third time (maxRetries). If that is unsuccessful, the adapter will fail and log an exception. Negative values are not supported.

Data Location

java.lang.String

Specify the dataset's Data location property value if necessary (see Dataset Locations).

Encoding

java.lang.String

UTF-8

Encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see Loading encoded data).

Mode

java.lang.String

APPENDONLY

With the default value APPENDONLY, updates and deletes in DatabaseReader, IncrementalBatchReader, and SQL CDC sources are handled as inserts in BigQuery. Set to MERGE to handle them as updates and deletes instead.

Since BigQuery does not have primary keys, when using MERGE mode you must use the keycolumns option in the Tables property to specify a column in the target table that will contain a unique identifier for each row: for example, Tables:'SCOTT.EMP,mydataset.employee keycolumns(emp_num)'. You may use wildcards for the source table provided all the tables have the key columns: for example, Tables:'DEMO.%,mydataset.% KeyColumns(...)'.

Null Marker

java.lang.String

NULL

A string inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string NULL, change this to a sequence of characters that will not appear in the data.

Optimized Merge

java.lang.Boolean

false

When the input stream of the target is the output of an OracleReader source:

  • with the default value of false, Oracle supplemental logging must be enabled on all columns

  • when set to true, supplemental logging is required only for primary key columns

See Enabling supplemental log data.

Project Id

java.lang.String

Specify the project ID of the dataset's project.

Quote Character

java.lang.String

" (UTF-8 0022)

The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain ", change the default value to a sequence of characters that will not appear in the data.

Service Account Key

java.lang.String

The path (from root or the Striim program directory) and file name to the .json credentials file downloaded from Google (see Service Accounts). This file must be copied to the same location on each Striim server that will run this adapter, or to a network location accessible by all servers.

The associated service account must have the BigQuery userjobUser, or higher role for the target tables (see Access Control).

Standard SQL

java.lang.Boolean

True

With the default setting of True, BigQueryWriter constrains timestamp values to standard SQL. Set to False to use legacy SQL. See Migrating to Standard SQL for more information.

Tables

java.lang.String

The name(s) of the table(s) to write to, in the format <dataset>.<table>. The table(s) must exist when the application is started.

When the target's input stream is a user-defined event, specify a single table.

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use wildcards for the table names, but not for the schema or database. For example:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

See Mapping columns for additional options.

The following describes use of BigQueryWriter with an input stream of a user-defined type. For an example of input from a CDC or DatabaseReader source, see Replicating Oracle data to Google BigQuery.

The sample application below assumes that you have created the following table in BigQuery:

Screen Shot 2016-09-25 at 5.34.11 PM.png
CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
    positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;
 
CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;

CREATE TARGET BigQueryTarget USING BigQueryWriter(
  ServiceAccountKey:"/<path>/<file name>.json",
  projectId:"myprojectid",
  Tables: "mydataset.mytable"
)
INPUT FROM PosSource_TransformedStream;

After running this application, BigQuery, run select * from mydataset.mytable; and you will see the data from the file. Since the default timeout is 90 seconds, it may take that long after the application completes before you see all 1160 records in BigQuery.

BigQueryWriter data type support and correspondence

Use following when the input stream is of a user-defined type. When the input is the output of a CDC or DatabaseReader source, see Replicating Oracle data to Google BigQuery.

TQL type

BigQuery type

notes on BigQuery types

Boolean

BOOLEAN

  • CSV format: true or false (case-insensitive) or 1 or 0

  • JSON format: true or false (case-insensitive

byte[]

BYTES

  • Imported BYTES data must be base64-encoded, except for Avro BYTES data, which BigQuery can read and convert.

  • BYTES data read from a BigQuery table are base64-encoded, unless you export to Avro format, in which case the Avro bytes data type applies.

DateTime

TIMESTAMP

stored internally as a UNIX timestamp with microsecond precision

Double, Float

FLOAT

double-precision

Long

INTEGER

64-bit signed

String

STRING