Striim 3.10.3 documentation

BigQuery Writer

BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse offered by Google. Striim’s BigQueryWriter writes the data from various supported sources into Google’s BigQuery data warehouse to support real time data warehousing and reporting.

BigQuery Writer can be used to move data from transactional databases such as Oracle, Amazon RDS, Azure SQL DB, PostgreSQL, Microsoft SQL Server, MySQL, Google Spanner, and other supported databases into BigQuery with low latency and in real time. Striim supports complete table loads, change feeds (CDC) , and incremental batch feeds into BigQuery. BigQuery Writer properties can be configured to support authentication, object mappings, batching, performance, and failure handling.

BigQuery Writer uses the google-cloud-bigquery client for Java API version 1.110.0.

You have a choice of two methods, using different parts of this API, for BigQuery Writer to use to write to its target tables. You cannot switch between these methods while an application is running.

  • Streaming (new in Striim 3.10.1): Incoming data is buffered locally as one memory buffer per target table. Once the upload condition is met, BigQuery Writer uses InsertAllResponse to stream the content of each memory buffer into its target table. We recommend using this method when you need low latency.

  • Load: Incoming data is buffered locally as one CSV file per target table. Once the upload condition for a file is met, BigQuery Writer uses TableDataWriteChannel to upload the content of the file to BigQuery, which writes it to the target table. This was the only method supported by BigQuery Writer in Striim 3.9.x and earlier releases. This method may be a good fit if your uploads are infrequent (for example, once an hour).

If BigQuery Writer applications you created with 3.9.8 or earlier are working well, you may keep using the load method. On the other hand, if you are spending a lot of time tuning those applications' batch policies or are running up against BigQuery's quotas, the streaming method may work better for you.

The most typical workflow when using BigQuery Writer is for streaming integration. Briefly, it works like this:

  1. Create tables in the target system corresponding to those in the source. You may do this using Striim's schema conversion utility for this task (Using the schema conversion utility or any other tool you prefer.

  2. Create a Striim initial load application using Database Reader and BigQuery Writer. This application will use the load method and, to avoid potential BigQuery quota limitations, will use large batches. Run it once to load existing data to BigQuery.

  3. Create a Striim streaming integration application using the appropriate CDC reader and BigQuery Writer. This application will use the streaming method to minimize the time it takes for new data to arrive at the target. Streaming also avoids the quota limitation issues of the load method. Typically you will want to enable recovery for this application (see Recovering applications) and perhaps deploy it on a multi-server cluster configured for failover to ensure high availability (see Continuing operation after server failover). Run this application continuously to stream new data to BigQuery (in other words, to synchronize the source and target.