July 30, 2021 · 3 minute read

Striim is a real-time data integration platform that integrates with over 100 data sources and targets, including Google BigQuery. In this post, we’ll share how Striim’s new partition pruning feature can help BigQuery users optimize their query costs and performance.

How Striim Writes Data to Data Warehouses

Striim Data warehouse writer architecture

Striim ingests data from a range of data sources including databases, files/logs, messaging systems, IoT devices, and data warehouses. BigQuery is a popular data target used by companies to efficiently analyze large datasets. Striim moves data into BigQuery in 3 steps when using the MERGE mode (which allows users to combine updates, inserts, deletes into a single statement):

  1. Upload from source table to stage-1 table 
  2. Compact by moving only the most recent snapshot of each record to the stage-2 table
  3. Merge from stage-2 table to the target table in BigQuery

In step 3 (Merge), the merge query has to scan the entire table to search for matching records and update (or delete) them, or insert a new record if one doesn’t already exist. However, scanning through an entire table is both costly and time-consuming.

What are Partitioned Tables?

Partitioned tables are tables that have been divided into segments. Table partitioning enables users to optimize their costs (and improve query performance).

The example below depicts an integer-range partitioned table where the target table is partitioned according to User ID values (integers). Partition 1 contains the records with User IDs 1-5, partition 2 contains User IDs 6-10, and so on.

Example of a partitioned table
Example of a target table that’s partitioned by User ID. Since the updates only apply to records in partitions 2 and 3, only those partitions need to be queried.

In this example, a user wants to update 3 customer records with their first names. Partitioned tables allow the user to indicate the specific partition(s) to search (also known as partition pruning). Without table partitioning, the user would need to search the full table to locate and update the corresponding records. With table partitioning, the records that need to be updated with user names are located in partitions 2 and 3, so only those partitions need to be searched.

How BigQuery Partitioned Tables Reduce Query Costs (and Improve Query Performance) in BigQuery

In BigQuery, tables can be partitioned by the following types of columns:

  • By TIMESTAMP, DATE, or DATETIME columns
  • Integer-type columns (as shown in the example above)
  • Ingestion time (virtual column) based on a timestamp from when BigQuery ingests data

BigQuery officially supports partition pruning when using a MERGE query with a filter in the merge_condition.

If you use a query to filter based on partition column value, BigQuery will only scan the partitions that match the filter values.

In the example below, the destination table is partitioned based on the “ID” column (the partitioning column). The Merge query on the right includes a filter based on a range of the “ID” column values. The Merge query on the left does not include a partition filter, which means that the full table will be scanned (resulting in 61.12 MB processed vs 4.01 MB processed).

 

How partitioning a table in BigQuery reduces query costs and time
Querying a table with and without using a filter in the MERGE query.

 

Even better, since partition pruning is done before running the query, users can calculate their query costs ahead of time. This is especially relevant for users who take advantage of BigQuery’s on-demand pricing.

New: Striim’s BigQuery Writer Now Supports BigQuery Partition Pruning

Striim recently added support for partition pruning in BigQuery. Want to try this new feature? Make sure that a) the partitioning column in your source database has supplemental logging enabled and b) your target table in BigQuery is partitioned.

With this newly added functionality, Striim’s BigQuery Writer will detect the partitioned column and automatically optimize the merge without any user input. The optimized merge query will scan only the required partitions, resulting in faster query execution and lower costs.

Ready to see how Striim can help you make the most of BigQuery? Request a personalized demo of the Striim platform or start your free trial today!