Stream, Transform, and Ensure Fresh Data Delivery with Striim and dbt

Use Striim to stream data from PostgreSQL to BigQuery then kick off transform jobs in dbt


Manage Scalable Applications
Integrate Striim with dbt to transform and monitor real time data SLAs
Capture Data Updates in real time
Use Striim’s postgrescdc reader for real time data updates
Build Real-Time Analytical Models
Use dbt to build Real-Time analytical and ML models 
Tools you need
Striim’s unified data integration and streaming platform connects clouds, data and applications.
Google BigQuery
BigQuery is a serverless, highly scalable multicloud data warehouse.
PostgreSQL is an open-source relational database management system.
dbt cloud
dbt cloud is a hosted service for dbt deployment in production. It is used to monitor and transform data in the warehouse.

Striim is a unified data streaming and integration product that offers change capture (CDC) enabling continuous replication from popular databases such as Oracle, SQLServer, PostgreSQL and many others to target data warehouses like BigQuery and Snowflake.

dbt Cloud is a hosted service that helps data analysts and engineers productionize dbt deployments. It is a popular technique among analysts and engineers to transform data into usable formats and also ensuring if source data freshness is meeting the SLAs defined for the project. Striim collaborates with dbt for effective monitoring and transformation of the in-flight data. For example, if the expectation is that data should be flowing every minute based on timestamps, then dbt will check that property and make sure the time between last check and latest check is only 1 minute apart.

In this tutorial, we have shown how Striim cloud service API is pinged to transform and monitor updated data from PostgreSQL before storing into BigQuery tables. In this example, Striim service API is hit via python connecting Striim app and dbt to make the two work together.

Step 1: Launch Striim Server and create the PostgreSQL to BigQuery app

Please follow the steps in the PostgreSQLCDC to BigQuery recipe to create an app that streams data from PostgreSQLDatabase to BigQuery with change data capture. For this recipe, we will host our app on Striim Cloud but there is always a free trial to see the power of Striim’s Change Data Capture. Here are the snapshots for fields in source, CQ and target.




Step 2: Set up dbt Cloud

For this use case, we created a deployment environment on a cloud managed repository in dbt cloud.

The dbt_project.yml, model yaml files and sql staging files were configured as follows:

Step 3: Get Striim Cloud Service API to interact with the data stream

To create and run data streaming applications from UI, click on the Endpoint of strim-node as shown below. This will redirect you to Striim User Interface.

  • Go to the API section under the Striim cloud service and copy the REST API URL that will be used in our python script to ping Striim Cloud Services.

  • In the python script, enter the REST API URL as connection url and source name in payload.

Step 4: Run the Python script

Once the dbt project is set up, the python script that hits the Striim Cloud Service url to get the metadata from striim server acts as a trigger to run dbt transformation and monitoring. The python script for our project can be found here.

To hit the dbt cloud API, the following commands are used.The account id and job id can be retrieved from dbt cloud url. The authorization token can be found under API access on the left navigation bar.

The following snapshots are from the dbt run that shows the inserts and source data freshness.

Enabling Source Freshness

To ensure you’re meeting data freshness SLAs for all your business stakeholders, you can monitor Source Freshness in dbt cloud.

Follow this document to enable source freshness of the real time data flowing from PostgreSQL through Striim to BigQuery. The source freshness snapshots can be checked under view data source.

Wrapping Up: Start Your Free Trial

Our tutorial showed you how a striim app can run with dbt, an open source data transformation and monitoring tool. With this feature you can monitor your data without interrupting the real-time streaming through Striim. dbt can be used with popular adapter plugins like PostgreSQL, Redshift, Snowflake and BigQuery, all of which are supported by Striim. With Striim’s integration with major databases and data warehouses and powerful CDC capabilities, data streaming and analytics becomes very fast and efficient.

As always, feel free to reach out to our integration experts to schedule a demo, or try Striim for free here.

Getting started is easy. Sign up for a free trial or talk to a cloud integration expert.