Striim is a unified data streaming and integration product that offers change capture (CDC) enabling continuous replication from popular databases such as Oracle, SQLServer, PostgreSQL and many others to target data warehouses like BigQuery and Snowflake.
dbt Cloud is a hosted service that helps data analysts and engineers productionize dbt deployments. It is a popular technique among analysts and engineers to transform data into usable formats and also ensuring if source data freshness is meeting the SLAs defined for the project. Striim collaborates with dbt for effective monitoring and transformation of the in-flight data. For example, if the expectation is that data should be flowing every minute based on timestamps, then dbt will check that property and make sure the time between last check and latest check is only 1 minute apart.
In this tutorial, we have shown how Striim cloud service API is pinged to transform and monitor updated data from PostgreSQL before storing into BigQuery tables. In this example, Striim service API is hit via python connecting Striim app and dbt to make the two work together.
Please follow the steps in the PostgreSQLCDC to BigQuery recipe to create an app that streams data from PostgreSQLDatabase to BigQuery with change data capture. For this recipe, we will host our app on Striim Cloud but there is always a free trial to see the power of Striim’s Change Data Capture. Here are the snapshots for fields in source, CQ and target.
To create and run data streaming applications from UI, click on the Endpoint of strim-node as shown below. This will redirect you to Striim User Interface.
Once the dbt project is set up, the python script that hits the Striim Cloud Service url to get the metadata from striim server acts as a trigger to run dbt transformation and monitoring. The python script for our project can be found here.
To hit the dbt cloud API, the following commands are used.The account id and job id can be retrieved from dbt cloud url. The authorization token can be found under API access on the left navigation bar.
The following snapshots are from the dbt run that shows the inserts and source data freshness.
To ensure you’re meeting data freshness SLAs for all your business stakeholders, you can monitor Source Freshness in dbt cloud.
Follow this document to enable source freshness of the real time data flowing from PostgreSQL through Striim to BigQuery. The source freshness snapshots can be checked under view data source.
Our tutorial showed you how a striim app can run with dbt, an open source data transformation and monitoring tool. With this feature you can monitor your data without interrupting the real-time streaming through Striim. dbt can be used with popular adapter plugins like PostgreSQL, Redshift, Snowflake and BigQuery, all of which are supported by Striim. With Striim’s integration with major databases and data warehouses and powerful CDC capabilities, data streaming and analytics becomes very fast and efficient.