Tutorial

Replicate data from PostgreSQL to Snowflake in real time with Change Data Capture

Stream data from PostgreSQL to Snowflake

Benefits

Operational Analytics
Visualize real time data with Refresh View on Snowflake

Capture Data Updates in real time
Use Striim’s PostgreSQL CDC reader for real-time data replication

Build Real-Time Analytical Models
Use dbt to build Real-Time analytical and ML models 
On this page

Overview

Striim is a unified data streaming and integration product that offers change capture (CDC) enabling continuous replication from popular databases such as Oracle, SQLServer, PostgreSQL and many others to target data warehouses like BigQuery and Snowflake.

Change Data Capture is a critical process desired by many companies to stay up to date with most recent data. This enables efficient real-time decision making which is important for stakeholders. Striim platform facilitates simple to use, real-time data
integration, replication, and analytics with cloud scale and security.

In this tutorial, we will walk you through a use case where data is replicated from PostgreSQL to Snowflake in real time. Change events are extracted from a PostgreSQL database as they are created and then streamed to Snowflake hosted on Microsoft Azure. Follow this recipe to learn how to secure your data pipeline by creating an SSH tunnel on Striim cloud through a jump host.

Core Striim Components

PostgreSQL CDC: PostgreSQL Reader uses the wal2json plugin to read PostgreSQL change data. 1.x releases of wal2jon can not read transactions larger than 1 GB.

Stream: A stream passes one component’s output to one or more other components. For example, a simple flow that only writes to a file might have this sequence

Snowflake Writer: Striim’s Snowflake Writer writes to one or more existing tables in Snowflake. Events are staged to local storage, Azure Storage, or AWS S3, then written to Snowflake as per the Upload Policy setting.

Step 1: Launch Striim Server and connect the Postgres instance with replication attribute

Please refer to postgres CDC to BigQuery recipe to learn how to create a replication user and replication slot for the postgres database and tables. Striim collaborates with Snowflake to provide Striim’s cloud service through Snowflake’s partner connect.

Follow the steps below to connect Striim server to postgres instance containing the source database:

Launch Striim in Snowflake Partner Connect by clicking on “Partner Connect” in the top right corner of the navigation bar.

In the next window, you can launch Striim and sign up for a free trial.


Create your first Striim Service to move data to Snowflake.

Launch the new service and use app wizard to stream data from PostgresCDC to Snowflake and Select Source and Target under create app from wizard:

Give a name to your app and establish the connection between striim server and postgres instance .

Step 1 :

  • Hostname: IP address of postgres instance
  • Port : For postgres, port is 5432
  • Username & Password: User with replication attribute that has access to source database
  • Database Name: Source Database

Step 2 :The wizard will check and validate the connection between source to striim server

Step 3 :Select the schema that will be replicated

Step 4 :The selected schema is validated

Step 5 :Select the tables to be streamed

Step 2: Configure the target (Snowflake on Azure Cloud)

Once the connection with the source database and tables is established, we will configure the target where the data is replicated to.

The connection url has the following format:
jdbc:snowflake://YOUR_HOST-2.azure.snowflakecomputing.com:***?warehouse=warehouse_name &db=RETAILCDC&schema=public

Step 3: Deploy and Run the Striim app for Fast Data Streaming

After the source and targets are configured and connection is established successfully, the app is ready to stream change data capture on the source table and replicate it onto the target snowflake table. When there is an
update on the source table, the updated data is streamed through striim app to the target table on snowflake.

Step 4: Refresh View on Snowflake

With Striim as the data streaming platform, real-time analytics can be done on target databases. In snowflake, you can write a query for refresh view of the incoming data in real-time. In this tutorial, a view is created that aggregates the total number of orders in each state at any given time.

Video Walkthrough

Here is the video showing all the steps in streaming Change Data from postgres to Snowflake and refresh view of updated data on snowflake.

Setting Up Postgres to Snowflake Streaming Application

Step 1: Download the data and Sample TQL file from our github repo

You can download the TQL files for streaming app our github repository. Deploy the Striim app on your Striim server.

Step 2: Configure your Postgres source and Snowflake target and add it to the source and target components of the app

Set up your source and target and add the details in striim app

Step 3: Run the app for fast Data Streaming

Deploy your streaming app and run it for real-time data replication

Step 4: Set up refresh view in Snowflake

Follow the recipe to write a query for refresh view of real-time data

Wrapping Up: Start Your Free Trial

Our tutorial showed you how easy it is to stream data from PostgreSQL CDC to Snowflake, a leading cloud data warehouse and do real-time analytics with a refresh view. By constantly moving your data into BigQuery, you could now start building analytics or machine learning models on top, all with minimal impact to your current systems. You could also start ingesting and normalizing more datasets with Striim to fully take advantage of your data.

As always, feel free to reach out to our integration experts to schedule a demo, or try Striim for free here.

Tools you need

Striim

Striim’s unified data integration and streaming platform connects clouds, data and applications.

PostgreSQL

PostgreSQL is an open-source relational database management system.

Snowflake

Snowflake is a cloud-native relational data warehouse that offers flexible and scalable architecture for storage, compute and cloud services.