Tutorial

Stream Data from PostgreSQL to Google BigQuery with Striim Cloud - Part 1

Use Striim Cloud to stream data securely from PostgreSQL database into Google BigQuery

Benefits

Operational Analytics
Analyze real-time operational, transactional data from PostgreSQL in BigQuery

Secure Data Transfer
Secure data-at-rest and in-flight with simple SSH tunnel configuration and automatic encryption

Build Real-Time Analytical Models
Use the power of Real Time Data Streaming to build Real-Time analytical and ML models 
On this page

Overview

Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling continuous replication from popular databases
such as Oracle, SQLServer, PostgreSQL and many others.

In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to
enrich and normalize data before it’s written to targets like BigQuery and Snowflake.

Traditionally Data warehouses that required data to be transferred use batch processing but with Striim’s streaming platform data can be replicated in real-time efficiently.

Securing the in-flight data is very important in any real world application. A jump host creates an encrypted public connection into a secure environment.

In this tutorial, we’ll walk you through how to create a secure SSH tunnel between Striim cloud and your on-premise/cloud databases with an example where data is streamed securely from PostgreSQL database into Google
BigQuery
through SSH tunneling.

Core Striim Components

PostgreSQL Reader: PostgreSQL Reader uses the wal2json plugin to read PostgreSQL change data. 1.x releases of wal2jon can not read transactions larger than 1 GB.

Stream: A stream passes one component’s output to one or more other components. For example, a simple flow that only writes to a file might have this sequence

BigQueryWriter: Striim’s BigQueryWriter writes the data from various supported sources into Google’s BigQuery data warehouse to support real time data warehousing and reporting.

(Optional) Step 1: Secure connectivity to your database

Striim provides multiple methods for secure connectivity to your database. For Striim Developer, you can allowlist the IP address in your email invite. 

In Striim Cloud, you can either allow your service IP or follow the below steps to configure a jump server.

Follow the steps below to set up your jump server on Google Compute Engine:

Go to google cloud console -> Compute Engine -> VM instances and create a new VM instance that would act as the jump server.

 

Add the jump server’s IP address to authorized networks of source database, in this case postgres instance

Step 2: Create SSH tunnel on Striim Cloud

Once the jump host is set up, an SSH tunnel will be created from Striim Cloud UI to establish a connection to the source database through the jump server.

Follow the steps below to configure an SSH tunnel between source database and Striim cloud:

Go to striim cloud console and launch a service instance. Under security create a new SSH tunnel and configure it as below.

Go to jump server (VM instance) and add the service key copied from the above step’

Now Striim server is integrated with both postgres and Bigquery and we are ready to configure Striim app for data migration.

Step 3: Launch Striim Server and Connect the Postgres Instance

For this recipe, we will host our app in Striim Cloud but there is always a free trial to see the power of Striim’s Change Data Capture.

Follow the steps below to connect Striim server to postgres instance containing the source database:

Click on Apps to display the app management screen:

Click on Create app :

 

Select Source and Target under create app from wizard:

Give a name to your app and establish the connection between striim server and postgres instance.



Once the connection between posgres and striim server is established, we will link the target data warehouse, in this case Google Bigquery. Striim also offers schema conversion feature where table schema can be validated for both source and target that
helps in migration of source schema to the target database.


Step 4: Targeting Google Bigquery

You have to make sure the instance of Bigquery mirrors the tables in the source database.This can be done from Google Cloud Console interface.Under the project inside Google Bigquery, create the dataset and an emty table containing all the columns that is populated
with data migrated from postgres database.

Follow the steps below to create a new dataset in Bigquery and integrating with Striim app using a service account:

Create a dataset with tables mirroring the source schema.

Go back to app wizard and enter the service key of your BigQuery instance to connect the app with target data warehouse

Now Striim server is integrated with both postgres and Bigquery and we are ready to configure Striim app for data migration.

Step 5: Configure Striim app using UI

With source, target and Striim server integrated for data migration, a few configuration on the easy to understand app UI is made before deploying and running the app.

  • Click on source and add the connection url (tunnel address), user name and password in proper format:
  • Click on target and add the input stream, source and target tables and upload the service key for Bigquery instance

    Now the app is good to go for deployment and data migration.

Step 6: Deploy and Run the Striim app for Fast Data Migration

In this section you will deploy and run the final app to visualize the power of Change Data Capture in Striim’s next generation technology.

Setting up the Postgres to BigQuery Streaming App

Step 1: Follow the recipe to configure your jump server and SSh tunnel to Striim cloud.

Step 2: Set up your Postgres Source and BigQuery Target.

Step 3: Create Postgres to BQ Striim app using wizard as shown in the recipe

Step 4: Migrate your data from source to Target by deploying your striim app

Wrapping Up: Start Your Free Trial

Our tutorial showed you how easy it is to migrate data from PostgreSQL to Google BigQuery, a leading cloud data warehouse. By constantly moving your data into BigQuery, you could now start building analytics or machine
learning models on top, all with
minimal impact to your current systems. You could also start ingesting and normalizing more datasets with Striim to fully take advantage of your data when combined with the power of BigQuery.

As always, feel free to reach out to our integration experts to schedule a demo, or try Striim for free here.

Tools you need

Striim

Striim’s unified data integration and streaming platform connects clouds, data and applications.

PostgreSQL

PostgreSQL is an open-source relational database management system.

Google BigQuery

BigQuery is a serverless, highly scalable multicloud data warehouse.

Google Compute Engine

Google Compute Engine offers a scalable number of Virtual Machines. In this use case a VM instance will serve as jump host for SSH tunneling