Move Oracle to Google BigQuery in Real Time

You are going to see how you can move data, in real-time, from Oracle on-Premise to Google BigQuery running in the Google Cloud Platform using Striim and change data capture.

You have data in lots of Oracle tables, on-premise. You want to move this into Google BigQuery for scalable low cost analytics, in real-time. How do you do this without affecting your production databases?

Unedited Transcript:

Today, you’re going to see how you can move data in real time from Oracle on-premise to Google Bigquery running in the cloud using Striim and change data capture. You have lots of data in Oracle tables on premise and you want to move them into Google BigQuery for scalable low-cost analytics in real time. How do you do this without affecting your production databases? You can’t use SQL queries because typically these would be queries against the timestamp or table scans that you do over and over again, and that puts a load on the Oracle database. You might also skip Important transactions. You need to change data capture, which enables nonintrusive collection of streaming database change. Striim provides change data capture right out of the box. This enables real-time collection of change data from Oracle, SQL Server and MySQL.

CDC works because databases write all operations that occur into transaction logs. Change data capture listens to those transaction logs and instead of using triggers or timestamps, directly reads these logs to collect operations. This means that every DML operation, every insert, update and delete is written to the lugs captured by change data capture and turned into events by our platform. The whole process of on-premise to cloud involves the following steps:

  • First you start change data capture.
  • Second, you do a full table load into Google BigQuery
  • Finally you turn on application of change to perform updates that happened during the load and ensure ongoing change is made in real time.

In this demo, you’ll see how you can utilize Striim and our CDC wizards to do real-time collection of change data capture from Oracle database and deliver that in real time into Google BigQuery. We also built a custom monitoring solution of the whole end-to-end data flow.

First, we connect to Google BigQuery in the Google cloud platform. In this instance, we have a table, POS Authorizations, that is currently completely empty. Using Striim, we create a new application choosing a template from our wizard collection. In this case, we’re going to use Oracle CDC to Google BigQuery. The wizard helps you through creating CDC. You first enter the required parameters and our platform checks that CDC is configured correctly. You choose the tables that you want and then either accept or make changes to the table mapping that is automatically performed, and finally you configure the BigQuery target again by entering the required and optional properties. The wizard generates a data flow from CDC to BigQuery, which we can deploy to our servers and use to preview the data that will flow. We can then start the application and run some SQL to make changes to the Oracle table.

This causes the data to start flowing, which you can see here. The BigQuery table is populated with the Oracle data, as we can see using this query. In addition to the data movement, we’ve also built a monitoring application complete with dashboard that shows data flowing through the various tables. The types of operations occurring, and the entire end to end transaction lag. This is the difference between when a transaction was committed on the source system and when it was captured and applied to the target. You can also see some of the most recent transactions. This was built again, using a data flow within the Striim platform. This data flow uses the original streaming change data from the Oracle database, then apply some processing in the form of SQL queries to generate statistics. In addition to generating data for this dashboard, you can also use this as rules to generate alerts for thresholds. The dashboard itself is not hard-coded. It’s generated using a dashboard builder, which utilizes queries to connect to the backend. Each visualization is powered by a query against the backend data, and there are lots of visualizations to choose from. We hope you’ve enjoyed seeing it’s a move on premise data into BigQuery in the cloud using Striim. If you’d like more in depth, look at this application, please request a demo or download our platform today.

 

Google Cloud Big Query Arch