Oracle to Google BigQuery – Continuous Movement of On-Premises Data via CDC

3 Minute Read

 

In this demo, we will show how to set up continuous data movement from Oracle to Google BigQuery in real time.

BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. More and more companies are moving to Google BigQuery so they can perform super-fast SQL queries. Having the capability to move real-time data via change data capture (CDC) to Google BigQuery is essential.

We’re going to look at how you can utilize the Striim platform to move data from Oracle to Google BigQuery using CDC, as well as how to build dashboards to visualize the data.

Let’s say you have data in Oracle databases on-premises, and you want to move this into Google BigQuery for scalable low-cost analytics, in real time. How do you do this without affecting your production databases?

You can’t use SQL queries because typically these would be queries against a timestamp – or table scans that you do over and over again – and that puts a load on the database. You might also skip important transactions. You need change data capture, which enables non-intrusive collection of streaming database changes. Striim provides change data capture out-of-the-box. This enables real-time collection of change data from Oracle, SQL Server, MySQL, HPE NonStop and MariaDB.

CDC works because databases write all the operations that occur into transaction logs. Change data capture listens to those transaction logs, and instead of using triggers or timestamps, directly reads the logs to collect operations. This means that every DML operation – every insert, update, and delete – is written to the logs captured by change data capture and turned into events by our platform.

The whole process of on-premises to cloud data movement involves the following steps. First, you start change data capture. Second, you do a full table load from Oracle to Google BigQuery. Finally, you apply changes to perform updates that happened during the load, and ensure ongoing change is made in real time.

In the below video, you’ll see how you can utilize Striim, and our CDC wizards, to do real-time collection of change data from Oracle database, and deliver that in real time to Google BigQuery. We also build a custom monitoring solution of the whole end-to-end data flow.

 

First, we connect to Google BigQuery in the Google Cloud Platform. In this instance, we have a table, POS Authorizations, that is currently completely empty. Using Striim, we create a new application, choosing a template from our wizard collection. In this case, we’re using the wizard for Oracle CDC to Google BigQuery.

The wizard steps you through creating CDC. You first enter the required parameters, and our platform checks that CDC is configured correctly. You choose the tables that you want, and then either accept, or make changes to, the table mapping that is automatically performed. Finally, you configure the BigQuery target by entering the required and optional properties.

The wizard generates a data flow via CDC from Oracle to Google BigQuery, which we can deploy to our servers, and use to preview the data that will flow. We can then start the application and run some SQL to make changes to the Oracle table.

This causes the data to start flowing. The BigQuery table is populated with the Oracle data.

In addition to the data movement, we’ve also built a monitoring application complete with dashboard that shows data flowing through the various tables, the types of operations occurring, and the entire end-to-end transaction lag. This is the difference between when a transaction is captured on the source system, and when it is collected and applied to the target. You can also see some of the most recent transactions.

This was built, again, using a data flow within the Striim platform. This data flow uses the original streaming change data from the Oracle database, and then applies some processing in the form of SQL queries to generate statistics. In addition to generating data for the dashboard, you can also use this as rules to generate alerts for thresholds.

The dashboard itself is not hard coded. It’s generated using a dashboard builder which utilizes queries to connect to the backend. Each visualization is powered by a query against the backend data. There are lots of visualizations to choose from.

To learn more about how you can set up CDC for Oracle to Google BigQuery within your architecture, please visit our Striim for Google Cloud Platform page, schedule a demo with a Striim expert, or download or provision the Striim platform to get started!