Oracle to Snowflake – Migrate data to Snowflake with Change Data Capture

Overview

In this article we will go over implementing a data pipeline that migrates data with Oracle change data capture to Snowflake.

1. Introduction
2. What is Striim?
3. What is Change Data Capture?
4. Tutorial: Migrating data from Oracle to Snowflake with Striim on Partner Connect
5. Conclusion: Get Started

With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service – Harsha Kapre, Director of Product Management at Snowflake

Migrate data from Oracle to Snowflake with Striim’s Free Trial on Partner Connect

At Striim, we value building real-time data integration solutions for cloud data warehouses. Snowflake has become a leading Cloud Data Platform by making it easy to address some of the key challenges in modern data management such as

  • Building a 360 view of the customer
  • Combining historical and real-time data
  • Handling large scale IoT device data
  • Aggregating data for machine learning purposes

It only took you minutes to get up and running with Snowflake. So, it should be just as easy to move your data into Snowflake with an intuitive cloud-based data integration service.

A Quick Tutorial

We’ll dive into a tutorial on how you can use Striim on Partner Connect to create schemas and move data into Snowflake in minutes. We’ll cover the following in the tutorial:

  • Launch Striim’s cloud service directly from the Snowflake UI
  • Migrate data and schemas from Oracle to Snowflake
  • Perform initial load: move millions of rows in minutes all during a free trial
  • Kick off a real-time replication pipeline using change data capture from Oracle to Snowflake
  • Monitoring your data integration pipelines with real-time dashboards and rule-based alerts

But first a little background!

What is Striim?

At a high level, Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling real-time data integration from popular databases such as Oracle, SQLServer, PostgreSQL and many others.

In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to enrich and normalize data before it’s written to Snowflake.

Our focus on usability and scalability has driven adoption from customers like Attentia, Belgium-based HR and well-being company, and Inspyrus, a Silicon Valley-based invoice processing company, that chose Striim for data integration to Snowflake.

What is Change Data Capture?

While many products focus on batch data integration, Striim specializes in helping you build continuous, real-time database replication pipelines using change data capture (CDC).This keeps the target system in sync with the source database to address real-time requirements.

Before we dive into an example pipeline, we’ll briefly go over the concept of Change Data Capture (CDC). CDC is the process of tailing the database’s change logs, turning database events such as inserts, updates, deletes, and relevant DDL statements into a stream of immutable events, and applying those changes to a target database or data warehouse.

Change data capture is also a useful software abstraction for other software applications such as version control and event sourcing.

Striim brings decades of experience delivering change data capture products that work in mission-critical environments. The founding team at Striim was the executive (and technical) team at GoldenGate Software (now Oracle GoldenGate). Now Striim is offering CDC as an easy-to-use, cloud-based product for data integration.

Migrating data to Snowflake in minutes with Striim’s cloud service

Let’s dive into how you can start moving data into Snowflake in minutes using our platform. In a few simple steps, this example shows how you can move transactional data from Oracle to Snowflake.

Here are the simple high level steps to move data from Oracle to Snowflake:

  1. Connect to Oracle database
  2. Connect to your Snowflake environment – this step is done automatically in Striim for Snowflake Partner Connect
  3. Map data types from Oracle to Snowflake – this step is also done automatically in Striim’s wizards
  4. Start data migration with Oracle change data capture to Snowflake
  5. Monitor and validate your data pipeline from Oracle to Snowflake

Let’s get started:

1. Launch Striim in Snowflake Partner Connect

In your Snowflake UI, navigate to “Partner Connect” by clicking the link in the top right corner of the navigation bar. There you can find and launch Striim.

2. Sign Up For a Striim Free Trial

Striim’s free trial gives you seven calendar days of the full product offering to get started. But we’ll get you up and running with schema migration and database replication in a matter of minutes.

3. Create your first Striim Service.

A Striim Service is an encapsulated SaaS application that dedicates the software and fully managed compute resources you need to accomplish a specific workload; in this case we’re creating a service to help you move data to Snowflake! We’re also available to assist with you via chat in the bottom right corner of your screen.

 

4. Start moving data with Striim’s step-by-step wizards.

In this case, the Oracle to Snowflake wizard is selected. As you can see, Striim supports data integration for a wide-range of database sources – all available in the free trial.

 

5. Select your schemas and tables from your source database

6. Start migrating your schemas and data

After select your tables, simply click ‘Next’ and your data migration pipeline will begin!

snowflake pc

7. Monitor your data pipelines in the Flow Designer

As your data starts moving, you’ll have a full view into the amount of data being ingested and written into Snowflake including the distribution of inserts, updates, deletes, primary key changes and more.

 

For a deeper drill down, our application monitor gives even more insights into low-level compute metrics that impact your integration latency.

Real-Time Database Replication with Oracle Change Data Capture to Snowflake

Striim makes it easy to sync your schema migration and CDC applications.

While Striim makes it just as easy to build these pipelines, there are some prerequisites to configuring CDC from most databases that are outside the scope of Striim.

To perform change data capture an administrator with the necessary privileges must create a user for use by the adapter and assign it the necessary privileges:

If using Oracle 11g ,or 12c, 18c, or 19c without CDB, enter the following commands

create role striim_privs;
grant create session,
  execute_catalog_role,
  select any transaction,
  select any dictionary
  to striim_privs;
grant select on SYSTEM.LOGMNR_COL$ to striim_privs;
grant select on SYSTEM.LOGMNR_OBJ$ to striim_privs;
grant select on SYSTEM.LOGMNR_USER$ to striim_privs;
grant select on SYSTEM.LOGMNR_UID$ to striim_privs;
create user striim identified by ******** default tablespace users;
grant striim_privs to striim;
alter user striim quota unlimited on users;

If using Database Vault, omit execute_catalog_role, and also enter the following commands:

grant execute on SYS.DBMS_LOGMNR to striim_privs;
grant execute on SYS.DBMS_LOGMNR_D to striim_privs;
grant execute on SYS.DBMS_LOGMNR_LOGREP_DICT to striim_privs;
grant execute on SYS.DBMS_LOGMNR_SESSION to striim_privs;

For Oracle 12c only, also enter the following command.

grant LOGMINING to striim_privs;

If using Oracle 12c, 18c, or 19c with PDB, enter the following commands. Replace <PDB name> with the name of your PDB.

create role c##striim_privs;
grant create session,
execute_catalog_role,
select any transaction,
select any dictionary,
logmining
to c##striim_privs;
grant select on SYSTEM.LOGMNR_COL$ to c##striim_privs;
grant select on SYSTEM.LOGMNR_OBJ$ to c##striim_privs;
grant select on SYSTEM.LOGMNR_USER$ to c##striim_privs;
grant select on SYSTEM.LOGMNR_UID$ to c##striim_privs;
create user c##striim identified by ******* container=all;
grant c##striim_privs to c##striim container=all;
alter user c##striim set container_data = (cdb$root, <PDB name>) container=current;

If using Database Vault, omit execute_catalog_role, and also enter the following commands:

grant execute on SYS.DBMS_LOGMNR to c##striim_privs;
grant execute on SYS.DBMS_LOGMNR_D to c##striim_privs;
grant execute on SYS.DBMS_LOGMNR_LOGREP_DICT to c##striim_privs;
grant execute on SYS.DBMS_LOGMNR_SESSION to c##striim_privs;

Maximum Uptime with Guaranteed Delivery, Monitoring and Alerts

Striim gives your team full visibility into your data pipelines with the following monitoring capabilities:

  • Rule-based, real-time alerts where you can define your custom alert criteria
  • Real-time monitoring tailored to your metrics
  • Exactly-once processing (E1P) guarantees

or

alerts

Striim uses a built-in stream processing engine that allows high volume data ingest and processing for Snowflake ETL purposes.

Conclusion

To summarize, Striim on Snowflake Partner Connect provides an easy-to-use cloud data integration service for Snowflake. The service comes with a 7-day free trial, giving you ample time to begin your journey to bridge your operational data with your Snowflake Data Warehouse.

Visit our Striim for Snowflake webpage for a deeper dive into solutions such as

  • How to migrate data from Oracle to Snowflake using Change Data Capture
  • How to migrate data from SQL Server to Snowflake using Change Data Capture
  • Teradata to Snowflake migrations
  • AWS Redshift to Snowflake migrations
  • Moving IoT data to Snowflake (OPCUA, MQTT)
  • Moving data from AWS S3 to Snowflake

As always, feel free to reach out to our integration experts to schedule a demo.



Announcing Striim on Snowflake Partner Connect

 

 

Overview

1. Introduction: Announcing Striim on Partner Connect
2. What is Striim?
3. What is Change Data Capture?
4. Tutorial: Migrating data to Snowflake with Striim on Partner Connect
5. Conclusion: Start your Free Trial

Introduction: Announcing Striim on Partner Connect

It only takes minutes to get up and running with Snowflake. So, it should be just as easy to move your data into Snowflake with an intuitive cloud-based data integration service.

To help you build real-time ETL pipelines for Snowflake, we’re happy to announce that Striim is now available as a cloud service directly on Snowflake Partner Connect.

“Striim simplifies and accelerates the movement of real-time enterprise data to Snowflake with an easy and scalable pay-as-you-go model,” Director of Product Management at Snowflake, Harsha Kapre said. “With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service.

John Kutay, Director of Product Growth at Striim, highlights the simplicity of Striim’s cloud service on Partner Connect: “We focused on delivering an experience tailored towards Snowflake customers; making it easy to bridge the gap between operational databases and Snowflake via self-service schema migration, initial data sync, and change data capture.”

What is Striim?

Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling continuous replication from popular databases such as Oracle, SQLServer, PostgreSQL and many others.

In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to enrich and normalize data before it’s written to Snowflake.

Our focus on usability and scalability has driven adoption from customers like Attentia, Belgium-based HR and well-being company, and Inspyrus, a Silicon Valley-based invoice automation company, who chose Striim to replicate data to Snowflake.

What is Change Data Capture?

While many data integration products focus on batch data integration, Striim specializes in helping you build continuous, real-time database replication pipelines using change data capture (CDC).This keeps the target system in sync with the source database to address real-time requirements.

Before we dive into an example pipeline, we’ll briefly go over the concept of Change Data Capture (CDC). CDC is the process of tailing the database’s change logs, turning database events such as inserts, updates, deletes, and relevant DDL statements into a stream of immutable events, and applying those changes to a target database or data warehouse.

Change data capture is also a useful software abstraction for other software applications such as version control and event sourcing.

Event as a Change to an Entry in a Database
Imagine Each Event as a Change to an Entry in a Database

Striim brings decades of experience delivering change data capture products that work in mission-critical environments. The founding team at Striim was the executive (and technical) team at GoldenGate Software – now Oracle GoldenGate. Today, Striim is offering CDC as an easy-to-use, cloud-based product for data integration.

Tutorial: Migrating data to Snowflake in minutes with Striim’s cloud service

snowflake etl wizards
Striim has published a tutorial to help users get up and running with Striim on Partner Connect including details for the following steps:

  • Launching Striim’s free trial from Partner Connect
  • Migrating schemas from your source databases to Snowflake with Striim’s intuitive user interface.
  • Historical sync: move millions of rows in minutes all during your free trial
  • Intro to setting up Streaming ETL pipelines using change data capture.
  • Monitor your workloads with dashboards and rule-based alerts

Check out the tutorial here (7 minute read).

Conclusion

To summarize, Striim on Snowflake Partner Connect provides an easy-to-use cloud data integration service for Snowflake. The service comes with a 7-day free trial, giving you ample time to begin your journey to bridge your operational data with your Snowflake Data Warehouse.

Visit our Striim for Snowflake webpage for a deeper dive into solutions such as

  • How to migrate data from Oracle to Snowflake using Change Data Capture
  • How to migrate data from SQL Server to Snowflake using Change Data Capture
  • Teradata to Snowflake migrations
  • AWS Redshift to Snowflake migrations
  • Moving IoT data to Snowflake (OPCUA, MQTT)
  • Moving data from AWS S3 to Snowflake

As always, feel free to reach out to our integration experts to schedule a demo.

 

Real-Time Data Integration for Snowflake with Striim on Partner Connect

 

 

Overview

1. Introduction
2. What is Striim?
3. What is Change Data Capture?
4. Tutorial: Migrating data to Snowflake with Striim on Partner Connect
5. Conclusion: Get Started

With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service – Harsha Kapre, Director of Product Management at Snowflake

Data Integration for Snowflake: Announcing Striim on Partner Connect

At Striim, we value building real-time data integration solutions for cloud data warehouses. Snowflake has become a leading Cloud Data Platform by making it easy to address some of the key challenges in modern data management such as

  • Building a 360 view of the customer
  • Combining historical and real-time data
  • Handling large scale IoT device data
  • Aggregating data for machine learning purposes

It only took you minutes to get up and running with Snowflake. So, it should be just as easy to move your data into Snowflake with an intuitive cloud-based data integration service.

To give you an equally seamless data integration experience, we’re happy to announce that Striim is now available as a cloud service directly on Snowflake Partner Connect.

“Striim simplifies and accelerates the movement of real-time enterprise data to Snowflake with an easy and scalable pay-as-you-go model,” Director of Product Management at Snowflake, Harsha Kapre said. “With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service.”

John Kutay, Director of Product Growth at Striim, highlights the simplicity of Striim’s cloud service on Partner Connect: “We focused on delivering an experience tailored towards Snowflake customers; making it easy to bridge the gap between operational databases and Snowflake via self-service schema migration, initial data sync, and change data capture.

A Quick Tutorial

We’ll dive into a tutorial on how you can use Striim on Partner Connect to create schemas and move data into Snowflake in minutes. We’ll cover the following in the tutorial:

  • Launch Striim’s cloud service directly from the Snowflake UI
  • Migrate schemas from your source database. We will be using MySQL for this example, but the steps are almost exactly the same other databases (Oracle, PostgreSQL, SQLServer, and more).
  • Perform initial load: move millions of rows in minutes all during your free trial of Striim
  • Kick off a real-time replication pipeline using change data capture.
  • Monitoring your data integration pipelines with real-time dashboards and rule-based alerts

But first a little background!

What is Striim?

At a high level, Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling real-time data integration from popular databases such as Oracle, SQLServer, PostgreSQL and many others.

In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to enrich and normalize data before it’s written to Snowflake.

Our focus on usability and scalability has driven adoption from customers like Attentia, Belgium-based HR and well-being company, and Inspyrus, a Silicon Valley-based invoice processing company, that chose Striim for data integration to Snowflake.

What is Change Data Capture?

While many products focus on batch data integration, Striim specializes in helping you build continuous, real-time database replication pipelines using change data capture (CDC).This keeps the target system in sync with the source database to address real-time requirements.

Before we dive into an example pipeline, we’ll briefly go over the concept of Change Data Capture (CDC). CDC is the process of tailing the database’s change logs, turning database events such as inserts, updates, deletes, and relevant DDL statements into a stream of immutable events, and applying those changes to a target database or data warehouse.

Change data capture is also a useful software abstraction for other software applications such as version control and event sourcing.

Event as a Change to an Entry in a Database
Imagine Each Event as a Change to an Entry in a Database

Striim brings decades of experience delivering change data capture products that work in mission-critical environments. The founding team at Striim was the executive (and technical) team at GoldenGate Software (now Oracle GoldenGate). Now Striim is offering CDC as an easy-to-use, cloud-based product for data integration.

Migrating data to Snowflake in minutes with Striim’s cloud service

Let’s dive into how you can start moving data into Snowflake in minutes using our platform. In a few simple steps, this example shows how you can move transactional data from MySQL to Snowflake. Let’s get started:

1. Launch Striim in Snowflake Partner Connect

In your Snowflake UI, navigate to “Partner Connect” by clicking the link in the top right corner of the navigation bar. There you can find and launch Striim.

2. Sign Up For a Striim Free Trial

Striim’s free trial gives you seven calendar days of the full product offering to get started. But we’ll get you up and running with schema migration and database replication in a matter of minutes.

snowflakepc

snowflakepc signup

3. Create your first Striim Service.

A Striim Service is an encapsulated SaaS application that dedicates the software and fully managed compute resources you need to accomplish a specific workload; in this case we’re creating a service to help you move data to Snowflake! We’re also available to assist with you via chat in the bottom right corner of your screen.

snowflakepc

snowflake pc

4. Start moving data with Striim’s step-by-step wizards.

In this case, the MySQL to Snowflake is selected. As you can see, Striim supports data integration for a wide-range of database sources – all available in the free trial.

snowflake pc wizard

snowflake pc wizard

5. Select your schemas and tables from your source database

snowflake pc

6. Start migrating your schemas and data

After select your tables, simply click ‘Next’ and your data migration pipeline will begin!

snowflake pc

7. Monitor your data pipelines in the Flow Designer

As your data starts moving, you’ll have a full view into the amount of data being ingested and written into Snowflake including the distribution of inserts, updates, deletes, primary key changes and more.


For a deeper drill down, our application monitor gives even more insights into low-level compute metrics that impact your integration latency.

snowflake pc

Real-Time Database Replication to Snowflake with Change Data Capture

Striim makes it easy to sync your schema migration and CDC applications.

While Striim makes it just as easy to build these pipelines, there are some prerequisites to configuring CDC from most databases that are outside the scope of Striim.

To use MySQLReader, the adapter that performs CDC, an administrator with the necessary privileges must create a user for use by the adapter and assign it the necessary privileges:

CREATE USER 'striim' IDENTIFIED BY '******';
GRANT REPLICATION SLAVE ON *.* TO 'striim';
GRANT REPLICATION CLIENT ON *.* TO 'striim';
GRANT SELECT ON *.* TO 'striim';</var/www/striim-com>

The MySQL 8 caching_sha2_password authentication plugin is not supported in this release. The mysql_native_password plugin is required. The minimum supported version is 5.5 and higher.

The REPLICATION privileges must be granted on *.*. This is a limitation of MySQL.

You may use any other valid name in place of striim. Note that by default MySQL does not allow remote logins by root.

Replace ****** with a secure password.

You may narrow the SELECT statement to allow access only to those tables needed by your application. In that case, if other tables are specified in the MySQLReader properties, Striim will return an error that they do not exist.

MYSQL BINARY LOG SETUP
MySQLReader reads from the MySQL binary log. If your MySQL server is using replication, the binary log is enabled, otherwise it may be disabled.

For MySQL, the property name for enabling the binary log, its default setting, and how and where you change that setting vary depending on the operating system and your MySQL configuration, so see the documentation for the version of MySQL you are running for instructions.

If the binary log is not enabled, Striim’s attempts to read it will fail with errors such as the following:

2016-04-25 19:05:40,377 @ -WARN hz._hzInstance_1_striim351_0423.cached.thread-2
com.webaction.runtime.Server.startSources (Server.java:2477) Failure in Starting
Sources.
java.lang.Exception: Problem with the configuration of MySQL
Row logging must be specified.
Binary logging is not enabled.
The server ID must be specified.
Add --binlog-format=ROW to the mysqld command line or add binlog-format=ROW to your
my.cnf file
Add --bin-log to the mysqld command line or add bin-log to your my.cnf file
Add --server-id=n where n is a positive number to the mysqld command line or add
server-id=n to your my.cnf file
at com.webaction.proc.MySQLReader_1_0.checkMySQLConfig(MySQLReader_1_0.java:605) ...</var/www/striim-com>

Once those prerequisites are completed, you can run the MySQL CDC wizard and start replicating data from your database right where your schema migration and initial load left off.

Maximum Uptime with Guaranteed Delivery, Monitoring and Alerts

Striim gives your team full visibility into your data pipelines with the following monitoring capabilities:

  • Rule-based, real-time alerts where you can define your custom alert criteria
  • Real-time monitoring tailored to your metrics
  • Exactly-once processing (E1P) guarantees

Striim uses a built-in stream processing engine that allows high volume data ingest and processing for Snowflake ETL purposes.

Conclusion

To summarize, Striim on Snowflake Partner Connect provides an easy-to-use cloud data integration service for Snowflake. The service comes with a 7-day free trial, giving you ample time to begin your journey to bridge your operational data with your Snowflake Data Warehouse.

Visit our Striim for Snowflake webpage for a deeper dive into solutions such as

  • How to migrate data from Oracle to Snowflake using Change Data Capture
  • How to migrate data from SQL Server to Snowflake using Change Data Capture
  • Teradata to Snowflake migrations
  • AWS Redshift to Snowflake migrations
  • Moving IoT data to Snowflake (OPCUA, MQTT)
  • Moving data from AWS S3 to Snowflake

As always, feel free to reach out to our integration experts to schedule a demo.

Striim 3.10.1 Further Speeds Cloud Adoption

 

 

We are pleased to announce the general availability of Striim 3.10.1 that includes support for new and enhanced Cloud targets, extends manageability and diagnostics capabilities, and introduces new ease of use features to speed our customers’ cloud adoption. Key Features released in Striim 3.10.1 are directly available through Snowflake Partner Connect to enable rapid movement of enterprise data into Snowflake.

Striim 3.10.1 Focus Areas Including Cloud Adoption

This new release introduces many new features and capabilities, summarized here:

3.10.1 Features Summary

 

Let’s review the key themes and features of this new release, starting with the new and expanded cloud targets

Striim on Snowflake Partner Connect

From Snowflake Partner Connect, customers can launch a trial Striim Cloud instance directly as part of the Snowflake on-boarding process from the Snowflake UI and load data, optionally with change data capture, directly into Snowflake from any of our supported sources. You can read about this in a separate blog.

Expanded Support for Cloud Targets to Further Enhance Cloud Adoption

The Striim platform has been chosen as a standard for our customers’ cloud adoption use-cases partly because of the wide range of cloud targets it supports. Striim provides integration with databases, data warehouses, storage, messaging systems and other technologies across all three major cloud environments.

A major enhancement is the introduction of support for the Google BigQuery Streaming API. This not only enables real-time analytics on large scale data in BigQuery by ensuring that data is available within seconds of its creation, but it also helps with quota issues that can be faced by high volume customers. The integration through the BigQuery streaming API can support data transfer up to 1GB per second.

In addition to this, Striim 3.10.1 also has the following enhancements:

  • Optimized delivery to Snowflake and Azure Synapse that facilitates compacting multiple operations on the same data to a single operation on the target resulting in much lower change volume
  • Delivery to MongoDB cloud and MongoDB API for Azure Cosmos DB
  • Delivery to Apache Cassandra, DataStax Cassandra, and Cassandra API for Azure Cosmos DB

  • Support for delivery of data in Parquet format to Cloud Storage and Cloud Data Lakes to further support cloud analytics environments

Schema Conversion to Simplify Cloud Adoption Workflows

As part of many cloud migration or cloud integration use-cases, especially during the initial phases, developers often need to create target schemas to match those of source data. Striim adds the capability to use source schema information from popular databases such as Oracle, SQL Server, and PostgreSQL and create appropriate target schema in cloud targets such as Google BigQuery, Snowflake and others. Importantly, these conversions understand data type and structure differences between heterogeneous sources and targets and act intelligently to spot problems and inconsistencies before progressing to data movement, simplifying cloud adoption.

Enhanced Monitoring, Alerting and Diagnostics

On-going data movement between on-premise and cloud environments for migrations, or powering reporting and analytics solutions, are often part of an enterprise’s critical applications. As such they demand deep insights into the status of all active data flows.

Striim 3.10.1 adds the capability to inherently monitor data from its creation in the source to successful delivery in a target, generate detailed lag reports, and alert on situations where lag is outside of SLAs.

End to End Lag Visualization

In addition, this release provides detailed status on checkpointing information for recovery and high availability scenarios, with insight into checkpointing history and currency.

Real-time Checkpointing Information

Simplifies Working with Complex Data

As customers work with heterogeneous environments and adopt more complex integration scenarios, they often have to work with complex data types, or perform necessary data conversions. While always possible through user defined functions, this release adds multiple commonly requested data manipulation functions out of the box. This simplifies working with JSON data and document structures, while also facilitating data cleansing, and regular expression operations.

On-Going Support for Enterprise Sources

As customers upgrade their environments, or adopt new technologies, it is essential that their integration platform keeps pace. In Striim 3.10.1 we extend our support for the Oracle database to include Oracle 19c, including change data capture, add support for schema information and metadata for Oracle GoldenGate trails, and certify our support for Hive 3.1.0

These are a high level view of the new features of Striim 3.10.1. There is a lot more to discover to aid on your cloud adoption journey. If you would like to learn more about the new release, please reach out to schedule a demo with a Striim expert.

Why Choose Striim + Snowflake

Why Choose Striim + Snowflake

 

 

With greater speed, ease, and flexibility, Snowflake’s cloud data warehouse helps you gain meaningful insights by providing the performance and simplicity that traditional data warehouse offerings could not support.

Adopting a data warehouse in the cloud with Snowflake requires a modern approach to the movement of enterprise data. This data is often generated from diverse data sources deployed in various locations – including on-prem data centers, major public clouds, and devices. Snowflake users need real-time data movement capabilities to realize the full potential of data warehousing in the cloud, and benefit from more meaningful operational intelligence.

Why Choose Striim + SnowflakeWhile there are many vendors that provide data movement to Snowflake, Striim’s real-time, enterprise-grade streaming ETL solution offers advanced capabilities that other vendors can’t match, including the ability to:

  • Bring a wide range of data sets (including security log data, Kafka/messaging, IoT data, OLAP and OLTP) in a consumable format to Snowflake to achieve rich, timely, and reliable insights fast
  • Use robust, reliable, low-impact change data capture (CDC) from major enterprise database
  • Aggregate, filter, denormalize, enrich, mask real-time data in-flight using a SQL-based language before delivering to Snowflake to rapidly gain time-sensitive insights
  • Combine and correlate machine data, OLAP, OLTP and IoT data with other data sources in-flight for complete and rich operational intelligence
  • Perform online migration from existing on-prem data warehouses such as (Oracle Exadata or Teradata) to Snowflake with minimized interruption and risks
  • Offer an enterprise-grade solution designed for mission-critical, high data volume environments with built-in HA, scalability, exactly-once-processing (no data loss or duplicates) and security, all in a patented distributed platform

In fact, many current Striim + Snowflake customers previously deployed a solution from another vendor, only to find that the solution did not meet their needs for scalability, reliability, in-flight processing, or, simply, data access.

Let’s drill down on the ways Striim supports Snowflake’s advanced analytics applications with modern, enterprise-grade streaming ETL, and further allows customers to gain operational value from their Snowflake solutions.

Moving data in real time from diverse sources

Data ingestion providers that only collect data from a limited range of sources cannot support complete and rich operational intelligence in Snowflake. The Striim platform ingests real-time streaming data from a variety of sources out-of-the box, including data warehouses databases (including Oracle, SQL Server, HPE NonStop, MongoDB, Amazon RDS, and MySQL), log files from security devices and other systems, sensors, messaging systems, and Hadoop solutions.

Avoiding batch ETL-related inefficiencies

Data integration providers that use traditional batch ETL for the movement of data into Snowflake are unable to support real-time operational intelligence and time-sensitive analytics use cases in Snowflake. When users adopt an ELT architecture by using replication solutions with another tool for in-target transformations, this architecture creates complexities, especially during process recovery. Striim offers an end-to-end solution with a simplified solution architecture to bring a wide range of data in real time.

Minimizing source impact and interruption

While there are providers that offer very simplistic movement of file data, and do not support CDC from databases, these solutions cannot scale and may require the use of additional products to be integrated into the configuration. Using Striim, businesses can easily adopt cloud data warehouse with online migration from existing data warehouses with minimal disruption and risk.

Assuring security and reliability

Striim is an enterprise-grade solution with built-in HA, security, scalability, exactly once processing for no data loss or duplication for business-critical production systems.

Applying transformations and enrichments

Striim applies filtering, transformations, aggregations, masking, and enrichment using static or streaming reference datain real time – as the data is being delivered into Snowflake, in a consumable format to Snowflake – to accelerate the delivery of rich, timely, and reliable insights.

Striim + Snowflake Customer Use Cases

To illustrate the above points, we would like to share a couple of Snowflake customers that chose Striim to gain operational intelligence for critical operations.

A Leading International Cruise Line

  • Ingests real-time POS data from spa, casinos, and stores to enterprise data warehouse on Snowflake to generate near real-time offers for travelers.
  • Striim moves spending data from Oracle, SQL Server databases and GoldenGate Trail Files to Snowflake.
  • The solution provides real-time promotional offers and discounts to customers based on their spending behavior to improve both satisfaction and revenue.

A European HR Solutions Provider

  • Ingests data from Oracle and SQL Server database using real-time CDC from to Snowflake.
  • Chose Striim for low data latency with built-in scalability, security, and reliability.
  • Zero maintenance solution and pay-per-use models also were key considerations.

By streaming enterprise data to Snowflake with built-in scalability, security, and reliability, Striim simplifies the adoption of a modern, cloud data warehouse for time-sensitive, operational decision making.

We invite you to learn more about Striim’s Snowflake offering by visiting our Snowflake solutions page. Feel free to contact us if we can help you reliably move data in real time to Snowflake.

Striim Announces Strategic Partnership with Snowflake to Drive Cloud-Based Data-Driven Analytics

We are excited to announce that we’ve entered into a strategic partnership with Snowflake, the data warehouse built for the cloud, in which Striim will be used to move real-time data into Snowflake. Through this strategic partnership, Snowflake users will be empowered to gain fast insights from their cloud-based analytics.

Enterprise companies are quickly adopting Snowflake because its architecture is built from the ground up for the cloud. Snowflake offers speed, scalability, and cost-effectiveness, along with zero management. In order to attain fast analytics, you need access to real-time data, and that’s where Striim comes in. Striim is leveraging its vast real-time data integration capabilities to enable Snowflake users to collect and move data from a variety of sources into their environment to accelerate their data-driven analytics.

Striim uses low-impact change data capture (CDC) to move data from existing on-prem databases, including SQL Server, Oracle, MongoDB, HPE NonStop, PostgreSQL, MySQL and Amazon RDS. Striim can also help you migrate data warehouses such as Teradata, Netezza, Amazon Redshift, and Oracle Exadata. Additionally, Striim can collect from messaging systems, Hadoop, log files, sensors, and security devices and other systems. Striim also has analytical capabilities to monitor and measure transaction lag and alert when SLAs are not met.

Through CDC, Striim can handle large volumes of enterprise data securely and reliably. Along with its CDC capabilities, Striim adds further value through in-flight processing, transformations, and denormalization to further assist Snowflake users in providing quicker analysis by continuously delivering data to Snowflake in the right format, and with added context.

Striim has a number of use cases with customers using the solution for both online migrations and continuous integration to Snowflake.

For example, Attentia, a Belgian company that offers HR and Well-Being services, is a joint customer that was searching for a low-latency streaming integration solution that was scalable and also offered a secure data warehouse with analytical options. Attentia’s goal was to enable employees to instantly query their personal information, as well as allow employers to identify trends and patterns from the data.

With Striim + Snowflake, Attentia has been delivering real-time data and analytics using CDC from Oracle to Azure for streamlined operations. The partnership between Striim and Snowflake has dramatically enhanced Attentia’s business operations, enabling them to make faster, smarter decisions based on their real-time data.

To learn more about the Striim-Snowflake solution and Striim’s partnership with Snowflake, please read our press release, visit our Striim for Snowflake product page, or set up a quick demo with a Striim technologist.

CDC to Snowflake

CDC to Snowflake

 

Let’s take a moment to discuss why Change Data Capture or CDC to Snowflake is quickly becoming the preferred method of loading real-time data from transactional databases to Snowflake, without impacting source systems.

Snowflake is changing expectations for speed and flexibility of a data warehouse. Snowflake provides a cloud-based data warehouse that enables organizations to store and analyze data using public cloud-based hardware and software on AWS and Microsoft Azure.CDC to Snowflake

However these benefits of speed and flexibility can be quickly throttled by legacy approaches to moving data into Snowflake. For most companies, their most valuable data – transactional and operational data – is stored on-prem in traditional relational databases or legacy data warehouses. While old-school migrations or batch ETL uploads achieve the objective of moving the data to a target such as Snowflake, these out-of-date, high-latency approaches cannot support the continuous data pipelines and real-time operational decision-making that Snowflake is built for.

Enter CDC to Snowflake, made possible by Striim. The Striim platform enables Snowflake users to quickly and easily leverage low-impact, real-time change data capture, or CDC to Snowflake, moving and processing only the changed data from their existing databases. Moving change data continuously, as new database transactions or events occur, makes it possible for Snowflake users to maintain the real-time data pipelines necessary to feed Snowflake’s fast and flexible storage and analytics solutions.

For the initial load of data to Snowflake, Striim enables zero-downtime, zero-data-loss migration from databases and data warehouses to Snowflake. As an enterprise-grade solution, Striim also features built-in, real-time monitoring to validate that the database transactions have loaded successfully to Snowflake, minimizing risk by ensuring data consistency.

Striim can not only load data, and continuously feed data, to Snowflake. Striim is unique in its ability to provide in-flight processing such as filtering, transformations into the desired schema, and data masking. In-memory stream processing minimizes ETL workloads, improves performance, reduces complexity and facilitates compliance.

Striim offers low-impact, log-based CDC to Snowflake from the following data sources: Oracle Microsoft SQL Server, MySQL, PostgreSQL, MongoDB, HPE NonStop SQL/MX, HPE NonStop SQL/MP, HPE NonStop Enscribe, and MariaDB. New sources are being added on a regular basis. All of these sources can be accessed via Striim’s easy-to-use CDC Wizards and drag-and-drop UI, speeding delivery of CDC to Snowflake solutions.

For more information on Striim’s CDC to Snowflake offering, please visit our Snowflake solutions page at: striim2020.local.com/partners/real-time-data-to-snowflake/

If you’d like a brief demo of CDC to Snowflake, please schedule a demo.

Setting Up Streaming ETL to Snowflake

Snowflake, the data warehouse built for the cloud, is designed to bring power and simplicity to your cloud-based analytics solutions, especially when combined with a streaming ETL to Snowflake running in the cloud.Streaming ETL to Snowflake

Snowflake helps you make better and faster business decisions using your data on a massive scale, fueling data-driven organizations. Just take a look at Snowflake’s example use cases and you can see how companies are creating value from their data with Snowflake. There’s just one key caveat – how do you get your data into Snowflake in the first place?

Approaches – ETL/CDC/ELT
There are plenty of options when it comes to using data integration technologies, including ETL to Snowflake.

Let’s start with traditional ETL. Now a 50+ year old legacy technology, ETL was the genesis of data movement and enabled batch, disk-based transformations. While ETL is still used for advanced transformation capabilities, the high latencies and immense load on your source databases leave something to be desired.

Next, there was Change Data Capture (CDC). Pioneered by the founders of Striim at their previous company, GoldenGate Software (acquired by Oracle), CDC technology enabled use cases such as zero downtime database migration and heterogeneous data replication. However, CDC lacks transformational capabilities, forcing you into an ELT approach – first landing the data into a staging area such as storage, and then transforming to its final form. While this works, the multiple hops increase your end-to-end latency and architectural complexity.

Continuously Integrating Transactional Data into Snowflake

Enter, Striim. Striim is an evolution from GoldenGate and combines the real-time nature of CDC with many of the transformational capabilities of ETL into a next-generation streaming solution for ETL to Snowflake and other analytics platforms, on-premises or in the cloud. Enabling real-time data movement into Snowflake, Striim continuously ingests data from on-premises systems and other cloud environments to Snowflake. In this quick start guide, we will walk you through, step-by-step, how to use Striim for streaming ETL to Snowflake by loading data in real time, whether you run Snowflake on Azure or AWS.

Data Flow
We’ll get started with an on-premises Oracle to Snowflake application with in-line transformations and denormalization. This guide assumes you already have Striim installed either on-premises or in the cloud, along with your Oracle database and Snowflake account configured.

After installing Striim, there are a variety of ways to create applications, or data pipelines, from a source to a target. Here, I’ll focus on using our pre-built wizards and drag-and-drop UI, but you can also build applications with the drag-and-drop UI from scratch, or using a declarative language using the CLI.

We will show how you can set up the flow between the source and target, and then how you can enrich records using an in-memory cache that’s preloaded with reference data.

  1. In the Add App page, select Start with Template.

Striim for ETL to Snowflake

2. In the following App Wizard screen, search for Snowflake.

Striim for ETL to Snowflake

3. For this example, we’ll choose Oracle CDC to Snowflake.

Striim for ETL to Snowflake

4. Name the application whatever you’d like – we’ll choose oracleToSnowflake. Go ahead and use the default admin Namespace. Namespaces are used for both application organization and enable a microservices approach when you have multiple data pipelines. Click Save.

Striim for ETL to Snowflake

5. Follow the wizards, entering first your on-premises Oracle configuration properties, and then your Snowflake connection properties. In this case I’m migrating an Oracle orders table. Click Save, and you’ll be greeted by our drag and drop UI with the source and target pre-populated. If you want to just do a straight source-to-target migration, that’s it! However, we’ll continue this example with enrichment and denormalization, editing our application using the connectors located on the left-hand side menu bar.

Striim for ETL to Snowflake

6. In this use case, we’ll enrich the Orders table with another table of the same on-premises Oracle database. Locate the Enrichment tab on the left-hand menu bar, and drag and drop the DB Cache to your canvas.

Striim for ETL to Snowflake

7. First, name the cache whatever you’d like – I chose salesRepCache. Then, specify the Type of your cache. In this case, my enrichment table contains three fields: ID, Name, and Email. Specify a Key to map. This tells Striim’s in-memory cache how to position the data for the fastest possible joins. Finally, specify your Oracle Username, the JDBC Connection URL, your password, and the tables that you want to use as a cache. Click Save.

Striim for ETL to Snowflake

8. Now we’ll go ahead and join our streaming CDC source with the static Database Cache. Click the circular stream beneath your Oracle source, and click Connect next CQ component.

Striim for ETL to Snowflake

9. Application logic in Striim is expressed using Continuous Queries, or CQs. You do so using standard SQL syntax and optional Java functionality for custom scenarios. Unlike a query on a database where you run one query and receive one result, a CQ is constantly running, executing the query on an event-by-event basis as the data flows through Striim. Data can be easily pre-formatted or denormalized using CQs.

10. In this example, we are doing a few simple transformations of the fields of the streaming Oracle CDC source, as well as enriching the source with the database cache – adding in the SALES_REP_NAME and SALES_REP_EMAIL fields where the SALES_REP_ID of the streaming CDC source equals the SALES_REP_ID of the static database cache. Specify the name of the stream you want to output the result to, and click Save. Your logic here may vary depending on your use case.

Striim for ETL to Snowflake

11. Lastly, we have to configure our SnowflakeTarget to read from the enrichedStream, not the original CDC source. Click on your Snowflake target and change the Input Stream from the Oracle source stream to your enriched stream. Click Save.

Striim for ETL to Snowflake

12. Now you’re good to go! In the top menu bar, click on Created and press Deploy App.

Striim for ETL to Snowflake

13. The deployment page allows you to specify where you want specific parts of your data pipeline to run. In this case I have a very simple deployment topology – I’m just running Striim on my laptop, so I’ll choose the default option.

Striim for ETL to Snowflake

14. Click the eye next to your enrichedStream to preview your data as it’s flowing through, and press Start App in the top menu bar.

Striim for ETL to Snowflake

15. Now that the apps running, let’s generate some data. In this case I just have a sample data generator that is connecting to my source Oracle on-premises database.

Striim for ETL to Snowflake

16. Data is flowing through the Striim platform, and you can see the enriched Sales Rep Name and Emails. 

Striim for ETL to Snowflake

17. Lastly, let’s go to our Snowflake warehouse and just do a simple select * query. Data is now being continuously written to Snowflake.

Striim for ETL to Snowflake

That’s it! Without any coding, you now have set up streaming ETL to Snowflake to load data continuously, in real time.

Interested in learning more about streaming ETL to Snowflake? Check out our Striim for Snowflake solution page, schedule a demo with a Striim technologist, or download the Striim platform to get started!

Continuously Move Data to Snowflake

Enterprises must continuously move data to Snowflake to take full advantage of this data warehouse built for the cloud.

You chose Snowflake to provide rapid insights into your data on a massive scale, on AWS or Azure. However, most of your source data resides elsewhere – in a wide variety of on-premise or cloud sources. How do you continually move data to Snowflake in real-time, processing it along the way, so that your fast analytics and insights are reporting on timely data?

Snowflake was built for the cloud, and built for speed. By separating compute from storage you can easily scale up and down as needed. This gives you instant elasticity supporting any amount of data, and high speed queries for any number of users, coupled with the peace of mind provided by secure data sharing. The per-second pricing and support for multiple clouds allows you to choose your infrastructure and only pay when you are using the data warehouse.

However, residing in cloud means you have to determine how to most effectively move data to Snowflake. This could be migrating an existing Teradata or Exadata Data Warehouse, or continually populating Snowflake with newly generated on-premises data from operational databases, logs, or device information. In order for the warehouse to provide up-to-date information, there should be as little latency as possible between the original data creation and its delivery to Snowflake.

The Striim platform can help with all these requirements and more. Our database adapters support change data capture, or CDC, from enterprise or cloud databases. CDC directly intercepts database activity and collects all the inserts, updates, and deletes as they happen, ready to stream into Snowflake. Adapters for machine logs and other files read at the end of multiple files in parallel to stream out data as it is written, removing the inherent latency of batch. While data from devices and messaging systems can be collected easily, independent of their format, through a variety of high-speed adapters and parsers.

After being collected continuously, the streaming data can be delivered directly into Snowflake with very low latency, or pushed through a data pipeline where it can be pre-processed through filtering, transformation, enrichment, and correlation using SQL-based queries, before delivery into Snowflake. This enables such things as data denormalization, change detection, de-duplication, and quality checking before the data is ever stored.

In addition to this, because Striim is an enterprise grade platform, it can scale with Snowflake and reliably guarantee delivery of source data while also providing built-in dashboards and verification of data pipelines for operational monitoring purposes.

The Striim wizard-based UI enables users to rapidly create a new data flow to move data to Snowflake. In this example, real-time change data from Oracle is being continually delivered to Snowflake. The wizard walks you through all the configuration steps, checking that everything is set up properly, and results in a data flow application. This data flow can be enhanced to filter, transform and enrich the data through SQL-based queries. In the video, we add a name and email address from a cache, based on an ID present in the original data.

When the application is started, data flows in real-time from Oracle to Snowflake. Making changes in Oracle results in the transformed data being written continually to Snowflake, visible through the Snowflake UI.

Striim and Snowflake can change the way you do analytics, with Snowflake providing rapid insight to the real-time data provided by Striim. The data warehouse that is built for the cloud needs data delivered to the cloud, and Striim can continuously move data to Snowflake to support your business operations and decision-making.

To learn more about how Striim makes it easy to continuously move data to Snowflake, visit our Striim for Snowflake product page, schedule a demo with a Striim technologist, or download the platform and try it for yourself.