Oracle Change Data Capture (CDC)

Oracle Change Data Capture Tutorial – An Event-Driven Architecture for Cloud Adoption

 

 

All businesses rely on data. Historically, this data resided in monolithic databases, and batch ETL processes were used to move that data to warehouses and other data stores for reporting and analytics purposes. As businesses modernize, looking to the cloud for analytics, and striving for real-time data insights, they often find that these databases are difficult to completely replace, yet the data and transactions happening within them are essential for analytics. With over 80% of businesses noting that the volume & velocity of their data is rapidly increasing, scalable cloud adoption and change data capture from databases like Oracle, SQLServer, MySQL and others is more critical than ever before. Oracle change data capture is specifically one area where companies are seeing an influx of modern data integration use cases.

To resolve this, more and more companies are moving to event-driven architectures, because of the dynamic distributed scalability which makes sharing large volumes of data across systems possible.

In this post we will look at an example which replaces batch ETL by event-driven distributed stream processing: Oracle change data capture events are extracted as they are created; enriched with in-memory, SQL-based denormalization; then delivered to the Azure Cloud to provide scalable, real-time, low-cost analytics, without affecting the source database. We will also look at using the enriched events, optionally backed by Kafka, to incrementally add other event-driven applications or services.

Continuous Data Collection, Processing, Delivery, and Analytics with the Striim Platform
Continuous Data Collection, Processing, Delivery, and Analytics with the Striim Platform

Event-Driven Architecture Patterns

Most business data is produced as a sequence of events, or an event stream: for example, web or mobile app interactions, devices, sensors, bank transactions, all continuously generate events. Even the current state of a database is the outcome of a sequence of events. Treating state as the result of a sequence of events forms the core of several event-driven patterns.

Event Sourcing is an architectural pattern in which the state of the application is determined by a sequence of events. As an example, imagine that each “event” is an incremental update to an entry in a database. In this case, the state of a particular entry is simply the accumulation of events pertaining to that entry. In the example below the stream contains the queue of all deposit and withdrawal events, and the database table persists the current account balances.

Event as a Change to an Entry in a Database
Imagine Each Event as a Change to an Entry in a Database

The events in the stream can be used to reconstruct the current account balances in the database, but not the other way around. Databases can be replicated with a technology called Change Data Capture (CDC), which collects the changes being applied to a source database, as soon as they occur by monitoring its change log, turns them into a stream of events, then applies those changes to a target database. Source code version control is another well known example of this, where the current state of a file is some base version, plus the accumulation of all changes that have been made to it.

The Change Log can be used to Replicate a Database
The Change Log can be used to Replicate a Database

What if you need to have the same set of data for different databases, for different types of use? With a stream, the same message can be processed by different consumers for different purposes. As shown below, the stream can act as a distribution point, where, following the polygot persistence pattern, events can be delivered to a variety of data stores, each using the most suited technology for a particular use case or materialized view.

Streaming Events Delivered to a Variety of Data Stores
Streaming Events Delivered to a Variety of Data Stores

Event-Driven Streaming ETL Use Case Example

Below is a diagram of the Event-Driven Streaming ETL use case example:

Event-Driven Streaming ETL Use Case Diagram
Event-Driven Streaming ETL Use Case Diagram
  1. Striim’s low-impact, real-time Oracle change data capture (CDC) feature is used to stream database changes (inserts, updates and deletes) from an Operational Oracle database into Striim
  2. CDC Events are enriched and denormalized with Streaming SQL and Cached data, in order to make relevant data available together
  3. Enriched, denormalized events are streamed to CosmosDB for real-time analytics
  4. Enriched streaming events can be monitored in real time with the Striim Web UI, and are available for further Streaming SQL analysis, wizard-based dashboards, and other applications on-premise or in the cloud.

Replacing Batch Extract with Real Time Streaming of CDC Order Events

Striim’s easy-to-use CDC wizards automate the creation of applications that leverage change data capture, to stream events as they are created, from various source systems to various targets. In this example, shown below, we use Striim’s OracleReader (Oracle Change Data Capture) to read the Order OLTP transactions in Oracle redo logs and stream these insert, update, delete operations, as soon as the transactions commit, into Striim, without impacting the performance of the source database.

Configuring Database Properties for the Oracle <a href=CDC Data Source” width=”353″ height=”379″ />
Configuring Database Properties for the Oracle CDC Data Source

Utilizing Caches For Enrichment

Relational Databases typically have a normalized schema which makes storage efficient, but causes joins for queries, and does not scale well horizontally. NoSQL databases typically have a denormalized schema which scales across a cluster because data that is read together is stored together.

Normalized Schema with Joins for Queries Does Not Scale Horizontally
Normalized Schema with Joins for Queries Does Not Scale Horizontally

With a normalized schema, a lot of the data fields will be in the form of IDs. This is very efficient for the database, but not very useful for downstream queries or analytics without any meaning or context. In this example we want to enrich the raw Orders data with reference data from the SalesRep table, correlated by the Order Sales_Rep_ID, to produce a denormalized record including the Sales Rep Name and Email information in order to make analysis easier by making this data available together.

Since the Striim platform is a high-speed, low latency, SQL-based stream processing platform, reference data also needs to be loaded into memory so that it can be joined with the streaming data without slowing things down. This is achieved through the use of the Cache component. Within the Striim platform, caches are backed by a distributed in-memory data grid that can contain millions of reference items distributed around a Striim cluster. Caches can be loaded from database queries, Hadoop, or files, and maintain data in-memory so that joining with them can be very fast. In this example, shown below, the cache is loaded with a query on the SalesRep table using the Striim DatabaseReader.

Configuring Database Properties for the Sales Rep Cache
Configuring Database Properties for the Sales Rep Cache

Joining Streaming and Cache Data For Real Time Transforming and Enrichment With SQL

We can process and enrich data-in-motion using continuous queries written in Striim’s SQL-based stream processing language. Using a SQL-based language is intuitive for data processing tasks, and most common SQL constructs can be utilized in a streaming environment. The main differences between using SQL for stream processing, and its more traditional use as a database query language, are that all processing is in-memory, and data is processed continuously, such that every event on an input data stream to a query can result in an output.

Dataflow Showing Joining and Enrichment of <a href=CDC data with Cache” width=”455″ height=”247″ />
Dataflow Showing Joining and Enrichment of CDC data with Cache

This is the query we will use to process and enrich the incoming data stream:

Full Transformation and Enrichment Query Joining the <a href=CDC Stream with Cache Data” width=”408″ height=”251″ />
Full Transformation and Enrichment Query Joining the CDC Stream with Cache Data

In this query we select the Order stream and SalesRep cache fields that we want, apply transformations to convert data types, put the Order stream and SalesRep cache in the FROM clause, and include a join on SALES_REP_ID as part of the WHERE clause. The result of this query is to continuously output enriched (denormalized) events, shown below, for every CDC event that occurs for the Orders table. So with this approach we can join streams from an Oracle Change Data Capture reader with cached data for enrichment.

Events After Transformation and Enrichment
Events After Transformation and Enrichment

Loading the Enriched Data to the Cloud for Real Time Analytics

Now the Oracle CDC (Oracle change data capture) data, streamed and enriched through Striim, can be stored simultaneously in Azure Cloud blob storage and Azure Cosmos DB, for elastic storage with advanced big data analytics, using the Striim AzureBlobWriter and the CosmosDBWriter shown below.

The image below shows the Striim flow web UI for our streaming ETL application. Flows define what data an application receives, how it processes the data, and what it does with the results.

End-to-End Data Flow
End-to-End Data Flow

Using Kafka for Streaming Replay and Application Decoupling

The enriched stream of order events can be backed by or published to Kafka for stream persistence, laying the foundation for streaming replay and application decoupling. Striim’s native Integration with Apache Kafka makes it quick and easy to leverage Kafka to make every data source re-playable, enabling recovery even for streaming sources that cannot be rewound. This also acts to decouple applications, enabling multiple applications to be powered by the same data source, and for new applications, caches or views to be added later.

Streaming SQL for Aggregates

We can further use Striims Streaming SQL on the denormalized data to make a real time stream of summary metrics about the events being processed available to Striim Real-Time Dashboards and other applications. For example, to create a running count and sum of orders per SalesRep in the last hour, from the stream of enriched orders, you would use a window, and the familiar group by clause.

CREATE WINDOW OrderWindow
OVER EnrichCQ
KEEP WITHIN 1 HOUR
PARTITION BY sales_rep_id

SELECT sales_rep_id, sales_rep_Name,
COUNT(*) as orderCount,
SUM(order_total) as totalAmount
FROM OrderWindow
GROUP BY sales_rep_id

Monitoring

With the Striim Monitoring Web UI we can now monitor our data pipeline with real-time information for the cluster, application components, servers, and agents. The Main monitor page allows to visualize summary statistics for Events Processed, App CPU%, Server Memory, or Server CPU%. Below the Monitor App page displays our App Resources, Performance and Components.

Striim Monitoring Web UI Monitor App Page
Striim Monitoring Web UI Monitor App Page

Clicking on an app component ‘more details’ button will display more detailed performance information such as CPU and Event rate as shown below:

Striim Monitoring Web UI Monitor App Component Details Page
Striim Monitoring Web UI Monitor App Component Details Page

Summary

In this blog post, we discussed how we can use Striim to:

  1. Perform Oracle Change Data Capture to stream data base changes in real-time
  2. Use streaming SQL and caches to easily denormalize data in order to make relevant data available together
  3. Load streaming enriched data to the cloud for real-time analytics
  4. Use Kafka for persistent streams
  5. Create rolling aggregates with streaming SQL
  6. Continuously monitor data pipelines

Additional Resources:

To read more about real-time data ingestion, please visit our Real-Time Data Integration solutions page.

To learn more about the power of streaming SQL, visit Striim Platform Overview product page, schedule a demo with a Striim technologist, or download a free trial of the platform and try it for yourself!

To learn more about Striim’s capabilities to support the data integration requirements for an Azure hybrid cloud architecture check out all of Striim’s solutions for Azure.

Striim 3.10.1 Further Speeds Cloud Adoption

 

 

We are pleased to announce the general availability of Striim 3.10.1 that includes support for new and enhanced Cloud targets, extends manageability and diagnostics capabilities, and introduces new ease of use features to speed our customers’ cloud adoption. Key Features released in Striim 3.10.1 are directly available through Snowflake Partner Connect to enable rapid movement of enterprise data into Snowflake.

Striim 3.10.1 Focus Areas Including Cloud Adoption

This new release introduces many new features and capabilities, summarized here:

3.10.1 Features Summary

 

Let’s review the key themes and features of this new release, starting with the new and expanded cloud targets

Striim on Snowflake Partner Connect

From Snowflake Partner Connect, customers can launch a trial Striim Cloud instance directly as part of the Snowflake on-boarding process from the Snowflake UI and load data, optionally with change data capture, directly into Snowflake from any of our supported sources. You can read about this in a separate blog.

Expanded Support for Cloud Targets to Further Enhance Cloud Adoption

The Striim platform has been chosen as a standard for our customers’ cloud adoption use-cases partly because of the wide range of cloud targets it supports. Striim provides integration with databases, data warehouses, storage, messaging systems and other technologies across all three major cloud environments.

A major enhancement is the introduction of support for the Google BigQuery Streaming API. This not only enables real-time analytics on large scale data in BigQuery by ensuring that data is available within seconds of its creation, but it also helps with quota issues that can be faced by high volume customers. The integration through the BigQuery streaming API can support data transfer up to 1GB per second.

In addition to this, Striim 3.10.1 also has the following enhancements:

  • Optimized delivery to Snowflake and Azure Synapse that facilitates compacting multiple operations on the same data to a single operation on the target resulting in much lower change volume
  • Delivery to MongoDB cloud and MongoDB API for Azure Cosmos DB
  • Delivery to Apache Cassandra, DataStax Cassandra, and Cassandra API for Azure Cosmos DB

  • Support for delivery of data in Parquet format to Cloud Storage and Cloud Data Lakes to further support cloud analytics environments

Schema Conversion to Simplify Cloud Adoption Workflows

As part of many cloud migration or cloud integration use-cases, especially during the initial phases, developers often need to create target schemas to match those of source data. Striim adds the capability to use source schema information from popular databases such as Oracle, SQL Server, and PostgreSQL and create appropriate target schema in cloud targets such as Google BigQuery, Snowflake and others. Importantly, these conversions understand data type and structure differences between heterogeneous sources and targets and act intelligently to spot problems and inconsistencies before progressing to data movement, simplifying cloud adoption.

Enhanced Monitoring, Alerting and Diagnostics

On-going data movement between on-premise and cloud environments for migrations, or powering reporting and analytics solutions, are often part of an enterprise’s critical applications. As such they demand deep insights into the status of all active data flows.

Striim 3.10.1 adds the capability to inherently monitor data from its creation in the source to successful delivery in a target, generate detailed lag reports, and alert on situations where lag is outside of SLAs.

End to End Lag Visualization

In addition, this release provides detailed status on checkpointing information for recovery and high availability scenarios, with insight into checkpointing history and currency.

Real-time Checkpointing Information

Simplifies Working with Complex Data

As customers work with heterogeneous environments and adopt more complex integration scenarios, they often have to work with complex data types, or perform necessary data conversions. While always possible through user defined functions, this release adds multiple commonly requested data manipulation functions out of the box. This simplifies working with JSON data and document structures, while also facilitating data cleansing, and regular expression operations.

On-Going Support for Enterprise Sources

As customers upgrade their environments, or adopt new technologies, it is essential that their integration platform keeps pace. In Striim 3.10.1 we extend our support for the Oracle database to include Oracle 19c, including change data capture, add support for schema information and metadata for Oracle GoldenGate trails, and certify our support for Hive 3.1.0

These are a high level view of the new features of Striim 3.10.1. There is a lot more to discover to aid on your cloud adoption journey. If you would like to learn more about the new release, please reach out to schedule a demo with a Striim expert.

Implementing Gartner’s Cloud Smart FEVER selection process using Striim

In their recent research note, “Move From Cloud First to Cloud Smart to Improve Cloud Journey Success” (February 2020), Gartner introduced the concept of using the FEVER selection process to prioritize workloads to move to cloud.

According to the research note, to ensure rapid results by building on the knowledge of earlier experiences with cloud, IT leaders “should prioritize the workloads to move to cloud by using a ‘full circle’ continuous loop selection process: faster, easier, valuable, efficient and repeat (FEVER; see Figure 2). This allows them to deliver results in waves of migrations according to the organization’s delivery capacities.”

While thinking about this concept I realized that following this approach is one of the reasons that Striim’s customers are so successful with their cloud migration and integration initiatives.  They are utilizing a cloud smart approach for real-world use-cases, including online database migrations enabled by change data capture, offloading reporting to cloud environments, and continuous data delivery for cloud analytics.

Faster

The speed of solutions is critical to many of our customers that have strict SLAs, and limited timeframes in which they want to complete their projects. Striim allows customers to build and test data flows supporting cloud adoption very quickly, while Striim’s optimized architecture enables rapid transfer of data from data sources to cloud for both initial load, and on-going real-time data delivery.

Easier

Customers don’t want to spend days or weeks learning a new solution. In order to implement quickly, the solution must be easy to learn and work with. Striim’s wizard-based approach and intuitive UI enables our customers to rapidly build out their data pipelines, and transfer knowledge for on-going operations.

Valuable

Many of our customers are already ‘Cloud Smart’ and approach cloud initiatives in a pragmatic way. They often start with highly critical, but simple migrations, that gives them the highest value in the shortest time. Once all the “lowest-hanging fruits” are picked and successfully implemented, they move onto more complex scenarios, or integrate additional sources.

Efficient

Cost-efficiency for our customers is more than just the on-going cost reductions inherent in moving to a cloud solution. It also includes the time taken by their valuable employees to build and maintain the solution, and the data ingress costs inherent in moving their data to the cloud. By utilizing Striim, they can reduce the amount of time spent to achieve success and reduce their data movement costs by utilizing one-time loads, with on-going change delivery.

Repeat

It is seldom that our customers have a single migration, or cloud adoption to perform. Repeatability, and reusability of the cloud migration or integration is essential to their long-term plans. Not only do they want to be able to repeat similar migrations, but they also want to be able to use the same platform for all of their cloud adoption initiatives. By standardizing on Striim, our customers can take advantage of the large numbers of sources and cloud targets we support and focus on the business imperatives without having to worry whether it’s possible.

 

If you would like to learn more about becoming cloud smart, you can access the full report “Move From Cloud First to Cloud Smart to Improve Cloud Journey Success” (February 2020), for a limited time using this link.

 

Move From Cloud First to Cloud Smart to Improve Cloud Journey Success, Henrique Cecci, 25 February 2020

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and is used herein with permission. All rights reserved.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Striim.

Getting Started with Real-Time ETL to Azure SQL Database

 

 

Running production databases in the cloud has become the new norm. For us at Striim, real-time ETL to Azure SQL Database and other popular cloud databases has become a common use case. Striim customers run critical operational workloads in cloud databases and rely on our enterprise-grade streaming data pipelines to keep their cloud databases up-to-date with existing on-premises or cloud data sources.

Striim supports your cloud journey starting with the first step. In addition to powering fully-connected hybrid and multi-cloud architectures, the streaming data integration platform enables cloud adoption by minimizing risks and downtime during data migration. When you can migrate your data to the cloud without database downtime or data loss, it is easier to modernize your mission-critical systems. And when you liberate your data trapped in legacy databases and stream to Azure SQL DB in sub-seconds, you can run high-value, operational workloads in the cloud and drive business transformation faster.

Streaming Integration from Oracle to Azure SQL DBBuilding continuous, streaming data pipelines from on-premises databases to production cloud databases for critical workloads requires a secure, scalable, and reliable integration solution. Especially if you have enterprise database sources that cannot tolerate performance degradation, traditional batch ETL will not suffice. Striim’s low-impact change data capture (CDC) feature minimizes overhead on the source systems while moving database operations (inserts, updates, and deletes) to Azure SQL DB in real time with security, reliability, and transactional integrity.

Striim is available as a PaaS offering in major cloud marketplaces such as Microsoft Azure Cloud, AWS, and Google Cloud. You can run Striim in the Azure Cloud to simplify real-time ETL to Azure SQL Database and other Azure targets, such as Azure Synapse Analytics, Azure Cosmos DB, Event Hubs, ADLS, and more. The service includes heterogeneous data ingestion, enrichment, and transformation in a single solution before delivering the data to Azure services with sub-second latency. What users love about Striim is that it offers a non-intrusive, quick-to-deploy, and easy-to-iterate solution for streaming data integration into Azure.

To illustrate the ease of use of Striim and to help you get started with your cloud database integration project, we have prepared a Tech Guide: Getting Started with Real-Time Data Integration to Microsoft Azure SQL Database. You will find step-by-step instructions on how to move data from an on-premises Oracle Database to Azure SQL Database using Striim’s PaaS offering available in the Azure Marketplace. In this tutorial you will see how Striim’s log-based CDC enables a solution that doesn’t impact your source Oracle Database’s performance.

If you have, or plan to have, Azure SQL Databases that run operational workloads, I highly recommend that you use a free trial of Striim along with this tutorial to find out how fast you can set up enterprise-grade, real-time ETL to Azure SQL Database. On our website you can find additional tutorials for different cloud databases. So be sure to check out our other resources as well. For any streaming integration questions, please feel free to reach out.

 

Cloud Adoption: How Streaming Integration Minimizes Risks

 

 

Last week, we hosted a live webinar, Cloud Adoption: How Streaming Integration Minimizes Risks. In just 35 minutes, we discussed how to eliminate database downtime and minimize other risks of cloud migration and ongoing integration for hybrid cloud architecture, including a live demo of Striim’s solution.

Our first speaker, Steve Wilkes, started the presentation discussing the importance of cloud adoption for today’s pandemic-impacted, fragile business environment. He continued with the common risks of cloud data migration and how streaming data integration with low-impact change data capture minimizes both downtime and risks. Our second presenter, Edward Bell, gave us a live demonstration of Striim for zero downtime data migration. In this blog post, you can find my short recap of the key areas of the presentation. This summary certainly cannot do justice to the comprehensive discussion we had at the webinar. That’s why I highly recommend you watch the full webinar on-demand to access details on the solution architecture, its comparison to batch ETL approach, customer examples, the live demonstration, and the interactive Q&A section.

Cloud adoption brings multiple challenges and risks that prevent many businesses from modernizing their business-critical systems.

Limited cloud adoption and modernization reduces the ability to optimize business operations. These challenges and risks include causing downtime and business disruption and losing data during the migration, which are simply not acceptable for critical business systems. The risk list, however, is longer than these two. Switching over to cloud without adequate testing that leads to failures, working with stale data in the cloud, and data security and privacy are also among the key concerns.

Steve emphasized the point that “rushing the testing of the new environment to reduce the downtime, if you cannot continually feed data, can also lead to failures down the line or problems with the application.” Later, he added that “Beyond the migration, how do you continually feed the system? Especially in integration use cases where you are maintaining the data where it was and also delivering somewhere else, you need to continuously refresh the data to prevent staleness.”

Each of these risks mentioned above are preventable with the right approach to data movement between the legacy and new cloud systems.

 

Streaming data integration plays a critical role in successful cloud adoption with minimized risks.

A reliable, secure, and scalable streaming data integration architecture with low-impact change data capture enables zero database downtime and zero data loss during data migration. Because the source system is not interrupted, you can test the new cloud system as long as you need before the switchover. You also have the option to failback to the legacy system after switchover by reversing the data flow and keeping the old system up-to-date with the cloud system until you are fully confident that it is stable.

CDCInitialLoad.png” alt=”” width=”1902″ height=”958″ />

Striim’s cloud data migration solution uses this modern approach. During the bulk load, Striim’s CDC component collects the source database changes in real time. As soon as the initial load is complete, Striim applies the changes to the target environment to maintain the legacy and cloud database consistency. With built-in exactly once processing (E1P), Striim can avoid data both data loss and duplicates. You have the ability to use Striim’s real-time dashboards to monitor the data flow and various detailed performance metrics.

Continuous, streaming data integration for hybrid cloud architecture liberates your data for modernization and business transformation.

Cloud adoption and streaming integration are not limited to the lifting and shifting of your systems to the cloud. Ongoing integration post-migration is a crucial part of planning your cloud adoption. You cannot restrict it to database sources and database targets in the cloud, either. Your data lives in various systems and needs to be shared with different endpoints, such as your storage, data lake, or messaging systems in the cloud environment. Without enabling comprehensive and timely data flow from your enterprise systems to the cloud, what you can achieve in the cloud will be very limited.

“It is all about liberating your data.” Steve added in this part of the presentation. “Making it useful for the purpose you need it for. Continuous delivery in the correct format from a variety of sources relies on being able to filter that data, transform it, and possibly aggregate, join and enrich before you deliver to where needed. All of these can be done in Striim with a SQL-based language.”

A key point both Edward and Steve made is that Striim is very flexible. You can source from multiple sources and send to multiple targets. True data liberation and modernizing your data infrastructure needs that flexibility.

Striim also provides deployment flexibility. In fact, this was a question in the Q&A part, asking about deployment options and pricing. Unfortunately we could not answer all the questions we received. The short answer is: Striim can be deployed in the cloud, on-premises, or both via a hybrid topology. It is priced based on the CPUs of the servers where the Striim platform is installed. So you don’t need to worry about the sizes of your source and target systems.

There is much more covered in this short webinar we hosted on cloud adoption. I invite you to watch it on-demand at your convenience. If you would like to get a customized demo for cloud adoption or other streaming data integration use cases, please feel free to reach out.

Simplify Your Azure Hybrid Cloud Architecture with Streaming Data Integration

While the typical conversation about Azure hybrid cloud architecture may be centered around scaling applications, VMs, and microservices, the bigger consideration is the data. Spinning up additional services on-demand in Azure is useless if the cloud services cannot access the data they need, when they need it.

“According to a March 2018 hybrid cloud report from 451 Research and NTT Communications, around 63% of firms have a formal strategy for hybrid infrastructure. In this case, hybrid cloud does not simply mean using a public cloud and a private cloud. It means having a seamless flow of data between all clouds, on and off-premises.” – Data Foundry

To help simplify providing a seamless flow of data to your Microsoft Azure hybrid cloud infrastructure, we’re happy to announce that the Striim platform is available in the Microsoft Azure Marketplace.

How Streaming Data Integration Simplifies Your Azure Hybrid Cloud Architecture

Enterprise-grade streaming data integration enables continuous real-time data movement and processing for hybrid cloud, connecting on-prem data sources and cloud environments, as well as bridging a wide variety of cloud services. With in-memory stream processing for hybrid cloud, companies can store only the data they need, in the format that they need. Additionally, streaming data integration enables delivery validation and data pipeline monitoring in real time.

Streaming data integration simplifies real-time streaming data pipelines for cloud environments. Through non-intrusive change data capture (CDC), organizations can collect real-time data without affecting source transactional databases. This enables cloud migration with zero database downtime and minimized risk, and feeds real-time data to targets with full context – ready for rich analytics on the cloud – by performing filtering, transformation, aggregation, and enrichment on data-in-motion.

Azure Hybrid Cloud Architecture

Key Traits of a Streaming Data Integration Solution for Your Azure Hybrid Cloud Architecture

There are three important objectives to consider when implementing a streaming data integration solution in an Azure hybrid cloud architecture:

  • Make it easy to build and maintain –The ability to use a graphical user interface (GUI) and a SQL-based language can significantly reduce the complexity of building streaming data pipelines, allowing more team members within the company to maintain the environment.
  • Make it reliable – Enterprise hybrid cloud environments require a data integration solution that is inherently reliable with failover, recovery and exactly-once processing guaranteed end-to-end, not just in one slice of the architecture.
  • Make it secure –Security needs to be treated holistically, with a single authentication and authorization model protecting everything from individual data streams to complete end-user dashboards. The security model should be role-based with fine-grained access, and provide encryption for sensitive resources.

Striim for Microsoft Azure

The Striim platform for Azure is an enterprise-grade data integration platform that simplifies an Azure-based hybrid cloud infrastructure. Striim provides real-time data collection and movement from a variety of sources such as enterprise databases (ie, Oracle, HPE NonStop, SQL Server, PostgreSQL, Amazon RDS for Oracle, Amazon RDS for MySQL via low-impact, log-based change data capture), as well as log files, sensors, messaging systems, NoSQL and Hadoop solutions.

Once the data is collected in real time, it can be streamed to a wide variety of Azure services including Azure Cosmos DB, Azure SQL Database, Azure SQL Data Warehouse, Azure Event Hubs, Azure Data Lake Storage, and Azure Database for PostgreSQL

While the data is streaming to Azure, Striim enables in-stream processing such as filtering, transformations, aggregations, masking, and enrichment, making the data more valuable when it lands. This is all done with sub-second latency, reliability and securty via an easy-to-use interface and SQL-based programming language.

To learn more about Striim’s capabilities to support the data integration requirements for an Azure hybrid cloud architecture, read today’s press release announcing the availability of the Striim platform in the Microsoft Azure Marketplace, and check out all of Striim’s solutions for Azure.

What’s New in Striim 3.9.5

What’s New in Striim 3.9.5: More Cloud Integrations; Greater On-Prem Extensibility; Enhanced Manageability

Striim’s development team has been busy, and launched a new release of the platform, Striim 3.9.5, last week. The goal of the release was to enhance the platform’s manageability while boosting its extensibility, both on-premises and in the cloud.

I’d like to give you a quick overview of the new features; starting with expanded cloud integration capabilities.

  • Striim 3.9.5 now offers direct writers for both Azure Data Lake Storage Gen 1 and  Gen 2. This capability allows businesses to stream real-time, pre-processed data to their Azure data lake solutions from enterprise databases, log files, messaging systems such as Kafka, Hadoop, NoSQL, and sensors, deployed on-prem or in the cloud.
  • Striim’s support for Google Pub/Sub is now improved with a direct writer. Google Pub/Sub serves as a messaging service for GCP services and applications. Rapidly building real-time data pipelines into Google Pub/Sub from existing on-prem or cloud sources allows businesses to seamlessly adopt GCP for their critical business operations and achieve the maximum benefit from their cloud solutions.
  • Striim has been providing streaming data integration to Google BigQuery since 2016. With this release, Striim supports additional BigQuery functionalities such as SQL MERGE.
  • Similarly, the new release brings enhancements to Striim’s existing Azure Event Hubs Writer and Amazon Redshift Writer to simplify development and management.

In addition to cloud targets, Striim boosted its heterogeneous sources and destinations for on-premises environments too. The 3.9.5 release includes:

  • Writing to and reading from Apache Kafka version 2.1
  • Real-time data delivery to HPE NonStop SQL/MX
  • Support for compressed data when reading from GoldenGate Trail Files
  • Support for NCLOB columns in log-based change data capture from Oracle databases

Following on to the 3.9 release, Striim 3.9.5 also added a few new features to improve Striim’s ease of use and manageability:

  • Striim’s users can now organize their applications with user-defined groups and see deployment status with color-coded indicators on the UI. This feature increases productivity, especially when there are hundreds of Striim applications running or in the process of being deployed, as many of our customers do.

 

 

 

 

 

 

 

 

 

 

 

 

 

  • New recovery status indicators in Striim 3.9.5 allow users to track when the application is in the replay mode for recovery versus in the forward processing mode after the recovery is completed.
  • Striim’s application management API now allows resuming a crashed application.
  • Last but not least, Striim 3.9.5 offers easier and more detailed monitoring of open transactions in Oracle databases sources.

For a deeper dive into the new features in Striim 3.9.5, please request a customized demo. If you would like to check out any of these features for yourself, we invite you to download a free trial.

Striim - 2019 CODiE Awards - Best iPaaS

Striim Is a 2019 CODiE Awards Finalist for Best iPaaS Solution

Striim is proud to announce that we’ve been recognized by SIIA as a 2019 CODiE Awards Finalist as a Best iPaaS, or Integration Platform as a Service.Striim - 2019 CODiE Awards - Best iPaaS

Why was Striim selected as a Best iPaaS solution? Striim is the only streaming (real-time) data integration platform running in the cloud that is built specifically to support cloud computing.

Real-time data integration is crucial for hybrid and multi-cloud architectures. Striim’s iPaaS solutions for real-time data integration in the cloud brings the agility and cost benefits of the cloud to integration use cases.

Striim enables companies to:

  • Quickly and easily provision streaming data pipelines to deliver real-time data to the cloud, or between cloud services
  • Easily adopt a multi-cloud architecture by seamlessly moving data across different cloud service providers: Azure, AWS, and Google Cloud
  • Offload operational workloads to cloud by moving data in real time and in the desired format
  • Filter, aggregate, transform, and enrich data-in-motion before delivering to the cloud in order to optimize cloud storage
  • Migrate data to the cloud without interrupting business operations
  • Minimize risk of cloud migrations with real-time, built-in cloud migration monitoring to avoid data divergence or data loss
  • Stream data in real time between cloud environments and back to on-premises systems

As one of the best iPaaS solutions, the Striim platform supports all aspects of Cloud integration as it relates to hybrid cloud and multi-cloud deployments.

Striim enables zero-downtime data migration to cloud by performing an initial load, and delivering the changes to the legacy system that occurred during the loading without pausing the source system. To prevent data loss, it validates that all of the data from on-premises sources migrated to the cloud environment.

Striim’s iPaaS solution provides the real-time data pipelines to and from the cloud to enable operational workloads in the cloud with the availability of up-to-date data.

Striim supports multi-cloud architecture by streaming data between different cloud platforms, including Azure, Google and AWS, and other cloud technologies such as Salesforce and Snowflake. If necessary, Striim can also provide real-time data flows between services offered within each of the three cloud platforms.

About Striim for Data IPaaS

Running as a PaaS solution on Microsoft Azure, AWS and Google Cloud Platform, the Striim streaming data integration platform offers real-time data ingestion from on-premises and cloud-based databases (including Oracle, SQL Server, HPE NonStop, PostgreSQL and MySQL), data warehouses (such as Oracle Exadata and Teradata), cloud services (such as AWS RDS and Amazon S3), Salesforce, log files, messaging systems (including Kafka), sensors, and Hadoop solutions.

Striim delivers this data in real time to a wide variety of cloud services (for example, Azure SQL Data Warehouse, Cosmos DB and Event Hubs; Amazon Redshift, S3 and Kinesis; and Google BigQuery, Cloud SQL and Pub/Sub), with in-flight transformations and enrichments.

Users can rapidly provision and deploy integration applications via a click-through interface using Striim’s pre-built templates and pre-configured integrations that are optimized for their cloud endpoints.

To learn more about Striim’s capabilities as one of the best iPaaS solutions, check out our three-part blog series, “Striim for Data iPaaS.”

What is iPaaS for Data?

What is iPaaS for Data?

Organizations can leverage a wide variety of cloud-based services today, and one of the fastest growing offerings is integration platform as a service. But what is iPaaS?

There are two major categories of iPaaS solutions available, focusing on application integration and data integration. Application integration works at the API level, typically involves relatively low volumes of messages, and enables multiple SaaS applications to be woven together.What is iPaaS for Data?

Integration platform as a service for data enables organizations to develop, execute, monitor, and govern integration across disparate data sources and targets, both on-premises and in the cloud, with processing and enrichment of the data as its streaming.

Within the scope of iPaaS for data there are older batch offerings, and more modern real-time streaming solutions. The latter are better suited to the on-demand and continuous way organizations are utilizing cloud resources.

Streaming data iPaaS solutions facilitate integration through intuitive UIs, by providing pre-configured connectors, automated operators, wizards and visualization tools to facilitate creation of data pipelines for real-time integration. With the iPaaS model, companies can develop and deploy the integrations they need without having to install or manage additional hardware or middleware, or acquire specific skills related to data integration. This can result in significant cost savings and accelerated deployment.

This is particularly useful as enterprise-scale cloud adoption becomes more prevalent, and organizations are required to integrate on-premises data and cloud data in real time to serve the company’s analytics and operational needs.

Factors such as increasing awareness of the benefits of iPaaS among enterprises – including reduced cost of ownership and operational optimization – are fueling the growth of the market worldwide.

For example, a report by Markets and Markets notes that the Integration Platform as a Service market is estimated to grow from $528 million in 2016 to nearly $3 billion by 2021, at a compound annual growth rate (CAGR) of 42% during the forecast period.

“The iPaaS market is booming as enterprises [embrace] hybrid and multi-cloud strategies to reduce cost and optimize workload performance” across on-premises and cloud infrastructure, the report says. Organizations around the world are adopting iPaaS and considering the deployment model an important enabler for their future, the study says.

Research firm Gartner, Inc. notes that the enterprise iPaaS market is an increasingly attractive space due to the need for users to integrate multi-cloud data and applications, with various on-premises assets. The firm expects the market to continue to achieve high growth rates over the next several years.

By 2021, enterprise iPaaS will be the largest market segment in application middleware, Gartner says, potentially consuming the traditional software delivery model along the way.

“iPaaS is a key building block for creating platforms that disrupt traditional integration markets, due to a faster time-to-value proposition,” Gartner states.

The Striim platform can be deployed on-premises, but is also available as an iPaaS solution on Microsoft Azure, Google Cloud Platform, and Amazon Web Services. This solution can integrate with on-premise data through a secure agent installation. For more information, we invite you to schedule a demo with one of our lead technologists, or download the Striim platform.

2019 Technology Predictions

19 For 19: Technology Predictions For 2019 and Beyond

Striim’s 2019 Technology Predictions article was originally published on Forbes.

With 2018 out the door, it’s important to take a look at where we’ve been over these past twelve months before we embrace the possibilities of what’s ahead this year. It has been a 2019 Technology Predictionsfast-moving year in enterprise technology. Modern data management has been a primary objective for most enterprise companies in 2018, evidenced by the dramatic increase in cloud adoption, strategic mergers and acquisitions and the rise of artificial intelligence (AI) and other emerging technologies.

Continuing on from my predictions for 2018, let’s take out the crystal ball and imagine what could be happening technology-wise in 2016.

2019 Technology Predictions for Cloud

• The center of gravity for enterprise data centers will shift faster towards cloud as enterprise companies continue to expand their reliance on the cloud for more critical, high-value workloads, especially for cloud-bursting and analytics applications.

• Technologies that enable real-time data distribution between different cloud and on-premises systems will become increasingly important for almost all cloud use-cases.

• With the acquisition of Red Hat, IBM may not directly challenge the top providers but will play an essential role through the use of Red Hat technologies across these clouds, private clouds and on-premise data centers in increasingly hybrid models.

• Portable applications and serverless computing will accelerate the move to multi-cloud and hybrid models utilizing containers, Kubernetes, cloud and multi-cloud management, with more and more automation provided by a growing number of startups and established players.

• As more open-source technologies mature in the big data and analytics space, they will be turned into scalable managed cloud services, cannibalizing the revenue of commercial companies built to support them.

2019 Technology Predictions for Big Data

• Despite consolidation in the big data space, as evidenced by the Cloudera/Hortonworks merger, enterprise investment in big data infrastructure will wane as more companies move to the cloud for storage and analytics. (Full disclosure: Cloudera is a partner of Striim.)

• As 5G begins to make its way to market, data will be generated at even faster speeds, requiring enterprise companies to seriously consider modernizing their architecture to work natively with streaming data and in-memory processing.

• Lambda and Kappa architectures combining streaming and batch processing and analytics will continue to grow in popularity driven by technologies that can work with both real-time and long-term storage sources and targets. Such mixed-use architectures will be essential in driving machine learning operationalization.

• Data processing components of streaming and batch big data analytics will widely adopt variants of the SQL language to enable self-service processing and analytics by users that best know the data, rather than developers that use APIs.

• As more organizations operate in real time, fast, scalable SQL-based architectures like Snowflake and Apache Kudu will become more popular than traditional big data environments, driven by the need for continual up-to-date information.

2019 Technology Predictions for Machine Learning/Artificial Intelligence

• AI and machine learning will no longer be considered a specialty and will permeate business on a deeper level. By adopting centralized cross-functional AI departments, organizations will be able to produce, share and reuse AI models and solutions to realize rapid return on investment (ROI).

• The biggest benefits of AI will be achieved through integration of machine learning models with other essential new technologies. The convergence of AI with internet of things (IoT), blockchain and cloud investments will provide the greatest synergies with ground-breaking results.

• Data scientists will become part of DevOps in order to achieve rapid machine learning operationalization. Instead of being handed raw data, data scientists will move upstream and work with IT specialists to determine how to source, process and model data. This will enable models to be quickly integrated with real-time data flows, as well as continually evaluating, testing and updating models to ensure efficacy.

2019 Technology Predictions for Security

• The nature of threats will shift from many small actors to larger stronger, possibly state-sponsored adversaries, with industrial rather than consumer data being the target. The sophistication of these attacks will require more comprehensive real-time threat detection integrated with AI to adapt to ever-changing approaches.

• As more organizations move to cloud analytics, security and regulatory requirements will drastically increase the need for in-flight masking, obfuscation and encryption technologies, especially around PII and other sensitive information.

2019 Technology Predictions for IoT

• IoT, especially sensors coupled with location data, will undergo extreme growth, but will not be purchased directly by major enterprises. Instead, device makers and supporting real-time processing technologies will be combined by integrators using edge processing and cloud-based systems to provide complete IoT-based solutions across multiple industries.

• The increased variety of IoT devices, gateways and supporting technologies will lead to standardization efforts around protocols, data collection, formatting, canonical models and security requirements.

2019 Technology Predictions for Blockchain

• The adoption of blockchain-based digital ledger technologies will become more widespread, driven by easy-to-operate and manage cloud offerings in Amazon Web Services (AWS) and Azure. This will provide enterprises a way to rapidly prototype supply chain and digital contract implementations. (Full disclosure: AWS and Azure are partners of Striim.)

• Innovative new secure algorithms, coupled with computing power advances, will speed up the processing time of digital ledger transactions from seconds to milliseconds or microseconds in the next few years, enabling high-velocity streaming applications to work with blockchain.

Whether or not any of these 2019 technology predictions come to pass, we can be sure this year will bring a mix of steady movement towards enterprise modernization, continued investment in cloud, streaming architecture and machine learning, and a smattering of unexpected twists and new innovations that will enable enterprises to think — and act — nimbly.

Any thoughts or feedback on my 2019 technology predictions? Please share on Steve’s LinkedIn page: https://www.linkedin.com/in/stevewilkes/  For more information on Striim’s solutions in the areas Cloud, Big Data, Security and IoT, please visit our Solutions page, or schedule a brief demo with one of our lead technologists.

Rapid Adoption of Azure Database for PostgreSQL

You want to move to Azure Database for PostgreSQL, but much of your data may currently be elsewhere – locked up on-premises in Oracle, MySQL, legacy systems, and other locations. You need a new hybrid cloud integration strategy for the continuous movement of enterprise data to, and from, Azure Cloud, with continuous collection, processing, and delivery of enterprise data in real time (not batch) to ensure Azure Database for PostgreSQL is always up to date.

Data from on-premises and cloud sources needs to be delivered to Azure Database for PostgreSQL, including a one-time load, and continuous change delivery with in-flight processing to ensure up-to-the-second information for your users.

Speed adoption of Azure Database for PostgreSQL using Striim’s streaming data integration from virtually any enterprise data source.

Striim is a next generation streaming data integration and intelligence platform that supports your hybrid cloud initiatives, and provides integration with multiple Azure Cloud technologies.

This video demonstrates how Striim can provide continuous data integration into Azure Database for PostgreSQL through a pipeline for the real-time collection, processing, and delivery of enterprise data, sourcing from Oracle on-premises.

In this case, we will be doing an initial load followed by continuous change delivery from Oracle to Azure Database for PostgreSQL. Striim’s UI makes it easy to continuously, and non-intrusively, ingest all your enterprise data from a variety of sources in real-time.

We’ll start by doing an initial load of data from Oracle on-premises to Azure PostgreSQL using a data flow. When the flow is started, the full contents of the on-premises CUSTOMER table is loaded into Azure PostgreSQL. After a short time, the 4.5M rows in the source table are present in the Azure PostgreSQL CUSTOMER_TGT table. This can be monitored using Striim, and the Azure Cloud Monitor UI.

Once the initial load is complete, we can continuously deliver change data using CDC from Oracle into the Azure PostgreSQL instance. A separate flow is used so that the Initial Load and CDC can be coordinated. After many changes, you can see that Azure PostgreSQL is completely up to date with the on-premises Oracle instance.

The continuous updates can also be monitored through the Striim UI. You have seen how Striim can enable your hybrid cloud initiatives and accelerate the adoption of Azure Database for PostgreSQL.

Learn more about how Striim can move a variety of data to your Azure environment by visiting our Striim for Azure product page, download a free trial of the Striim platform, or provision Striim in the Azure Marketplace.

Streaming Integration to Azure Cosmos DB

Real-time integration to Azure Cosmos DB enables companies to make the most of the environment’s globally-distributed, multi-model database service. With Striim’s streaming integration to Azure Cosmos DB solution, companies can continuously feed real-time operational data from a wide-range of on-premises and cloud-based data sources.

What is Striim?

The Striim software platform offers continuous, real-time data movement from enterprise document and relational databases, sensors, messaging systems, and log files into Azure Cosmos DB with in-flight transformations and built-in delivery validation to support real-time reporting, IoT analytics, and transaction processing.

Streaming Integration to Azure Cosmos DB

Offload Operational Reporting

  • Move real-time unstructured and structured data to Cosmos DB to support operational workloads including real-time reporting
  • Continuously collect data from a diverse set of sources (such as Internet of Things (IoT) sensors) for timely and rich insight

Accelerate and Simplify Processing

  • Perform filtering, transformations, aggregation, and enrichments in-flight before delivery to Cosmos DB
  • Avoid adding latency via stream processing
  • Easily convert structured data to document form

Ease the Cosmos DB Adoption Process

  • Use phased and zero-downtime migration from MongoDB by running them in parallel
  • Continuously visualize and monitor data pipelines with real-time alerts
  • Prevent data loss with built-in validation

How Striim Delivers Streaming Integration to Azure Cosmos DB

Low-Impact Change Data Capture from Enterprise Databases

  • Continuous, non-intrusive data ingestion for high-volume data
  • Support for databases such as Oracle, SQL Server, HPE NonStop, MySQL, PostgreSQL, MongoDB, Amazon RDS for Oracle, and Amazon RDS for MySQL
  • Real-time data collection from logs, sensors, Hadoop and message queues to support rich and timely analytics

Continuous, In-Flight Data Processing

  • In-line transformation, filtering, aggregation, enrichment to store only the data you need, in the right format
  • Uses SQL-based continuous queries via a drag-and-drop UI

Real-Time Data Delivery with Built-In Monitoring

  • Continuous verification of source and target database consistency
  • Interactive, live dashboards for streaming data pipelines
  • Real-time alerts via web, text, email

Streaming Integration to Azure Cosmos DB

To learn more about how to leverage Striim’s solution for streaming integration to Azure Cosmos DB, check out our Striim for Azure Cosmos DB solution page, schedule a brief demo with a Striim technologist, provision Striim for Cosmos DB on the Azure marketplace, or download a free trial of the Striim platform and get started today!

Streaming Integration to Azure

To adopt modern data warehousing, advanced big data analytics, and machine learning solutions in the Azure Cloud, businesses need streaming integration to Azure. They need to be able to continuously feed real-time operational data from existing on-premises and cloud-based data stores and data warehouses.

What is Striim?

The Striim software platform offers continuous, real-time data movement from heterogeneous, on-premises systems and AWS into Azure with in-flight transformations and built-in delivery validation to make data immediately available in Azure, in the desired format.

Streaming Integration to Azure

Implement Operational Data Warehouse on Azure Cloud

  • Rapidly set up real-time data pipelines from on-prem databases and AWS to enable real-time operational data store
  • Perform transformations, including denormalization, in-flight
  • Use phased and zero downtime migration from Oracle Exadata, Teradata, AWS Redshift by running them in parallel
  • Prevent data loss with built-in validation

Run Operational Workloads in Azure Databases

  • Continuously stream on-prem and AWS data to Azure SQL DB, Cosmos DB, Azure Database for MySQL, and Azure Database for PostgreSQL
  • Use non-intrusive change data capture to avoid impacting sources
  • Offload operational reporting
  • Move data continuously from MongoDB, sensors and other sources to Cosmos DB

Use Pre-Processed, Real-Time Data for Advanced Big Data Analytics and ML

  • Feed real-time data to Azure Data Lake Storage, Azure DataBricks, and Azure HDInsight from on-prem or AWS databases, log files, messaging systems, Hadoop, and sensors
  • Pre-process data-in-motion to reduce ETL efforts and accelerate insight
  • Continuously visualize and monitor data pipelines with real-time alerts

How Striim Works to Achieve Streaming Integration to Azure

Low Impact Change Data Capture from Enterprise Databases

  • Non-stop, non-intrusive data ingestion for high-volume data
  • Support for data warehouses such as Oracle Exadata, Teradata, Amazon Redshift; and databases such as Oracle, SQL Server, HPE NonStop, MySQL, PostgreSQL, MongoDB, Amazon RDS for Oracle, Amazon RDS for MySQL
  • Real-time data collection from logs, sensors, Hadoop and message queues to support operational decision making

Continuous Data Processing and Delivery

  • In-flight transformation, incl. denormalization, filtering, aggregation, enrichment to store only the data you need, in the right format
  • Real-time data delivery to Azure SQL Data Warehouse, SQL Server on Azure, Azure SQL Database, Azure Data Lake Storage, Azure Databricks, Kafka, Azure HDInsight, and Cosmos DB

Built-In Monitoring and Validation

  • Interactive, live dashboards for streaming data pipelines
  • Continuous verification of source and target database consistency
  • Real-time alerts via web, text, emailStreaming Integration to Azure

Why Striim?

As an enterprise-grade platform with built-in high-availability, scalability, and reliability, Striim is designed to deliver tangible ROI with low TCO to meet the real-time requirements for streaming integration to Azure in mission-critical environments.

With a broad set of supported sources, Striim enables you to make virtually any data available on Azure in real time and the desired format to support next-generation cloud analytics and operational decision making on a continuous basis.

To learn more about how to use Striim for streaming integration to Azure, check out our Striim for Azure product page, schedule a short demo with a Striim technologist, or download a free trial of the Striim platform and get started today.

Continuously Move Data to Snowflake

Enterprises must continuously move data to Snowflake to take full advantage of this data warehouse built for the cloud.

You chose Snowflake to provide rapid insights into your data on a massive scale, on AWS or Azure. However, most of your source data resides elsewhere – in a wide variety of on-premise or cloud sources. How do you continually move data to Snowflake in real-time, processing it along the way, so that your fast analytics and insights are reporting on timely data?

Snowflake was built for the cloud, and built for speed. By separating compute from storage you can easily scale up and down as needed. This gives you instant elasticity supporting any amount of data, and high speed queries for any number of users, coupled with the peace of mind provided by secure data sharing. The per-second pricing and support for multiple clouds allows you to choose your infrastructure and only pay when you are using the data warehouse.

However, residing in cloud means you have to determine how to most effectively move data to Snowflake. This could be migrating an existing Teradata or Exadata Data Warehouse, or continually populating Snowflake with newly generated on-premises data from operational databases, logs, or device information. In order for the warehouse to provide up-to-date information, there should be as little latency as possible between the original data creation and its delivery to Snowflake.

The Striim platform can help with all these requirements and more. Our database adapters support change data capture, or CDC, from enterprise or cloud databases. CDC directly intercepts database activity and collects all the inserts, updates, and deletes as they happen, ready to stream into Snowflake. Adapters for machine logs and other files read at the end of multiple files in parallel to stream out data as it is written, removing the inherent latency of batch. While data from devices and messaging systems can be collected easily, independent of their format, through a variety of high-speed adapters and parsers.

After being collected continuously, the streaming data can be delivered directly into Snowflake with very low latency, or pushed through a data pipeline where it can be pre-processed through filtering, transformation, enrichment, and correlation using SQL-based queries, before delivery into Snowflake. This enables such things as data denormalization, change detection, de-duplication, and quality checking before the data is ever stored.

In addition to this, because Striim is an enterprise grade platform, it can scale with Snowflake and reliably guarantee delivery of source data while also providing built-in dashboards and verification of data pipelines for operational monitoring purposes.

The Striim wizard-based UI enables users to rapidly create a new data flow to move data to Snowflake. In this example, real-time change data from Oracle is being continually delivered to Snowflake. The wizard walks you through all the configuration steps, checking that everything is set up properly, and results in a data flow application. This data flow can be enhanced to filter, transform and enrich the data through SQL-based queries. In the video, we add a name and email address from a cache, based on an ID present in the original data.

When the application is started, data flows in real-time from Oracle to Snowflake. Making changes in Oracle results in the transformed data being written continually to Snowflake, visible through the Snowflake UI.

Striim and Snowflake can change the way you do analytics, with Snowflake providing rapid insight to the real-time data provided by Striim. The data warehouse that is built for the cloud needs data delivered to the cloud, and Striim can continuously move data to Snowflake to support your business operations and decision-making.

To learn more about how Striim makes it easy to continuously move data to Snowflake, visit our Striim for Snowflake product page, schedule a demo with a Striim technologist, or download the platform and try it for yourself. 

Why CDC to Azure is Essential for Cloud Adoption

 

 

Let’s discuss why CDC to Azure is a critical component in successfully utilizing the Azure cloud.

Adopting Azure cloud services is important to your business. But why is change data capture (CDC) and real-time data movement a necessary part of this process?

You’ve already decided that you want to adopt Azure cloud services. This could be Azure SQL DB, EventHubs, Azure SQL Data Warehouse, Cosmos DB, or a myriad of other technologies. There are a number of reasons why you may have made the decision to utilize Azure cloud services. You may want to migrate on-premises applications directly to the cloud, use the cloud to add scalability to applications as required, or move data to the cloud for analytics purposes.

Here are some use cases that highlight why CDC to Azure, and in Striim’s case specifically, log-based CDC to Azure, is essential for cloud adoption. Log-based CDC directly intercepts database activity and collects all the inserts, updates, and deletes as they happen.

If you are migrating applications to the cloud, you need them to work as-is. Your application can be changed to use Azure SQL Database, or Azure Database for PostgreSQL or MySQL, but the data will need to be migrated. And migrations don’t happen instantly. Unless you can stop people from using the application, the data will have changed by the time the migration is finished. You will also want to test the cloud application to ensure it is working correctly, before flipping the switch and finalizing the transition to the cloud. This validation process can take weeks or even months.

So there are two necessary tasks. Firstly, you need to make an initial copy of the database, and secondly, you need to apply any changes that happen on-prem to the cloud. The initial copy can be made in many ways, but it is important to start collecting change while, and after, this is happening to ensure the on-prem and cloud databases are, and remain, identical. CDC is used to collect the change from on-premises databases and apply this change to the cloud.

In the case of scaling applications in-part to the cloud, otherwise known as cloud bursting, you need read-only instances. The process is similar to cloud migrations, except you are targeting more than one Azure Database with the same data. The read-only instances need to be continually kept up to date, and again CDC to Azure is the most suitable technology to achieve this.CDCtoAzure-FeaturedGraphic.jpg”>CDCtoAzure-FeaturedGraphic.jpg” alt=”” width=”351″ height=”185″ />

Azure gives you a number of choices for cloud analytics. You can either deliver directly to Azure SQL Data Warehouse, or use an intermediary like Azure Data Lake Storage, or Azure Event Hubs to host the data before preparing for analytics. Azure Cosmos DB can also be the analytics source for document and NoSQL data. However, to perform accurate and up-to-date analytics, you need to ensure the data is current. Once more, this is best accomplished through CDC to Azure.

Striim’s streaming integration platform can continuously collect data from on-prem or other cloud databases, and deliver to all of your Azure endpoints. Striim can take care of initial loads, as well as CDC for the continuous application of change, and these data flows can be created rapidly through our wizard-driven UI. With Striim, your cloud migrations, scaling and analytics are just a few wizard-steps away.

Please visit our Striim for Microsoft Azure solutions page to learn more about the wide variety of Azure environments we support, including Azure SQL Data Warehouse, Azure databases, and other Azure analytics services.

Enabling Real-Time Data Warehousing with Azure SQL Data Warehouse

In this post, we will discuss how to enable real-time data warehousing for modern analytics through streaming integration with Striim to Azure SQL Data Warehouse.

Azure SQL Data Warehouse provides a fully managed, fast, flexible, and scalable cloud analytics platform. It enables massive parallel processing and elasticity working with Azure Data Lake Store and other Azure services to load raw and processed data. However, much of your data may currently reside elsewhere, for example, locked up on-premises, in a variety of clouds, in Oracle Exadata, Teradata, Amazon Redshift, operational databases, and other locations.

A requirement for real-time data warehousing and modern analytics is to continuously integrate data into Azure cloud analytics so that you are always acting on current information. This new hybrid cloud integration strategy must enable the continuous movement of enterprise data – to, from, and between clouds – providing continuous ingestion, storage, preparation, and serving of enterprise data in real time, not batch. Data from on-prem and cloud sources need to be delivered into multiple Azure endpoints, including a one-time load and continuous change delivery, with in-flight processing to ensure up-to-the-second information for analytics.

Striim is a next-generation streaming integration and intelligence platform that supports your hybrid cloud initiatives, enabling integration with multiple Azure cloud technologies. Please watch the embedded video to see how Striim can provide continuous data integration into Azure SQL Data Warehouse via Azure Data Lake Store through a pipeline for the ingestion, storage, preparation, and serving of enterprise data.

  • Ingest. Striim makes it easy to continuously and non-intrusively ingest all your enterprise data from a variety of sources in real time. In the video example, Striim collects live transactions from Oracle Exadata orders table.
  • Store. Striim can continuously deliver data to a variety of Azure targets including Azure Data Lake Store. Striim can be used to pre-process your data in real time as it is being delivered into the store to speed downstream activities.
  • Prep & Train. Azure DataBricks uses the data that Striim writes to Azure Data Lake Store for machine learning and transformations. Results can be loaded into Azure SQL Data Warehouse, and the machine learning model could be used by Striim for live scoring.
  • Model & Serve. Striim orchestrates the process to ensure fast, reliable, and scalable poly-based delivery to Azure SQL Data Warehouse from Azure Data Lake Store, enabling analytics applications to always be up-to-date.

See how Striim can enable your hybrid cloud initiatives and accelerate the adoption of Azure SQL Data Warehouse for flexible and scalable cloud analytics. Read more about Striim for Azure SQL Data Warehouse. Get started with Striim now with a trial download on our website, or via Striim’s integration offerings in the Azure Marketplace.

Striim’s Latest Releases Boost Cloud Integration Capabilities, Ease of Use, and Extensibility – Part 1

The Striim team has been busy! With a focus on cloud integration and extensibility of the Striim platform, we have delivered two new releases in the last two months. We are excited to share with you what’s new.

In late June 2018, we released version 3.8.4 which brought several features that improve the manageability and the extensibility of the platform, while making it easy for you to offload critical analytics workloads to the cloud. Earlier this month, we released Striim version 3.8.5 which includes a platform as a service (PaaS) offering for real-time data integration to Azure SQL Data Warehouse. In this blog post, you can find an overview of the new features of the latest Striim releases. Let’s start with cloud integration.

Cloud Integration with a Broader Set of Targets

Available as a cloud service, Striim offers continuous real-time data movement with scalability, enabling faster time to market so you can reap the agility and cost-savings benefits of cloud-based analytics. Striim can now deliver real-time data to additional cloud services, such as Azure SQL Data Warehouse, Azure Database for PostgreSQL, Azure Database for MySQL, and Google Cloud SQL. The solutions for Azure SQL DW, Azure SQL DB, Azure HDInsight and Azure Storage are also available as subscription-based services in the Azure Cloud. If you are an Azure user, you can get started with these solutions in minutes.

As you may have read in prior blog posts, Striim is known for its low-impact change data capture (CDC) feature to ingest real-time data from enterprise databases. With the version 3.8.5, we’ve also introduced an Incremental Batch Reader that can collect low-latency data in mini-batch mode from databases that do not support CDC. The source databases for incremental batch loading include Teradata, Netezza, or any other JDBC-compliant database. One prevalent use case for this new feature is enabling a near real-time data pipeline from existing data warehouses to Azure SQL Data Warehouse to ease and accelerate the transition of analytics workloads to the cloud.

With a broad and continually growing set of cloud targets, Striim allows you to create enterprise-grade, real-time data pipelines to feed different layers of your cloud-based solutions such as:

  • Analytics services and data warehousing solutions, such as Azure SQL Data Warehouse and Google BigQuery, that directly support end users with timely intelligence
  • Data management and analytics frameworks, such as Azure HDInsight, which support interactive analysis or creating machine learning models
  • Storage solutions, such as Amazon S3 or Azure Data Lake Storage (ADLS), from on-premises and other cloud-based data sources in real time
  • Staging solutions, such as HDFS, S3, and Azure Data Lake Storage, which are used by other cloud services and components

In short: to get the most out of your cloud-based analytics, you need continuous data flows to different components of your architecture. Striim supports all key layers of your cloud-based analytics architecture with enterprise-grade solutions to enable continuous data flows where needed.

In Part 2 of 2 of this blog post, I will discuss several new features that bolster the ease of use and extensibility of the Striim platform. In the meantime, I invite you contact us to schedule a demo, or experience Striim v. 3.8.5 by downloading the Striim platform.

Continuous Data Movement to Azure: Getting Started with Striim

 

 

Striim in the Microsoft Azure Cloud enables companies to simplify real-time data movement to Azure by enabling heterogeneous data ingestion, enrichment, and transformation in a single solution before it delivers the data with sub-second latency. Brought to you by the core team behind GoldenGate Software, Striim offers a non-intrusive, quick-to-deploy, and easy-to-iterate solution for streaming data movement to Azure.

It’s easy to get started with Striim in the Azure Cloud. We offer Azure Marketplace solutions for Azure SQL Data Warehouse, Azure Cosmos DB, SQL Database, Azure Database for PostgreSQL, Azure Storage, and HDInsight. However, if you want continuous data movement to Azure Database for MySQL or other Azure services, you can quickly set up your own instance of Striim on a VM.

This quick-start guide assumes you already have an Azure account set up, and should take about 20 minutes to complete.

 

Create Ubuntu VM on Microsoft Azure

  1. Open up your favorite browser and navigate to Azure’s dashboard at portal.azure.com
  2. Click on the + Create a resource button on the top of the left hand menu

3. Search for Ubuntu and select Ubuntu Server 16.04 LTS. This version is certified to work with Striim out of the box.

4. Select Create

5. Enter the information for your VM. For this demo, I’m using a password as authentication type. For a more secure connection for production workloads, select SSH public key and generate it through terminal if you’re on a Mac. Click OK when you’re done entering the connection information.

6. Choose a size for your VM. I’m using D2s_v3 for this demo. However, choose a larger size if need be for production workloads. Note that you can always scale the VM after creating it if you find your initial choice is not sufficient. Press Select to continue.

7. Configure the additional Settings of the VM. If you need high availability for your configuration, select it here and specify your Availability set. As I’m only using this VM for demo purposes and don’t need high availability, I’ll skip it.

8. The important piece here is to open up the specific ports that you need to access on the VM. Make sure to open HTTP and SSH, as we’ll need them for connecting to the VM and Striim. Select OK when you’re done.

9. Azure will validate your Ubuntu VM, and make sure everything is correct. Make sure everything looks good on your end as well, and select Create.

10. It may take awhile to deploy the VM, so just be patient until it is deployed. When the VM is properly deployed, it will show up under All resources in your Azure Dashboard

 

Download and Configure Striim on your Ubuntu VM

Download Free Trial

  1. Navigate to your Azure Dashboard and select the Ubuntu VM you just created.

2. First, ensure the correct ports are open in the Networking pane on the Azure portal (Striim needs to have the following ports open: 1527, 5701, 9080, 9300, 49152-65535, and 54237). You can find more information about the required ports in Installation and Configuration Guide of Striim’s documentation.

3. Select Connect in the top middle of the screen and copy the Login using VM local account address.

4. Note: these instructions are for a Mac. Open up a new Terminal window and paste the SSH command you copied earlier from the Azure portal. There is a slightly different process on Windows to SSH into a VM, and a quick Google search should get you started with PuTTY or other tools.

5. Type yes to continue, and enter the password you created through the Azure portal

6. Congratulations! Now you’re logged in to the VM you created on Azure. From here, first we’ll install Java, and then download and install Striim.

7. Following the instructions here: https://medium.com/var/www/striim-comrscorner/installing-oracle-java-8-in-ubuntu-16-10-845507b13343, install Oracle’s JDK 8. First, add Oracle’s PPA to your list of sources, pressing ENTER when necessary:

– sudo add-apt-repository ppa:webupd8team/java

8. Update your package repository, typing Y or yes when necessary

– sudo apt-get update

9. Install Java. There will be two screens that pop up during the installation process that require you to accept Oracle’s license terms

– sudo apt-get install oracle-java8-installer

10. To make sure Java installed correctly, type java -version to ensure that Oracle JDK 1.8.0 is installed.

11. Now that Java is installed, we can install Striim on your Azure VM. First, download Derby using wget.

– sudo su
– wget https://s3-us-west-1.amazonaws.com/striim-downloads/Releases/3.8.3A/striim-dbms-3.8.3A-Linux.deb

12. Download Striim

– wget https://s3-us-west-1.amazonaws.com/striim-downloads/Releases/3.8.3A/striim-node-3.8.3A-Linux.deb

13. Now, install both the Striim and Derby packages

dpkg -i striim-node-3.8.3A-Linux.deb
dpkg -i striim-dbms-3.8.3A-Linux.deb

14. Edit the /opt/striim/conf/striim.conffile using your favorite text editor, and enter the following fields at the top of the file:
– WA_CLUSTER_NAME: choose a unique name for the new cluster (unique in the sense that it is not already used by any existing Striim cluster on the same network)
– WA_CLUSTER_PASSWORD: will be used by other servers to connect to the cluster and for other cluster-level operations
– WA_ADMIN_PASSWORD: will be assigned to Striim’s default admin user account
– WA_IP_ADDRESS: the IP address of this server to be used by Striim
– WA_PRODUCT_KEY and WA_LICENSE_KEY: If you have keys, specify them, otherwise leave blank to run Striim on a 30-day trial license.
– NOTE: You cannot create a multi-server cluster using a trial license.
– WA_COMPANY_NAME: If you specified keys, this must exactly match the associated company name. Otherwise, enter your company name.

15. If using Vi, execute the following commands:

– vi /opt/striim/conf/striim.conf
– This edits the existing file
– I
– Type/copy and paste your properties
– Press esc key
– Type : x to save file

16. Now, we can go ahead and start up Striim as a process. Execute the following commands:
– systemctl enable striim-dbms
– systemctl start striim-dbms
– Wait ten seconds, then
– systemctl enable striim-node
– systemctl start striim-node

17. Enter the following command and wait for a message similar to: Please go to … to administer, or use console.
– tail -F /opt/striim/logs/striim-node.log

18. Finally, navigate to Striim’s Web UI located at <VM Public IP Address>:9080 and login with the information you provided during the setup process. If you’re new to Striim, we recommend you go through the quickstart located under the ? Help > Documentation button in the upper right hand corner of the UI.

You are now ready to enable continuous data movement to Azure. For technical assistance, please feel free to contact support@striim.com. For a demo of Striim’s full capabilities around moving data to Azure, please schedule a demo with one of our lead technologists.