Getting Started with Real-Time ETL to Azure SQL Database

 

 

Running production databases in the cloud has become the new norm. For us at Striim, real-time ETL to Azure SQL Database and other popular cloud databases has become a common use case. Striim customers run critical operational workloads in cloud databases and rely on our enterprise-grade streaming data pipelines to keep their cloud databases up-to-date with existing on-premises or cloud data sources.

Striim supports your cloud journey starting with the first step. In addition to powering fully-connected hybrid and multi-cloud architectures, the streaming data integration platform enables cloud adoption by minimizing risks and downtime during data migration. When you can migrate your data to the cloud without database downtime or data loss, it is easier to modernize your mission-critical systems. And when you liberate your data trapped in legacy databases and stream to Azure SQL DB in sub-seconds, you can run high-value, operational workloads in the cloud and drive business transformation faster.

Streaming Integration from Oracle to Azure SQL DBBuilding continuous, streaming data pipelines from on-premises databases to production cloud databases for critical workloads requires a secure, scalable, and reliable integration solution. Especially if you have enterprise database sources that cannot tolerate performance degradation, traditional batch ETL will not suffice. Striim’s low-impact change data capture (CDC) feature minimizes overhead on the source systems while moving database operations (inserts, updates, and deletes) to Azure SQL DB in real time with security, reliability, and transactional integrity.

Striim is available as a PaaS offering in major cloud marketplaces such as Microsoft Azure Cloud, AWS, and Google Cloud. You can run Striim in the Azure Cloud to simplify real-time ETL to Azure SQL Database and other Azure targets, such as Azure Synapse Analytics, Azure Cosmos DB, Event Hubs, ADLS, and more. The service includes heterogeneous data ingestion, enrichment, and transformation in a single solution before delivering the data to Azure services with sub-second latency. What users love about Striim is that it offers a non-intrusive, quick-to-deploy, and easy-to-iterate solution for streaming data integration into Azure.

To illustrate the ease of use of Striim and to help you get started with your cloud database integration project, we have prepared a Tech Guide: Getting Started with Real-Time Data Integration to Microsoft Azure SQL Database. You will find step-by-step instructions on how to move data from an on-premises Oracle Database to Azure SQL Database using Striim’s PaaS offering available in the Azure Marketplace. In this tutorial you will see how Striim’s log-based CDC enables a solution that doesn’t impact your source Oracle Database’s performance.

If you have, or plan to have, Azure SQL Databases that run operational workloads, I highly recommend that you use a free trial of Striim along with this tutorial to find out how fast you can set up enterprise-grade, real-time ETL to Azure SQL Database. On our website you can find additional tutorials for different cloud databases. So be sure to check out our other resources as well. For any streaming integration questions, please feel free to reach out.

 

Cloud Adoption: How Streaming Integration Minimizes Risks

 

 

Last week, we hosted a live webinar, Cloud Adoption: How Streaming Integration Minimizes Risks. In just 35 minutes, we discussed how to eliminate database downtime and minimize other risks of cloud migration and ongoing integration for hybrid cloud architecture, including a live demo of Striim’s solution.

Our first speaker, Steve Wilkes, started the presentation discussing the importance of cloud adoption for today’s pandemic-impacted, fragile business environment. He continued with the common risks of cloud data migration and how streaming data integration with low-impact change data capture minimizes both downtime and risks. Our second presenter, Edward Bell, gave us a live demonstration of Striim for zero downtime data migration. In this blog post, you can find my short recap of the key areas of the presentation. This summary certainly cannot do justice to the comprehensive discussion we had at the webinar. That’s why I highly recommend you watch the full webinar on-demand to access details on the solution architecture, its comparison to batch ETL approach, customer examples, the live demonstration, and the interactive Q&A section.

Cloud adoption brings multiple challenges and risks that prevent many businesses from modernizing their business-critical systems.

Limited cloud adoption and modernization reduces the ability to optimize business operations. These challenges and risks include causing downtime and business disruption and losing data during the migration, which are simply not acceptable for critical business systems. The risk list, however, is longer than these two. Switching over to cloud without adequate testing that leads to failures, working with stale data in the cloud, and data security and privacy are also among the key concerns.

Steve emphasized the point that “rushing the testing of the new environment to reduce the downtime, if you cannot continually feed data, can also lead to failures down the line or problems with the application.” Later, he added that “Beyond the migration, how do you continually feed the system? Especially in integration use cases where you are maintaining the data where it was and also delivering somewhere else, you need to continuously refresh the data to prevent staleness.”

Each of these risks mentioned above are preventable with the right approach to data movement between the legacy and new cloud systems.

 

Streaming data integration plays a critical role in successful cloud adoption with minimized risks.

A reliable, secure, and scalable streaming data integration architecture with low-impact change data capture enables zero database downtime and zero data loss during data migration. Because the source system is not interrupted, you can test the new cloud system as long as you need before the switchover. You also have the option to failback to the legacy system after switchover by reversing the data flow and keeping the old system up-to-date with the cloud system until you are fully confident that it is stable.

CDCInitialLoad.png” alt=”” width=”1902″ height=”958″ />

Striim’s cloud data migration solution uses this modern approach. During the bulk load, Striim’s CDC component collects the source database changes in real time. As soon as the initial load is complete, Striim applies the changes to the target environment to maintain the legacy and cloud database consistency. With built-in exactly once processing (E1P), Striim can avoid data both data loss and duplicates. You have the ability to use Striim’s real-time dashboards to monitor the data flow and various detailed performance metrics.

Continuous, streaming data integration for hybrid cloud architecture liberates your data for modernization and business transformation.

Cloud adoption and streaming integration are not limited to the lifting and shifting of your systems to the cloud. Ongoing integration post-migration is a crucial part of planning your cloud adoption. You cannot restrict it to database sources and database targets in the cloud, either. Your data lives in various systems and needs to be shared with different endpoints, such as your storage, data lake, or messaging systems in the cloud environment. Without enabling comprehensive and timely data flow from your enterprise systems to the cloud, what you can achieve in the cloud will be very limited.

“It is all about liberating your data.” Steve added in this part of the presentation. “Making it useful for the purpose you need it for. Continuous delivery in the correct format from a variety of sources relies on being able to filter that data, transform it, and possibly aggregate, join and enrich before you deliver to where needed. All of these can be done in Striim with a SQL-based language.”

A key point both Edward and Steve made is that Striim is very flexible. You can source from multiple sources and send to multiple targets. True data liberation and modernizing your data infrastructure needs that flexibility.

Striim also provides deployment flexibility. In fact, this was a question in the Q&A part, asking about deployment options and pricing. Unfortunately we could not answer all the questions we received. The short answer is: Striim can be deployed in the cloud, on-premises, or both via a hybrid topology. It is priced based on the CPUs of the servers where the Striim platform is installed. So you don’t need to worry about the sizes of your source and target systems.

There is much more covered in this short webinar we hosted on cloud adoption. I invite you to watch it on-demand at your convenience. If you would like to get a customized demo for cloud adoption or other streaming data integration use cases, please feel free to reach out.

Mitigating Data Migration and Integration Risks for Hybrid Cloud Architecture

 

Cloud computing has transformed how businesses use technology and drive innovation for improved outcomes. However, the journey to the cloud, which includes data migration from legacy systems, and integration of cloud solutions with existing systems, is not a trivial task. There are multiple cloud adoption risks that businesses need to mitigate to achieve the cloud’s full potential.

 

Common Risks in Data Migration and Integration to Cloud Environments

In addition to data security and privacy, there are additional concerns and risks in cloud migration and integration. These include:

Downtime: The bulk data loading technique, which takes a snapshot of the source database, requires you to lock the legacy database to preserve the consistent state. This translates to downtime and business disruption for your end users. While this disruption can be acceptable for some of your business systems, the mission-critical ones that need modernization are typically the ones that cannot tolerate even planned downtime. And sometimes, planned downtime extends beyond the expected duration, turning into unplanned downtime with detrimental effects on your business.

Data loss: Some data migration tools might lose or corrupt data in transit because of a process failure or network outage. Or they may fail to apply the data to the target system in the right transactional order. As a result, your cloud database ends up diverging from the legacy system, also negatively impacting your business operations.

Inadequate Testing: Many migration projects operate under tense time pressures to minimize downtime, which can lead to a rushed testing phase. When the new environment is not tested thoroughly, the end result can be an unstable cloud environment. Certainly, not the desired outcome when your goal is to take your business systems to the next level.

Stale Data: Many migration solutions focus on the “lift and shift” of existing systems to the cloud. While it is a critical part of cloud adoption, your journey does not end there. Having a reliable and secure data integration solution that keeps your cloud systems up-to-date with existing data sources is critical to maintaining your hybrid cloud or multi-cloud architecture. Working with outdated technologies can lead to stale data in the cloud and create delays, errors, and other inefficiencies for your operational workloads.

 

Upcoming Webinar on the Role of Streaming Data Integration for Data Migration and Integration to Cloud

Streaming data integration is a new approach to data integration that addresses the multifaceted challenges of cloud adoption. By combining bulk loading with real-time change data capture technologies, it minimizes downtime and risks mentioned above and enables reliable and continuous data flow after the migration.

Striim - Data Migration to Cloud

In our next live, interactive webinar, we dive into this particular topic; Cloud Adoption: How Streaming Data Integration Minimizes Risks. Our Co-Founder and CTO, Steve Wilkes, will present the practical ways you can mitigate the data migration risks and handle integration challenges for cloud environments. Striim’s Solution Architect, Edward Bell, will walk you through with a live demo of zero downtime data migration and continuous streaming integration to major cloud platforms, such as AWS, Azure, and Google Cloud.

I hope you can join this live, practical presentation on Thursday, May 7th 10:00 AM PT / 1:00 PM ET to learn more about how to:

  • Reduce migration downtime and data loss risks, as well as allow unlimited testing time of the new cloud environment.
  • Set up streaming data pipelines in just minutes to reliably support operational workloads in the cloud.
  • Handle strict security, reliability, and scalability requirements of your mission-critical systems with an enterprise-grade streaming data integration platform.

Until we see you at the webinar, and afterward, please feel free to reach out to get a customized Striim demo for data migration and integration to cloud to support your specific IT environment.

 

Top 4 Highlights from Our Streaming Data and Analytics Webinar with GigaOm

 

 

On April 9, 2020, Striim’s co-founder and CTO Steve Wilkes joined GigaOm’s analyst Andrew Brust (bio) in an interview-style webinar on “Streaming Data: The Nexus of Cloud Modernized Analytics.” GigaOm and Striim Webinar SpeakersOver the course of the hour, the two talked about the evolution of data integration needs, what defines streaming data integration, capturing transactional data through change data capture (CDC), comparative approaches for data integration, where companies typically start with streaming data, use case examples, how it supports cloud initiatives, providing a foundation for operational intelligence, and even its role in AI/ML advancements.

While we can’t cover it all in one blog post, here is a “top 4” list of our favorite things highlighted during the webinar — and we invite you to view the entire on-demand event by watching it online

 

#1: “Today, People Expect to Have Up-to-the-Second Information” — Steve Wilkes

Andrew asked Steve to do a bit of “wayback machine” to trace how we arrived at the need for streaming, real-time data. “Twenty years ago, most data was created by humans working on applications with data stored in databases, and you’d use ETL to move and store the data in batches into a data warehouse. It was OK to see data hours or even days later, and everyone did that,” said Steve. But fast-forward to our daily lives today and how we get immediate updates on things like Twitter feeds, news alerts, instant messaging with friends, and expectations have changed.

“So the business world needs to work the same way, and this does drive competitive pressures,” he continued. “If you’re not having this view into your operations and what your customers need, someone else will and they can push you out of business.”

Related to this, Andrew said later in the webinar: “We have new modes of thinking. But using older modes of technology, we’re going to run into issues.”

GigaOm: Old vs New Approaches to Data Movement

#2: Cloud Adoption Driving the Need for Streaming Data

As Steve noted, there’s been a significant shift from all on-premises systems to cloud-based environments, but there is still the need to get data into the cloud in order to get use from it.

Steve shared with Andrew that what Striim sees across its global customer case in terms of adoption is that the majority have a first goal of building the ability to stream their data first and then use it to power the analytics.

“Initial use cases are often zero-downtime data migrations to cloud or feeding a cloud-based data warehouse…. Once they’ve stream-enabled a lot of their sources, they will start to think about what analytics they can promote to real time and where they can get value out of that,” said Steve.

 

#3: A Range of Business Use Cases

Throughout the webinar, Andrew mentioned a few possible use cases, particularly in the context of the global pandemic being faced. “There’s nothing more frustrating, especially in these times of lockdown, when it says something is in stock and then you go to confirm the purchase and it says it’s out of stock … or you find out later.”

From Steve: “That real immediacy into what customers are doing, need, and want is key to what streaming data can do.”

Another example Andrew used illustrated the need for operational intelligence using real-time data. He referenced his home state of New York as it faces the coronavirus pandemic, where the real-time sharing of data about medical supplies and personnel data across the state’s hospitals could improve decisions to best allocate and redistribute those assets.

Shifting to the analytics side, Steve described operational intelligence as being able to change what you know about your operations and the decisions you make, based on current information. He gave the example of being able to track down critical devices, such as wheelchairs, in settings such as airports and hospitals.

The two also discussed how streaming data fits with AI/ML, where Steve commented how streaming data can be used to get data ready and processed for AI models to improve efficiency and performance.

 

#4: Status of Streaming Data

Andrew polled attendees with the question of where they are today with having streaming data in their organization.

GigaOm Poll: Use of Streaming Data In Your Organization

At least half of the attendees said they are using streaming data at least occasionally, which suggests that streaming data integration will continue to grow in popularity and ubiquity. Another 25% are currently evaluating streaming data technology.

Andrew asked Steve for his thoughts on the 15% who felt they don’t have a need for streaming data. As Steve commented: “A lot of organizations have a perception of what a real-time application is and the categories of use cases they are good for. But if you are moving applications to the cloud and they are business-critical, if you can’t turn them off for a few days, how do you do that without turning it off when data is still changing. There’s a need for real-time streaming data there.”

As you can see, the two covered a lot of ground — and so much more during this interactive webinar event. It is available to watch on demand at your convenience, so please check it out. We thank GigaOm and Andrew Brust for hosting this engaging program.

Also, you can learn more about the topic of Streaming Integration in a new 100+ book published by O’Reilly Media and co-authored by Steve Wilkes, who was the speaker of this webinar. Download your free PDF copy today.

 

A New Comprehensive Guide to Streaming ETL for Google Cloud

 

 

Not to brag, but since we literally wrote the book on data modernization with streaming data integration, it is our pleasure to provide you with a guide book on using streaming ETL for Google Cloud Platform. This eBook will help your company unleash innovative services and solutions by combining the power of streaming data integration with Google Cloud Platform services.

As part of your data modernization and cloud adoption efforts, you cannot ignore how you collect and move your data to your new data management platform. But, like adopting any new technology, there is complexity in the move and a number of things to consider, especially when dealing with mission-critical systems. We realize that the process of researching options, building requirements, getting consensus, and deciding on a streaming ETL for Google Cloud is never a trivial task.

A Buyer's Guide to Streaming Data Integration to Google Cloud PlatformAs a technology partner of Google Cloud, we, at Striim, are thrilled to invite you to easily tap into the power of streaming ETL by way of our new eBook: A Buyer’s Guide to Streaming Data Integration for Google Cloud. If you’ve been looking to move to the Google Cloud or get more operational value in your cloud adoption journey, this eBook is your go-to guide.

This eBook provides an in-depth analysis of the game-changing trends of digital transformation. It explains why a new approach to data integration is required, and how www.striim.com/blog/2020/01/streaming-data-integration-whiteboard-wednesdays/“>streaming data integration (SDI) fits into a modern data architecture. With many use case examples, the eBook shows you how streaming ETL for Google Cloud provides business value, and why this is a foundational step. You’ll discover how this technology is enabling the business innovations of today – from ride sharing and fintech, to same-day delivery and retail/e-retail.

Here’s a rundown of what we hope you’ll learn through this eBook:

  • A clear definition of what streaming integration is, and how it compares and contrasts to traditional extract/transform/load (ETL) tools
  • An understanding of how SDI fits into existing as well as emerging enterprise architectures
  • The role streaming data integration architecture plays in regards to cloud migration, hybrid cloud, multi-cloud, etc.
  • The true business value of adopting SDI
  • What companies and IT professionals should be looking for in a streaming data integration solution, focusing on the value of combining SDI and stream processing in one integrated platform
  • Modern SDI use cases, and how these are helping organizations to transform their business
  • Specifically, the benefits of using the Striim SDI platform in combination with the Google Cloud Platform

The digital business operates in real time, and the limitations of legacy integration approaches will hold you back from the limitless potential that cloud platforms bring to your business. To ease your journey into adopting streaming ETL to Google Cloud, please accept our tested and proven guidance with this new eBook: A Buyer’s Guide to Streaming Data Integration for Google Cloud. By following the practical steps provided for you, you can reap the full benefits of Google Cloud for your enterprise. For further information on streaming data integration or the Striim platform, please feel free to contact us.

5 Streaming Cloud Integration Use Cases: Whiteboard Wednesdays

 

 

Today we’re going to talk about five streaming cloud integration use cases. Streaming cloud integration moves data continuously in real time between heterogeneous databases, with in-flight data processing. Read on, or watch the 9-minute video:

Let’s focus on how to use streaming data integration in cloud initiatives, and the five common scenarios that we see.

Use Case #1 – Online Migration/Cloud Adoption

Let’s start with the first one. It is basically adopting the cloud or getting to the cloud. When you want to move your data to the cloud, streaming cloud integration helps you with online database migration. You have your legacy database and you want to move it to the cloud. If this is a critical database, you do not want to pause it during this migration, you want it to be operational to support your business.

Streaming data integration offers Change Data Capture technology. This captures all new transactions, change transactions, as soon as they happen and delivers them to the target. While you’re doing the initial load to the cloud database, you can start Change Data Capture, keep the system open to transactions, and capture all the new transactions happening with the CDC feature.

Once the initial load is done here, you can apply the change data to the system so that these two are in-sync. Because the system, this database, is open to transactions you basically have no database downtime. This is available for users and once the data is applied here you also have the ability to validate that these two databases are in-sync, and there is no data loss during this migration process. There are tools that provide this validation, to give you no data loss during the migration.

Because this database has production data and is up-to-date and the other one is still functional, you have unlimited time to test the new database. You can control the tests and be comfortable before you point any users or any applications to this cloud database. This unlimited testing minimizes your risks. The time pressure is gone, and you can be comfortable with your move to the cloud.

You also have the ability to perform phased migration. Bi-directional data flow between the legacy system and the cloud system allows you to have users on both sides. You can move some of your users to the cloud database and some of them are still in the legacy.

The streaming cloud integration solution can apply the changes happening in the cloud to the legacy and the changes happening in the legacy back to the cloud, so they stay in-sync. You can gradually move your users to the cloud database when you feel comfortable. Phased migration is another way to minimize your risk of moving your mission-critical systems to the cloud.

Use Case #2 – Hybrid Cloud Architecture for Analytics

We have discussed how to ease your cloud adoption, but once you’re in the cloud and you have adopted a cloud solution, you also need to treat it as part of your data center and build continuous data movement between your existing data sources and the new cloud solution.Hybrid cloud architecture

We see quite a bit of cloud analytics solutions and that’s our second use case. Many organizations these days offload their analytics to cloud solutions. Modern cloud solutions give them tons of new features to modernize their analytics environment and transform their business.

We help with moving all kinds of enterprise data. This can be your databases, your machine data, all kinds of log files (security files and system log files), your existing cloud sources, your messaging systems, and your sensor data. All of them can be moved in real time continuously to your cloud analytics solution.

I would like to add that some streaming cloud integration solutions give you the ability to do in-flight data processing. Transformations happen in-flight so that you deliver the data, without adding latency, to the target system in a consumable format that it needs. You end up having data flowing in the right format for your cloud analytics solution.

The main value from this is that you can now run operational workloads, high value or high operational value producing analytics applications in your analytics solution. You can influence the decisions, operational decisions, happening in your business. That will help you gain faster business transformation throughout your enterprise.

Use Case #3 – Building New Applications in the Cloud

Building applications in the cloudWe talked about the analytics use case, here is another similar one. As part of your hybrid cloud architecture, you might be building new applications in the cloud. You still need data coming from your enterprise data sources to your cloud environment. By moving all this diverse set of data in real time to your cloud messaging systems or cloud databases or storage solutions, you are able to easily build applications in the cloud.

These modern applications move your business forward because the data is available. You can make better use of these cloud applications if you have this real-time bridge between your existing data center and your new cloud environment. Streaming Integration helps you to move your data so you can quickly build new applications for your business, to help it move forward with more modern solutions.

Use Case #4 – Multi-Cloud Integration

We also see multi-cloud use cases. A lot of companies now haveMulti-cloud integration one cloud solution for one purpose, another cloud solution for another purpose, and are working with multiple vendors. You have the option to feed your data to multiple targets. After you capture it once you can feed it to all kinds of different targets, maybe one of them for analytics and one of them for supporting new applications. You have the ability to distribute your data to multiple cloud solutions.

Use Case #5 – Inter-Cloud Integration

Similarly, if you’re working with multiple cloud vendors, you will need to connect these solutions with each other. If you have an operational database in one cloud and you have an analytics solution in Inter-cloud integrationanother cloud, you need to move the data from this cloud solution to the other one in real time, so you can have operational reporting or operational analytics solutions in this cloud.

Streaming cloud integration gives you the agility and the ability to move your data wherever you want. Cloud can be easily a part of your data center, seamlessly part of your data infrastructure, by moving your data to that environment.

You can use streaming cloud integration to ease your migration to the cloud and adoption of cloud solutions by minimizing your risk and business disruption. You can also maintain your hybrid cloud architecture and multi-cloud architecture with a continuous data flow from your existing data sources.

To learn more about streaming data integration for your cloud solutions, please visit our Hybrid Cloud Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started.

 

Evaluating Streaming Data Integration Platforms

Evaluating Streaming Data Integration Platforms: Whiteboard Wednesdays

 

 

In today’s Whiteboard Wednesday video, Steve Wilkes, founder and CTO of Striim, looks at what you need to consider when evaluating streaming data integration platforms. Read on, or watch the 15-minute video:

We’ve already gone through what the components of a streaming integration platform are. Today we’re going to talk about how you go about evaluating streaming data integration platforms based on these components.

Just to reiterate, you need the platform to be able to:

  • Do real-time continuous data collection
  • Move that data continuously from where it’s collected to where it’s going
  • Support delivery to all the different targets that you care about
  • Process the data as it’s moving, so stream processing
  • This all needs to be enterprise grade so that it is scalable and reliable, and all those other things that you care about for mission-critical data
  • Get insights and alerts on that data movement

Let’s think about the things that you need to consider in order to actually achieve this when you’re evaluating such platforms.

Data Collection & Delivery

For data collection and delivery, you care about quite a few different things. Firstly, it needs to be low latency. If it’s a streaming data integration platform, then just doing bulk loads or micro batch may not be sufficient. You want to be able to collect the data the instant it’s created, within milliseconds typically. You need low-latency data collection.

Evaluating Streaming Integration Platforms - Data CollectionIt needs to be able to support all the sources that you care about. If you’re looking for a streaming integration platform, then you’re thinking of more than just one use case. You’re thinking “what platform is going to support all of the streaming data integration needs within my organization?” Supporting just one data source or a couple of data sources isn’t enough.

You need to be able to support all the sources that you care about now and may care about in the future. That could be databases, files, or messaging systems. It could even be IoT. So think about that when you’re evaluating whether the platform has all the sources that you need. Think about how it can deal with those sources in a number of different ways.

For databases, you may need to be able to do bulk loads into a streaming infrastructure, as well as doing Change Data Capture. This is important for collecting real-time change as it’s happening in a database, the inserts, updates, and deletes. For files, you may need to do bulk files, files that exist already, but also files as they’re created, streaming out the data as it’s being written. Supporting both bulk and change data is equally important.

You also need to consider whether the adapters are actually part of the platform or are they third party. If they are part of the platform and the platform is built well, then it means that they will be able to handle all the different requirements of the platform – scalability, reliability, and recoverability. All of those things are integrated end to end because the adapters are part of the platform.

If they’re third party, then that may not be the case. If you have to plug in third party components into your infrastructure, then you can have areas of brittleness where things may not work properly or problematic interfaces when things change. Try to avoid third party adapters wherever you can.

Data collection and data delivery need to be able to support the end to end recovery and reliability that is part of being enterprise grade. That means that from a database perspective, for example, you may need to be able to support maintaining a database transaction context from one end to the other. You need to be able to pick up from where you left off and make sure that data that is collected is delivered to all of the appropriate targets. These could be variable and different.

You might be delivering some data on-premise and some data to the cloud, but you still need to be able to make sure that all the data has made it there. You need to be able to validate that the data is being written to all the different sources and targets that the platform is supporting.

If it’s part of a platform and they’re not third party, you would expect that to be there. If they are third party, then you have to investigate whether all of those things are supported. Data collection and data delivery is the first part of how you evaluate the platform.

Data Movement

The next part is how does it do data movement? This is crucial to maintaining the kind of high throughput and low latency that you’d expect. Data movement is a number of different things. It’s between processing steps. Between your source collection and your data delivery.

Between source collection, maybe some in-memory processing or maybe some enrichment and data delivery. Or it could be an even a more complex pipeline with multiple steps in it. You’re moving data between each step.

It’s also between nodes. If you have a clustered platform and that platform is moving data between nodes for different processing steps, or maybe between source and target because the target is closer to one of the nodes than other nodes. You need to be able to ensure that the data movement happens efficiently, with high throughput and low latency, between nodes.

You also need to be able to support collecting data on-premise and delivering it into cloud environments, or collecting it from cloud environments and delivering it to on-premise, or moving between clouds. Supporting all these different typologies is all part of data movement.

Ideally as much of the data movement as possible should be in memory only. Try to avoid having to write to disk or do any kind of IO in between processing steps. The reason for this is that each processing step needs to perform optimally in order to get high throughput.

If you are persisting data, that can add latency. Ideally when you’re doing multiple processing steps in a pipeline, you’re doing all of that data movement in memory only, between the steps or just between nodes. You’re not persisting to disk.

You should only use persistent data movement or persistent data streams where needed. There are a couple of really good use cases for this. One is if you have data sources that you can’t rewind into for recoverability, you may want to use a persistent data stream as the first step in the process, but everything downstream can be in memory only.

If you’re collecting data in real time, but you have multiple applications all running at their own speeds against that data, you may want to think about having persistent data streams between different steps. Typically, you want to minimize the amount of persistent data streams that you have and use in-memory only data streams wherever possible. That will really aid in reducing your latency and increasing your throughput.

Stream Processing

The next thing that you need to be able to do is stream processing. Stream processing obviously has to be able to support all of the different types of processing that you want to do. For example, it needs to be able to support complex transformations. If it doesn’t support the transformations that you want, you should be able to add in your own components or your own user defined functions to do the transformations.

It needs to be able to combine and enrich data. This requires a lot of different constructs for stream processing. When you are combining data together from multiple data streams, they run at high speed and typically events aren’t going to happen at the same time.

You need a flexible windowing structure that can maintain a set of events from different data streams to combine together, in order to be able to produce a combined output stream that has the last data from every stream apart from the current data from the current one.

When you’re enriching data, you need to be able to join streaming data with reference data. You can’t go back to a database or go back to the original source of the reference data for every event on a data stream. It’s just too slow. You need to be able to load, cache, and remember the data you are using for enrichment in memory so you can join it really efficiently, in order to keep and maintain the throughput that you’re looking for from the overall system.

You want the stream processing to be optimized. It should really run as fast as if you’d written it yourself manually. It also needs to be easy to use. We recommend that you look for SQL-based stream processing because SQL is the language of data. There are very few people that work with data that don’t understand SQL. It allows you to do filtering, transformation, and data enrichment through natural SQL constructs.

Obviously if you want to do more complex things, you should also be allowed to import your own transformations and work with those. For SQL-based transformations, it enables anyone that knows data to be able to build and understand what the transformations are. You also want building pipelines to be as easily accessible as possible to all the people that want to work with the data.

You need to have a good UI for building the data pipelines and have as much of the process as possible automated through wizards and other UI based assistance. You need to be able to build multi-step stream processing, not just a single source into single target or a single source into single piece of processing into single target. Potentially with fan in and fan out. Multiple data sources coming in, going into multiple processing steps in a staged environment, where they go step by step by step, to potentially multiple targets coming out at the other end.

This all needs to be coordinated, well-maintained, and deployable across a cluster in order to be scalable. Your stream processing should be very rich, very capable, and also very high throughput.

Enterprise Grade

You also need to think about the enterprise-grade qualities of the platform. I’ve mentioned before, for it to be enterprise grade it needs to be scalable. You need to be able to handle increasing the throughput, increasing the number of sources, increasing the number of targets, and increasing the volume of data being generated from each one of those.

When you’re evaluating platforms and evaluating for a production scenario, you should test the platform with a reasonable throughput that corresponds to what you’re expecting in order to see how it behaves and how it scales, and measure the throughput and the latency from end to end as you’re evaluating the platform.

You also need it to be reliable. You need to be able to ensure that you have guaranteed delivery from source all the way to target. Even if something fails, if a network fails, if the source or the target goes down, if any of the processing nodes in the cluster go down or the whole cluster goes down, you need to be able to ensure that it picks up from where it left off and doesn’t miss any messages.

It has to be able to recover from failures as well. Guaranteed delivery in the normal “I’m always running” case so you don’t miss any messages, just because they disappeared into the ether somewhere. But also, that if you have a failure, you should recover and not lose any messages, not lose any events that come from the source into the target.

Of course, security is also paramount. You can secure the data while it’s moving in transit, so it’s encrypted as it goes across the network. But also that you can secure who has access to the data, who can work with individual data streams, who can see the data on individual data streams, who can build applications, who can view the results of building applications.

You need security that works across the whole end to end and deals with every single component, so that you can secure them and lock them down and make sure that only the people that need to work with data, can.

Insights & Alerts

Finally, you need to make sure that the platform gives you visibility into your data, that you can monitor the data flows and see what’s going on in real time, that you get alerts when anything happens. This could be when CPU or memory usage on any of the nodes goes above certain criteria. It could be when applications crash, or data flows crash. It could be when volume goes above or below what you expect, and doing that in a granular fashion. For example, when an individual database table goes above or below what you expect.

You need to be able to work with insights into the data flows that help you operationalize this and make sure that it’s working full time, 24/7, when you actually put it into production. You may even want to get insights on the data itself, drill down into the actual data that’s flowing, and do some analytics on that. If your streaming integration platform can also give you those valuable insights on the streaming data, then that’s the icing on the cake.

Just to summarize, when you’re evaluating streaming data integration platforms, you need to make sure that the platform can do everything that you need, to get your data from where it’s generated to where it needs to be, in order to get real value out of your data.

 

To learn more about streaming data integration, please visit our Real-time Data Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started.

Use Cases for Streaming Data Integration

The Top 4 Use Cases for Streaming Data Integration: Whiteboard Wednesdays

 

 

Today we are talking about the top four use cases for streaming data integration. If you’re not familiar with streaming data integration, please check out our channel for a deeper dive into the technology. In this 7-minute video, let’s focus on the use cases.

 

Use Case #1 Cloud Adoption – Online Database Migration

The first one is cloud adoption – specifically online database migration. When you have your legacy database and you want to move it to the cloud and modernize your data infrastructure, if it’s a critical database, you don’t want to experience downtime. The streaming data integration solution helps with that. When you’re doing an initial load from the legacy system to the cloud, the Change Data Capture (CDC) feature captures all the new transactions happening in this database as it’s happening. Once this database is loaded and ready, all the changes that happened in the legacy database can be applied in the cloud. During the migration, your legacy system is open for transactions – you don’t have to pause it.

While the migration is happening, CDC helps you to keep these two databases continuously in-sync by moving the real-time data between the systems. Because the system is open to transactions, there is no business interruption. And if this technology is designed for both validating the delivery and checkpointing the systems, you will also not experience any data loss.

Because this cloud database has production data, is open to transactions, and is continuously updated, you can take your time to test it before you move your users. So you have basically unlimited testing time, which helps you minimize your risks during such a major transition. Once the system is completely in-sync and you have checked it and tested it, you can point your applications and run your cloud database.

This is a single switch-over scenario. But streaming data integration gives you the ability to move the data bi-directionally. You can have both systems open to transactions. Once you test this, you can run some of your users in the cloud and some of you users in the legacy database.

All the changes happening with these users can be moved between databases, synchronized so that they’re constantly in-sync. You can gradually move your users to the cloud database to further minimize your risk. Phased migration is a very popular use case, especially for mission-critical systems that cannot tolerate risk and downtime.

Cloud adoptionUse Case #2 Hybrid Cloud Architecture

Once you’re in the cloud and you have a hybrid cloud architecture, you need to maintain it. You need to connect it with the rest of your enterprise. It needs to be a natural extension of your data center. Continuous real-time data moment with streaming data integration allows you to have your cloud databases and services as part of your data center.

The important thing is that these workloads in the cloud can be operational workloads because there’s fresh information (ie, continuously updated information) available. Your databases, your machine data, your log files, your other cloud sources, messaging systems, and sensors can move continuously to enable operational workloads.

What do we see in hybrid cloud architectures? Heavy use of cloud analytics solutions. If you want operational reporting or operational intelligence, you want comprehensive data delivered continuously so that you can trust that’s up-to-date, and gain operational intelligence from your analytics solutions.

You can also connect your data sources with the messaging systems in the cloud to support event distribution for your new apps that you’re running in the cloud so that they are completely part of your data center. If you’re adopting multi-cloud solutions, you can again connect your new cloud systems with existing cloud systems, or send data to multiple cloud destinations.

Hybrid Cloud ArchitectureUse Case #3 Real-Time Modern Applications

A third use case is real-time modern applications. Cloud is a big trend right now, but not everything is necessarily in the cloud. You can have modern applications on-premises. So, if you’re building any real-time app and modern new system that needs timely information, you need to have continuous real-time data pipelines. Streaming data integration enables you run real-time apps with real-time data.

Use Case #4 Hot Cache

Last, but not least, when you have an in-memory data grid to help with your data retrieval performance, you need to make sure it is continuously up-to-date so that you can rely on that data – it’s something that users can depend on. If the source system is updated, but your cache is not updated, it can create business problems. By continuously moving real-time data using CDC technology, streaming data integration helps you to keep your data grid up-to-date. It can serve as your hot cache to support your business with fresh data.

 

To learn more about streaming data integration use cases, please visit our Solutions section, schedule a demo with a Striim expert, or download the Striim platform to get started.

 

How to Migrate Oracle Database to Google Cloud SQL for PostgreSQL with Streaming Data Integration

 

 

For those who need to migrate an Oracle database to Google Cloud, the ability to move mission-critical data in real-time between on-premises and cloud environments without either database downtime or data loss data is paramount. In this video Alok Pareek, Founder and EVP of Products at Striim demonstrates how the Striim platform enables Google Cloud users to build streaming data pipelines from their on-premises databases into their Cloud SQL environment with reliability, security, and scalability. The full 8-minute video is available to watch below:

Striim offers an easy-to-use platform that maximizes the value gained from cloud initiatives; including cloud adoption, hybrid cloud data integration, and in-memory stream processing. This demonstration illustrates how Striim feeds real-time data from mission-critical applications from a variety of on-prem and cloud-based sources to Google Cloud without interruption of critical business operations.

Oracle database to Google Cloud

Through different interactive views, Striim users can develop Apps to build data pipelines to Google Cloud, create custom Dashboards to visualize their data, and Preview the Source data as it streams to ensure they’re getting the data they need. For this demonstration, Apps is the starting point from which to build the data pipeline.

There are two critical phases in this zero-downtime data migration scenario. The first involves the initial load of data from the on-premise Oracle database into the Cloud SQL Postgres database. The second is the synchronization phase, achieved through specialized readers to keep the source and target consistent.

Oracle database to Google Cloud
Striim Flow Designer

The pipeline from the source to the target is built using a flow designer that easily creates and modifies streaming data pipelines. The data can also be transformed while in motion, to be realigned or delivered in a different format. Through the interface, the properties of the Oracle database can also be configured – allowing users extensive flexibility in how the data is moved.

Once the application is started, the data can be previewed, and progress monitored. While in-motion, data can be filtered, transformed, aggregated, enriched, and analyzed before delivery. With up-to-the-second visibility of the data pipeline, users can quickly and easily verify the ingestion, processing, and delivery of their streaming data.

Oracle database to Google Cloud

During the time of initial load, the source data in the database is continually changing. Striim keeps the Cloud SQL Postgres database up-to-date with the on-premises Oracle database using change data capture (CDC). By reading the database transactions in the Oracle redo logs, Striim collects the insert, update, and delete operations as soon as the transactions commit, and makes only the changes to the target, This is done without impacting the performance of source systems, while avoiding any outage to the production database.

By generating DML activity using a simulator, the demonstration shows how inserts, updates, and deletes are managed. Running DMLS operations against the orders table, the preview shows not only the data being captured, but also metadata including the transaction ID, the system commit number, the table name, and the operation type. When you log into the orders table, the data is present in the table.

The initial upload of data from the source to the target, followed by change data capture to ensure source and target remain in-sync, allows businesses to move data from on-premises databases into Google Cloud with the peace of mind that there will be no data loss and no interruption of mission-critical applications.

Additional Resources:

To learn more about Striim’s capabilities to support the data integration requirements for a Google hybrid cloud architecture, check out all of Striim’s solutions for Google Cloud Platform.

To read more about real-time data integration, please visit our Real-Time Data Integration solutions page.

To learn more about how Striim can help you migrate Oracle database to Google Cloud, we invite you to schedule a demo with a Striim technologist.

 

Real-Time Continuous Data Movement and Processing

Real-Time Data is for Much More Than Just Analytics

Striim’s Real-Time Data is for Much More Than Just Analytics article was originally published on Forbes.

The conversation around real-time data, fast data and streaming data is getting louder and more energetic. As the age of big data fades into the sunset — and many industry folks are even reluctant to use the term — there is much more focus on fast data and obtaining timely insights. The focus of many of these discussions is on real-time analytics (otherwise known as streaming analytics), but this only scratches the surface of what real-time data can be used for.

If you look at how real-time data pipelines are actually being utilized, you find that about 75% of the use cases are integration related. That is, continuous data collection creates real-time data streams, which are processed and enriched and then delivered to other systems. Often these other systems are not themselves streaming. The target could be a database, data warehouse or cloud storage, with a goal of ensuring that these systems are always up to date. This leaves only about 25% of companies doing immediate streaming analytics on real-time data. But these are the use cases that are getting much more attention.

There are many reasons why streaming data integration is more common, but the main reason is quite simple: This is a relatively new technology, and you cannot do streaming analytics without first sourcing real-time data. This is known as a “streaming first” data architecture, where the first problem to solve is obtaining real-time data feeds.

Organizations can be quite pragmatic about this and approach stream-enabling their sources on a need-to-have, use-case-specific basis. This could be because batch ETL systems no longer scale or batch windows have gone away in a 24/7 enterprise. Or, they want to move to more modern technologies, which are most suitable for the task at hand, and keep them continually up to date as part of a digital transformation initiative.

Cloud Is Driving Streaming Data Integration

The rise of cloud has made a streaming-first approach to data integration much more attractive. Simple use cases, like migrating an on-premise database that services an in-house business application to the cloud, are often not even viable without streaming data integration.

The naive approach would be to back up the database, load it into the cloud and point the cloud application at it. However, this assumes a few things:

1. You can afford application downtime.

2. Your application can be stopped while you are doing this.

3. You can spin up and use the cloud application without testing it.

For most business-critical applications, none of these things are true.

A better approach to minimizing or eliminating downtime is an online migration that keeps the application running. To perform this task, source changes from the in-house database, using a technology called change data capture (CDC), as real-time data streams, load the database to the cloud, then apply any changes from the real-time stream that happened while you were doing the loading. The change delivery to the cloud can be kept running while you test the cloud application, and when you cut over, it will be already up to date.

Streaming data integration is a crucial element of this type of use case, and it can also be applied to cloud bursting, operational machine learning, large scale cloud analytics or any other scenario where having up-to-the-second data is essential.

Streaming Data Integration Is The Precursor To Streaming Analytics

Once organizations are doing real-time data collection, typically for integration purposes, it then opens the door to doing streaming analytics. But you can’t put the cart before the horse and do streaming analytics unless you already have streaming data.

Streaming analytics also requires preprepared data. It’s a commonly known metric that 80% of the time spent in data science is in data preparation. This is true for machine learning and also true for streaming analytics. Obtaining the real-time data feed is just the beginning. You may also need to transform, join, cleanse and enrich data streams to give the data more context before performing analytics.

As a simple example, imagine you are performing CDC on a source database and have a stream of orders being made by customers. In any well-normalized, relational database, these tables are mostly just numbers relating to detail contained in other tables.

This might be perfect for a relational, transactional system, but it’s not very useful for analytics. However, if you can join the streaming data with reference data for customers and items, you have now added more context and more value. The analytics can now show real-time sales by customer location or item category and truly provide business insights.

Without the processing steps of streaming data integration, the streaming analytics would lose value, again showing how important the real-time integration layer really is.

Busting The Myth That Real-Time Data Is Prohibitively Expensive

A final consideration is cost. Something that has been said repeatedly is that real-time systems are expensive and should only be used when absolutely necessary. The typically cited use cases are algorithmic trading and critical control systems.

While this may have been true in the past, the massive improvements in the price-performance equation for CPU and memory over the last few decades have made real-time systems, and in-memory processing in general, affordable for mass consumption. Coupled with cloud deployments and containerization, the capability to have real-time data streamed to any system is within reach of any enterprise.

While real-time analytics and instant operational insights may get the most publicity and represent the long-term goal of many organizations, the real workhorse behind the scenes is streaming data integration. 

Simplify Your Azure Hybrid Cloud Architecture with Streaming Data Integration

While the typical conversation about Azure hybrid cloud architecture may be centered around scaling applications, VMs, and microservices, the bigger consideration is the data. Spinning up additional services on-demand in Azure is useless if the cloud services cannot access the data they need, when they need it.

“According to a March 2018 hybrid cloud report from 451 Research and NTT Communications, around 63% of firms have a formal strategy for hybrid infrastructure. In this case, hybrid cloud does not simply mean using a public cloud and a private cloud. It means having a seamless flow of data between all clouds, on and off-premises.” – Data Foundry

To help simplify providing a seamless flow of data to your Microsoft Azure hybrid cloud infrastructure, we’re happy to announce that the Striim platform is available in the Microsoft Azure Marketplace.

How Streaming Data Integration Simplifies Your Azure Hybrid Cloud Architecture

Enterprise-grade streaming data integration enables continuous real-time data movement and processing for hybrid cloud, connecting on-prem data sources and cloud environments, as well as bridging a wide variety of cloud services. With in-memory stream processing for hybrid cloud, companies can store only the data they need, in the format that they need. Additionally, streaming data integration enables delivery validation and data pipeline monitoring in real time.

Streaming data integration simplifies real-time streaming data pipelines for cloud environments. Through non-intrusive change data capture (CDC), organizations can collect real-time data without affecting source transactional databases. This enables cloud migration with zero database downtime and minimized risk, and feeds real-time data to targets with full context – ready for rich analytics on the cloud – by performing filtering, transformation, aggregation, and enrichment on data-in-motion.

Azure Hybrid Cloud Architecture

Key Traits of a Streaming Data Integration Solution for Your Azure Hybrid Cloud Architecture

There are three important objectives to consider when implementing a streaming data integration solution in an Azure hybrid cloud architecture:

  • Make it easy to build and maintain –The ability to use a graphical user interface (GUI) and a SQL-based language can significantly reduce the complexity of building streaming data pipelines, allowing more team members within the company to maintain the environment.
  • Make it reliable – Enterprise hybrid cloud environments require a data integration solution that is inherently reliable with failover, recovery and exactly-once processing guaranteed end-to-end, not just in one slice of the architecture.
  • Make it secure –Security needs to be treated holistically, with a single authentication and authorization model protecting everything from individual data streams to complete end-user dashboards. The security model should be role-based with fine-grained access, and provide encryption for sensitive resources.

Striim for Microsoft Azure

The Striim platform for Azure is an enterprise-grade data integration platform that simplifies an Azure-based hybrid cloud infrastructure. Striim provides real-time data collection and movement from a variety of sources such as enterprise databases (ie, Oracle, HPE NonStop, SQL Server, PostgreSQL, Amazon RDS for Oracle, Amazon RDS for MySQL via low-impact, log-based change data capture), as well as log files, sensors, messaging systems, NoSQL and Hadoop solutions.

Once the data is collected in real time, it can be streamed to a wide variety of Azure services including Azure Cosmos DB, Azure SQL Database, Azure SQL Data Warehouse, Azure Event Hubs, Azure Data Lake Storage, and Azure Database for PostgreSQL

While the data is streaming to Azure, Striim enables in-stream processing such as filtering, transformations, aggregations, masking, and enrichment, making the data more valuable when it lands. This is all done with sub-second latency, reliability and securty via an easy-to-use interface and SQL-based programming language.

To learn more about Striim’s capabilities to support the data integration requirements for an Azure hybrid cloud architecture, read today’s press release announcing the availability of the Striim platform in the Microsoft Azure Marketplace, and check out all of Striim’s solutions for Azure.

real time data ingestion diagram

Real-Time Data Ingestion – What Is It and Why Does It Matter?

 

 

The integration and analysis of data from both on-premises and cloud environments give an organization a deeper understanding of the state of their business. Real-time data ingestion for analytical or transactional processing enables businesses to make timely operational decisions that are critical to the success of the organization – while the data is still current. real-time data ingestion diagram

Transactional and operational data contain valuable insights that drive informed and appropriate actions. Achieving visibility into business operations in real time allows organizations to identify and act on opportunities and address situations where improvements are needed. Real-time data ingestion to feed powerful analytics solutions demands the movement of high volumes of data from diverse sources without impacting source systems and with sub-second latency.

Using traditional batch methods to move the data introduces unwelcome delays. By the time the data is collected and delivered it is already out of date and cannot support real-time operational decision making. Real-time data ingestion is a critical step in the collection and delivery of volumes of high-velocity data – in a wide range of formats – in the timeframe necessary for organizations to optimize their value.

The Striim platform enables the continuous movement of structured, semi-structured, and unstructured data – extracting it from a wide range of sources and delivering it to cloud and on-premises endpoints – in real time and available immediately to users and applications.

The Striim platform supports real-time data ingestion from sources including databases, log files, sensors, and message queues and delivery to targets that include Big Data, Cloud, Transactional Databases, Files, and Messaging Systems. Using non-intrusive Change Data Capture (CDC) Striim reads new database transactions from source databases’ transaction or redo logs and moves only the changed data without impacting the database workload.

Real-time data ingestion is critical to accessing data that delivers significant value to a business. With clear visibility into the organization, based on data that is current and comprehensive, organizations can make more informed operational decisions faster.

To read more about real-time data ingestion, please visit our Real-Time Data Integration solutions page.

To have one of our experts guide you through a brief demo of our real-time data ingestion offering, please schedule a demo.

Striim - 2019 CODiE Awards - Best iPaaS

Striim Is a 2019 CODiE Awards Finalist for Best iPaaS Solution

Striim is proud to announce that we’ve been recognized by SIIA as a 2019 CODiE Awards Finalist as a Best iPaaS, or Integration Platform as a Service.Striim - 2019 CODiE Awards - Best iPaaS

Why was Striim selected as a Best iPaaS solution? Striim is the only streaming (real-time) data integration platform running in the cloud that is built specifically to support cloud computing.

Real-time data integration is crucial for hybrid and multi-cloud architectures. Striim’s iPaaS solutions for real-time data integration in the cloud brings the agility and cost benefits of the cloud to integration use cases.

Striim enables companies to:

  • Quickly and easily provision streaming data pipelines to deliver real-time data to the cloud, or between cloud services
  • Easily adopt a multi-cloud architecture by seamlessly moving data across different cloud service providers: Azure, AWS, and Google Cloud
  • Offload operational workloads to cloud by moving data in real time and in the desired format
  • Filter, aggregate, transform, and enrich data-in-motion before delivering to the cloud in order to optimize cloud storage
  • Migrate data to the cloud without interrupting business operations
  • Minimize risk of cloud migrations with real-time, built-in cloud migration monitoring to avoid data divergence or data loss
  • Stream data in real time between cloud environments and back to on-premises systems

As one of the best iPaaS solutions, the Striim platform supports all aspects of Cloud integration as it relates to hybrid cloud and multi-cloud deployments.

Striim enables zero-downtime data migration to cloud by performing an initial load, and delivering the changes to the legacy system that occurred during the loading without pausing the source system. To prevent data loss, it validates that all of the data from on-premises sources migrated to the cloud environment.

Striim’s iPaaS solution provides the real-time data pipelines to and from the cloud to enable operational workloads in the cloud with the availability of up-to-date data.

Striim supports multi-cloud architecture by streaming data between different cloud platforms, including Azure, Google and AWS, and other cloud technologies such as Salesforce and Snowflake. If necessary, Striim can also provide real-time data flows between services offered within each of the three cloud platforms.

About Striim for Data IPaaS

Running as a PaaS solution on Microsoft Azure, AWS and Google Cloud Platform, the Striim streaming data integration platform offers real-time data ingestion from on-premises and cloud-based databases (including Oracle, SQL Server, HPE NonStop, PostgreSQL and MySQL), data warehouses (such as Oracle Exadata and Teradata), cloud services (such as AWS RDS and Amazon S3), Salesforce, log files, messaging systems (including Kafka), sensors, and Hadoop solutions.

Striim delivers this data in real time to a wide variety of cloud services (for example, Azure SQL Data Warehouse, Cosmos DB and Event Hubs; Amazon Redshift, S3 and Kinesis; and Google BigQuery, Cloud SQL and Pub/Sub), with in-flight transformations and enrichments.

Users can rapidly provision and deploy integration applications via a click-through interface using Striim’s pre-built templates and pre-configured integrations that are optimized for their cloud endpoints.

To learn more about Striim’s capabilities as one of the best iPaaS solutions, check out our three-part blog series, “Striim for Data iPaaS.”

What is iPaaS for Data?

What is iPaaS for Data?

Organizations can leverage a wide variety of cloud-based services today, and one of the fastest growing offerings is integration platform as a service. But what is iPaaS?

There are two major categories of iPaaS solutions available, focusing on application integration and data integration. Application integration works at the API level, typically involves relatively low volumes of messages, and enables multiple SaaS applications to be woven together.What is iPaaS for Data?

Integration platform as a service for data enables organizations to develop, execute, monitor, and govern integration across disparate data sources and targets, both on-premises and in the cloud, with processing and enrichment of the data as its streaming.

Within the scope of iPaaS for data there are older batch offerings, and more modern real-time streaming solutions. The latter are better suited to the on-demand and continuous way organizations are utilizing cloud resources.

Streaming data iPaaS solutions facilitate integration through intuitive UIs, by providing pre-configured connectors, automated operators, wizards and visualization tools to facilitate creation of data pipelines for real-time integration. With the iPaaS model, companies can develop and deploy the integrations they need without having to install or manage additional hardware or middleware, or acquire specific skills related to data integration. This can result in significant cost savings and accelerated deployment.

This is particularly useful as enterprise-scale cloud adoption becomes more prevalent, and organizations are required to integrate on-premises data and cloud data in real time to serve the company’s analytics and operational needs.

Factors such as increasing awareness of the benefits of iPaaS among enterprises – including reduced cost of ownership and operational optimization – are fueling the growth of the market worldwide.

For example, a report by Markets and Markets notes that the Integration Platform as a Service market is estimated to grow from $528 million in 2016 to nearly $3 billion by 2021, at a compound annual growth rate (CAGR) of 42% during the forecast period.

“The iPaaS market is booming as enterprises [embrace] hybrid and multi-cloud strategies to reduce cost and optimize workload performance” across on-premises and cloud infrastructure, the report says. Organizations around the world are adopting iPaaS and considering the deployment model an important enabler for their future, the study says.

Research firm Gartner, Inc. notes that the enterprise iPaaS market is an increasingly attractive space due to the need for users to integrate multi-cloud data and applications, with various on-premises assets. The firm expects the market to continue to achieve high growth rates over the next several years.

By 2021, enterprise iPaaS will be the largest market segment in application middleware, Gartner says, potentially consuming the traditional software delivery model along the way.

“iPaaS is a key building block for creating platforms that disrupt traditional integration markets, due to a faster time-to-value proposition,” Gartner states.

The Striim platform can be deployed on-premises, but is also available as an iPaaS solution on Microsoft Azure, Google Cloud Platform, and Amazon Web Services. This solution can integrate with on-premise data through a secure agent installation. For more information, we invite you to schedule a demo with one of our lead technologists, or download the Striim platform.

CDC to Snowflake

CDC to Snowflake

 

Let’s take a moment to discuss why Change Data Capture or CDC to Snowflake is quickly becoming the preferred method of loading real-time data from transactional databases to Snowflake, without impacting source systems.

Snowflake is changing expectations for speed and flexibility of a data warehouse. Snowflake provides a cloud-based data warehouse that enables organizations to store and analyze data using public cloud-based hardware and software on AWS and Microsoft Azure.CDC to Snowflake

However these benefits of speed and flexibility can be quickly throttled by legacy approaches to moving data into Snowflake. For most companies, their most valuable data – transactional and operational data – is stored on-prem in traditional relational databases or legacy data warehouses. While old-school migrations or batch ETL uploads achieve the objective of moving the data to a target such as Snowflake, these out-of-date, high-latency approaches cannot support the continuous data pipelines and real-time operational decision-making that Snowflake is built for.

Enter CDC to Snowflake, made possible by Striim. The Striim platform enables Snowflake users to quickly and easily leverage low-impact, real-time change data capture, or CDC to Snowflake, moving and processing only the changed data from their existing databases. Moving change data continuously, as new database transactions or events occur, makes it possible for Snowflake users to maintain the real-time data pipelines necessary to feed Snowflake’s fast and flexible storage and analytics solutions.

For the initial load of data to Snowflake, Striim enables zero-downtime, zero-data-loss migration from databases and data warehouses to Snowflake. As an enterprise-grade solution, Striim also features built-in, real-time monitoring to validate that the database transactions have loaded successfully to Snowflake, minimizing risk by ensuring data consistency.

Striim can not only load data, and continuously feed data, to Snowflake. Striim is unique in its ability to provide in-flight processing such as filtering, transformations into the desired schema, and data masking. In-memory stream processing minimizes ETL workloads, improves performance, reduces complexity and facilitates compliance.

Striim offers low-impact, log-based CDC to Snowflake from the following data sources: Oracle Microsoft SQL Server, MySQL, PostgreSQL, MongoDB, HPE NonStop SQL/MX, HPE NonStop SQL/MP, HPE NonStop Enscribe, and MariaDB. New sources are being added on a regular basis. All of these sources can be accessed via Striim’s easy-to-use CDC Wizards and drag-and-drop UI, speeding delivery of CDC to Snowflake solutions.

For more information on Striim’s CDC to Snowflake offering, please visit our Snowflake solutions page at: striim2020.local.com/partners/real-time-data-to-snowflake/

If you’d like a brief demo of CDC to Snowflake, please schedule a demo.

2019 Technology Predictions

19 For 19: Technology Predictions For 2019 and Beyond

Striim’s 2019 Technology Predictions article was originally published on Forbes.

With 2018 out the door, it’s important to take a look at where we’ve been over these past twelve months before we embrace the possibilities of what’s ahead this year. It has been a 2019 Technology Predictionsfast-moving year in enterprise technology. Modern data management has been a primary objective for most enterprise companies in 2018, evidenced by the dramatic increase in cloud adoption, strategic mergers and acquisitions and the rise of artificial intelligence (AI) and other emerging technologies.

Continuing on from my predictions for 2018, let’s take out the crystal ball and imagine what could be happening technology-wise in 2016.

2019 Technology Predictions for Cloud

• The center of gravity for enterprise data centers will shift faster towards cloud as enterprise companies continue to expand their reliance on the cloud for more critical, high-value workloads, especially for cloud-bursting and analytics applications.

• Technologies that enable real-time data distribution between different cloud and on-premises systems will become increasingly important for almost all cloud use-cases.

• With the acquisition of Red Hat, IBM may not directly challenge the top providers but will play an essential role through the use of Red Hat technologies across these clouds, private clouds and on-premise data centers in increasingly hybrid models.

• Portable applications and serverless computing will accelerate the move to multi-cloud and hybrid models utilizing containers, Kubernetes, cloud and multi-cloud management, with more and more automation provided by a growing number of startups and established players.

• As more open-source technologies mature in the big data and analytics space, they will be turned into scalable managed cloud services, cannibalizing the revenue of commercial companies built to support them.

2019 Technology Predictions for Big Data

• Despite consolidation in the big data space, as evidenced by the Cloudera/Hortonworks merger, enterprise investment in big data infrastructure will wane as more companies move to the cloud for storage and analytics. (Full disclosure: Cloudera is a partner of Striim.)

• As 5G begins to make its way to market, data will be generated at even faster speeds, requiring enterprise companies to seriously consider modernizing their architecture to work natively with streaming data and in-memory processing.

• Lambda and Kappa architectures combining streaming and batch processing and analytics will continue to grow in popularity driven by technologies that can work with both real-time and long-term storage sources and targets. Such mixed-use architectures will be essential in driving machine learning operationalization.

• Data processing components of streaming and batch big data analytics will widely adopt variants of the SQL language to enable self-service processing and analytics by users that best know the data, rather than developers that use APIs.

• As more organizations operate in real time, fast, scalable SQL-based architectures like Snowflake and Apache Kudu will become more popular than traditional big data environments, driven by the need for continual up-to-date information.

2019 Technology Predictions for Machine Learning/Artificial Intelligence

• AI and machine learning will no longer be considered a specialty and will permeate business on a deeper level. By adopting centralized cross-functional AI departments, organizations will be able to produce, share and reuse AI models and solutions to realize rapid return on investment (ROI).

• The biggest benefits of AI will be achieved through integration of machine learning models with other essential new technologies. The convergence of AI with internet of things (IoT), blockchain and cloud investments will provide the greatest synergies with ground-breaking results.

• Data scientists will become part of DevOps in order to achieve rapid machine learning operationalization. Instead of being handed raw data, data scientists will move upstream and work with IT specialists to determine how to source, process and model data. This will enable models to be quickly integrated with real-time data flows, as well as continually evaluating, testing and updating models to ensure efficacy.

2019 Technology Predictions for Security

• The nature of threats will shift from many small actors to larger stronger, possibly state-sponsored adversaries, with industrial rather than consumer data being the target. The sophistication of these attacks will require more comprehensive real-time threat detection integrated with AI to adapt to ever-changing approaches.

• As more organizations move to cloud analytics, security and regulatory requirements will drastically increase the need for in-flight masking, obfuscation and encryption technologies, especially around PII and other sensitive information.

2019 Technology Predictions for IoT

• IoT, especially sensors coupled with location data, will undergo extreme growth, but will not be purchased directly by major enterprises. Instead, device makers and supporting real-time processing technologies will be combined by integrators using edge processing and cloud-based systems to provide complete IoT-based solutions across multiple industries.

• The increased variety of IoT devices, gateways and supporting technologies will lead to standardization efforts around protocols, data collection, formatting, canonical models and security requirements.

2019 Technology Predictions for Blockchain

• The adoption of blockchain-based digital ledger technologies will become more widespread, driven by easy-to-operate and manage cloud offerings in Amazon Web Services (AWS) and Azure. This will provide enterprises a way to rapidly prototype supply chain and digital contract implementations. (Full disclosure: AWS and Azure are partners of Striim.)

• Innovative new secure algorithms, coupled with computing power advances, will speed up the processing time of digital ledger transactions from seconds to milliseconds or microseconds in the next few years, enabling high-velocity streaming applications to work with blockchain.

Whether or not any of these 2019 technology predictions come to pass, we can be sure this year will bring a mix of steady movement towards enterprise modernization, continued investment in cloud, streaming architecture and machine learning, and a smattering of unexpected twists and new innovations that will enable enterprises to think — and act — nimbly.

Any thoughts or feedback on my 2019 technology predictions? Please share on Steve’s LinkedIn page: https://www.linkedin.com/in/stevewilkes/  For more information on Striim’s solutions in the areas Cloud, Big Data, Security and IoT, please visit our Solutions page, or schedule a brief demo with one of our lead technologists.

Move Data to Amazon Redshift with Striim

Continuously Move Data to Amazon Redshift via AWS Marketplace

Striim’s New Metered Cloud Solution for Streaming Data Pipelines to Move Data to Amazon Redshift Now Available in the AWS Marketplace

 

We are delighted to announce that Striim for Amazon Redshift is now available as a Platform-as-a-Service (PaaS) offering in the Amazon Web Services (AWS) Marketplace to enable companies to migrate and continuously move data to Amazon Redshift in real time. As an AWS Partner Network partner, we make it fast and easy to build streaming data pipelines to move data from a broad range of data sources to Amazon Redshift, speeding adoption of a hybrid-cloud architecture running on AWS.Move Data to Amazon Redshift with Striim

Running on AWS as a PaaS solution, the Striim platform offers non-intrusive, real-time data collection and movement from databases (including Oracle, SQL Server, HPE NonStop, PostgreSQL, and MySQL), data warehouses (such as Oracle Exadata and Teradata), Salesforce, Amazon S3, log files, messaging systems, sensors, and Hadoop solutions.

While data is streaming, Striim provides in-flight transformations and optimized delivery to Amazon Redshift.

“With Striim, AWS users can move data to Amazon Redshift continuously, and in the right format. Now that Striim for Amazon Redshift is available in the AWS Marketplace, streaming data pipelines to Redshift can be built in minutes using Striim’s data movement wizards. More importantly, Striim supports mission-critical workloads in the most demanding data environments, handling extreme volumes of data with built-in security and reliability for enterprise-grade, operational decision making.”

Alok Pareek
Founder and EVP of Products, Striim

For anyone looking to move data to Amazon Redshift, Striim offers several features and benefits that can maximize the speed and reliability of data migration, continuous data movement, and in-stream processing. Striim:

  • minimizes impact on source databases with non-intrusive change data capture (CDC)
  • simplifies CDC configuration through wizards
  • enables in-flight transformations / visualizations before delivery to Amazon Redshift
  • reduces data latency and on-premises ETL workloads
  • offers optimized interfaces to enable fast data loading to Amazon Redshift
  • provides full context for downstream operations

Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake. Redshift uses machine learning, massively parallel query execution, and columnar storage on high-performance disk to deliver high performance for cloud analytics.

For more information about Striim’s platform-as-a-service offering to move data to Amazon Redshift, please visit https://www.striim.com/partners/striim-for-aws/, or provision Striim for Amazon Redshift in the AWS Marketplace.

Continuous Data Movement and Processing for Hybrid Cloud

Continuous Data Movement and Processing for Hybrid Cloud

Striim enables continuous real-time data movement and processing for hybrid cloud, connecting on-prem data sources and cloud environments, as well as bridging a wide variety of cloud services across Microsoft Azure, AWS, and Google Cloud platforms. With in-memory stream processing for hybrid cloud, Striim allows you to store only the data you need, in the format you need. And Striim’s built-in delivery validation and data pipeline monitoring ensures pipeline health and replication verification in real time.

Why Striim for Data Movement and Processing for Hybrid Cloud

Striim automates and simplifies real-time streaming data pipelines for cloud environments. With its non-intrusive change data capture feature, Striim extracts real-time data without slowing down source transactional databases. Striim enables cloud migration with zero database downtime and minimized risk, and feeds real-time data to targets with full context – ready for rich analytics on the cloud – by performing filtering, transformation, aggregation, and enrichment on data in-motion.

 

Striim can run both on-premises, or in Azure, AWS and Google Cloud environments as a Service (PaaS) offering, allowing a flexible data management architecture.

  • Offload operational workloads to cloud by easily moving data across different cloud service providers: Azure, AWS, and Google Cloud
  • Filter, aggregate, transform, and enrich your data in-motion before delivering to the cloud in order to optimize cloud storage
  • Migrate your data to the cloud without interrupting business operations
  • Minimize risk of cloud migrations with real-time, built-in data delivery validation and data pipeline monitoring

Use Case: A European Express Parcel Company

This leading courier company in Europe enabled real-time data movement and processing for hybrid cloud solutions with the help of Striim. The company is moving its data warehousing and analytics solutions to the cloud, and uses Striim to move real-time data from transactional systems running on Oracle databases to Google BigQuery to enable cloud-based analytics. Google BigQuery serves as the operational data store supporting real-time reporting and ad-hoc queries. The company plans to use real-time transactional data for fleet optimization and real-time shipment status notifications to customers.

  • Moved their operational data store (ODS) to the cloud by ensuring up-to-date transactional data is available in the cloud
  • Eliminated the performance impact of running ad-hoc queries on the production OLTP systems
  • Supports analytics users with timely data in a flexible and future-ready, cloud-based ODS and data warehouse solution

Use Case: Leading Canada-Based Global Bank

The leading Canadian retail bank has adopted a cloud-first strategy and wanted to move its financial reporting application to Azure. Striim, running on Azure, continuously captures existing data and newly arriving change data from Oracle database and HPE NonStop systems (without impacting their performance), processes the data while in-flight, and delivers to Azure Event Hubs in real time. Azure Event Hubs supports their operational reporting and other new applications in the cloud. Bank employees can now have 24/7 access to current financial information and make critical risk management and other operational decisions based on real-time data, versus the old method of using day-old data.

  • Moves transactions from HPE NonStop to Azure Event Hubs in real time to run financial reporting in Azure
  • Employees now make critical risk management and other operational decisions based on real-time data, versus the old method of using day-old data
  • Easily executes on their cloud-first strategy by feeding real-time data to Azure to support new applications

To learn more about data movement and processing for hybrid cloud, please visit our Hybrid Cloud Integration solutions page, schedule a brief demo with a Striim technologist, or download a free trial of the Striim platform and try it for yourself!