Data iPaaS for Simple and Complex Data Flows – Part 3

Now that we’ve covered how to get up-and-running with applications, let’s take a look at data iPaaS for simple and complex data flows.

What this is going to do is read real-time changes from the MySQL database as they’re happening and deliver them as-is into Cosmos DB. By default, Cosmos DB is a document database so we would convert it to JSON. This is a very simple example, showing how easy it is to use a wizard for CDC.

You can actually handle much more complex data flows. For example, taking MySQL again and writing into Google Pub/Sub is a very similar type of application. We can look at something a little bit more complicated – MySQL CDC writing into MongoDB. This is more complicated because we’re not just moving the raw data. We are actually doing transformations of that data as it’s moving. You can build transformations using a drag-and-drop UI.

You can also do transformations using SQL. A lot of our customers prefer to use SQL because it’s more powerful. In this case, what we’re doing is taking change data as it’s happening from MySQL – this is basically all of the inserts, updates, and deletes happening in this database that are being pushed into this data stream. We’re also loading some reference information into memory. We’re loading some data from a product table in a MySQL database, and some information that is in a locations CSV file, into caches in-memory. If you run Striim in a cluster, these are distributed, replicated, scalable, caches of data that can be refreshed. They can also be updated through change data capture so that they are continually up-to-date.

We join that information with the real-time data stream using a SQL query. This SQL is doing a join from the stream against the two caches and doing some data conversion. It’s converting some things into integers; it’s doing a case statement that’s determining what to do, based on whether you’re getting an update, insert, or delete from the database; it’s doing a date conversion, etc. So you can see it’s doing quite a lot of transformation of that data.

Moving on, this enriched data stream actually has specific fields, so it’s basically transformed the nature of the data stream. We’re then running it through another step here, doing some email masking. There’s an email address in there that we want to obfuscate because of GDPR or other privacy regulations. We don’t want to push that into analytics. So we are going to anonymize the email address as it’s going through with this mask component. The nice thing about this is not only is it GUI-driven – you can choose what it is you want to do and what function you want to apply to things -– but also you can see what the SQL is underneath, which enables you to learn how to write SQL. We’re then converting things into JSON and delivering that into MongoDB.

So, as you can see, it’s very straightforward to build these arbitrarily complex data flows. You can lock all these things down through security. As I mentioned, you can allow only a certain user to administer an application and modify it, while everyone else only has access to the end data stream. So if they want to build applications from that (real-time analytics applications for example), or they want to write this data into other targets (such as MongoDB, Azure, Google Spanner, etc.), they can do that. However, they’re not allowed to access this raw data stream because it had personally identifiable information in it.

I hope you enjoyed this brief tutorial on working with simple and complex data flows. To learn more about the Striim platform, read our Striim Platform Overview data sheet, set up a quick demo with a Striim technologist, or provision the Striim platform as an iPaaS solution on Microsoft Azure, Google Cloud Platform, or Amazon Web Services.

If you missed it or would like to catch up, read parts 1 and 2 of our data iPaaS series, “The Striim Platform as a Data Integration Platform as a Service” and “Building Data iPaaS Applications with Wizards Using Striim.”