Striim Hazelcast Hot Cache Powered by Striim CDC Webinar

In this webinar, Striim and Hazelcast are introducing the new Hot Cache functionality. This enables a Hazelcast cache to be continually updated by change from a database, ensuring it is always up to date. This solves the cache staleness issue that occurs when databases change outside of the data path that includes the cache.

Try Striim for Hazelcast Hot Cache for free here.

To learn more about the Striim platform, visit our platform overview page.


Unedited Transcript: 

Hi everyone and thanks for joining today’s Webinar. I am Ben Sawyer from Hazelcast and I am administrating the Webinar today. Today we’re going to be talking about keeping your cache hot with real time pushed based synchronization and this is a solution from Hazelcast and Striim and that today our presenters are Steve Wilkes, who’s the founder and CTO of Striim and Victor Gamma who is a senior solutions architect at Hazelcast. And without further ado, I’m going to turn it over to Victor.

Good morning guys. I hope you are doing well today and you’re ready to learn something about hot cache. So being a solution architect, I’ve worked with a bunch of projects with our customers and in many use cases I would say like 75% of use cases that I encounter is when the people try to speed up certain aspects of their fatigue. In particularly, they want to speed up the access to underlying databases. So in this, webinar, in this session we’re going to use this example of online store which is a pretty good example explaining certain concepts out the development and the typical three layer architecture. When we have a the database where we store inventory and aware we store our stuff application server that accesses, this data, and Hazelcast might used as the cache to speed out the access to certain data and obviously the data can be accessed through different clients.

So in this particular situation, um, works pretty well. However, in many use cases, um, we dealing with some legacy applications, maybe it’s a batch processing or maybe some old excel app or some Delphi app or some God knows yes system that also plays with that same data. And in, in this particular picture with my candor situation where data that we’ve written directly to the database will not be reflected one-to-one in in the particular moment we have the pool base synchronization. We can enable a different eviction policies or we can have some jobs that would run and reflects the cache of their certain time. However, it creates the time where the data is not available in cache. It will create a so-called window of inconsistency when data, a new data presents a presence in database. However, it’s not available in the cache and it’s not available for applications to consume.

So in this particular scenario, we we need to introduce something and we’re going to talk about that today with you and that allows to do a push base. So since the data, will be written database, change data capture solution is gonna handle transfer of these updated data in a push based architecture or in the push based approach. So you will not have this window for inconsistency and your data in your cache will always be hot and available. So from this point, I will pass my microphone to Steve who’s going to talk about Striim and a stream integration with Hazelcast that provides this solution for you. Steve.

Thank you victor. And Good morning everyone. So we’ll start off just a little bit about Striim so you can understand kind of our architecture and have it work and then we will delve into the actual solution. So Striim is a full end to end streaming integration and analytics platform that allows you to make use of your data as soon as it’s created and as an end to end platform, it has all of the plumbing needed to basically build these applications with very little work, very little coding. We’ve built it to be an enterprise grade platform, so it scales. It’s a distributed clustered solution that scales that has recovery and fail over built in that has built in security from end to end and allows you to collect data from a variety of different sources to process and transform that data, to run analytics on that data and to deliver it to a variety of different sources and then build real time streaming dashboards and alerting and that kind of thing on it.

So we partnered with Hazelcast to build this solution that uses a portion of our platform in order to keep caches synchronized with underlying databases. And we do that by utilizing the data collectors, the processing and the data delivery. So on the collection side, we can collect from things like sensors, message queues, log files, and do all of that in real time in a streaming fashion. But we can also deal with databases. Now, databases are typically thought of as a historical record of what’s happened in the past. But by using a technology called change data capture, you can see all the changes as they’re happening in a database. And what you end up with is a change stream, a stream of the updates that are happening within the database. Now, within our platform, you can take that stream or any other stream and run it through a set of continuous queries to process that data.

All those queries are written in a SQL like language that enables you to do things like filtering and transformation aggregation. You can add reference data and enrich that data. Then you can deliver it to a target so you can push data into databases, files, message queues to enable hybrid cloud integration, big data integration. And I’m in this instance also Hazelcast. It doesn’t stop there with the platform though, you can in addition, build, things like complex event processing to do pattern matching. You can do correlation, you can do anomaly detection and statistical analysis. You can store the results, you can build real time streaming dashboards and you can generate alerts and triggers. And what we’ve done is we’ve enabled the platform to be used by a variety of different users. The data flows that you use to collect and process and deliver that data. Effectively you can use the UI to do drag and drop and build those data flows.

You can use the UI to create your analytics, to deploy, visualize and monitor all of those things. And we also have a command line that takes a scripting language that also enables you to do those things. So if you utilize our platform, you can use change data capture from the database and push those changes live into the cache with the result that the cache isn’t there synchronized with the underlying database within milliseconds. So in real time. And so in the inventory example, you can imagine that your viewing the website and it’s showing inventory. And if you have an underlying date change to that inventory, you could be showing too much or too little stock. And obviously that can cause impacts on the business. If you show too much stock or people might think they’ve bought something when the haven’t really, if you show too little then they might not buy it and the sales guys aren’t happy.

But using change data capture, you will always keep the inventory up to date. And that goes for anything that you may once a store in a cache in order to speed up your application. So how does change data capture work? Okay, well, virtually every database writes the transactions, the operations that happen into transaction logs and it does this for a recovery and housekeeping and other purposes. Instead of doing SQL queries against the database, by using timestamps to see what’s changed since you last looked or triggers against the database, which have a big impact on the actual database operation. A change data capture reads the transaction logs. So it’s very non-intrusive and has very little impact on the underlying database. So it enables you to collect the operations as they occur, which means that every insert update and delete that happens against that database get turned into events in our platform. So they become events on a stream within that platform that can then be used to update the cache and synchronize it in real time.

Okay. So in this example, we’re going to do a very simple example. We have a inventory information in a database table and it’s represented in the cache as a product in Java class. Okay. And we’ve defined some mapping between these two using a object relational mapping XML file that basically maps the table and the columns within the table to the attributes within the Java class. So how do we go about building this using the Striim platform? So pretty recently in our platform, we added the notion of wizards to actually speed this particular development process. And we did this because there are a few crucial steps that you need to do with change data capture that typically involve you and you’re reading a manual. And not everyone did that. So we’ve kind of automated those steps and kind of made it easy for people to understand what’s actually required of the database in order to achieve change data capture. 

And, you know, for SQL it’s pretty straightforward. For example, you need to have the binary log turned on and you need to have users with correct privileges set up that can actually read from that binary log with, you know, Oracle and MySQL. There are different requirements. So you know, for example, you need to have supplemental logging turned on in Oracle to do change data capture. Yeah. So as you step through using the wizard, you enter the connectivity information into your database and then we go through as a set of pre-checks to make sure that you can connect to the database, that you have the required privileges and that the change data capture configuration of the database is set up correctly, that all of the pieces that you have are the configured properly. You notice that change data capture is going to work well. Then make sure that you can read table metadata in order to provide you information about the database tables.

So once you connected into the database and those pre-checks are passed, you can then select the tables that you want and that can be one or more table. In this case, it’s just a product inventory table and that will result in a change data stream. So all of the operations that occur to all of the tables that you’ve selected will be written into the same change data stream. There are additional configurations that you can do using this wizard where you can also create specific streams for specific tables. This is kind of a replication scenario where you are ensuring that you want to want the data in the database is reflected in the cache. But in other examples you may want to do analytics on a particular table or do some specific modifications. And what this is going to do internally is actually creates specific data streams with specific data types for the tables that you have selected.

So once you’ve got your change data stream, you can now configure the Hazelcast connection and the object relational mapping file that I showed you earlier that will do the mapping from the database into the caches that you need to keep updated. So you specify the connection into Hazelcast and then multiple ways you can do this. Here we’re using a cluster name, password and the URL and we will then validate that the object relational mapping file works with the Java class specified that that class is available, and that the types match et cetera. So we’ll do all that validation for you and then we’ll you to generate the Hazelcast target where you map the table into a particular cache name. You basically say, okay, this is the name of the cache that I want to read. The generator keep updated based on the tables coming in.

This results in a data flow and this is probably the simplest type of data flow you’ll ever see in the Striim platform. It is a simple change data capture with a single target that is writing out to Hazelcast. And when this is deployed and running, yeah. Or the changes that are happening in real time, well we pushed into Hazelcast in real time and you could do more things in between these. So, for example, if you wanted to load reference data into memory, join it with the change data and create a bigger objects that you needed to push into the cache. You could do that here. There’s a lot of options you could do. This is the kind of the simplest use case. Okay. In addition to doing that change data capture, you can also build a monitoring data flow and you can do this using kind of a template application that we have. And you can kind of build on this. You can extend it, you can do whatever you want to analyze the data flow and have some real time tracking. You could have ad alerting in here. So if you wanted to flag if transaction rates went over a particular volume, you could flag that. Or if it dropped to zero you could flag that. So there’s a lot of things you can do on the analytic side.

So let’s see what this all looks like when it’s actually running. Okay. And we’ll do that by utilizing application flows within the Striim platform. Now, the first thing we’re actually going to do here because we have an empty Hazelcast cache is to actually do an initial load. So we’re going to use a Striim product to actually load the data into the cache. So let’s take a look at what we have on this side. There’s a sample program that we can make available that is creating a table within MySQL database. And you can see from this select statement that the table currently has zero rows in it. Over on this side we have a Hazelcast instance, and the instance also has a little a command line put around it. So we can actually do things like lists the data in the caches and there’s just a test cache that just has one row of data in just to make sure that everything’s working.

We don’t actually have the real cache yet, so we can use this, a database program here to add some data into the database table. So we’re going to insert data into this table, a hundred thousand records at a time. And obviously this takes a little bit of time to run. Um, it’s just added 100,000 rows there. Let’s add another hundred thousand rows. And the results of this will be obviously a table with a hundred inventory items in it or 200,000 inventory items in it. And we can check that that’s the case just by doing a quick count. So you can see the table now has 200,000 rows. Okay. And we just verify again, we still have nothing over here. This is kind of like a magician show and he has nothing up his sleeves. So there’s nothing in the caches yet. If we run here now we’ll deploy this application and start it up.

This is going to do a query from that table and it’s going to push it over into Hazelcast. And because it’s just 200,000 rows just happens really quickly. So that should be complete. We will be able to go back to our command line and you can see the connection happened there. Let’s just do a list. And so we can see here, we now have 200,000 rows in the product inventory map that didn’t happen, didn’t exist before and for interest. This is the first entry in that. So that’s the simple case. We’ve just done an initial load of the cache well are going to do now is start the change data capture application.

Okay. Which is that data flow that you saw in the presentation. So you can see this as the CDC and this is the delivery into Hazelcast and in between these two a out of interest that is a data stream and this is a general purpose data stream that it contains a general purpose structure. That’s why it can handle any type of tables where multiple tables coming in because it just has metadata that tells you what table or operation type, what transaction id, a whole bunch of things like that. And then it has the data for inserts, updates to these, etc. But then for updates you can also include the before image. So if data changed, you can actually see what it used to be and what it is as well. So this is the simple data flow, doing change data capture that’s deployed this application and start this one up.

So those who started, it’s listening to the database and any changes are going to be reflected instantly into Hazelcast. Let’s go back to our application over here and run some modifications. So these are modifications. It’s a random programs, so it’s generating random inserts, updates and deletes against table, and it’s tracking the average rate. We can see the rate here is matching, so it’s keeping up with the rate that’s happening over here. And I’m not gonna do too many, because it’s kind of boring to just look at these numbers scrolling past. If I do a stop, you can see that it’s now done 47,301 modifications. If I do a count.

Okay, this tells us that the table, because of changes to inserts, updates and deletes now has 188,183. If we go over to a Hazelcast and we do a list, you can see that the cache also has 188,000, 183 items in it. But you know, obviously, you know, something may have got messed up. So let’s just do a check. There’s a feature in this program as well where you can actually just do a dump and this is going to output all of those rows into a file so we can verify it. And I can do the same over here on the Hazelcast side and dump out the entries in the cash also to a file.

Okay. And if I have a look in here, I can do a diff on the Hazelcast dump file to the SQL dump file. And you can see that the identical obviously could have cheated there with empty files. I’ll just show you, there’s the Hazelcast dump, there’s the SQL dump, so you can see that they are actually identical. So that’s kind of the basics of doing chang data capture and delivery into Hazelcast and keeping Hazelcast up to date. The other thing that you can do, as I mentioned in the presentation is you can build a data flow that will also monitor what’s going on and you can do custom monitoring. You could even do custom analytics. You could even drill down into your data that was flowing through if you wanted to do very specific analytics on your application.

And what this is doing, it’s running against that same changed data stream that we had coming in from a SQL server and it’s doing some queries. You can see this is a simple SQL query and it’s pulling out the metadata from that change stream. It’s then doing things like calculating the lag and doing aggregates over moving windows to give you a view of what’s going on in the change data capture stream in real time. So this is kind of a data flow that’s generating some analytics. So if I deploy this and I start this and then redo those modifications, okay, I can visualize all of that on a dashboard. So if I take a look at the change data capture dashboard, this is a live view. Um, the data’s being pushed into the dashboard in real time of what’s happening in those modifications.

So you can see these are the inserts, updates and deletes that are going on. This is the transaction rate that the transaction lag over time. This is another view of that and these are some of the most recent transactions that are actually occurred. So you can drill into that as well. Now this dashboard, like everything else in that platform is not hard coded. Uh, this is something that you can build. We have a dashboard builder over here where you can drag and drop visualizations into the dashboard and each one of these visualizations is powered by a query that is pulling the information from that data flow at the back end and then visualizing it. And there’s a special syntax here, this push that tells us that this is a live dashboard. So it’s basically going to be having data delivered to it in real time from the backend.

And you basically map the data in the results of that query inthis case, a bar chart, by dragging and dropping fields. And then you actually have your visualization in the dashboard. So that’s how you build not only the end to end data flow in our platform, but also visualization. So that’s just stop this from a running and you can very quickly show you kind of what the wizards look like as well. So if I go into apps, you can build a new applications very easily just by doing add application. This is how you get to those wizards. I won’t go through the whole wizard because you’ve seen that already in the slides, but you can see here that if I limit the target by Hazelcast, these are the various Hazelcast CDC targets that we have. And when you click on one of these, it asks you for name and then takes you through all of those steps that I mentioned before.

Just enter all of these things. And then hopefully it didn’t make a typo. Forgive me if I did. Nope, I made a typo. You can see that if you get things wrong then it actually shows you everything you need to do to actually get it right. I think it’s actually three, three or six.

 So yeah, I did that deliberately just to prove 30 days he checks everything as you’re entering it. So that’s an example of how you’d actually use the wizard and then you’d go through and you’d select the a tables you would select the Hazelcast information and you would end up with your full working data flow

So that’s a very quick introduction to how you build the Hazelcast hot cache using the Striim platform and we will now open it up to questions and I’ll hand it over to Ben to guide you through this process. Great. Thanks to you.

All right. So like toask your questions, please use the Webex Q and A box. If you’re out of full screen mode, you’ll see it in the upper right hand corner of your Webex interface. And if you’re in full screen mode, you can just hover over the bar at the top and you can access it that way as well. So please just go ahead and type your questions in there. And uh, Steve and victor can answer those.

So there’s a question about if there’s a network glitch between stream and then MySQL server has it recover. So the Striim platform has recovery built into it. And the way that we do that is through basically checkpointing and kind of remembering state during the checkpoint. And in this particular situation that is reasonably straightforward because you have a single source in a single target, in more complex scenarios where you have windows involved or a pattern matching or content buffers or any of those other things, you need to recover those things to the point of which you left off. So our recovery process, checkpointing is actually recently intelligent in that it takes all of those things into account. But the simple answer to the question is yes. Within that checkpointing, we will remember where we were in the transaction log. And in the case of a failure, we will restart from where we left off. No. Okay. That may not always be possible. And if you think, for example, that your log archiving scheme is not sufficient, that if you have a big outage you would not be able to rewind back into the archives.

Then we also do have additional option and making streams persistent. So that data stream that’s between the MySQL CDC and Hazelcast, you can optionally turn that into a persistent stream. And what we do behind the covers is we integrate and bundle and ship Kafka as part of our platform. If you have your own Kafka cluster, you can utilize that, but we can also utilize that to store historically any transactions we’ve seen. And we’ll rewind into that instead. So there are kind of two options there. Either you rewind into the alcove, local, you rewind into an intermediate Kafka stream, but we can absolutely handle any kind of network, leisure, any kind of failure automatically. Okay. Let’s see, Victoria, are there any questions that you want to answer at this point? I’m just scrolling through.

Yeah, I’m just looking. So in terms of interceptors, um, so yes, uh, interceptor steel will be invoked. Um, if we’re talking about Hazelcast interceptor, a special piece of code that will be in worked in particular operations, um, Striim utilizes Hazel carrots as client. So from perspective of Haelcast, it’s traditional application. So all API would also work, um, including like interceptors

and there’s a couple of questions around DB two. Unfortunately we do not support a db two at this point. That said that our approach is to deal with that. If you have a DB two replication, um, then you can configure that to deliver to a MQ series or you know, basically JMS compliant. We could read a changes from that and use those to deliver into Hazelcast so we can kind of pass those messages, the replication messages that are written into say MQ series and use those two can update Hazelcast. The platform is pretty flexible like that. But yeah, unfortunately we don’t support db two at this time. Um, yeah. So another question around, you know, does CDC use Redo logs from disk? So the, the CDC that we use, it does effectively go through and read from the Redo logs, although we actually go through APIs to access those, which means that yeah, a lot of cases, if you’re dealing with most recent change, those redo logs are actually mapped into memory in the kind of the, the physical implementation of it.

So the offshoot of that is it’s very efficient and Aif you’re keeping up, you’re able to keep up very well. And it will have very little latency. You know, most of the latency comes from the initial startup time where, you know, when you first connect in and you start doing change data capture, you will be a little bit behind and that can remain, but typically the latency can be in the order of milliseconds. Um, it’s pretty close to what’s actually going on in real time in the database.

Okay. So then there’s a lot of questions around kind of what happens with a change data capture where you have multiple objects that involve multiple tables and you know, so if your cache isn’t a simple cache but is more of a complex object that’s built from multiple tables and you’re using something like hibernate object, relational mapping how does that actually work? So that isn’t something that is supported out of the box right now. It’s not something where you can just say, okay, I want to look at this structure and keep this up to date. You could build a data flow to actually achieve some of those things where you could load, for example, reference data into memory and build things up within memory and deliver that into the cache. Although one of the customers we’re working with, the approach that they’re using is to have Striim deliver live into an intermediate map.

So basically we’ll deliver all the changes that we see into an intermediate map. And then there’s a little bit of Java code that’s written there that looks at the change. And depending on what a part of the object is, can use reference data to find the top level object. And then it’s simply evicts that object from the cache it does a remove. And the offshoot of that is that the next time that object is tried to be accessed, it will recreate the object from the database. And you now have a fresh object. So instead of directly updating the cache to keep it synchronized, you effectively evict the object from the casche, which means that the next time you try and access it, it’s going to be up to date. And that that can work just by kind of understanding the relationship. So this is something that you have to kind of manually construct, but it’s certainly possible using this scenario.

There’s another question here. Is it possible to select the operations to be streamed? Okay. Uh, can you choose just updates and deletes because a new rows will be inserted when the customer asks for them. So basically the cache is containing the things that have been asked for and inserts are ignored because you know, they’re part of the process. Yes, absolutely you can do that. If we take a look at the Striim platform again and I go back and see, just cancel this, go back to you know, any application here. Um, but easiest to look at this monitor up here. So, I showed you how, you know, you have a data stream here and this data stream is extracting amount of data. You can see it’s pulling off the operation name. So we, know for any change that comes in by table is from what the timestamp is while the operation name is, whether it’s insert, update and delete within this query here or any query that was runningY in the data flow. You could say have a where clause which base it said, you know, where operation name is not insert. And if that’s the case then you only going to push through the updates and deletes. And so if you had a simple data flow that was your change data capture and then a query that was filtering out inserts and then the delivery into HazelCast would be exactly the scenario that you just asked for.

Okay. Victor, do you have any ones you want to answer? I think I’ve got through all of mine.

Um, I guess question about if cluster goes down, can we use a data stream? Like can we Striim platform, I assumewe haven’t question can we use a stream to recreate the cluster to present state? So there’s a couple things that you can do. If you will use a persistent stream, you probably can replay, right Steve. Um, but it’s going to be able to match any to sort of like force it to, to rerun or something.

I’m guessing the cluster they’re talking about here is the Hazelcast cluster, right? Yes. So if you lost the cluster completely, then you’re going to need to rebuild the data in that cluster. Now, I think Hazelcast now has some new functionality to kind of fast load caches rates to persist them and faster than, yeah,

the two options that are available right now is to use Hazelcast to start a store. So the hot restart store essentially, you know, if you lost the whole cluster is, you know, is the entity it will actually put data into the file, to the disc. So the next hour, next time you don’t need to actually receive the cluster. Used simply restart the cluster and we’ll figure out how to restore this data, this functionality called hot restart store and it works. We first situation like all one off whole cluster, you know, failure. Second option that you can explore is to use or application of the clusters. So say you have one cluster that has some data in that you can replicate this data to another cluster. So in case of failure, you can restore the data by calling the same configuration, which will from your backup cluster. You can initiate the sync, it will redirect the cluster of through start, it’s an interruption option that’s available in Hazelcast out of the box.

Yeah. One of the things that could have happened there is, you know, obviously you’re between the cluster going down and you restarting it. The database may have changed.

One option and we should probably come up with some best practices around this. You know, one option would be to use if you’re using a stream to keep their cache heart would be to have Striim race out transaction ID; your SSN or whatever to indicate you want to use or maybe even a timestamp for whatever the last replicated a transaction that occurred and store that in Hazelcast as well in some map. That way if the cluster goes down and you restart it from your a heart restart, we will have an indicator of what the last transaction that here’s a cache saw was. Um, and then you could rewind Striim to that position and replay all the changes since then and bring your cluster back to a real time. So, I think we should put some best practices around that, but I think the combination of those two things were laid to get your cluster back really quickly and have it update with the database. As I showed in this example, you could, if you wished or you alternatively use Striim for initial loading, the here’s accounts closer or whatever technology you have already, but I think that combination of part restart and uh, having Striim, bringing it up to date is probably the best and fastest option there.

So I think we’re out of questions. So I would like to thank you all for being on this call and I’ll hand you back to you.

Great. Thanks Steve. And thank you so much Victor. Thanks everyone for attending. If you want to learn more about Striim hot cache solution visit either of the URLs that you see on your screen right now. Thanks for attending. We will be doing a recording of this Webinar, which we’ll put up and share with you probably in the next week, so be looking for that. You can definitely share that with anyone else who might be interested. And thanks again for attending and everyone have a great day.

Thank you very much.