Watch this video to see how to do SQL Server change data capture. This video specifically describes a Striim application that moves data from a relational data source (MS SQL) to HBase.

To learn more about the Striim platform, go here.


Unedited Transcript:

Hello, my name is Dimitri. I’m a solution architect at Striim. Today, I will show you SQL Server change data capture to Hadoop use case with a Striim application. What would be the motivation for doing something like this? Obviously, uh, people would like to replicate their upper operational sources of data row by row to Hadoop data hub. And the solution that Striim presents is applying inserts, updates and deletes through CDC to age base or hive or other data sources. Why Striim? Today, vendors cannot easily apply inserts, updates and deletes directly to Hadoop services without custom coding. Striim can solve this problem, but easily integrating CDC readers for various sources, uh, that apply data to targets. It’s repeatable. It’s very easy to create a new application and operated. Today. I will show you how Striim application can easily do it for HBase. Uh, I have pre-prepared, uh, pre-created, uh, a bunch of tables on a Microsoft SQL Server.

So, uh, through a remote desktop connection, I’m connecting to my SQL server management studio. And here I have two tables that I will be using for the demo. Uh, one has the Schema, my key as a string, as an integer. And then my value is a string. And the other one has two integers. My key in my int value. And then my value, again, my key is the primary table, a primary key for both of them. So now on the age-based side, if I list uh, tables that they have, it can see they have table one and two and then I’ll be working with table one through this cumulus approach where I can replicate both tables on sequel server to the same table on uh, HBase base. So if I scan the table right now, I will see that the table contains about 10 values, the different garbage value values stored in there.

So, um, next I would like to show you the user interface. There are two parts to Striim application, first the flow management, so you can easily drag and drop and create the sources and the targets in here and then define your trends with transformations. So, um, in this case we’re working with SQLserver source and we are looking at two tables listed here, separated by Semi-colon. And the database credentials are listed in here as well. And then we’re working with HBase targets through age base writer and this is the definition for the writer. And then we separately have the monitor flow that allows us to power the user interface. So we work with a dashboard that basically can list all metadata about the activity on the source tables and then look at activity by operation. I also list high transaction lags. Anything that fits the threshold profile. Let’s say something that’s longer than six seconds. So let me drive a couple of manual operations first. So right now on the surface table, a target table, we don’t have the key 52 so let me just insert key 52 to sequel server table. And then if I scan the high, the Hbase table, I will see it on the other side. So it just made it in here at the same time. I can update it. Let me re scan the table again.

So for 52 I’ve just applied the value that I upended in here and that may delete this role for this key, our Emory scan. And again, and you can see that this role value 52 is gone. And then just to show that we can also deal with data at higher volumes through my custom reader, I’m gonna apply a thousand rows to my target table, and it’s going to take a couple of seconds. So I’ve sent deletes and then thousand and certain and thousand updates. I’ve actually done 3000 operations right now, or actually, sorry, 2010 and 12 so if I rescanned a table, I will see that all thousand rows have been applied. So this concludes the demo for CDC to Hadoop with a focus on age base. Thank you very much.