Skip to main content

Striim Cloud 4.1.0 documentation

Running the CDC demo apps

The CDC demo applications demonstrate Striim's data migration capabilities using a PostgreSQL instance in a Docker container.

About the CDC demo apps

A grid of tiles representing the Change Data Capture demo applications.

There are three groups of applications:

  • SamplesDB demonstrates a SQL CDC source to database target pipeline, replicating data from one set of PostgreSQL tables to another set. The two applications that are similar to their real-world equivalents are:

    • PostgresToPostgresInitialLoad150KRows uses Database Reader and Database Writer to replicate 150,000 existing records from the customer, nation, and region tables to the customertarget, nationtarget, and regiontarget tables. In a real-world application, the source and target would typically be different databases. For example, the source might be Oracle and the target might be Amazon Redshift; Azure SQL Data Warehouse, PostgreSQL, or SQL DB; Google BigQuery, Cloud SQL, or Spanner; or Snowflake.DatabaseReaderDatabaseWriter

    • PostgresToPostgresCDC uses PostgreSQLReader (see PostgreSQL) and Database Writer to continuously update the target tables with changes to the source.DatabaseWriter

  • SamplesDB2Kafka demonstrates a typical SQL CDC source to database target pipeline, replicating data from a set of PostgreSQL tables to a Kafka topic. The two applications that are similar to their real-world equivalents are:

    • PostgresToKafkaInitialLoad150KRows uses Database Reader and Kafka Writer to replicate 150,000 existing records from the PostgreSQL customer, nation, and region tables to messages in a Kafka topic called kafkaPostgresTopic. In a real-world application, the target would be an external Kafka instance, either on-premise or in the cloud.DatabaseReader

    • PostgresToKafkaCDC uses PostgreSQLReader (see PostgreSQL) and Kafka Writer to continuously update the Kafka topic with changes to the PostgreSQL source tables. Note that updates and deletes in PostgreSQL create new messages in Kafka rather than updating or deleting previous messages relating to those rows.

  • SamplesDB2File demonstrates a typical SQL CDC source to file target pipeline, replicating data from a set of PostgreSQL tables to files. The two applications that are similar to their real-world equivalents are:

    • PostgresToFileInitialLoad150KRows uses Database Reader and File Writer to replicate 150,000 existing records from the PostgreSQL customer, nation, and region tables to files in striim/SampleOutput.. In a real-world application, the target directory would typically be on another host, perhaps in AWS S3, Azure Blob Storage or HD Insight Hadoop, or Google Cloud Storage.DatabaseReader

    • PostgresToFileCDC uses PostgreSQLReader (see PostgreSQL) and File Writer to continuously update the files with changes to the PostgreSQL source tables. Note that updates and deletes in PostgreSQL add new entries to the target files rather than updating or deleting previous entries relating to those rows.

Striim provides wizards to help you create similar applications for many source-target combinations (see Creating apps using templates).Creating apps using templates

The other applications use open processors (see Creating an open processor component) and other custom components to manage the PostgreSQL instance and generate inserts, updates, and deletes. In a real-world application, the source database would be updated by users and other applications.

  • ValidatePostgres, ValidateKafka, and ValidateFile verify that the sources and targets used by the other apps are available.

  • Execute250Inserts adds 250 rows to the source tables and stops automatically.

  • Execute250Updates changes 250 rows in the source tables and stops automatically.

  • Execute250Deletes removes 250 rows from the source tables and stops automatically.

  • ResetPostgresSample, ResetKafkaSample, and ResetFileSample clear all the data created by the other apps, leaving the apps, PostgreSQL tables, Kafka, and SampleOutput directory in their original states.

Running the applications

When Striim, the PostgreSQL instance in Docker, and Kafka are running, you can use the PostgreSQL demo applications. The process is the same for all three sets of applications.

  1. Deploy and start the ValidatePostgres, ValidateKafka, and ValidateFile applications and leave them running.

  2. In the SamplesDB group, deploy and start the SamplesDB.PostgresToPostgresInitialLoad150KRows application.

    A Postgres alert for an insert.
  3. When you see the alert above, that means initial load has completed. Stop and undeploy the InitialLoad application.

  4. Deploy and start the SamplesDB.PostgresToPostgresCDC application.

  5. Once the CDC application is running, deploy and start the SamplesDB.Execute250Inserts application. It will add 250 rows to the customer table, give you an alert, and stop automatically. The CDC app will replicate the rows to the target.

    An alert that 250 inserts in the source table PUBLIC.CUSTOMER are complete.
  6. Deploy and start SamplesDB.Execute250Updates. It will update a random range of 250 rows in the customer table, give you an alert, and stop automatically. The PostgreSQL CDC app will replicate the changes to the corresponding rows in the customertarget table. PostgresToKafkaCDC will add messages describing the updates to the target topic. PostgresToFileCDC will add entries describing the updates to the files in SampleOutput.

  7. Deploy and start SamplesDB.Execute250Deletes. It will delete the first 250 rows in the customer table, give you an alert, and stop automatically. The PostgreSQL CDC app will delete the corresponding rows in the customertarget table. PostgresToKafkaCDC will add messages describing the deletes to the target topic. PostgresToFileCDC will add entries describing the deletes to the files in SampleOutput.

Verifying PostgreSQL to PostgreSQL replication

To view the results of the load, insert, update, and delete commands in the PostgreSQL target, use any PostgreSQL client to log in to localhost:5432 with username striim and password striim.

Alternatively, you can access virtual machine's command line and run psql:

  1. In a Docker Quickstart, OS X, or Linux terminal, enter:

    docker exec -it striimpostgres /bin/bash
  2. When you see the bash prompt, enter:

    psql -U striim -d webaction

Before running PostgresToPostgresInitialLoad150KRows, the customer table has 150,000 rows and customertarget has none:

webaction=# select count(*) from customer;
 count  
--------
 150000
(1 row)

webaction=# select count(*) from customertarget;
 count 
-------
     0
(1 row)

After running PostgresToPostgresInitialLoad150KRows, customertarget has 150,000 rows:

webaction=# select count(*) from customertarget;
 count  
--------
 150000
(1 row)

After stopping PostgresToPostgresInitialLoad150KRows, starting PostgresToPostgresCDC, and running Execute250Inserts:

webaction=# select count(*) from customertarget;
 count  
--------
 150250
(1 row)

After running Execute250Updates:

webaction=# select * from customer where c_custkey=113981;
 c_custkey |       c_name       |             c_address              | c_nationkey ...    
-----------+--------------------+------------------------------------+------------ ...
    113981 | Customer#000113981 | kpxLWwaZh3DpOr Qudn1OKolRYyIlFshOG |           4 ...
(1 row)

webaction=# select * from customertarget where c_custkey=113981;
 c_custkey |       c_name       |             c_address              | c_nationkey ...    
-----------+--------------------+------------------------------------+------------ ...
    113981 | Customer#000113981 | kpxLWwaZh3DpOr Qudn1OKolRYyIlFshOG |           4 ...
(1 row)

After running Execute250Deletes:

webaction=# select * from customer where c_custkey=1;
 c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment 
-----------+--------+-----------+-------------+---------+-----------+--------------+-----------
(0 rows)

webaction=# select * from customertarget where c_custkey=1;
 c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment 
-----------+--------+-----------+-------------+---------+-----------+--------------+-----------
(0 rows)

Viewing Kafka target data

To see the output of PostgresToKafkaCDC, use Kafka Tool or a similar viewer. The Kafka cluster name is the same as your Striim cluster name. The Kafka version is 0.11.

Viewing file target data

The output of PostgresToFileCDC is in striim/SampleOutput.

Running the applications again at a later time

  1. docker start striimpostgres
  2. Warning

    On Windows, Zookeeper and Kafka do not shut down cleanly. (This is a well-known problem.) Before you restart Kafka, you must delete the files they leave in c:\tmp.

  3. Deploy and start the ValidatePostgres, ValidateKafka, and ValidateFile applications and leave them running.

  4. Deploy and start the ResetPostgresSample, ResetKafkaSample, and ResetFileSample apps, then when they have completed undeploy them.