Skip to main content

Pipelines

As discussed in Common Striim use cases, Striim applications can do many different things. When the primary purpose of an application is to move or copy data from a source to a target, we call that a "pipeline" application. For an introduction to the subject, see What is a Data Pipeline.

Common source-target combinations

The following examples are just the most popular among Striim's customers. There are many other possibilities.

  • Database to database, for example, from MySQL, Oracle, or SQL Server to MariaDb, PostgreSQL, or Spanner in the cloud. See the Change Data Capture (CDC) for a full list of supported sources and Targets for a full list of supported targets.

    The most common use for this kind of pipeline is to allow a gradual migration from on-premise to cloud. Applications built on top of the on-premise database can be gradually replaced with new applications built on the cloud database. Once all the legacy applications are replaced, the pipeline can be shut down and the on-premise database can be retired.

    In this model, updates, and delete operations on the the source tables are replicated to the target with no duplicates or missing data (that is, "exactly once processing or E1P"). This consistency is ensured even after events such as a server crash require restarting the application (see Recovering applications).Recovering applications

  • Database to data warehouse, for example, from Oracle, PostgreSQL, or SQL Server (on premise or in the cloud) to Google BigQuery, Amazon Redshift, Azure Synapse, or Snowflake. See Sources for a full list of supported sources and Targets for a full list of supported targets.

    The primary use for this kind of pipeline is to update data warehouses with new data in near real time rather than in periodic batches.

    Typically data warehouses retain all data so that business intelligence reports can be generated from historical data. Consequently, when rows are updated or deleted in the source tables, instead of overwriting the old data in the target Striim appends a record of the update or delete operation. Striim ensures that all data is replicated to the target, though after events such as a server crash require restarting the application there may be duplicates in the target (that is, "at least once processing" or A1P).

Supported sources and targets for pipeline apps

The following sources (all SQL databases) and targets may be directly connected by a WAEvent stream.

Supported WAEvent sources

Supported targets

  • Cosmos DB Reader

  • GCS Reader

  • HP NonStop SQL/MX using Database Reader or Incremental Batch Reader

  • HP NonStop Enscribe, SQL/MP, and SQL/MX readers (CDC)

  • MariaDB Reader (CDC)

  • MariaDB using Database Reader or Incremental Batch Reader

  • Mongo Cosmos DB Reader

  • MySQL Reader (CDC)

  • MySQL using Database Reader or Incremental Batch Reader

  • Oracle Reader (CDC)

  • OJet

  • Oracle Database using Database Reader or Incremental Batch Reader

  • PostgreSQL Reader (CDC)

  • PostgreSQL using Database Reader or Incremental Batch Reader

  • Salesforce Pardot Reader

  • ServiceNow Reader (in this release, supports insert and update operations only, not deletes)

  • SQL Server using MSJet (CDC)

  • SQL Server CDC using MS SQL Reader (CDC)

  • SQL Server using Database Reader or Incremental Batch Reader

  • Sybase using Database Reader or Incremental Batch Reader

  • Teradata using Database Reader or Incremental Batch Reader

  • Azure Synapse using Azure SQL DWH Writer

  • BigQuery Writer

  • Cassandra Cosmos DB Writer

  • Cassandra Writer

  • Cloudera Hive Writer

  • Cosmos DB Writer

  • Databricks Writer

  • Hazelcast Writer

  • HBase Writer

  • HP NonStop SQL/MX using Database Writer

  • Hortonworks Hive Writer

  • Kafka Writer

  • Kudu Writer

  • MariaDB using Database Writer

  • Mongo Cosmos DB Writer

  • MongoDB Writer

  • MySQL using Database Writer

  • Oracle Database using Database Writer

  • PostgreSQL using Database Writer

  • Redshift Writer

  • Salesforce Writer (in MERGE mode)

  • SAP HANA using Database Writer

  • ServiceNow Writer

  • Singlestore (MemSQL) using Database Writer

  • Snowflake Writer

  • Spanner Writer

  • SQL Server using Database Writer

The following sources and targets may be directly connected by a JSONNodeEvent stream.

Supported JSONNodeEvent sources

Supported targets

  • Cosmos DB Reader

  • JMX Reader

  • MongoDB Reader

  • Mongo Cosmos DB Reader

  • ADLS Writer (Gen1 and Gen2)

  • Azure Blob Writer

  • Azure Event Hub Writer

  • Cosmos DB Writer

  • File Writer

  • GCS Writer

  • Google PubSub Writer

  • HDFS Writer

  • JMS Writer

  • Kafka Writer

  • Kinesis Writer

  • MapR FS Writer

  • MapR Stream Writer

  • MongoDB Cosmos DB Writer

  • MongoDB Writer

  • S3 Writer

Schema migration

Some of Striim's writers require you to create tables corresponding to the source tables in the target. Some initial load templates will automate this task. See Creating apps using templates for details.

Striim Platform also provides a script that can automate some of that work. See Using the schema conversion utility for details.Using the schema conversion utility

Mapping and filtering

The simplest pipeline applications simply replicate the data from the source tables to target tables with the same names, column names, and data types. If your requirements are more complex, see the following:

Schema evolution

For some CDC sources, Striim can capture DDL changes. Depending on the target, it can replicate those changes to the target tables, or take other actions, such as quiescing or halting the application, For mote information, see Handling schema evolution:

Initial load versus continuous replication

Typically, setting up a data pipeline occurs in two phases.

The first step is the initial load, copying all existing data from the source to the target. You may write a Striim application or use a third-party tool for this step. If the source and target are homogenous (for example, MySQL to MariaDB, Oracle to Oracle Exadata, or SQL Server to Azure SQL Server managed instance), it its be fastest and easiest to use the native copy or backup-restore tools. If you use Striim, foreign key constraints must be disabled in the target before beginning initial load and re-enabled after initial load is complete (see, for example, Learn > SQL > SQL Server > Disable foreign key constraints with INSERT and UPDATE statements).

Depending on the amount and complexity of data in the source tables, initial load may take minutes, hours, days, or weeks. You may monitor progress by Creating a data validation dashboard.Creating a data validation dashboard

Once the initial load is complete, you will start the Striim pipeline application to pick up where the initial load left off. See Switching from initial load to continuous replication for technical details.

Monitoring your pipeline

You may monitor your pipeline by Creating a data validation dashboard.Creating a data validation dashboard

You should also set up alerts to let you know if anything goes wrong. See Sending alerts about servers and applications.Sending alerts about servers and applications

Setting up alerts for your pipeline

System alerts for potential problems are automatically enabled. You may also create custom alerts. For more information. (see Sending alerts about servers and applications).Sending alerts about servers and applications

Scaling up for better performance

When a single reader can not keep up with the data being added to your source, create multiple readers. Use the Tables property to distribute tables among the readers:

  • Assign each table to only one reader.

  • When tables are related (by primary or foreign key) or to ensure transaction integrity among a set of tables, assign them all to the same reader.

  • When dividing tables among readers, distribute them according to how busy they are rather than simply by the number of tables. For example, if one table generates 50% of the entries in the CDC log, you might assign it and any related tables to one reader and all the other tables to another.

The following is a simple example of how you could use two Oracle Readers, with one reading a very busy table and the other reading the rest of the tables in the same schema:

CREATE SOURCE OracleSource1 USING OracleReader ( 
  FetchSize: 1,
  Compression: false,
  Username: 'myname',
  Password: '7ip2lhUSP0o=',
  ConnectionURL: '198.51.100.15:1521:orcl',
  ReaderType: 'LogMiner',
  Tables: 'MYSCHEMA.VERYBUSYTABLE'
) 
OUTPUT TO OracleSourcre_ChangeDataStream;

CREATE SOURCE OracleSource2 USING OracleReader ( 
  FetchSize: 1,
  CommittedTransactions: true,
  Compression: false,
  Username: 'myname',
  Password: '7ip2lhUSP0o=',
  ConnectionURL: '198.51.100.15:1521:orcl',
  ReaderType: 'LogMiner',
  Tables: 'MYSCHEMA.%',
  ExcludedTables: 'MYSCHEMA.VERYBUSYTABLE'
) 
OUTPUT TO OracleSourcre_ChangeDataStream;

When a single writer can not keep up with the data it is receiving from the source (that is, when it is backpressured), create multiple writers. For many writers, you can simply use the Parallel Threads property to create additional instances and Striim will automatically distribute data among them (see Creating multiple writer instances). For other writers, use the same approach as for sources, described above.