Pipelines

As discussed in Common Striim use cases, Striim applications can do many different things. When the primary purpose of an application is to move or copy data from a source to a target, we call that a "pipeline" application. For an introduction to the subject, see What is a Data Pipeline.

Common source-target combinations

The following examples are just the most popular among Striim's customers. There are many other possibilities.

Database to database, for example, from MySQL, Oracle, or SQL Server to MariaDb, PostgreSQL, or Spanner in the cloud. See the Change Data Capture (CDC) for a full list of supported sources and Targets for a full list of supported targets.
The most common use for this kind of pipeline is to allow a gradual migration from on-premise to cloud. Applications built on top of the on-premise database can be gradually replaced with new applications built on the cloud database. Once all the legacy applications are replaced, the pipeline can be shut down and the on-premise database can be retired.
In this model, updates, and delete operations on the the source tables are replicated to the target with no duplicates or missing data (that is, "exactly once processing or E1P"). This consistency is ensured even after events such as a server crash require restarting the application (see Recovering applications).Recovering applications
Database to data warehouse, for example, from Oracle, PostgreSQL, or SQL Server (on premise or in the cloud) to Google BigQuery, Amazon Redshift, Azure Synapse, or Snowflake. See Sources for a full list of supported sources and Targets for a full list of supported targets.
The primary use for this kind of pipeline is to update data warehouses with new data in near real time rather than in periodic batches.
Typically data warehouses retain all data so that business intelligence reports can be generated from historical data. Consequently, when rows are updated or deleted in the source tables, instead of overwriting the old data in the target Striim appends a record of the update or delete operation. Striim ensures that all data is replicated to the target, though after events such as a server crash require restarting the application there may be duplicates in the target (that is, "at least once processing" or A1P).

Supported sources and targets for pipeline apps

The following sources and targets may be directly connected by a WAEvent stream.

Supported WAEvent sources	Supported targets
ADLS Reader Cosmos DB Reader GCS Reader Google Ads Reader HP NonStop SQL/MX using Database Reader or Incremental Batch Reader HP NonStop Enscribe, SQL/MP, and SQL/MX readers (CDC) HubSpot Reader Intercom Reader Jira Reader MariaDB Reader (CDC) MariaDB using Database Reader or Incremental Batch Reader Mongo Cosmos DB Reader MySQL Reader (CDC) MySQL using Database Reader or Incremental Batch Reader Oracle Reader (CDC) OJet Oracle Database using Database Reader or Incremental Batch Reader PostgreSQL Reader (CDC) PostgreSQL using Database Reader or Incremental Batch Reader Salesforce CDC Reader Salesforce Pardot Reader ServiceNow Reader (in this release, supports insert and update operations only, not deletes) Snowflake Reader SQL Server using MSJet (CDC) SQL Server CDC using MS SQL Reader (CDC) SQL Server using Database Reader or Incremental Batch Reader Stripe Reader Sybase using Database Reader or Incremental Batch Reader Teradata using Database Reader or Incremental Batch Reader Yugabyte Reader (CDC) YugabyteDB using Database Reader or Incremental Batch Reader Zendesk Reader	Azure Synapse using Azure SQL DWH Writer BigQuery Writer Cassandra Cosmos DB Writer Cassandra Writer Cloudera Hive Writer Cosmos DB Writer Databricks Writer Fabric Data Warehouse Writer Fabric Lakehouse File Writer Hazelcast Writer HBase Writer HP NonStop SQL/MX using Database Writer Hortonworks Hive Writer Kafka Writer Kudu Writer MariaDB using Database Writer Microsoft Dataverse Writer Mongo Cosmos DB Writer MongoDB Writer MySQL using Database Writer Oracle Database using Database Writer PostgreSQL using Database Writer Redshift Writer Salesforce Marketing Cloud Writer Salesforce Writer (in MERGE mode) SAP HANA using Database Writer ServiceNow Writer Singlestore (MemSQL) using Database Writer Snowflake Writer Spanner Writer SQL Server using Database Writer

Supported WAEvent sources

Supported targets

ADLS Reader
Cosmos DB Reader
GCS Reader
Google Ads Reader
HP NonStop SQL/MX using Database Reader or Incremental Batch Reader
HP NonStop Enscribe, SQL/MP, and SQL/MX readers (CDC)
HubSpot Reader
Intercom Reader
Jira Reader
MariaDB Reader (CDC)
MariaDB using Database Reader or Incremental Batch Reader
Mongo Cosmos DB Reader
MySQL Reader (CDC)
MySQL using Database Reader or Incremental Batch Reader
Oracle Reader (CDC)
OJet
Oracle Database using Database Reader or Incremental Batch Reader
PostgreSQL Reader (CDC)
PostgreSQL using Database Reader or Incremental Batch Reader
Salesforce CDC Reader
Salesforce Pardot Reader
ServiceNow Reader (in this release, supports insert and update operations only, not deletes)
Snowflake Reader
SQL Server using MSJet (CDC)
SQL Server CDC using MS SQL Reader (CDC)
SQL Server using Database Reader or Incremental Batch Reader
Stripe Reader
Sybase using Database Reader or Incremental Batch Reader
Teradata using Database Reader or Incremental Batch Reader
Yugabyte Reader (CDC)
YugabyteDB using Database Reader or Incremental Batch Reader
Zendesk Reader

Azure Synapse using Azure SQL DWH Writer
BigQuery Writer
Cassandra Cosmos DB Writer
Cassandra Writer
Cloudera Hive Writer
Cosmos DB Writer
Databricks Writer
Fabric Data Warehouse Writer
Fabric Lakehouse File Writer
Hazelcast Writer
HBase Writer
HP NonStop SQL/MX using Database Writer
Hortonworks Hive Writer
Kafka Writer
Kudu Writer
MariaDB using Database Writer
Microsoft Dataverse Writer
Mongo Cosmos DB Writer
MongoDB Writer
MySQL using Database Writer
Oracle Database using Database Writer
PostgreSQL using Database Writer
Redshift Writer
Salesforce Marketing Cloud Writer
Salesforce Writer (in MERGE mode)
SAP HANA using Database Writer
ServiceNow Writer
Singlestore (MemSQL) using Database Writer
Snowflake Writer
Spanner Writer
SQL Server using Database Writer

The following sources and targets may be directly connected by a JSONNodeEvent stream.

Supported JSONNodeEvent sources	Supported targets
Cosmos DB Reader JMX Reader MongoDB Reader Mongo Cosmos DB Reader	ADLS Gen2 Writer Azure Blob Writer Azure Event Hub Writer Cosmos DB Writer File Writer GCS Writer Google PubSub Writer HDFS Writer JMS Writer Kafka Writer Kinesis Writer MapR FS Writer MapR Stream Writer MongoDB Cosmos DB Writer MongoDB Writer S3 Writer

Schema migration

Some of Striim's writers require you to create tables corresponding to the source tables in the target. Some initial load wizards will automate this task. See Creating apps using wizards for details.

Striim Platform also provides a script that can automate some of that work. See Using the schema conversion utility for details.Using the schema conversion utility

Mapping and filtering

The simplest pipeline applications simply replicate the data from the source tables to target tables with the same names, column names, and data types. If your requirements are more complex, see the following:

Schema evolution

For some CDC sources, Striim can capture DDL changes. Depending on the target, it can replicate those changes to the target tables, or take other actions, such as quiescing or halting the application, For mote information, see Handling schema evolution:

Initial load versus continuous replication

Typically, setting up a data pipeline occurs in two phases.

The first step is the initial load, copying all existing data from the source to the target to create an initial, point-in-time copy of the source dataset, a process that is also called "historical sync" or "initial snapshot." You may write a Striim application or use a third-party tool for this step. If the source and target are homogenous (for example, MySQL to MariaDB, Oracle to Oracle Exadata, or SQL Server to Azure SQL Server managed instance), it its be fastest and easiest to use the native copy or backup-restore tools. If you use Striim, foreign key constraints must be disabled in the target before beginning initial load and re-enabled after initial load is complete (see, for example, Learn > SQL > SQL Server > Disable foreign key constraints with INSERT and UPDATE statements).

Depending on the amount and complexity of data in the source tables, initial load may take minutes, hours, days, or weeks. You may monitor progress by Creating a data validation dashboard.Creating a data validation dashboard

Once the initial load is complete, you will start the Striim pipeline application to pick up where the initial load left off. See Switching from initial load to continuous replication for technical details.

Monitoring your pipeline

You may monitor your pipeline using any of the options discussed in the Monitoring Guide or, for some adapters, by Creating a data validation dashboard.Creating a data validation dashboard

You should also set up alerts to let you know if anything goes wrong. See Getting alerts about servers and applications.

Setting up alerts for your pipeline

System alerts for potential problems are automatically enabled. You may also create custom alerts. For more information. (see Getting alerts about servers and applications).

Scaling up for better performance

When a single reader can not keep up with the data being added to your source, create multiple readers. Use the Tables property to distribute tables among the readers:

Assign each table to only one reader.
When tables are related (by primary or foreign key) or to ensure transaction integrity among a set of tables, assign them all to the same reader.
When dividing tables among readers, distribute them according to how busy they are rather than simply by the number of tables. For example, if one table generates 50% of the entries in the CDC log, you might assign it and any related tables to one reader and all the other tables to another.

The following is a simple example of how you could use two Oracle Readers, with one reading a very busy table and the other reading the rest of the tables in the same schema:

CREATE SOURCE OracleSource1 USING OracleReader ( 
  FetchSize: 1,
  Compression: false,
  Username: 'myname',
  Password: '7ip2lhUSP0o=',
  ConnectionURL: '198.51.100.15:1521:orcl',
  ReaderType: 'LogMiner',
  Tables: 'MYSCHEMA.VERYBUSYTABLE'
) 
OUTPUT TO OracleSourcre_ChangeDataStream;

CREATE SOURCE OracleSource2 USING OracleReader ( 
  FetchSize: 1,
  CommittedTransactions: true,
  Compression: false,
  Username: 'myname',
  Password: '7ip2lhUSP0o=',
  ConnectionURL: '198.51.100.15:1521:orcl',
  ReaderType: 'LogMiner',
  Tables: 'MYSCHEMA.%',
  ExcludedTables: 'MYSCHEMA.VERYBUSYTABLE'
) 
OUTPUT TO OracleSourcre_ChangeDataStream;

When a single writer can not keep up with the data it is receiving from the source (that is, when it is backpressured), create multiple writers. For many writers, you can simply use the Parallel Threads property to create additional instances and Striim will automatically distribute data among them (see Creating multiple writer instances (parallel threads)). For other writers, use the same approach as for sources, described above.