Striim 3.9.8 documentation

Creating multiple writer instances

Some writers have a Parallel Threads property, which in some circumstances may allow you to create multiple instances for better performance. For example:

CREATE TARGET KWSample USING KafkaWriter VERSION '0.9.0' (
  brokeraddress:'localhost:9092',
  topic:'test',
  ParallelThreads:'4',
  PartitionKey:'merchantId'
)
FORMAT USING DSVFormatter ()
INPUT FROM TypedCSVStream;

This would create four instances of KafkaWriter with identical settings. Each instance would run in its own thread, increasing throughput. If KWSample were deployed ON ALL to multiple servers, each server would run four instances, so the total number of KafkaWriter instances would be the ParallelThreads value time the number of servers.

Warning

Use ParallelThreads only when the target is not able to keep up with incoming events (that is, when its input stream is backpressured). Otherwise, the overhead imposed by additional threads could reduce the application's performance.

When you can use parallel threads to improve performance depends on the writer.

writer

event distribution

limitations

notes

  • CassandraCosmosDBWriter

  • CosmosDBWriter

  • DatabaseWriter

  • HBaseWriter

  • KuduWriter

  • MaprDBWriter

Events are evenly distributed among the writer instances in round-robin fashion. All instances may write to all target tables.

Enabling recovery for the application disables parallel threads.

Use only for initial load, not for continuous replication.

  • AzureDWHWriter

  • BigqueryWriter

  • HiveWriter

  • RedShiftWriter

  • SnowflakeWriter

Events are distributed among the writer instances based on the target table name. Each target table will be written to by only one of the instances.

Enabling recovery for the application disables parallel threads.

Use only for initial load, not for continuous replication.

Creating more instances than there are target tables will not improve performance. (Note that you may be able to improve performance by creating multiple Database Reader sources that read from different tables and all output to the same stream.)

  • GCSWriter

  • S3Writer

Events are distributed among the writer instances based on the target bucket name, directory name, and file name. Each file will be written to by only one of the instances.

You must use dynamic names for target buckets, directories, or files. Otherwise, parallel threads will not improve performance.

Creating more instances than there are target files will not improve performance.

KafkaWriter

Events are be distributed among the writer instances by the PartitionKey field value. Each target partition will be written to by only one of the instances.

Creating more instances than there are target partitions will not improve performance.