Striim 3.9.6 documentation

Setting KafkaWriter's mode property: sync versus async

KafkaWriter performs differently depending on whether the mode property value is sync or async and whether recovery (see Recovering applications) is enabled for the application. The four possibilities are:

notes

sync with recovery 

Provides the most accurate output. Events are written in order with no duplicates ("exactly-once processing," also known as E1P), provided that you do not change the partitioner logic, number of partitions, or IP address used by Striim while the application is stopped.

To avoid duplicate events after recovery, the Kafka topic's retention period must be longer than the amount of time that elapses before recovery is initiated (see Recovering applications) and, if writing to multiple partitions, the brokers must be brought up in reverse of the order in which they went down.

With this configuration, two KafkaWriter targets (even if in the same application) cannot write to the same topic. Instead, use a single KafkaWriter, a topic with multiple partitions (see Writing to multiple Kafka partitions), and, if necessary, parallel threads (see Creating multiple writer instances).

async with recovery

Provides higher throughput with the tradeoff that events are written out of order (unless using KafkaWriter version 2.1 with enable.idempotence set to true in KafkaConfig) and recovery may result in some duplicates ("at-least-once processing," also known as A1P).

sync without recovery

Appropriate with non-recoverable sources (see Recovering applications) when you need events to be written in order. Otherwise, async will give better performance.

async without recovery

Appropriate with non-recoverable sources when you don't care whether events are written in order. Throughput will be slightly faster than async with recovery.

When using sync, multiple events are batched in a single Kafka message. The number of messages in a batch is controlled by the batch.size parameter, which by default is 1 million bytes. The maximum amount of time KafkaWriter will wait between messages is set by the linger.ms parameter, which by default is 1000 milliseconds. Thus, by default, KafkaWriter will write a message after it has received a million bytes or one second has elapsed since the last message was written, whichever occurs first.

 Batch.size must be larger than the largest event KafkaWriter will receive, but must not exceed the max.message.bytes size in the Kafka topic configuration.

The following setting would write a message every time KafkaWriter has received 500,000 bytes or two seconds has elapsed since the last message was written:

KafkaConfig:'batch.size=500000,linger.ms=2000'