Key considerations
Topic & partitions mapping
Data can be distributed to one or many topics, each with one or many partitions. You can configure Kafka Writer's to achieve the data distribution based on the various scenarios.
Use cases can be categorized into four groups based on the number of topics and partitions. The following table summarizes the writer behavior and corresponding use cases.
Number of topics | One Partition | Multiple Partitions |
|---|---|---|
Single Topic | All source data to one target topic, partition 0. Choose this option when you want to preserve the order of events from a source which is unpartitioned. | All source data to one target topic with multiple partitions. If the events from the source are of the same category but might belong to a different subcategory this setting can be used. If the data is from an unpartitioned source the order of data will vary since it will be distributed to multiple partitions. The order of data is preserved only within a single partition. |
Multiple Topics | Wildcard mapping or explicit mapping of one or multiple source entities to multiple target topics. Choose this option:
| After distributing source data to multiple topics, they can be partitioned based on a subcategory. This setting is appropriate mostly when the downstream application is running some analysis based on the categories of data. |
Message semantics and retries
For message delivery, the writer supports exactly-once-processing and at-least-once processing. You can decide on the appropriate behavior based on your use case.
E1P (exactly-once processing)
In this mode, the writer ensures exactly-once delivery semantics. It avoids duplicates even on retries and supports atomic writes to multiple partitions. This is safer but slower due to overhead. This is the default behavior for the writer.
To achieve E1P, Kafka transactions functionality is used along with checkpointing information about each transaction stored into a checkpointing topic. The number of events that is added to a transaction is based on the Commit Policy.
If the source is OLTP, the transaction is committed when either the commit policy is triggered or DDL is received, whichever occurs first.. For a non-OLTP source, the events from the source will spread across multiple target transactions based on the Commit Policy configuration.
To support E1P, recovery must be enabled for the Kafka Writer application and the following Kafka producer configurations must be set:
enable.idempotence=true acks=all
A1P (at-least-once processing)
At-least once processing guarantees that every message is processed one or more times, but never lost. Choose this mode if you are fine with duplicates or don't want the overhead of maintaining the checkpointing topic.
Retries
Within the writer, two levels of retry happen:
Internal retry managed by Kafka Producer Client
Writer-level retry to handle credentials rotation and for retriable exception codes
In either case ordering of data should be intact until the user changes the above mentioned producer configurations. Messages will be in order regardless of whether Exactly Once Processing is True or False.
For more information, see the notes for Connection Retry in Kafka Writer properties.
Kafka message design
Each incoming event is converted into a Kafka message. Each Kafka message also called as a record has an optional header, and a key and value.
Message Header
A Kafka message has a header, Key and a payload. The payload of a message is traditionally for the business object and headers are traditionally used for transport routing, filtering etc. Headers are of form <key>,<value> pairs. Kafka Writer provides a way to customize the header value. This configuration is common across messages written to different topics-partitions. By default the Messages will be published without a Kafka header.
Users can define the message header to have dynamic values for each Kafka Message by referencing fields from data (only if the incoming stream contains typed events), metadata, or user data.
Users can add a static Message Header to be added to each Kafka message.
If multiple message headers are required, use the respective UI widget to add the values or provided as a semicolon-separated value via TQL.
Events Supported: Applies to all incoming event types (JSONNodeEvent, WAEvent, AvroEvent, TypedEvent) and formatters (DSV, XML, AVRO and JSON formatters).
In custom mode, data is stored in Kafka headers as string-based key-value pairs.
Kafka Message Key
The Kafka message payload has a key, value and timestamp. The key part of the kafka message is used for multiple purposes like for log compaction, distribution, downstream applications to create queries based on the key part of the message payload.
The multi-topic Kafka writer provides three configurations for defining how the Kafka message key is constructed. These configurations are common across the messages written to different topics-partitions.
None: Messages will be published without a Kafka message key. This is the default behavior of the writer.
Custom: Users can define the message key to have dynamic values for each Kafka Message by referencing fields from data (only if the incoming stream contains typed events), metadata, or user data.
Primary Key: The message key is automatically constructed using primary key column values from the source table. This is supported only if the source is OLTP. If no primary key columns are defined in the source, all columns will be treated as primary key columns and will be used to construct the message key.
Message Keys can be serialized using one of the following formatters: JSON, Avro, DSV, XML.
Message Payload
Each incoming event’s contents like data, metadata, user data and a few more fields (depending on the “Members” configuration of the respective formatter) is formatted and added to the Kafka message value. By default Striim uses Striim Serializer. For Avro Formatters, this can be set to Confluent Serializer via the Serializer property.
Kafka Writer can work with Avro, DSV, JSON and XML formatters. The message value type and format will be formatted based on the formatter. In case of DSV, JSON, XML formatters, Kafka messages get formatted very similar to the legacy Kafka Writer in Async mode.
Schema tracking
In case of CSV, JSON, XML, there is no schema attached to the events produced as Kafka messages. In the case of Avro Formatter, an Avro record is generated corresponding to the incoming event (based on the “formatAs” configuration) and it will have respective schemas registered into the schema registry. Avro formatter supports WAEvent, Typed Events, JSONNodeEvent, Avro Events.