Striim 3.10.3 documentation

Enhancing your Oracle to Kafka application

Once you have your basic Oracle to Kafka application, you may modify it in many ways. The following are a few of the most common:

Including DDL in the output

DDL may be included by changing OracleReader's DictionaryMode property to OnlineCatalog. See Including DDL operations in Oracle Reader output and Handling DDL events when replicating Oracle data.

Using the Forwarding Agent

The Striim Forwarding Agent is a stripped-down version of the Striim server that can be used to run sources and queries locally on the Oracle host. See Striim Forwarding Agent installation and configuration for more information.

To make your application deployable to an agent, split it into two flows:

CREATE APPLICATION Oracle2Kafka RECOVERY 5 SECOND INTERVAL;

CREATE FLOW AgentFlow;
CREATE SOURCE OracleSource USING OracleReader (
FetchSize: 1,
CommittedTransactions: true,
Compression: false,
ConnectionURL: '198.51.100.15:1521:orcl',
DictionaryMode: 'OnlineCatalog',
Tables: 'MYSCHEMA.%'
)
OUTPUT TO OracleSourcre_ChangeDataStream;
END FLOW AgentFlow;

CREATE FLOW ServerFlow;
CREATE TARGET KafkaTarget USING KafkaWriter VERSION '0.8.0' (
Mode: 'Sync',
Topic: 'MyTopic',
)
FORMAT USING AvroFormatter ( schemaFileName: 'MySchema.avro' )
INPUT FROM OracleSourcre_ChangeDataStream;
END FLOW ServerFlow;

END APPLICATION Oracle2Kafka;

With the default deployment group names, you would deploy this using the command:

DEPLOY APPLICATION Oracle2Kafka with AgentFlow in agent, ServerFlow in default;


See Using the Striim Forwarding Agent and Managing deployment groups for more information.

If Oracle writes to its CDC log faster than your template-generated application can read it, you can increase throughput by using multiple instances of OracleReader. Even if deployed on the same server this can increase performance, since each can be run on a different core.

Use the Tables property to distribute tables among the OracleReaders:

• Assign each table to only one OracleReader.

• When tables are related (by primary or foreign key) or to ensure transaction integrity among a set of tables, assign them all to the same OracleReader.

• When dividing tables among OracleReaders, distribute them according to how busy they are rather than simply by the number of tables. For example, if one table generates 50% of the entries in the CDC log, you might assign it and any related tables to one OracleReader and all the other tables to another.

The following is a simple example of how you could modify the basic application example above to use two OracleReaders, using one to read a very busy table and the other to read the rest of the tables in the same schema:

CREATE SOURCE OracleSource1 USING OracleReader (
FetchSize: 1,
Compression: false,
ConnectionURL: '198.51.100.15:1521:orcl',
Tables: 'MYSCHEMA.VERYBUSYTABLE'
)
OUTPUT TO OracleSourcre_ChangeDataStream;

CREATE SOURCE OracleSource2 USING OracleReader (
FetchSize: 1,
CommittedTransactions: true,
Compression: false,
ConnectionURL: '198.51.100.15:1521:orcl',
Tables: 'MYSCHEMA.%',
ExcludedTables: 'MYSCHEMA.VERYBUSYTABLE'
)
OUTPUT TO OracleSourcre_ChangeDataStream;
Writing to multiple Kafka topic partitions

Use KafkaWriter's PartitionKey property to choose a field from the input stream, METADATA map (see OracleReader WAEvent fields, or USERDATA map (see Adding user-defined data to WAEvent streams) to be used to distribute events among multiple partitions. For example, assuming the sixth element (counting from zero) in the WAEvent data array is a city name:

CREATE AddUserData
INSERT INTO OracleSourceWithPartitionKey
SELECT putUserData(x, 'city', data[5])
FROM OracleSourcre_ChangeDataStream x;

CREATE TARGET KafkaTarget USING KafkaWriter VERSION '0.8.0' (
Mode: 'Sync',
Topic: 'MyTopic',
PartitionKey:'@userdata(city)'
) input from UserDataDemoStream;
Scaling up KafkaWriter

If OracleReader sends CDC events faster than KafkaWriter can write them, you can increase throughput by using the ParallelThreads property to create multiple instances of KafkaWriter. Even if deployed on the same server this can increase performance, since each can be run on a different core.

Warning

Use ParallelThreads only when the target is not able to keep up with incoming events (that is, when its input stream is backpressured). Otherwise, the overhead imposed by additional threads could reduce the application's performance.

The following is a simple example of how you could modify the basic application example above to use four KafkaWriters:

CREATE TARGET KafkaTarget USING KafkaWriter VERSION '0.8.0' (
Mode: 'Sync',
Topic: 'MyTopic',

All four KafkaWriters will use the same properties. If the application is deployed ON ALL to a deployment group with multiple servers, each server will have four KafkaWriters.