Striim 3.9.6 documentation

Kudu Writer

Writes to Apache Kudu 1.4 or later.

property

type

default value

notes

Batch Policy

java. lang. String

EventCount:1000, Interval:30

The batch policy includes eventcount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to EventCount:1,Interval:0.

With the default setting, events will be sent every 30 seconds or sooner if the buffer accumulates 1000 events.

Each batch may include events for only one table. When writing to multiple tables, the current batch will be sent and a new one started every time an event is received for a different table.

Checkpoint Table

java. lang. String

CHKPOINT

A table with the specified value will be created automatically in Kudu and used by Striim for internal purposes.

Connection Retry Policy

java. lang. String

retryInterval=30, maxRetries=3

The connection retry policy includes retryInterval and maxRetries. With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds (retryInterval. If the second attempt is unsuccessful, in 30 seconds it will try a third time (maxRetries). If that is unsuccessful, the adapter will fail and log an exception. Negative values are not supported.

Ignorable Exception

java. lang. String

By default, if the target returns an error, KuduWriter crashes the application. Use this property to specify errors to ignore, separated by commas. Supported values are ABORTED, ALREADY_PRESENT, CONFIGURATION_ERROR, CORRUPTION, END_OF_FILE, ILLEGAL_STATE, INCOMPLETE, INVALID_ARGUMENT, IO_ERROR, NETWORK_ERROR, NOT_AUTHORIZED, NOT_FOUND, NOT_SUPPORTED, REMOTE_ERROR, RUNTIME_ERROR, SERVICE_UNAVAILABLE, TIMED_OUT, and UNINITIALIZED.

For example, to ignore ALREADY_PRESENT and NOT_FOUND errors, you would specify:

IgnorableExceptionCode: 'ALREADY_PRESENT,NOT_FOUND'

When an ignorable exception occurs, Striim will write an "Ignoring VendorExceptionCode" message to the log. Alternatively, to capture the ignored exceptions, Writing exceptions to a WActionStore.

Kudu Client Config

java. lang. String

Specify the master address, socket read timeout, and operation timeout properties for Kudu. For example:

master.addresses->192.168.56.101:7051;
socketreadtimeout->10000;
operationtimeout->30000

In a high availability environment, specify multiple master addresses, separated by commas. For example:

master.addresses->192.168.56.101:7051, 
192.168.56.102:7051; ...

PK Update Handling Mode

java. lang. String

ERROR

This property controls how KuduWriter will handle events that update the primary key, which is not supported by Kudu.

  • With the default setting of ERROR, the application will crash.

  • Set to IGNORE to ignore such events and continue.

  • Set to DELETEANDINSERT to drop the existing row and insert the one with the updated primary key. When using this setting, the Compression property in the CDC reader must be set to False.

Tables

java. lang. String

The name(s) of the table(s) to write to. The table(s) must exist in Kudu and the user specified in Username must have access. Table names are case-sensitive. The columns must have only supported data types as described below.

When the input stream of a KuduWriter target is the output of a CDC reader or DatabaseReader source, KuduWriter can write to multiple tables. For example:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

Primary key columns must be first in Kudu (see Known Issues and Limitations), so you may need to map columns if the source table columns are not in the same order (see Mapping columns).

Update As Upsert

java. lang. Boolean

False

With the default value of False, if an update fails, KuduWriter will crash. When set to True, if an update fails, KuduWriter will insert the row instead. Do not set to True when a source table has no primary key.

The following TQL will replicate data for the specified tables from Oracle to Kudu:

CREATE SOURCE OracleCDCIn USING OracleReader (
  Username:'striim',
  Password:'passwd',
  ConnectionURL:'203.0.113.49:1521:orcl',
  Tables:'MYSCHEMA.NAME,MYSCHEMA.DEPT'
)
OUTPUT TO OracleCDCStream;

CREATE TARGET KuduOut USING KuduWriter(
  KuduClientConfig:"master.addresses->203.0.113.88:7051;
    socketreadtimeout->10000;operationtimeout->30000",
  Tables: "MYSCHEMA.NAME,name;MYSCHEMA.DEPT,dept"
INPUT FROM OracleCDCStream;
KuduWriter data type support and correspondence

Columns in target tables must use only the following supported data types.

If using Cloudera's Kudu, see Apache Kudu Schema Design and Apache Kudu Usage Limitations.

Striim data type

Kudu data type

java.lang.Byte[]

binary

java.lang.Double

double

java.lang.Float

float

java.lang.Integer

int32, int64

java.lang.Long

int64

java.lang.Short

int16

java.lang.String

string

org.joda.time.DateTime

unixtime_micros

When the input stream for a KuduwriterTarget is the output of an OracleReader source, the following combinations are supported:

Oracle type

Kudu type

BINARY_DOUBLE

double

BINARY_FLOAT

float

BLOB

binary, string

CHAR

string

CHAR(1)

bool

CLOB

string

DATE

unixtime_micros

DEC

float

DECIMAL

float

FLOAT

float

INT

int32

INTEGER

int32

LONG

int64, string

NCHAR

string

NUMBER

int64

NUMBER(1,0)

bool

NUMBER(10)

int64

NUMBER(19,0)

int64

NUMERIC

float

NVARCHAR2

string

SMALLINT

int16

TIMESTAMP

unixtime_micros

TIMESTAMP WITH LOCAL TIME ZONE

unixtime_micros

TIMESTAMP WITH TIME ZONE

unixtime_micros

VARCHAR2

string

When the input stream for a KuduwriterTarget is the output of an MSSQLReader source, the following combinations are supported:

SQL Server type

Kudu type

bigint

int64

bit

bool or int64

char

string

date

unixtime_micros

datetime

unixtime_micros

datetime2

unixtime_micros

decimal

float

float

double or float

image

binary

int

int64

money

double

nchar

string

ntext

string

numeric

float

nvarchar

string

nvarchar(max)

string

real

float

smalldatetime

unixtime_micros

smallint

int64

smallmoney

double

text

string

tinyint

int64

varbinary

binary

varbinary(max)

binary

varchar

string

varchar(max)

string

xml

string