Skip to main content

Kudu Writer

Writes to Apache Kudu 1.4 or later.

Kudu Writer properties

property

type

default value

notes

Batch Policy

String

EventCount:1000, Interval:30

The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to EventCount:1,Interval:0.

With the default setting, events will be sent every 30 seconds or sooner if the buffer accumulates 1000 events.

Each batch may include events for only one table. When writing to multiple tables, the current batch will be sent and a new one started every time an event is received for a different table.

Checkpoint Table

String

CHKPOINT

A table with the specified value will be created automatically in Kudu and used by Striim for internal purposes.

Connection Retry Policy

String

retryInterval=30, maxRetries=3

With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds (retryInterval. If the second attempt is unsuccessful, in 30 seconds it will try a third time (maxRetries). If that is unsuccessful, the adapter will fail and log an exception. Negative values are not supported.

Ignorable Exception

String

By default, if the target returns an error, KuduWriter terminates the application. Use this property to specify errors to ignore, separated by commas. Supported values are ALREADY_PRESENT and NOT_FOUND.

For example, to ignore ALREADY_PRESENT and NOT_FOUND errors, you would specify:

IgnorableExceptionCode: 'ALREADY_PRESENT,NOT_FOUND'

Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE).

Kudu Client Config

String

Specify the master address, socket read timeout, and operation timeout properties for Kudu. For example:

master.addresses->192.168.56.101:7051;
socketreadtimeout->10000;
operationtimeout->30000

In a high availability environment, specify multiple master addresses, separated by commas. For example:

master.addresses->192.168.56.101:7051, 
192.168.56.102:7051; ...

Parallel Threads

Integer

See Creating multiple writer instances.

PK Update Handling Mode

String

ERROR

This property controls how KuduWriter will handle events that update the primary key, which is not supported by Kudu.

  • With the default setting of ERROR, the application will terminate.

  • Set to IGNORE to ignore such events and continue.

  • Set to DELETEANDINSERT to drop the existing row and insert the one with the updated primary key. When using this setting, the Compression property in the CDC reader must be set to False.

Tables

String

The name(s) of the table(s) to write to. The table(s) must exist in Kudu and the user specified in Username must have access. Table names are case-sensitive. The columns must have only supported data types as described below.

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the % wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (schema.%,schema.%) but in three parts when the source is Oracle Reader or OJet ((database.schema.%,schema.%). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (database.schema.%,schema.%) but in two parts when the source is MS SQL Reader or MS Jet (schema.%,schema.%). Examples:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

Primary key columns must be first in Kudu (see Known Issues and Limitations), so you may need to map columns if the source table columns are not in the same order (see Mapping columns).

Update As Upsert

Boolean

False

With the default value of False, if an update fails, KuduWriter will terminate. When set to True, if an update fails, KuduWriter will insert the row instead. Do not set to True when a source table has no primary key.

Kudu Writer sample application

The following TQL will replicate data for the specified tables from Oracle to Kudu:

CREATE SOURCE OracleCDCIn USING OracleReader (
  Username:'striim',
  Password:'passwd',
  ConnectionURL:'203.0.113.49:1521:orcl',
  Tables:'MYSCHEMA.NAME,MYSCHEMA.DEPT'
)
OUTPUT TO OracleCDCStream;

CREATE TARGET KuduOut USING KuduWriter(
  KuduClientConfig:"master.addresses->203.0.113.88:7051;
    socketreadtimeout->10000;operationtimeout->30000",
  Tables: 'MYSCHEMA.NAME,name;MYSCHEMA.DEPT,dept'
INPUT FROM OracleCDCStream;

Kudu Writer data type support and correspondence

Columns in target tables must use only the following supported data types.

If using Cloudera's Kudu, see Apache Kudu Schema Design and Apache Kudu Usage Limitations.

Striim data type

Kudu data type

java.lang.Byte[]

binary

java.lang.Double

double

java.lang.Float

float

java.lang.Integer

int32, int64

java.lang.Long

int64

java.lang.Short

int16

java.lang.String

string

org.joda.time.DateTime

unixtime_micros

When the input stream for a KuduwriterTarget is the output of an OracleReader source, the following combinations are supported:

Oracle type

Kudu type

BINARY_DOUBLE

double

BINARY_FLOAT

float

BLOB

binary, string

CHAR

string

CHAR(1)

bool

CLOB

string

DATE

unixtime_micros

DEC

float

DECIMAL

float

FLOAT

float

INT

int32

INTEGER

int32

LONG

int64, string

NCHAR

string

NUMBER

int64

NUMBER(1,0)

bool

NUMBER(10)

int64

NUMBER(19,0)

int64

NUMERIC

float

NVARCHAR2

string

SMALLINT

int16

TIMESTAMP

unixtime_micros

TIMESTAMP WITH LOCAL TIME ZONE

unixtime_micros

TIMESTAMP WITH TIME ZONE

unixtime_micros

VARCHAR2

string

When the input stream for a KuduwriterTarget is the output of an MSSQLReader source, the following combinations are supported:

SQL Server type

Kudu type

bigint

int64

bit

bool or int64

char

string

date

unixtime_micros

datetime

unixtime_micros

datetime2

unixtime_micros

decimal

float

float

double or float

image

binary

int

int64

money

double

nchar

string

ntext

string

numeric

float

nvarchar

string

nvarchar(max)

string

real

float

smalldatetime

unixtime_micros

smallint

int64

smallmoney

double

text

string

tinyint

int64

varbinary

binary

varbinary(max)

binary

varchar

string

varchar(max)

string

xml

string