Kudu Writer

Writes to Apache Kudu 1.4 or later.

Kudu Writer properties

property	type	default value	notes
Batch Policy	String	EventCount:1000, Interval:30	The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to `EventCount:1,Interval:0`. With the default setting, events will be sent every 30 seconds or sooner if the buffer accumulates 1000 events. Each batch may include events for only one table. When writing to multiple tables, the current batch will be sent and a new one started every time an event is received for a different table.
Checkpoint Table	String	CHKPOINT	A table with the specified value will be created automatically in Kudu and used by Striim for internal purposes.
Connection Retry Policy	String	retryInterval=30, maxRetries=3	With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds (`retryInterval`. If the second attempt is unsuccessful, in 30 seconds it will try a third time (`maxRetries`). If that is unsuccessful, the adapter will fail and log an exception. Negative values are not supported.
Ignorable Exception	String		By default, if the target returns an error, KuduWriter terminates the application. Use this property to specify errors to ignore, separated by commas. Supported values are ALREADY_PRESENT and NOT_FOUND. For example, to ignore ALREADY_PRESENT and NOT_FOUND errors, you would specify: IgnorableExceptionCode: 'ALREADY_PRESENT,NOT_FOUND' Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE).
Kudu Client Config	String		Specify the master address, socket read timeout, and operation timeout properties for Kudu. For example: master.addresses->192.168.56.101:7051; socketreadtimeout->10000; operationtimeout->30000 In a high availability environment, specify multiple master addresses, separated by commas. For example: master.addresses->192.168.56.101:7051, 192.168.56.102:7051; ...
Parallel Threads	Integer		See Creating multiple writer instances (parallel threads).
PK Update Handling Mode	String	ERROR	This property controls how KuduWriter will handle events that update the primary key, which is not supported by Kudu. With the default setting of ERROR, the application will terminate. Set to IGNORE to ignore such events and continue. Set to DELETEANDINSERT to drop the existing row and insert the one with the updated primary key. When using this setting, the Compression property in the CDC reader must be set to False.
Tables	String		The name(s) of the table(s) to write to. The table(s) must exist in Kudu and the user specified in Username must have access. Table names are case-sensitive. The columns must have only supported data types as described below. When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the `%` wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (`schema.%,schema.%`) but in three parts when the source is Oracle Reader or OJet ((`database.schema.%,schema.%`). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (`database.schema.%,schema.%`) but in two parts when the source is MS SQL Reader or MS Jet (`schema.%,schema.%`). Examples: source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as `<schema name>.<table name>` for MySQL and Oracle and as `<database name>.<schema name>.<table name>` for SQL Server. Primary key columns must be first in Kudu (see Known Issues and Limitations), so you may need to map columns if the source table columns are not in the same order (see Mapping columns).
Update As Upsert	Boolean	False	With the default value of False, if an update fails, KuduWriter will terminate. When set to True, if an update fails, KuduWriter will insert the row instead. Do not set to True when a source table has no primary key.

Kudu Writer sample application

The following TQL will replicate data for the specified tables from Oracle to Kudu:

CREATE SOURCE OracleCDCIn USING OracleReader (
  Username:'striim',
  Password:'passwd',
  ConnectionURL:'203.0.113.49:1521:orcl',
  Tables:'MYSCHEMA.NAME,MYSCHEMA.DEPT'
)
OUTPUT TO OracleCDCStream;

CREATE TARGET KuduOut USING KuduWriter(
  KuduClientConfig:"master.addresses->203.0.113.88:7051;
    socketreadtimeout->10000;operationtimeout->30000",
  Tables: 'MYSCHEMA.NAME,name;MYSCHEMA.DEPT,dept'
INPUT FROM OracleCDCStream;

Kudu Writer data type support and correspondence

Columns in target tables must use only the following supported data types.

If using Cloudera's Kudu, see Apache Kudu Schema Design and Apache Kudu Usage Limitations.

Striim data type	Kudu data type
java.lang.Byte[]	binary
java.lang.Double	double
java.lang.Float	float
java.lang.Integer	int32, int64
java.lang.Long	int64
java.lang.Short	int16
java.lang.String	string
org.joda.time.DateTime	unixtime_micros

When the input stream for a KuduwriterTarget is the output of an OracleReader source, the following combinations are supported:

Oracle type	Kudu type
BINARY_DOUBLE	double
BINARY_FLOAT	float
BLOB	binary, string
CHAR	string
CHAR(1)	bool
CLOB	string
DATE	unixtime_micros
DEC	float
DECIMAL	float
FLOAT	float
INT	int32
INTEGER	int32
LONG	int64, string
NCHAR	string
NUMBER	int64
NUMBER(1,0)	bool
NUMBER(10)	int64
NUMBER(19,0)	int64
NUMERIC	float
NVARCHAR2	string
SMALLINT	int16
TIMESTAMP	unixtime_micros
TIMESTAMP WITH LOCAL TIME ZONE	unixtime_micros
TIMESTAMP WITH TIME ZONE	unixtime_micros
VARCHAR2	string

When the input stream for a KuduwriterTarget is the output of an MSSQLReader source, the following combinations are supported:

SQL Server type	Kudu type
bigint	int64
bit	bool or int64
char	string
date	unixtime_micros
datetime	unixtime_micros
datetime2	unixtime_micros
decimal	float
float	double or float
image	binary
int	int64
money	double
nchar	string
ntext	string
numeric	float
nvarchar	string
nvarchar(max)	string
real	float
smalldatetime	unixtime_micros
smallint	int64
smallmoney	double
text	string
tinyint	int64
varbinary	binary
varbinary(max)	binary
varchar	string
varchar(max)	string
xml	string

Kudu Writer

Kudu Writer properties

Kudu Writer sample application

Kudu Writer data type support and correspondence

Search results