Kudu Writer
Writes to Apache Kudu 1.4 or later.
Kudu Writer properties
property | type | default value | notes |
---|---|---|---|
Batch Policy | String | EventCount:1000, Interval:30 | The batch policy includes eventCount and interval (see Setting output names and rollover / upload policiesfor syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to With the default setting, events will be sent every 30 seconds or sooner if the buffer accumulates 1000 events. Each batch may include events for only one table. When writing to multiple tables, the current batch will be sent and a new one started every time an event is received for a different table. |
Checkpoint Table | String | CHKPOINT | A table with the specified value will be created automatically in Kudu and used by Striim for internal purposes. |
Connection Retry Policy | String | retryInterval=30, maxRetries=3 | With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds ( |
Ignorable Exception | String | By default, if the target returns an error, KuduWriter terminates the application. Use this property to specify errors to ignore, separated by commas. Supported values are ALREADY_PRESENT and NOT_FOUND. For example, to ignore ALREADY_PRESENT and NOT_FOUND errors, you would specify: IgnorableExceptionCode: 'ALREADY_PRESENT,NOT_FOUND' Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE). | |
Kudu Client Config | String | Specify the master address, socket read timeout, and operation timeout properties for Kudu. For example: master.addresses->192.168.56.101:7051; socketreadtimeout->10000; operationtimeout->30000 In a high availability environment, specify multiple master addresses, separated by commas. For example: master.addresses->192.168.56.101:7051, 192.168.56.102:7051; ... | |
Parallel Threads | Integer | ||
PK Update Handling Mode | String | ERROR | This property controls how KuduWriter will handle events that update the primary key, which is not supported by Kudu.
|
Tables | String | The name(s) of the table(s) to write to. The table(s) must exist in Kudu and the user specified in Username must have access. Table names are case-sensitive. The columns must have only supported data types as described below. When the input stream of a KuduWriter target is the output of a CDC reader or DatabaseReader source, KuduWriter can write to multiple tables. For example: source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as Primary key columns must be first in Kudu (see Known Issues and Limitations), so you may need to map columns if the source table columns are not in the same order (see Mapping columns). | |
Update As Upsert | Boolean | False | With the default value of False, if an update fails, KuduWriter will terminate. When set to True, if an update fails, KuduWriter will insert the row instead. Do not set to True when a source table has no primary key. |
Kudu Writer sample application
The following TQL will replicate data for the specified tables from Oracle to Kudu:
CREATE SOURCE OracleCDCIn USING OracleReader ( Username:'striim', Password:'passwd', ConnectionURL:'203.0.113.49:1521:orcl', Tables:'MYSCHEMA.NAME,MYSCHEMA.DEPT' ) OUTPUT TO OracleCDCStream; CREATE TARGET KuduOut USING KuduWriter( KuduClientConfig:"master.addresses->203.0.113.88:7051; socketreadtimeout->10000;operationtimeout->30000", Tables: 'MYSCHEMA.NAME,name;MYSCHEMA.DEPT,dept' INPUT FROM OracleCDCStream;
Kudu Writer data type support and correspondence
Columns in target tables must use only the following supported data types.
If using Cloudera's Kudu, see Apache Kudu Schema Design and Apache Kudu Usage Limitations.
Striim data type | Kudu data type |
---|---|
java.lang.Byte[] | binary |
java.lang.Double | double |
java.lang.Float | float |
java.lang.Integer | int32, int64 |
java.lang.Long | int64 |
java.lang.Short | int16 |
java.lang.String | string |
org.joda.time.DateTime | unixtime_micros |
When the input stream for a KuduwriterTarget is the output of an OracleReader source, the following combinations are supported:
Oracle type | Kudu type |
---|---|
BINARY_DOUBLE | double |
BINARY_FLOAT | float |
BLOB | binary, string |
CHAR | string |
CHAR(1) | bool |
CLOB | string |
DATE | unixtime_micros |
DEC | float |
DECIMAL | float |
FLOAT | float |
INT | int32 |
INTEGER | int32 |
LONG | int64, string |
NCHAR | string |
NUMBER | int64 |
NUMBER(1,0) | bool |
NUMBER(10) | int64 |
NUMBER(19,0) | int64 |
NUMERIC | float |
NVARCHAR2 | string |
SMALLINT | int16 |
TIMESTAMP | unixtime_micros |
TIMESTAMP WITH LOCAL TIME ZONE | unixtime_micros |
TIMESTAMP WITH TIME ZONE | unixtime_micros |
VARCHAR2 | string |
When the input stream for a KuduwriterTarget is the output of an MSSQLReader source, the following combinations are supported:
SQL Server type | Kudu type |
---|---|
bigint | int64 |
bit | bool or int64 |
char | string |
date | unixtime_micros |
datetime | unixtime_micros |
datetime2 | unixtime_micros |
decimal | float |
float | double or float |
image | binary |
int | int64 |
money | double |
nchar | string |
ntext | string |
numeric | float |
nvarchar | string |
nvarchar(max) | string |
real | float |
smalldatetime | unixtime_micros |
smallint | int64 |
smallmoney | double |
text | string |
tinyint | int64 |
varbinary | binary |
varbinary(max) | binary |
varchar | string |
varchar(max) | string |
xml | string |