Striim 3.9.6 documentation

Redshift Writer

Writes to one or more table(s) in a Amazon Redshift store via an Amazon S3 staging area. Before writing to RedShift, its JDBC driver must have been installed as described in Installing the Redshift JDBC driver.

Before you create a RedshiftWriter target, we suggest you first create an S3Writer for the staging area (see S3 Writer) and verify that Striim can write to it. We recommend that the Redshift cluster's zone be in the same region as the S3 bucket (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html). If it is not, you must set the S3region property.

After the data has been written to Redshift, the files in the S3 bucket are moved to a subdirectory called archive. They are not deleted automatically, so you should periodically delete them. This may be automated (see https://aws.amazon.com/code/Amazon-S3/943).

Specify either the access key and secret access key or an IAM role.

property

type

default value

notes

Access Key ID

java.lang.String

an AWS access key ID (created on the AWS Security Credentials page) for a user with read permission on the S3 bucket (leave blank if using an IAM role)

Bucket Name

java.lang.String

the S3 bucket name

Column Delimiter

java.lang.String

| (UTF-8 007C)

The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the | character, change the default value to a sequence of characters that will not appear in the data.

Connection URL

java.lang.String

copy this from the JDBC URL field on the AWS Dashboard cluster-details page for the target cluster

Conversion Params

java.lang.String

Optionally, specify one or more of the following Redshift Data Conversion Parameters, separated by commas:

  • ACCEPTINVCHARS=”<replacement character”

  • BLANKSASNULL

  • DATEFORMAT="<format string>"

  • EMPTYASNULL

  • ENCODING=<encoding type>

  • EXPLICIT_IDS

  • FILLRECORD

  • IGNOREBLANKLINES

  • IGNOREHEADER=<number of rows>

  • NULL AS="<string>"

  • ROUNDEC

  • TIMEFORMAT="<format string>"

  • TRIMBLANKS

  • TRUNCATECOLUMNS

For example, ConversionParams: 'IGNOREHEADER=2, NULL AS="NULL", ROUNDEC'

Mode

java.lang.String

incremental

With an input stream of a user-defined type, do not change the default. See Replicating Oracle data to Amazon Redshift for more information.

Password

com.webaction. security.Password

the password for the Redshift user

Quote Character

java.lang.String

" (UTF-8 0022)

The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain ", change the default value to a sequence of characters that will not appear in the data.

S3 IAM Role

java.lang.String

an AWS IAM role with read write permission on the bucket (leave blank if using an access key)

S3 Region

java.lang.String

If the S3 staging area is in a different AWS region (not recommended), specify it here (see AWS Regions and Endpoints). Otherwise, leave blank.

Secret Access Key

com.webaction. security.Password

the secret access key for the S3 staging area

Tables

java.lang.String

The name(s) of the table(s) to write to. The table(s) must exist in Redshift and the user specified in Username must have insert permission.

When the input stream of the RedshiftWriter target is the output of a CDC reader or DatabaseReader source, RedshiftWriter can write to multiple tables. In this case, specify the names of both the source and target tables. You may use wildcards for the table names, but not for the schema or database. For example:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

See Replicating Oracle data to Amazon Redshift for an example.

Upload Policy

java.lang.String

eventcount:10000, interval:5m

see S3 Writer

Username

java.lang.String

a Redshift user

The staging area in S3 will be created at the path <bucketname> / <namespace> / <target input stream type> / <table name>.

The following describes use of RedshiftWriter with an input stream of a user-defined type. When the input is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source, see Replicating Oracle data to Amazon Redshift.

The following example would write to a table called MyTable:

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
    positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;
 
CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;
 
CREATE TARGET testRedshiftTarget USING RedshiftWriter(
  ConnectionURL: 'jdbc:redshift://mys3bucket.c1ffd5l3urjx.us-west-2.redshift.amazonaws.com:5439/dev',
  Username:'mys3user',
  Password:'******',
  bucketname:'mys3bucket',
/* for striimuser */
  accesskeyid:'********************',
  secretaccesskey:'****************************************',
  Tables:'mytable'
)
INPUT FROM PosSource_TransformedStream;

If this application were deployed to the namespace RS1, the staging area in S3 would be mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable.

The target table must match PosSource_TransformedStream_Type:

create table mytable(
MerchantId char(35),
DateTime timestamp,
AuthAmount float,
Zip char(5));

After the data is written to Redshift, the intermediate files will be moved to mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable / archive.

Redshift data type correspondence

Striim data type

Redshift data type

java.lang.Boolean

BOOLEAN

java.lang.Double

DOUBLE PRECISION

java.lang.Float

REAL

java.lang.Integer

INTEGER

java.lang.Long

BIGINT

java.lang.Short

SMALLINT

java.lang.String

CHAR or VARCHAR

org.joda.time.DateTime

TIMESTAMP