Redshift Writer

Writes to one or more table(s) in a Amazon Redshift store via an Amazon S3 staging area.

Before you create a RedshiftWriter target, we suggest you first create an S3Writer for the staging area (see S3 Writer) and verify that Striim can write to it. We recommend that the Redshift cluster's zone be in the same region as the S3 bucket (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html). If it is not, you must set the S3region property.

After the data has been written to Redshift, the files in the S3 bucket are moved to a subdirectory called archive. They are not deleted automatically, so you should periodically delete them. This may be automated (see https://aws.amazon.com/code/Amazon-S3/943).

Specify either the access key and secret access key or an IAM role.

Redshift Writer properties

property	type	default value	notes
Access Key ID	String		an AWS access key ID (created on the AWS Security Credentials page) for a user with read permission on the S3 bucket (leave blank if using an IAM role)
Bucket Name	String		the S3 bucket name
Column Delimiter	String	\| (UTF-8 007C)	The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the `\|` character, change the default value to a sequence of characters that will not appear in the data.
Connection URL	String		copy this from the JDBC URL field on the AWS Dashboard cluster-details page for the target cluster
Conversion Params	String		Optionally, specify one or more of the following Redshift Data Conversion Parameters, separated by commas: ACCEPTINVCHARS=”<replacement character” BLANKSASNULL DATEFORMAT="<format string>" EMPTYASNULL ENCODING=<encoding type> EXPLICIT_IDS FILLRECORD IGNOREBLANKLINES IGNOREHEADER=<number of rows> NULL AS="<string>" ROUNDEC TIMEFORMAT="<format string>" TRIMBLANKS TRUNCATECOLUMNS For example, `ConversionParams: 'IGNOREHEADER=2, NULL AS="NULL", ROUNDEC'`
Mode	String	incremental	With an input stream of a user-defined type, do not change the default. See Replicating Oracle data to Amazon Redshift for more information.
Parallel Threads	Integer		See Creating multiple writer instances (parallel threads).
Password	encrypted password		the password for the Redshift user
Quote Character	String	" (UTF-8 0022)	The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain `"`, change the default value to a sequence of characters that will not appear in the data.
S3 IAM Role	String		an AWS IAM role with read write permission on the bucket (leave blank if using an access key)
S3 Region	String		If the S3 staging area is in a different AWS region (not recommended), specify it here (see AWS Regions and Endpoints). Otherwise, leave blank.
Secret Access Key	encrypted password		the secret access key for the S3 staging area
Tables	String		The name(s) of the table(s) to write to. The table(s) must exist in Redshift and the user specified in Username must have insert permission. When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the `%` wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (`schema.%,schema.%`) but in three parts when the source is Oracle Reader or OJet ((`database.schema.%,schema.%`). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (`database.schema.%,schema.%`) but in two parts when the source is MS SQL Reader or MS Jet (`schema.%,schema.%`). Examples: source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% See Replicating Oracle data to Amazon Redshift for an example.
Upload Policy	String	eventcount:10000, interval:5m	see S3 Writer
Username	String		a Redshift user

The staging area in S3 will be created at the path <bucketname> / <namespace> / <target input stream type> / <table name>.

Redshift Writer sample application

The following describes use of RedshiftWriter with an input stream of a user-defined type. When the input is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source, see Replicating Oracle data to Amazon Redshift.

The following example would write to a table called MyTable:

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
    positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;
 
CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;
 
CREATE TARGET testRedshiftTarget USING RedshiftWriter(
  ConnectionURL: 'jdbc:redshift://mys3bucket.c1ffd5l3urjx.us-west-2.redshift.amazonaws.com:5439/dev',
  Username:'mys3user',
  Password:'******',
  bucketname:'mys3bucket',
/* for striimuser */
  accesskeyid:'********************',
  secretaccesskey:'****************************************',
  Tables:'mytable'
)
INPUT FROM PosSource_TransformedStream;

If this application were deployed to the namespace RS1, the staging area in S3 would be mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable.

The target table must match PosSource_TransformedStream_Type:

create table mytable(
MerchantId char(35),
DateTime timestamp,
AuthAmount float,
Zip char(5));

After the data is written to Redshift, the intermediate files will be moved to mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable / archive.

Redshift data type correspondence

Striim data type	Redshift data type
java.lang.Boolean	BOOLEAN
java.lang.Double	DOUBLE PRECISION
java.lang.Float	REAL
java.lang.Integer	INTEGER
java.lang.Long	BIGINT
java.lang.Short	SMALLINT
java.lang.String	CHAR or VARCHAR
org.joda.time.DateTime	TIMESTAMP

Redshift Writer

Redshift Writer properties

Redshift Writer sample application

Redshift data type correspondence

Search results