Skip to main content

Redshift Writer

Writes to one or more table(s) in a Amazon Redshift store via an Amazon S3 staging area.

Before you create a RedshiftWriter target, we suggest you first create an S3Writer for the staging area (see S3 Writer) and verify that Striim can write to it. We recommend that the Redshift cluster's zone be in the same region as the S3 bucket (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html). If it is not, you must set the S3region property.

After the data has been written to Redshift, the files in the S3 bucket are moved to a subdirectory called archive. They are not deleted automatically, so you should periodically delete them. This may be automated (see https://aws.amazon.com/code/Amazon-S3/943).

Specify either the access key and secret access key or an IAM role.

Redshift Writer properties

property

type

default value

notes

Access Key ID

String

an AWS access key ID (created on the AWS Security Credentials page) for a user with read permission on the S3 bucket (leave blank if using an IAM role)

Bucket Name

String

the S3 bucket name

Column Delimiter

String

| (UTF-8 007C)

The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the | character, change the default value to a sequence of characters that will not appear in the data.

Connection URL

String

copy this from the JDBC URL field on the AWS Dashboard cluster-details page for the target cluster

Conversion Params

String

Optionally, specify one or more of the following Redshift Data Conversion Parameters, separated by commas:

  • ACCEPTINVCHARS=”<replacement character”

  • BLANKSASNULL

  • DATEFORMAT="<format string>"

  • EMPTYASNULL

  • ENCODING=<encoding type>

  • EXPLICIT_IDS

  • FILLRECORD

  • IGNOREBLANKLINES

  • IGNOREHEADER=<number of rows>

  • NULL AS="<string>"

  • ROUNDEC

  • TIMEFORMAT="<format string>"

  • TRIMBLANKS

  • TRUNCATECOLUMNS

For example, ConversionParams: 'IGNOREHEADER=2, NULL AS="NULL", ROUNDEC'

Mode

String

incremental

With an input stream of a user-defined type, do not change the default. See Replicating Oracle data to Amazon Redshift for more information.

Parallel Threads

Integer

See Creating multiple writer instances.

Password

encrypted password

the password for the Redshift user

Quote Character

String

" (UTF-8 0022)

The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain ", change the default value to a sequence of characters that will not appear in the data.

S3 IAM Role

String

an AWS IAM role with read write permission on the bucket (leave blank if using an access key)

S3 Region

String

If the S3 staging area is in a different AWS region (not recommended), specify it here (see AWS Regions and Endpoints). Otherwise, leave blank.

Secret Access Key

encrypted password

the secret access key for the S3 staging area

Tables

String

The name(s) of the table(s) to write to. The table(s) must exist in Redshift and the user specified in Username must have insert permission.

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the % wildcard only for tables, not for schemas or databases. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (schema.%,schema.%) but in three parts when the source is Oracle Reader or OJet ((database.schema.%,schema.%). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (database.schema.%,schema.%) but in two parts when the source is MS SQL Reader or MS Jet (schema.%,schema.%). Examples:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

See Replicating Oracle data to Amazon Redshift for an example.

Upload Policy

String

eventcount:10000, interval:5m

see S3 Writer

Username

String

a Redshift user

The staging area in S3 will be created at the path <bucketname> / <namespace> / <target input stream type> / <table name>.

Redshift Writer sample application

The following describes use of RedshiftWriter with an input stream of a user-defined type. When the input is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source, see Replicating Oracle data to Amazon Redshift.

The following example would write to a table called MyTable:

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
    positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;
 
CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;
 
CREATE TARGET testRedshiftTarget USING RedshiftWriter(
  ConnectionURL: 'jdbc:redshift://mys3bucket.c1ffd5l3urjx.us-west-2.redshift.amazonaws.com:5439/dev',
  Username:'mys3user',
  Password:'******',
  bucketname:'mys3bucket',
/* for striimuser */
  accesskeyid:'********************',
  secretaccesskey:'****************************************',
  Tables:'mytable'
)
INPUT FROM PosSource_TransformedStream;

If this application were deployed to the namespace RS1, the staging area in S3 would be mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable.

The target table must match PosSource_TransformedStream_Type:

create table mytable(
MerchantId char(35),
DateTime timestamp,
AuthAmount float,
Zip char(5));

After the data is written to Redshift, the intermediate files will be moved to mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable / archive.

Redshift data type correspondence

Striim data type

Redshift data type

java.lang.Boolean

BOOLEAN

java.lang.Double

DOUBLE PRECISION

java.lang.Float

REAL

java.lang.Integer

INTEGER

java.lang.Long

BIGINT

java.lang.Short

SMALLINT

java.lang.String

CHAR or VARCHAR

org.joda.time.DateTime

TIMESTAMP