Striim 3.10.3 documentation

Parquet Formatter

Formats a writer's output for use by Apache Parquet and generates one or more schema files. See Supported writer-formatter combinations.

Notes:

• Encryption Policy can not be set for the associated writer.

• Data written using Parquet Formatter can not be consumed until the target file is closed (rolls over).

property

type

default value

notes

Block Size

Long

128000000

Sets the parquet.block.size property in Parquet.

Compression Type

String

UNCOMPRESSED

Optionally, specify the target's compression format. Supported types are GZIP, LZO, and SNAPPY.

Format As

String

Default

With the Default setting, a single schema file is created.

• With an input stream of a user-defined type, the output includes the name and value of each field.

• With an input stream of type WAEvent (from any source), the output includes all contents of the event excluding dataPresenceBitMap, beforePresenceBitMap, and typeUUID.

The two other settings are supported only with an input stream of type WAEvent from a Database Reader, Incremental Batch Reader, or SQL CDC reader source. A dynamic directory, folder, or bucket name must be specified in the writer (see Setting output names and rollover / upload policies).

A schema file with a timestamp appended to its name is created in each directory, folder, or bucket. With S3 Writer, if both the bucket name and folder name are dynamic, each combination of bucket and folder will have its own schema file. If there is a DDL change in the source, a new schema file is created and the output file(s) rolls over.

• When Format As is set to Native, the output includes all contents of the WAEvent except for typeUUID.

• When Format As is set to Table, the output includes only the column names and values.

See Parquet Formatter examples for sample schema files and output for each setting.

Members

String

Optionally:

• With an input stream of type WAEvent, specify a comma-separated list of elements to include in the output.

• With an input stream of type WAEvent from a Database Reader, Incremental Batch Reader, or SQL CDC reader source, specify additional elements (for example, from the METADATA or USERDATA maps) to include in addition to the data array values. See Parquet Formatter examples for a sample schema file and output.

Schema File Name

String

The fully qualified name of the Parquet schema file Striim will create when the application runs. When a dynamic directory is specified in the writer, Striim in some cases writes the files in the target directories and/or appends a timestamp to the file names. See the notes for Format As for more details.