Skip to main content

ADLS Gen2 Writer

Note

ADLS Reader and ADLS Gen2 Writer read from and write to the same Azure Data Lake Storage (ADLS) service. Microsoft dropped "Gen2" from the name of the service after it retired the ADLS Gen1 service in February 2024.

Writes to files in an Azure Data Lake Storage file system. A common use case is to write data from on-premise sources to an ADLS staging area from which it can be consumed by Azure-based analytics tools.

When you create the ADLS storage account, set Storage account kind to StorageV2 and enable Hierarchical namespace.

ADLS Gen2 Writer properties

property

type

default value

notes

Account Name

String

the storage account name

Compression Type

String

Set to gzip when the input is in gzip format. Otherwise, leave blank.

Directory

String

The full path to the directory in which to write the files. See Setting output names and rollover / upload policies for advanced options.

File Name

String 

The base name of the files to be written. See Setting output names and rollover / upload policies.

File System Name

String

the name of the ADLS Gen2 file system (container) where the files will be written

Rollover on DDL

Boolean

True

Has effect only when the input stream is the output stream of a MySQLReader or OracleReader source. With the default value of True, rolls over to a new file when a DDL event is received. Set to False to keep writing to the same file.

SAS Token

encrypted password

The SAS token for a shared access signature for the storage account. Allowed services must include Blob, allowed resource types must include Object, and allowed permissions must include Write and Create. Remove the ? from the beginning of the SAS token. Note that SAS tokens have an expiration date. See Best practices when using SAS.

Upload Policy

String

eventcount:10000, interval:5m

See Setting output names and rollover / upload policies. Keep these settings low enough that individual uploads do not exceed the underlying Microsoft REST API's limit of 100 MB for a single operation.

For best performance, Microsoft recommends uploads between 4 and 16 MB. Setting UploadPolicy to filesize:16M will accomplish that. However, if there is a long gap between events, this will mean some events will not be written to ADLS for some time. For example, if Striim receives events only during working hours, the last events received at the end of the day on Friday would not be written until Monday morning.

When the app is stopped, any remaining data in the upload buffer is discarded.

This adapter has a choice of formatters. See Supported writer-formatter combinations for more information.Supported writer-formatter combinations

ADLS Gen2 Writer connection profile properties

Setting ADLS Access Key properties

In this release, an ADLS connection profile can be used only to set the external stage connection properties for Databricks Writer and Snowflake Writer.

  • Azure Account Access Key: Specify the account access key from Storage accounts > <account name> > Access keys. For more information see Azure / Learn / Storage / Manage storage account access keys.

  • Azure Account Name: Specify the name of the storage account.

  • Azure Container Name: Specify the name of the ADLS container (also called the "file system") to be used as the staging area. If it does not exist, it will be created automatically.

Setting ADLS Entra ID properties

Entra ID was formerly known as Azure Active Directory.

In this release, an ADLS connection profile can be used only to set the external stage connection properties for Databricks Writer and Snowflake Writer.

  • Azure Account Name: Specify the name of the storage account.

  • Azure Container Name: Specify the name of the ADLS container (also called the "file system") to be used as the staging area. If it does not exist, it will be created automatically.

After specifying the account and container names, click Sign in using Entra ID. Log in with an Entra ID organization (work) account that has the Storage Blob Data Contributor role on the storage account. This is the account Striim will use to access ADLS. Once you log in successfully, close the browser window, return to the connection profile page, and test the connection.

Setting ADLS SAS properties

In this release, an ADLS connection profile can be used only to set the external stage connection properties for Databricks Writer and Snowflake Writer.

  • Azure SAS: Specify the SAS token for a shared access signature for the storage account. If there is a ? at the beginning of the SAS token, remove it. For more information, see Learn / Azure / Storage / Grant limited access to Azure Storage resources using shared access signatures (SAS) and Best practices when using SAS.

    Allowed services must include Blob, allowed resource types must include Service, Container, and Object, allowed permissions must include Read, Write, Delete, List, Add, Create, Update, Process, and Permanent Delete, and "Immutable storage" must be deselected.

    Note that SAS tokens have an expiration date. When you update this property with a new token, applications using this connection profile will automatically switch to the new token when the old one expires.

  • Azure Account Name: Specify the name of the storage account.

  • Azure Container Name: Specify the name of the ADLS container (also called the "file system") to be used as the staging area. If it does not exist, it will be created automatically.

ADLS Gen2 Writer sample application

CREATE APPLICATION ADLSGen2Test;

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
  positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;

CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;

CREATE TARGET ADLSGen2Target USING ADLSGen2Writer (
  accountname:'mystorageaccount',
  sastoken:'********************************************',
  filesystemname:'myfilesystem',
  directory:'mydir',
  filename:'myfile.json',
  uploadpolicy: 'interval:15s'
)
FORMAT USING JSONFormatter ()
INPUT FROM PosSource_TransformedStream;

END APPLICATION ADLSGen2Test;