Building a pipeline with ADLS Reader
You can read from Azure Data Lake Storage using the ADLS Reader and write to any target supported by Striim. Typically, you will set up pipelines in two phases—initial load, followed by continuous incremental replication—as explained in Pipelines.
For initial load, run ADLS Reader to process existing objects that match your Object Filter (for example, all objects under a folder) and emit parsed events to the target.
After initial load has completed, start continuous replication by reading objects created or updated after the start of incremental processing, using the configured Object Detection Mode and Polling Interval.
Using an automated pipeline wizard: if you want to build pipelines from ADLS to a supported target, we recommend using an automated pipeline wizard with an ADLS Reader source. These wizards typically:
Create two applications: one for initial load and the other for continuous incremental replication (polling). ADLS Reader can also handle both within a single application.
Prompt you to select the parser (for example, Parquet, JSON, CSV/DSV) and set object filters.
Configure or create structures at the target (for example, tables, files, or streams), as appropriate for the selected writer.
Run the initial-load application to process existing objects in the container.
On completion of initial load, run or switch to the incremental application to detect and process new or updated objects at the configured polling interval.
Not using an automated pipeline wizard: if your use case or policies do not allow using an automated pipeline, create separate applications for initial load and continuous replication:
Before performing initial load, decide on your Object Detection Mode (ADLS Directory Listing or Log Analytics) and Object Filter, and confirm parser settings for your file formats.
Create and configure the target (schema/tables, file sinks, or other targets) to receive parsed events. If your target requires merge/upsert semantics (to avoid duplicates under A1P), enable them.
Perform the initial load by running ADLS Reader to process existing objects. If you need a specific starting point, set Start Timestamp accordingly.
Switch to continuous replication by enabling periodic polling. For Log Analytics mode, verify the workspace is emitting the required logs.
Replicate new data using ADLS Reader with the chosen detection mode and Polling Interval. Configure the target for idempotent merge behavior where appropriate.
Alternatively, instead of using wizards, you can create applications using Flow Designer, TQL, or Striim’s REST API.
Pre-requisite - The initial setup and configuration you do for ADLS Reader are described in the Initial setup section.
Create an ADLS Reader application using the Flow Designer
This procedure outlines how to use Striim’s Flow Designer to build and configure data pipelines. Flow Designer enables you to visually create applications with minimal or no coding.
Go to the Apps page in the Striim UI and click Start from scratch.
Provide the Name and Namespace for your app. The namespace helps organize related apps.
In the components panel, expand Sources, and search for ADLS Reader.
Drag the ADLS Reader source onto the canvas and open its properties.
Configure properties such as AccountName, Tenant ID, Client ID, Client Secret, Container, Object Detection Mode, Object Filter, Polling Interval, and Start Timestamp (optional). For Log Analytics, set the Log Analytics Workspace ID.
Attach a Parser (for example, JSON, DSV, Avro, Parquet). For Parquet, the reader downloads files before parsing.
Add and configure your Target (database, warehouse, file writer, etc.). Enable upsert/merge semantics on the target if duplicates are possible.
Click Save, then Deploy and Start the application to begin data flow.
Create an ADLS Reader application using TQL
The following are sample applications for ADLS Reader.
Sample application: Directory Listing mode
CREATE OR REPLACE APPLICATION ADLSTest; CREATE OR REPLACE SOURCE ADLS USING Global.ADLSReader ( ConnectionRetryPolicy: 'retryInterval=30, maxRetries=3', ClientId: 'ClientId', adapterName: 'ADLSReader', AccountName: 'exampleazuredatalakegen2', ClientSecret_encrypted: 'true', LogAnalyticsWorkspaceId: 'LogAnalyticsWorkspaceId', ClientSecret: 'ClientSecret', TenantId: 'TenantId', ObjectDetectionMode: 'ADLSDirectoryListing', PollingInterval: 5000, ObjectFilter: '*', Container: 'new-container', ProcessSubFolder: false, SubFolderPath: '' ) PARSE USING Global.JSONParser ( handler: 'com.webaction.proc.JSONParser_1_0', parserName: 'JSONParser' ) OUTPUT TO ADLSOut; CREATE TARGET Sysout USING Global.SysOut ( name: 'sysout' ) INPUT FROM ADLSOut;END APPLICATION ADLSTest;
Sample application: Log Analytics mode
CREATE OR REPLACE APPLICATION ADLSTest; CREATE OR REPLACE SOURCE ADLS USING Global.ADLSReader ( ConnectionRetryPolicy: 'retryInterval=30, maxRetries=3', ClientId: 'ClientId', adapterName: 'ADLSReader', AccountName: 'exampleazuredatalakegen2', ClientSecret_encrypted: 'true', LogAnalyticsWorkspaceId: 'LogAnalyticsWorkspaceId', ClientSecret: 'ClientSecret', TenantId: 'TenantId', ObjectDetectionMode: 'LogAnalytics', PollingInterval: 5000, ObjectFilter: '*', Container: 'new-container', ProcessSubFolder: false, SubFolderPath: '' ) PARSE USING Global.JSONParser ( handler: 'com.webaction.proc.JSONParser_1_0', parserName: 'JSONParser' ) OUTPUT TO ADLSOut; CREATE TARGET Sysout USING Global.SysOut ( name: 'sysout' ) INPUT FROM ADLSOut;END APPLICATION ADLSTest;