Databricks Writer programmer's reference
Databricks Writer properties
property | type | default value | notes |
---|---|---|---|
Authentication Type | enum | PersonalAccessToken | Appears in Flow Designer only when Connection Profile is False. The simplest way to use Microsoft Entra ID (OAuth) is to create a connection profile (see Introducing connection profiles). Alternatively, select Manual OAuth and follow the instructions in Configuring Microsoft Entra ID (formerly Azure Active Directory) for Databricks Writer manually. In that case, specify Client ID, Client Secret, Refresh Token, and Tenant ID. With the default setting PersonalAccessToken, Striim's connection to Databricks is authenticated using the token specified in Personal Access Token. |
CDDL Action | enum | Process | See Handling schema evolution. If TRUNCATE commands may be entered in the source and you do not want to delete events in the target, precede the writer with a CQ with the select statement |
Client ID | string | Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property. | |
Client Secret | encrypted password | Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property. | |
Connection Profile Name | Enum | Appears in Flow Designer only when Use Connection Profile is True. See Introducing connection profiles. | |
Connection Retry Policy | String | initialRetryDelay=10s, retryDelayMultiplier=2, maxRetryDelay=1m, maxAttempts=5, totalTimeout=10m | Do not change unless instructed to by Striim support. |
Connection URL | String | Appears in Flow Designer only when Connection Profile is False. Provide the JDBC URL from the JDBC/ODBC tab of the Databricks cluster's Advanced options (see Get connection details for a cluster). If the URL starts with | |
External Stage Connection Profile Name | enum | Appears in Flow Designer only when Use Connection Profile is True and External Stage Type is ADLSGen2 or S3. Select or specify the name of the connection profile for the external stage. (When Databricks Writer uses a connection profile, you must use a connection profile for ADLSGen2 or S3 as well.) | |
External Stage Type | enum |
| Set to ADLSGen2 or S3 to match the stage type you chose in Choose which staging area to use. NotePersonal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage). |
Ignorable Exception Code | String | Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information. Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE). | |
Mode | enum | AppendOnly | Set to Merge if that was your choice in Choose which writing mode to use. |
Optimized Merge | Boolean | false | Appears in Flow Designer only when Mode is Merge. Set to True only when Mode is MERGE and the target's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source and the source events will include partial records. For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates. When the source events will always include full records, leave this set to False. |
Parallel Threads | Integer | Not supported when Mode is Merge. | |
Personal Access Token | encrypted password | Appears in Flow Designer only when Connection Profile is False and Authentication Type is Personal Access Token. Used to authenticate with the Databricks cluster (see Generate a personal access token). The user associated with the token must have read and write access to DBFS (see Important information about DBFS permissions). If table access control has been enabled, the user must also have MODIFY and READ_METADATA (see Data object privileges - Data governance model). | |
Personal Staging User Name | String | Personal staging locations have been deprecated by AWS (see Create metastore-level storage) and Microsoft (see Create metastore-level storage). | |
Refresh Token | encrypted password | Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property. The token expires in 90 days, after which the application will halt. To avoid that, use a connection profile (see Introducing connection profiles), which will allow you to update the token without stopping the application. Alternatively, prior to expiry stop the application and update the token. | |
Stage Location | String |
| |
Tables | String | The name(s) of the table(s) to write to. The table(s) must exist in the database. Specify target table names as When the target's input stream is a user-defined event, specify a single table. The only special character allowed in target table names is underscore ( When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the source.emp,target_database.emp source_schema.%,target_catalog.target_database.% source_database.source_schema.%,target_database.% source_database.source_schema.%, target_catalog.target_database.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as See Mapping columns and Defining relations between source and target using ColumnMap and KeyColumns for additional options. | |
Tenant ID | String | Appears in Flow Designer only when Connection Profile is False and Authentication Type is Manual OAuth. This property is required when Manual OAuth is selected as the value of the Authentication Type property. | |
Upload Policy | String | eventcount:100000, interval:60s | The upload policy may include eventcount and/or interval (see Setting output names and rollover / upload policies for syntax). Buffered data is written to the storage account every time any of the specified values is exceeded. With the default value, data will be written every 60 seconds or sooner if the buffer contains 100,000 events. When the app is quiesced, any data remaining in the buffer is written to the storage account; when the app is undeployed, any data remaining in the buffer is discarded. |
Use Connection Profile | Boolean | False | Set to True to use a connection profile instead of specifying the connection properties here. See Introducing connection profiles. |
Azure Data Lake Storage (ADLS) Gen2 properties for Databricks Writer
To use an ADLS Gen2 container as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.
property | type | default value | notes |
---|---|---|---|
Azure Account Access Key | encrypted password | When Authentication Type is set to ServiceAccountKey, specify the account access key from Storage accounts > <account name> > Access keys. When Authentication Type is set to AzureAD, this property is ignored in TQL and not displayed in the Flow Designer. | |
Azure Account Name | String | the name of the Azure storage account for the blob container | |
Azure Container Name | String | striim-deltalakewriter-container | the blob container name from Storage accounts > <account name> > Containers If it does not exist, it will be created. |
Amazon S3 properties for Databricks Writer
To use an Amazon S3 bucket as your staging area, your Databricks instance should be using Databricks Runtime 11.0 or later.
property | type | default value | notes |
---|---|---|---|
S3 Access Key | String | an AWS access key ID (created on the AWS Security Credentials page) for a user with read and write permissions on the bucket | |
S3 Bucket Name | String | striim-deltalake-bucket | Specify the S3 bucket to be used for staging. If it does not exist, it will be created. |
S3 Region | String | us-west-1 | the AWS region of the bucket |
S3 Secret Access Key | encrypted password | the secret access key for the access key |
Databricks Writer connection profile properties
Setting Databricks Entra ID properties
In the Create or Edit Connection Profile dialog, select Databricks as the endpoint type.
Click Sign in using Entra ID and log in with an Entra ID account that has the following permissions:
USE CATALOG and CREATE SCHEMA on the Databricks catalog that will contain the target schema (see Learn / Azure Databricks documentation / Create schemas)
CAN USE permission on the Databricks workspace (see Learn / Azure Databricks documentation / Monitor and manage access to personal access tokens)
CAN ATTACH TO on the workspace's Compute (see Learn / Azure Databricks documentation / Security / Access control lists)
Once you log in successfully, close the browser window (if you do not, sign-in will fail), and return to the connection profile page.
Connection URL: specify the JDBC URL from the JDBC/ODBC tab of the Databricks cluster's Advanced options (see Get connection details for a cluster). If the URL starts with jdbc:spark://
change that to jdbc:databricks://
.
Once you have signed in using Entra ID and specified the connection URL, test the connection.
Setting Databricks Personal Access Token properties
Personal Access Token: See Generate a personal access token). The user associated with the token must have read and write access to DBFS (see Important information about DBFS permissions). If table access control has been enabled, the user must also have MODIFY and READ_METADATA (see Data object privileges - Data governance model). When you update this property with a new token, applications using this connection profile will automatically switch to the new token when the old one expires.
Connection URL: specify the JDBC URL from the JDBC/ODBC tab of the Databricks cluster's Advanced options (see Get connection details for a cluster). If the URL starts with
jdbc:spark://
change that tojdbc:databricks://
.
Databricks Writer data type support and correspondence
TQL type | Delta Lake type |
---|---|
java.lang.Byte | binary |
java.lang.Double | double |
java.lang.Float | float |
java.lang.Integer | int |
java.lang.Long | bigint |
java.lang.Short | smallint |
java.lang.String | string |
org.joda.time.DateTime | timestamp |
For additional data type mappings, see Data type support & mapping for schema conversion & evolution.