Skip to main content

Databricks Writer

Databricks Writer writes to Delta Lake tables in Databricks on AWS or Azure. Delta Lake is an open-source tabular storage framework that includes a transaction log to support features typically associated with relational databases, such as ACID transactions and optimistic concurrency control.

You can use Striim's Databricks Writer to write data from transactional databases such as Oracle and SQL Server, applications such as Salesforce and ServiceNow, NoSQL databases such as Cosmos DB and MongoDB, object stores such as Amazon S3 and Google Cloud Storage, and other supported sources to Delta Lake tables in Databricks on AWS or Azure.

Databricks Writer summary

Supported sources	Databricks Writer can write data from all sources supported by Striim.
Authentication	Azure Databricks: Databricks Writer authenticates its connection using a personal access token or Microsoft Entra (formerly Azure Active Directory). Databricks on AWS: Databricks Writer authenticates its connection using a personal access token.
Supported write modes	Databricks Writer supports two write modes: Merge: Records inserted, updated, or deleted from the source database(s) are inserted, updated, or deleted in Databricks, so the data in Databricks duplicates the data in the source database(s). Append Only: Insert, update, and delete operations in the source database(s) are all treated as inserts in Databricks. Thus, you can use Databricks to query old data that no longer exists in the source database(s), for example, for month-over-month or year-over-year reports.
Additional writing features	Supports auto-quiesce after an initial load from Cosmos DB Reader, Database Reader, Mongo Cosmos DB Reader, or MongoDB Reader. Supports schema evolution to detect and propagate DDL changes from supported sources to the BigQuery tables.
Supported staging areas	Databricks requires a staging area to temporarily hold new data while it is being written to tables. Databricks Writer supports the following staging areas: Azure Databricks: Azure Data Lake Storage Gen2 or Databricks File System Databricks on AWS: S3 or Databricks File System
Resilience and recovery	Supports connection retry to avoid application halting due to transient connection issues. Supports recovery with at-least-once processing (see Recovering applications).
Performance	Parallel threads (see Creating multiple writer instances (parallel threads)) can increase throughput to the target in certain situations.Creating multiple writer instances
Programmability	Flow Designer TQL wizards in the web UI to create applications from the following sources: Initial load with Auto Schema Conversion (using Database Reader) from BigQuery, MariaDB, MySQL, Oracle, PostgreSQL, Salesforce, Snowflake, or SQL Server CDC from MariaDB, MySQL, Oracle, PostgreSQL, Salesforce, Snowflake, or SQL Server ADLS Amazon S3 Google Ads Google Cloud Storage HDFS HubSpot Incremental Batch Reader Intercom Jira Salesforce ServiceNow Stripe Zendesk
Metrics and auditing	Key metrics are available through Striim's monitoring features (see Monitoring Guide).
drivers and other third-party libraries	Databricks Writer uses Databricks JDBC driver version 2.6.29. It also uses the following: for authentication using Azure Active Directory and staging in ADLS Gen2: azure-identity version 1.5.3 for staging in ADLS Gen2: azure-storage-blob version 12.18.0 for staging in DBFS: databricks-rest-client version 3.2.2 for staging in S3: aws-java-sdk-s3 version 1.12.589 and aws-java-sdk-sts version 1.11.320
Key limitations	Data is written in batch mode. Streaming mode is not supported in this release.

For more information, see:

For Databricks on AWS:
- What is Delta Lake?, Databricks on AWS, and Databricks documentation for Amazon Web Services on databricks.com
- Databricks on AWS on aws.amazon.com
For Azure Databricks:
- Azure Databricks on databricks.com
- What is Delta Lake?, Azure Databricks, and Azure Databricks documentation on microsoft.com

Would you like to provide feedback? Just click here to suggest edits.