Skip to main content

Striim Cloud 4.1.0 documentation

Delta Lake Writer for Databricks

Starting with Striim 4.0.1.3, Delta Lake Writer supports Databricks in both Amazon Web Services (AWS) and Azure.

Delta Lake is an open-source tabular storage format best known through Databricks' implementation. It includes a transaction log that supports features such as ACID transactions and optimistic concurrency control typically associated with relational databases. For more information, see delta.io and Diving Into Delta Lake: Unpacking The Transaction Log.

Delta Lake Writer writes to tables in Azure Databricks. For more information, see:

In this release, Delta Lake Writer supports only Databricks on AWS and Azure and no other implementations of Delta Lake.

Writing to Databricks requires a staging area. The native Databricks File System (DBFS) has as a 2 GB cap on storage, which can cause file corruption. To work around that limitation, we strongly recommend using an external stage instead: Azure Data Lake Storage (ADLS) Gen2 for Azure Databricks or Amazon S3 for Databricks on AWS.

If you will use MERGE mode, we strongly recommend partitioning your target tables as this will significantly improve performance (see Partitions | Databricks on AWS or Learn / Azure / Azure Databricks / Partitions.

Data is written in batch mode. Streaming mode is not supported in this release because it is not supported by Databricks Connect (see Databricks Connect - Limitations).

Known issue DEV-29579: in this release, Delta Lake Writer can not be used when Striim is running in Microsoft Windows.