Skip to main content

Striim for Databricks Documentation

What is Striim for Databricks?

Striim for Databricks is a fully-managed software-as-a-service tool for building data pipelines (see What is a Data Pipeline) to copy data from MariaDB, MySQL, Oracle, PostgreSQL, and SQL Server to Databricks in real time using change data capture (CDC).

Striim first copies all existing source data to Databricks ("initial sync"), then transitions automatically to reading and writing new and updated source data ("live sync"). You can monitor the real-time health and progress of your pipelines, as well as view performance statistics as far back as 90 days.

Optionally, with some sources, Striim can also synchronize schema evolution. That is, when you add a table or column to, or drop a table from, the source database, Striim will update Databricks to match. Sync will continue without interruption. (However, if a column is dropped from a source table, it will not be dropped from the corresponding Databricks target table.). If your source supports this, How would you like to handle schema changes? will appear among the Connect to Source properties.

When you launch Striim for Databricks, we guide you through the configuration of your pipeline, including connecting to your Databricks project, configuring your source, selecting the schemas and tables you want to sync to Databricks, and choosing which settings to use for the pipeline.