Skip to main content

Handling schema evolution

For the CDC sources listed below, Striim can capture certain changes to the DDL in the source tables. When CDDL Capture is enabled in a source, you must choose which of the following actions you want to happen when it encounters a DDL change:

Note

Process in the source is supported only when all source table data types are supported. Process in the target is supported only when all source types are mapped to target types.

  • Halt in source: DDL changes are never expected in the source. Striim will halt the application so you can investigate the issue. Use this option when you want to prohibit changes to the source schema.

  • Ignore in source: target is schemaless and does not have the CDDL Action property (for example, FileWriter with JSON Formatter, or MongoDB).

  • Process in source, Halt in target: the application has one or more targets that support the Process action and one or more that have the CDDL Action property but do not support the Process action. In this case, set the CDDL Action property to Halt for the targets that do not support Process. Schema changes in the source will be replicated to the targets that support Process and the application will halt for you to deal with the others manually. If recovery is enabled for the application, after restart the DDL operation will be sent again.

    DDL changes to tables specified in Excluded Tables will not trigger Halt.

  • Process in source, Ignore in target: the application has multiple targets that have the CDDL Action property and you do not want to replicate changes to this one.

  • Process in source, Process in target: replicate changes to the target, keeping the target schema in sync with the source automatically without interrupting operation of the application. This is supported only for the targets listed below..

    When an unsupported DDL operation or unsupported data type is encountered in a DDL change, the application will halt for you to troubleshoot the problem.

  • Quiesce in source: target does not support schema evolution, so when a DDL change is detected, Striim will write all the events received prior to the DDL operation to the target, then quiesce the application. Then you can update the target schema manually. If recovery is enabled for the application, after restart the DDL operation will not be sent again.

Always select Process in the source when Using the Confluent or Hortonworks schema registry.

Supported sources

  • MariaDB using MariaDB Reader

  • MySQL using MySQL Reader

  • Oracle Database 18c and earlier using Oracle Reader

  • Oracle GoldenGate 12.1 or later for Oracle Database only using GG Trail Reader

  • PostgreSQL using PostgreSQL Reader (see PostgreSQL setup for schema evolution)

  • SQL Server using MSJet (not supported using MS SQL Reader)(

Targets with the CDDL Action property

  • Azure Synapse Writer

  • BigQuery Writer

  • Databricks Writer: supports only CREATE TABLE, ADD COLUMN, DROP TABLE, and TRUNCATE

  • MariaDB (via Database Writer)

  • MySQL (via Database Writer)

  • Oracle Database (via Database Writer)

  • PosgtgreSQL (via Database Writer)

  • SAP Hana (via Database Writer - does not support Process)

  • Snowflake Writer

  • Spanner Writer

  • SQL Server (via Database Writer)

  • Sybase (via Database Writer - does not support Process)

Supported DDL operations

  • CREATE TABLE (default column values are not supported)

  • ALTER TABLE ... ADD COLUMN (default column values are not supported)

    • with MySQL, AFTER and ALGORITHM=INSTANT are not supported (known issues DEV-35539 and DEV-35681)

    • with Oracle Database, adding NOT NULL constraints is not supported (known issues DEV-24666, DEV-25424)

  • ALTER TABLE ... MODIFY COLUMN: The modification must be compatible with existing data, for example, you could change short to long, or varchar(20) to varchar(30). Default column values are not supported.

    • not supported with BigQuery or Databricks targets

    • If a ColumnMap is specified (see Mapping columns), the mapped target column will be modified.

    • with Oracle Database, adding NOT NULL constraints is not supported (known issues DEV-24666, DEV-25424)

    • Snowflake Writer: see limitations described in ALTER TABLE … ALTER COLUMN

    • not supported with SQL Server sources (known issue DEV-26386)

  • ALTER TABLE ... ADD PRIMARY KEY

    • not supported with BigQuery or Databricks targets

    • not supported with GoldenGate, MySQL, or SQL Server sources

  • ALTER TABLE ... ADD CONSTRAINT ... PRIMARY KEY

    • not supported with BigQuery or Databricks targets

    • With GoldenGate sources, use the syntax ALTER TABLE <name> ADD CONSTRAINT <name> PRIMARY KEY (id). (The syntax ALTER TABLE MODIFY ID NUMBER NOT NULL PRIMARY KEY will not work.)

    • not supported with SQL Server sources

  • ALTER TABLE ... ADD CONSTRAINT ... UNIQUE

    • not supported with BigQuery or Databricks targets

    • supported with GoldenGate sources only when table has a primary key column and the constraint is not added to that column (known issues DEV-26575)

    • not supported with SQL Server sources (known issue DEV-26386)

  • ALTER TABLE ... ADD UNIQUE

    • supported with MariaDB and MySQL only

  • ALTER TABLE ... DROP COLUMN:

    • not supported with BigQuery or Databricks targets

    • If a ColumnMap is specified for the column (see Mapping columns), the application will halt. ALTER and RECOMPILE the application to remove the ColumnMap for the column, drop the column from the target table, and restart the application.

    • with Oracle Database, adding NOT NULL constraints is not supported (known issues DEV-24666, DEV-25424)

  • DROP TABLE

  • TRUNCATE TABLE

    sources that support TRUNCATE TABLE

    targets that support TRUNCATE TABLE

    • GG Trail Reader

    • MariaDB Xpand Reader

    • MySQL Reader

    • OJet

    • Oracle Reader (with Oracle version 18c or earlier only)

    • PostgreSQL Reader (using wal2json version 2 only)

    • Azure Synapse Writer

    • BigQuery Writer

    • Database Writer writing to

      • MariaDB

      • MariaDB Xpand

      • MySQL

      • Oracle

      • PostgreSQL

      • SQL Server

    • Databricks Writer

    • Snowflake Writer

    • TRUNCATE TABLE ... CASCADE is not supported

    • TRUNCATE cannot be supported with Spanner or SQL Server sources as TRUNCATE operations is not included in their CDC

Data type support and mappings

See Data type support & mapping for schema conversion & evolution.

Monitoring schema evolution

The MON command includes the following metrics for schema evolution:

  • number of DDL operations, by table

  • last captured / applied DDL statement

  • time of last captured / applied DDL

  • ignored DDL count

Schema evolution notes and limitations

  • Schema evolution is not supported when using Bidirectional replication.

  • Striim can capture only those DDL changes made after schema evolution is enabled.

  • The first time you start an application with CDDL Capture enabled, the CDC reader will take a snapshot of the source database's table metadata from the. It is essential that there are no DDL changes made to the database until startup completes. Otherwise, the schema captured in the snapshot will be out of date, which will eventually cause the application to terminate.

  • When the Tables property in the reader uses a wildcard, the first time the application is started Striim must fetch the metadata for all tables in the schema. If there are many tables in the schema, this may take a significant amount of time.

  • After an application with both recovery and schema evolution enabled is restarted, Striim will automatically use the correct schema for the restart position

  • Any \r, \n, \t or other control character in a DDL statement must be followed by a white space (Unicode 0020 / ASCII 32) or the application will halt.

  • If the application halts due to an unsupported DDL change, an unsupported column data type, or a parser exception, you may add the table causing the halt to the Excluded Tables list and restart the application.

  • Renaming tables is not supported.

Schema evolution known issues

The following are known issues in this release related to schema evolution. Additional known issues are flagged by "DEV-#####"in Handling schema evolution.

All sources

Columns with a data type that has a scale set to a negative value (for example, number(1, -17) ) are not supported.

Azure Synapse Writer
  • If using Optimized Merge mode, CREATE TABLE will cause the application to halt (DEV-29689).

  • Adding a NOT NULL constraint on a column that already has a UNIQUE constraint is not supported. (DEV-26158)

BigQuery Writer

If you are using the legacy streaming API to write to template tables, using the default setting of Process may cause the application to halt due to a limitation in BigQuery that does not allow writing for up to 90 minutes after a DDL change (see BigQuery > Documentation > Guides > Use the legacy streaming API > Creating tables automatically using template tables > Changing the template table schema). In this case, supporting schema evolution is impossible, so set CDDL Action to Ignore. This is not an issue if you are using partitioned tables.

MSJet
  • DDL changes made using SQL Server Management Studio are not captured. (DEV-37099)

  • If a table is dropped and a table of the same name is created, the application may terminate. (DEV-26417)

  • The application will terminate if the database contains tables with names that vary only by case, for example, id and ID, even if those tables are not among those read by MSJet. (DEV-26872)

MySQL Reader
  • Adding a UNQUE constraint syntax with a system generated constraint name (for example, ADD CONSTRAINT cs_01ec19f5f75caa91a1160eca1 UNIQUE (created_att) is not supported. (DEV-26678)

OJet
  • Columns of type ROWID are not supported.

  • Invisible, virtual, and unused columns are not supported.

Oracle Reader
  • Columns of type INTERVAL DAY(x) TO SECOND(y) are not supported. (DEV-24624).

  • Invisible, virtual, and unused columns are not supported.

PostgreSQL Reader
  • To capture DDL changes when the command has more than 1024 characters (for example, a CREATE TABLE statement with many columns), you must raise PostgreSQL's track_activity_query_size parameter from its default value of 1024. (DEV-24650)

  • Creating a table with a column of type serial or adding a column of type serial is not supported.

Sample WAEvents for DDL operations when schema evolution is enabled

DDL command

example

resulting WAEvent

CREATE TABLE

CREATE TABLE PRODUCT.CUSTOMER
(
    c_custkey     BIGINT not null,
    c_name        VARCHAR(25) not null,
    c_address     VARCHAR(40) not null,
    c_nationkey   INTEGER not null,
    c_phone       CHAR(15) not null,
    c_acctbal     DOUBLE PRECISION,,
    c_mktsegment  CHAR(10) not null
);
WAEvent{
data: ["CREATE TABLE PRODUCT.CUSTOMER  …”]
metadata:{
"OperationName": "Create",
"TableName": "PRODUCT.CUSTOMER",
"SchemaName": "PRODUCT",
"OperationType": "DDL",
"CDDLMetadata": “<Info about DDL>”
}
};

ALTER TABLE ADD COLUMN

ALTER TABLE PRODUCT.CUSTOMER
  ADD  c_comment VARCHAR(117) not null;
WAEvent{
data: ["ALTER TABLE PRODUCT.CUSTOMER
  ADD  c_comment VARCHAR(117) not null;"]
metadata:{
"OperationName": "AlterColumns",
"OperationSubName": "AddColumn",
"TableName": "PRODUCT.CUSTOMER",
"SchemaName": "PRODUCT",
"OperationType": "DDL",
"CDDLMetadata": “<Info about DDL>”
}
};

ALTER TABLE MODIFY COLUMN

ALTER TABLE PRODUCT.CUSTOMER
ALTER COLUMN c_address TYPE VARCHAR(200);
WAEvent{
data: ["ALTER TABLE PRODUCT.CUSTOMER
ALTER COLUMN c_address TYPE VARCHAR(200);"]
metadata:{
"OperationName": "AlterColumns",
"OperationSubName": "AlterColumn",
"TableName": " PRODUCT.CUSTOMER",
"SchemaName": "PRODUCT",
"OperationType": "DDL",
"CDDLMetadata": “<Info about DDL>”
}
};

ALTER TABLE DROP COLUMN

ALTER TABLE PRODUCT.CUSTOMER 
DROP COLUMN c_acctbal;
WAEvent{
data: ["ALTER TABLE PRODUCT.CUSTOMER 
DROP COLUMN c_acctbal;"]
metadata:{
"OperationName": "AlterColumns",
"OperationSubName": "DropColumn",
"TableName": "PRODUCT.CUSTOMER",
"SchemaName": "PRODUCT",
"OperationType": "DDL",
"CDDLMetadata": “<Info about DDL>”
}
};

DROP TABLE

Drop Table PRODUCT.CUSTOMER;
WAEvent{
data: ["DROP TABLE PRODUCT.CUSTOMER"]
metadata:{
"OperationName": "Drop",
"TableName": "HR.EMP",     
"SchemaName": "HR",
"OperationType": "DDL",
"CDDLMetadata": “<Info about DDL>”
}
};