Skip to main content

Snowflake Reader operational considerations

Change data is available only for the duration of the Time Travel retention period (default is 1 day). If an application requests change data that is beyond the retention period, it will halt. See Guides > Business Continuity & Data Recovery > Time Travel > Understanding & using Time Travel.

If Change Tracking is already enabled for a table but the Start Timestamp is pointing to a time before change tracking was enabled, then the behavior will be undefined. The application might halt or might result in redundant records being sent by the source. The behavior is not guaranteed and will be considered as a wrong configuration since if data from the past is supposed to be read, it is always recommended to provide a valid start timestamp that is pointing to a time that is after the time at which change tracking is enabled for the table and within the data retention period.

Performance considerations

The SnowflakeConfiguration property group has several properties that directly influence the performance of the Snowflake Reader and the load it places on Snowflake.

ThreadPoolSize property

 ThreadPoolSize specifies the number of queries (threads) that will be executed in parallel by SnowflakeReader.

  • ThreadPoolSize < Table Count – the average read latency will be relatively higher, the overall load on Snowflake will be lower.

  • ThreadPoolSize = Table Count – Snowflake reader achieves the optimal latency. This configuration can overload source Snowflake when the number of tables I high leading to performance bottlenecks.

  • ThreadPoolSize > Table Count – not recommended, and may result in unused resources in Snowflake Reader.

EnableQuickCatchUp property

When the reader is starting from the past, it tries to catch up to the current time by making frequent calls to the Snowflake endpoint. This behavior can have a high cost impact on Snowflake. If this property is set to true, the Snowflake endpoint is less frequently called, thereby reducing the cost, with increased performance and lower catch up time.

YieldAfter property

This property is beneficial when the ThreadPoolSize is less than the number of tables. In situations where some tables experience heavy traffic (one example is when a table is in a catchup phase while polling data from the past), tables with less traffic may not get a chance to process the polled results, and may contribute to a higher read lag. You can address this issue by limiting each table's event processing to the count specified in YieldAfter. Once a table reaches its YieldAfter limit, the resources are yielded to the next table waiting in the queue.

Snowflake Reader monitoring metrics

Table 4. Reader information

Name

Submetric name

Measure

Scope

Updated

Description

Number of Tables being Monitored

Count of Tables being monitored (int)

Since the application started

-

Total number of tables that are being monitored

Last Successful Query Detail

Table Name

Table Name (String)

Most recent query executed by the Reader across tables

Whenever a query is successfully executed on Snowflake

This metric captures the table name of the most recent successful query to Snowflake

Query Time

Time at which the last successful query was executed by the Reader(Instant)

This metric represents the time at which the most recent successful query to Snowflake was made, in the same timezone as the Striim Server

Query Count

Polled Query Count

Count of polls successfully executed(long)

Since the application started

Whenever a query is successfully executed on Snowflake

The cumulative count of queries successfully executed by SnowflakeReader on Snowflake across all tables since the start of the process.

Repeated Polled Query Count

Count of polls that are reexecuted(long)

When a query associated with a particular time frame is re-executed due to restart / retry

The cumulative count of queries repeated by SnowflakeReader due to retry since the start of the process.

Empty Result Query Count

Count of empty result polls(long)

Whenever a poll result is found to be empty

The cumulative count of queries that yielded empty results across all tables since the inception of the process.

Repeated Empty Result Polled Query Count

Count of polls that are reexecuted and produced empty results(long)

When a query associated with a particular time frame is re-executed due to restart / retry and its result is found to be empty

The cumulative count of queries repeated by SnowflakeReader due to retry and yielded empty results since the start of the process.

Processed Query Count

Count of queries whose results are processed(long)

Upon completing the processing of a new query result

The cumulative count of query results processed by SnowflakeReader but not yet fully published to the Platform since its inception.

Published Query Count

Count of queries whose results are fully published (long)

When all records from a new query have been successfully published downstream.

The cumulative count of queries processed and published by SnowflakeReader against Snowflake across all tables since its inception.

Average Delay in ms

Query Delay

Average wait time for poll requests after surpassing the scheduled time in milliseconds (long)

Whenever a query is successfully executed on Snowflake

The average wait time for Poll Requests in the SnowflakeReader queue before execution after surpassing the scheduled time.

Processing Delay

Average wait time for poll results to start processing in milliseconds (long)

Upon completing the processing of a new query result

The average duration that query results wait in the Processor queue before processing.

Average Latency in ms

Query Latency

Average time taken to execute the queries in milliseconds (long)

Whenever a query is successfully executed on Snowflake

This metric computes the average duration for a query to execute across various tables in SnowflakeReader.

Processing Latency

Average time taken to process the queries in milliseconds (long)

Upon completing the processing of a new query result

The average duration for SnowflakeReader to handle query results.

Yield Information

Average Yield Interval in ms

Average wait time between the execution of different yielded batches in milliseconds (long)

Upon completing the processing of a new query result

The yield interval represents the wait time between the execution of different yielded batches from a single poll result. The reader-level average yield interval is the overall average of these average yield intervals calculated across multiple tables. This metric will be conditionally visible only if yield is configured.

Average Yield Count

Average number of times a single poll result was yielded (long)

Yield Count will be the number of times the single poll result has been yielded to process the polled events completely. The reader-level average yield count is the average number of times a single poll result was yielded across all tables. This metric will be conditionally visible only if yield is configured.

Average Read Lag in ms

Average time taken to prepare and publish events after retrieval in milliseconds (long)

Whenever an event is published downstream

This metric measures the average duration for SnowflakeReader to publish a change record to the output stream across tables, starting from the moment it was retrieved by the query.

Events Published

Count of events published (long)

Whenever an event is published downstream

The total number of events published downstream by SnowflakeReader since it started. This includes the number of insert, update and deletes per table.

Reconnect Information

Reconnect Attempt Count

Number of reconnect attempts (long)

Whenever a reconnect happens

Number of times the reconnect attempt was made.

Last Reconnect Attempt Time

Time at which last reconnect attempt was made (Instant)

Time at which the last reconnect attempt was made.



Table 5. Table information

Name

Submetric name

Submetric name

Measure

Scope

Updated

Description

Last Successful Query Time

Time at which last successfully query executed (Instant)

Most recent query executed against the specific table by the Reader

Whenever a query is successfully executed against the specific table in Snowflake

The timestamp of the most recent successful query to this table, similar to the corresponding reader-level metric.

Query Count

Polled Query Count

Count of polls successfully executed (long)

Since the application started

Whenever a query is successfully executed against the specific table in Snowflake

The number of queries executed by SnowflakeReader against a particular table since its inception.

Repeated Polled Query Count

Count of polls that are re-executed (long)

When a query associated with a particular time frame for the specific table is re-executed due to restart / retry

The cumulative count of queries repeated by SnowflakeReader due to retry since the start of the process.

Empty Result Query Count

Count of empty result polls (long)

Whenever a poll result for the specific table is found to be empty

The number of queries that have returned empty results for a particular table since its inception.

Repeated Empty Result Polled Query Count

Count of polls that are re-executed and produced empty results (long)

When a query associated with a particular time frame for the specific table is re-executed due to restart / retry and its result is found to be empty

The cumulative count of queries repeated by SnowflakeReader due to retry and yielded empty results since the start of the process.

Processed Query Count

Count of queries whose results are procesed (long)

Upon completing the processing of a new query result for the specific table

The total count of query results processed for this table by SnowflakeReader but not yet fully published to the downstream component.

Published Query Count

Count of queries whose results are fully published (long)

Whenever all the records associated with a new query for the specific table is published downstream

The overall number of queries executed by SnowflakeReader against this table since its initiation.

Average Delay in ms

Query Delay

Average wait time for poll requests after surpassing the scheduled time in milliseconds (long)

Whenever a query is successfully executed against the specific table in Snowflake

Average duration that the Poll Requests spend waiting in the SnowflakeReader queue before execution.

Processing Delay

Average wait time for poll results to start processing in milliseconds (long)

Upon completing the processing of a new query result for the specific table

Average duration that query results for this table spend waiting in the SnowflakeReader queue before they are processed

Average Latency in ms

Query Latency

Average time taken to execute the queries in milliseconds (long)

Whenever a query is successfully executed against the specific table in Snowflake

Average time it takes for a query to be executed against this table in SnowflakeReader.

Processing Latency

Average time taken to process the queries in milliseconds (long)

Upon completing the processing of a new query result for the specific table

Average amount of time it takes SnowflakeReader to process the results of a query to this table

Yield Information

Average Yield Interval in ms

Average wait time between the execution of different yielded batches in milliseconds (long)

Upon completing the processing of a new query result for the specific table

The yield interval represents the wait time between the execution of different yielded batches from a single poll result. The table-level average yield interval is the average yield intervals calculated across multiple poll results. This metric will be conditionally visible only if yield is configured.

Average Yield Count

Average number of times a single poll result was yielded (long)

Yield Count will be the number of times the single poll result has been yielded. The table-level average yield count is the average yield count calculated across multiple poll results. This metric will be conditionally visible only if yield is configured.

Average Read Lag in ms

Average time taken to prepare and publish events after retrieval in milliseconds (long)

Whenever an event for the specific table is published downstream

This metric measures the average duration for SnowflakeReader to publish a change record to the output stream for this table, starting from the moment it was retrieved by the query.

Events Published

Count of events published (long)

Whenever an event for the specific table is published downstream

The total number of events published to the output stream for this table since SnowflakeReader started. This includes the number of insert, update, deletes per table.

No of Inserts

Count of INSERT eventspublished (long)

Whenever an INSERT event for the specific table is published downstream

Total number of INSERTs published

No of Updates

Count of UPDATE events published (long)

Whenever a UPDATE event for the specific table is published downstream

Total number of UPDATEs published

No of Deletes

Count of DELETE events published (long)

Whenever a DELETE event for the specific table is published downstream

Total number of DELETEs published

No of DDLs

Count of DDL events published (long)

Whenever a DDL event for the specific table is published downstream

Total number of DDLs published. This metric will be conditionally visible only if CDDLCapture is enabled

DDL Information

Ignored DDL Count

Count of DDL events IGNORED (long)

Since the application started

Whenever a DDL event for the specific table is IGNORED

Number of DDLs Ignored when CDDLAction is set to Ignore. This metric will be conditionally visible only if CDDLCapture is enabled

Last Observed DDL Info

Last Observed DDL

Latest DDL Observed (String)

Most recent DDL change observed on the specific table

Whenever a DDL change is observed on the specific table

Last DDL change that is identified by the Snowflake Reader for this specific table. This metric will be conditionally visible only if CDDLCapture is enabled

Last Observed DDL Time

Time at which latest DDL change observed (Instant)

Time at which the last DDL change for the specific table is identified. This metric will be conditionally visible only if CDDLCapture is enabled

Last Published DDL Info

Last Published DDL

Latest DDL published (String)

Most recent DDL event published for the specific table

Whenever a DDL event for the specific table is published downstream

Last DDL change that is published by the Snowflake Reader for the specific table. This metric will be conditionally visible only if CDDLCapture is enabled

Last Published DDL Time

Time at which latest DDL event published (Instant)

Time at which the last DDL change for the specific table is published to the downstream component. This metric will be conditionally visible only if CDDLCapture is enabled

Latest Position Information

Read Position

Position representing the latest read (String)

Most recent query executed against the specific table

Whenever a query is successfully executed against the specific table in Snowflake

This metric indicates the event position successfully queried by the SnowflakeReader for change data. It enables monitoring the progress of the SnowflakeReader Poller Thread.

Processed Position

Position of latest processed event (String)

Most recent event processed for the specific table

Whenever a new event for the specific table is processed successfully

This metric denotes the last event position2 successfully processed by the SnowflakeReader. It provides a means to monitor the progress of the SnowflakeReader Processor Thread.

Published Position

Position of latest published event (String)

Most recent event published for the specific table

Whenever a new event for the specific table is published sucessfully

This metric represents the latest event position successfully published by the SnowflakeReader main thread downstream. It offers a way to monitor the overall progress of the SnowflakeReader.