Snowflake Reader operational considerations
Change data is available only for the duration of the Time Travel retention period (default is 1 day). If an application requests change data that is beyond the retention period, it will halt. See Guides > Business Continuity & Data Recovery > Time Travel > Understanding & using Time Travel.
If Change Tracking is already enabled for a table but the Start Timestamp is pointing to a time before change tracking was enabled, then the behavior will be undefined. The application might halt or might result in redundant records being sent by the source. The behavior is not guaranteed and will be considered as a wrong configuration since if data from the past is supposed to be read, it is always recommended to provide a valid start timestamp that is pointing to a time that is after the time at which change tracking is enabled for the table and within the data retention period.
Performance considerations
The SnowflakeConfiguration property group has several properties that directly influence the performance of the Snowflake Reader and the load it places on Snowflake.
ThreadPoolSize property
ThreadPoolSize specifies the number of queries (threads) that will be executed in parallel by SnowflakeReader.
ThreadPoolSize<Table Count– the average read latency will be relatively higher, the overall load on Snowflake will be lower.ThreadPoolSize=Table Count– Snowflake reader achieves the optimal latency. This configuration can overload source Snowflake when the number of tables I high leading to performance bottlenecks.ThreadPoolSize>Table Count– not recommended, and may result in unused resources in Snowflake Reader.
EnableQuickCatchUp property
When the reader is starting from the past, it tries to catch up to the current time by making frequent calls to the Snowflake endpoint. This behavior can have a high cost impact on Snowflake. If this property is set to true, the Snowflake endpoint is less frequently called, thereby reducing the cost, with increased performance and lower catch up time.
YieldAfter property
This property is beneficial when the ThreadPoolSize is less than the number of tables. In situations where some tables experience heavy traffic (one example is when a table is in a catchup phase while polling data from the past), tables with less traffic may not get a chance to process the polled results, and may contribute to a higher read lag. You can address this issue by limiting each table's event processing to the count specified in YieldAfter. Once a table reaches its YieldAfter limit, the resources are yielded to the next table waiting in the queue.
Snowflake Reader monitoring metrics
Name | Submetric name | Measure | Scope | Updated | Description |
Number of Tables being Monitored | Count of Tables being monitored (int) | Since the application started | - | Total number of tables that are being monitored | |
Last Successful Query Detail | Table Name | Table Name (String) | Most recent query executed by the Reader across tables | Whenever a query is successfully executed on Snowflake | This metric captures the table name of the most recent successful query to Snowflake |
Query Time | Time at which the last successful query was executed by the Reader(Instant) | This metric represents the time at which the most recent successful query to Snowflake was made, in the same timezone as the Striim Server | |||
Query Count | Polled Query Count | Count of polls successfully executed(long) | Since the application started | Whenever a query is successfully executed on Snowflake | The cumulative count of queries successfully executed by SnowflakeReader on Snowflake across all tables since the start of the process. |
Repeated Polled Query Count | Count of polls that are reexecuted(long) | When a query associated with a particular time frame is re-executed due to restart / retry | The cumulative count of queries repeated by SnowflakeReader due to retry since the start of the process. | ||
Empty Result Query Count | Count of empty result polls(long) | Whenever a poll result is found to be empty | The cumulative count of queries that yielded empty results across all tables since the inception of the process. | ||
Repeated Empty Result Polled Query Count | Count of polls that are reexecuted and produced empty results(long) | When a query associated with a particular time frame is re-executed due to restart / retry and its result is found to be empty | The cumulative count of queries repeated by SnowflakeReader due to retry and yielded empty results since the start of the process. | ||
Processed Query Count | Count of queries whose results are processed(long) | Upon completing the processing of a new query result | The cumulative count of query results processed by SnowflakeReader but not yet fully published to the Platform since its inception. | ||
Published Query Count | Count of queries whose results are fully published (long) | When all records from a new query have been successfully published downstream. | The cumulative count of queries processed and published by SnowflakeReader against Snowflake across all tables since its inception. | ||
Average Delay in ms | Query Delay | Average wait time for poll requests after surpassing the scheduled time in milliseconds (long) | Whenever a query is successfully executed on Snowflake | The average wait time for Poll Requests in the SnowflakeReader queue before execution after surpassing the scheduled time. | |
Processing Delay | Average wait time for poll results to start processing in milliseconds (long) | Upon completing the processing of a new query result | The average duration that query results wait in the Processor queue before processing. | ||
Average Latency in ms | Query Latency | Average time taken to execute the queries in milliseconds (long) | Whenever a query is successfully executed on Snowflake | This metric computes the average duration for a query to execute across various tables in SnowflakeReader. | |
Processing Latency | Average time taken to process the queries in milliseconds (long) | Upon completing the processing of a new query result | The average duration for SnowflakeReader to handle query results. | ||
Yield Information | Average Yield Interval in ms | Average wait time between the execution of different yielded batches in milliseconds (long) | Upon completing the processing of a new query result | The yield interval represents the wait time between the execution of different yielded batches from a single poll result. The reader-level average yield interval is the overall average of these average yield intervals calculated across multiple tables. This metric will be conditionally visible only if yield is configured. | |
Average Yield Count | Average number of times a single poll result was yielded (long) | Yield Count will be the number of times the single poll result has been yielded to process the polled events completely. The reader-level average yield count is the average number of times a single poll result was yielded across all tables. This metric will be conditionally visible only if yield is configured. | |||
Average Read Lag in ms | Average time taken to prepare and publish events after retrieval in milliseconds (long) | Whenever an event is published downstream | This metric measures the average duration for SnowflakeReader to publish a change record to the output stream across tables, starting from the moment it was retrieved by the query. | ||
Events Published | Count of events published (long) | Whenever an event is published downstream | The total number of events published downstream by SnowflakeReader since it started. This includes the number of insert, update and deletes per table. | ||
Reconnect Information | Reconnect Attempt Count | Number of reconnect attempts (long) | Whenever a reconnect happens | Number of times the reconnect attempt was made. | |
Last Reconnect Attempt Time | Time at which last reconnect attempt was made (Instant) | Time at which the last reconnect attempt was made. |
Name | Submetric name | Submetric name | Measure | Scope | Updated | Description |
Last Successful Query Time | Time at which last successfully query executed (Instant) | Most recent query executed against the specific table by the Reader | Whenever a query is successfully executed against the specific table in Snowflake | The timestamp of the most recent successful query to this table, similar to the corresponding reader-level metric. | ||
Query Count | Polled Query Count | Count of polls successfully executed (long) | Since the application started | Whenever a query is successfully executed against the specific table in Snowflake | The number of queries executed by SnowflakeReader against a particular table since its inception. | |
Repeated Polled Query Count | Count of polls that are re-executed (long) | When a query associated with a particular time frame for the specific table is re-executed due to restart / retry | The cumulative count of queries repeated by SnowflakeReader due to retry since the start of the process. | |||
Empty Result Query Count | Count of empty result polls (long) | Whenever a poll result for the specific table is found to be empty | The number of queries that have returned empty results for a particular table since its inception. | |||
Repeated Empty Result Polled Query Count | Count of polls that are re-executed and produced empty results (long) | When a query associated with a particular time frame for the specific table is re-executed due to restart / retry and its result is found to be empty | The cumulative count of queries repeated by SnowflakeReader due to retry and yielded empty results since the start of the process. | |||
Processed Query Count | Count of queries whose results are procesed (long) | Upon completing the processing of a new query result for the specific table | The total count of query results processed for this table by SnowflakeReader but not yet fully published to the downstream component. | |||
Published Query Count | Count of queries whose results are fully published (long) | Whenever all the records associated with a new query for the specific table is published downstream | The overall number of queries executed by SnowflakeReader against this table since its initiation. | |||
Average Delay in ms | Query Delay | Average wait time for poll requests after surpassing the scheduled time in milliseconds (long) | Whenever a query is successfully executed against the specific table in Snowflake | Average duration that the Poll Requests spend waiting in the SnowflakeReader queue before execution. | ||
Processing Delay | Average wait time for poll results to start processing in milliseconds (long) | Upon completing the processing of a new query result for the specific table | Average duration that query results for this table spend waiting in the SnowflakeReader queue before they are processed | |||
Average Latency in ms | Query Latency | Average time taken to execute the queries in milliseconds (long) | Whenever a query is successfully executed against the specific table in Snowflake | Average time it takes for a query to be executed against this table in SnowflakeReader. | ||
Processing Latency | Average time taken to process the queries in milliseconds (long) | Upon completing the processing of a new query result for the specific table | Average amount of time it takes SnowflakeReader to process the results of a query to this table | |||
Yield Information | Average Yield Interval in ms | Average wait time between the execution of different yielded batches in milliseconds (long) | Upon completing the processing of a new query result for the specific table | The yield interval represents the wait time between the execution of different yielded batches from a single poll result. The table-level average yield interval is the average yield intervals calculated across multiple poll results. This metric will be conditionally visible only if yield is configured. | ||
Average Yield Count | Average number of times a single poll result was yielded (long) | Yield Count will be the number of times the single poll result has been yielded. The table-level average yield count is the average yield count calculated across multiple poll results. This metric will be conditionally visible only if yield is configured. | ||||
Average Read Lag in ms | Average time taken to prepare and publish events after retrieval in milliseconds (long) | Whenever an event for the specific table is published downstream | This metric measures the average duration for SnowflakeReader to publish a change record to the output stream for this table, starting from the moment it was retrieved by the query. | |||
Events Published | Count of events published (long) | Whenever an event for the specific table is published downstream | The total number of events published to the output stream for this table since SnowflakeReader started. This includes the number of insert, update, deletes per table. | |||
No of Inserts | Count of INSERT eventspublished (long) | Whenever an INSERT event for the specific table is published downstream | Total number of INSERTs published | |||
No of Updates | Count of UPDATE events published (long) | Whenever a UPDATE event for the specific table is published downstream | Total number of UPDATEs published | |||
No of Deletes | Count of DELETE events published (long) | Whenever a DELETE event for the specific table is published downstream | Total number of DELETEs published | |||
No of DDLs | Count of DDL events published (long) | Whenever a DDL event for the specific table is published downstream | Total number of DDLs published. This metric will be conditionally visible only if CDDLCapture is enabled | |||
DDL Information | Ignored DDL Count | Count of DDL events IGNORED (long) | Since the application started | Whenever a DDL event for the specific table is IGNORED | Number of DDLs Ignored when CDDLAction is set to Ignore. This metric will be conditionally visible only if CDDLCapture is enabled | |
Last Observed DDL Info | Last Observed DDL | Latest DDL Observed (String) | Most recent DDL change observed on the specific table | Whenever a DDL change is observed on the specific table | Last DDL change that is identified by the Snowflake Reader for this specific table. This metric will be conditionally visible only if CDDLCapture is enabled | |
Last Observed DDL Time | Time at which latest DDL change observed (Instant) | Time at which the last DDL change for the specific table is identified. This metric will be conditionally visible only if CDDLCapture is enabled | ||||
Last Published DDL Info | Last Published DDL | Latest DDL published (String) | Most recent DDL event published for the specific table | Whenever a DDL event for the specific table is published downstream | Last DDL change that is published by the Snowflake Reader for the specific table. This metric will be conditionally visible only if CDDLCapture is enabled | |
Last Published DDL Time | Time at which latest DDL event published (Instant) | Time at which the last DDL change for the specific table is published to the downstream component. This metric will be conditionally visible only if CDDLCapture is enabled | ||||
Latest Position Information | Read Position | Position representing the latest read (String) | Most recent query executed against the specific table | Whenever a query is successfully executed against the specific table in Snowflake | This metric indicates the event position successfully queried by the SnowflakeReader for change data. It enables monitoring the progress of the SnowflakeReader Poller Thread. | |
Processed Position | Position of latest processed event (String) | Most recent event processed for the specific table | Whenever a new event for the specific table is processed successfully | This metric denotes the last event position2 successfully processed by the SnowflakeReader. It provides a means to monitor the progress of the SnowflakeReader Processor Thread. | ||
Published Position | Position of latest published event (String) | Most recent event published for the specific table | Whenever a new event for the specific table is published sucessfully | This metric represents the latest event position successfully published by the SnowflakeReader main thread downstream. It offers a way to monitor the overall progress of the SnowflakeReader. |