Skip to main content

Snowflake Reader runtime considerations

Change data is available only for the duration of the Time Travel retention period (default is 1 day). If an application requests change data that is beyond the retention period, it will halt. See Guides > Business Continuity & Data Recovery > Time Travel > Understanding & using Time Travel.

If Change Tracking is already enabled for a table but the Start Timestamp is pointing to a time before change tracking was enabled, then the behavior will be undefined. The application might halt or might result in redundant records being sent by the source. The behavior is not guaranteed and will be considered as a wrong configuration since if data from the past is supposed to be read, it is always recommended to provide a valid start timestamp that is pointing to a time that is after the time at which change tracking is enabled for the table and within the data retention period.

Performance considerations

The SnowflakeConfiguration property group has several properties that directly influence the performance of the Snowflake Reader and the load it places on Snowflake.

ThreadPoolSize property

 ThreadPoolSize specifies the number of queries (threads) that will be executed in parallel by SnowflakeReader.

  • ThreadPoolSize < Table Count – the average read latency will be relatively higher, the overall load on Snowflake will be lower.

  • ThreadPoolSize = Table Count – Snowflake reader achieves the optimal latency. This configuration can overload source Snowflake when the number of tables I high leading to performance bottlenecks.

  • ThreadPoolSize > Table Count – not recommended, and may result in unused resources in Snowflake Reader.

You can keep track of the Average Query Delay metric to determine whether modifications are necessary for this property. This metric reports the time for Poll Requests in the SnowflakeReader queue before execution after surpassing the scheduled time.

EnableQuickCatchUp property

When the reader is starting from the past, it tries to catch up to the current time by making frequent calls to the Snowflake endpoint. This behavior can have a high cost impact on Snowflake. If this property is set to true, the Snowflake endpoint is less frequently called, thereby reducing the cost, with increased performance and lower catch up time.

YieldAfter property

This property is beneficial when the ThreadPoolSize is less than the number of tables. In situations where some tables experience heavy traffic (one example is when a table is in a catchup phase while polling data from the past), tables with less traffic may not get a chance to process the polled results, and may contribute to a higher read lag. You can address this issue by limiting each table's event processing to the count specified in YieldAfter. Once a table reaches its YieldAfter limit, the resources are yielded to the next table waiting in the queue.

Metrics related to the YieldAfter property

When you configure YieldAfter, you can monitor some additional reader and table-level metrics.

You can see the following YieldAfter metrics for the reader:

  • Average Yield Interval in ms: represents the wait time between the execution of different yielded batches from a single poll result. This interval is calculated from the overall average of the average yield intervals across multiple tables.

    Note

    A high Average Yield Interval may suggest that the Snowflake Reader is unable to process the data efficiently with the current resources. An increase in the Average Yield Interval shows that even though the poll results for a table are processed according to the yield settings, the time taken to switch between tables remains high. To address this, consider increasing the ThreadPoolSize.

  • Average Yield Count: the number of times the single poll result has been yielded to process the polled events completely. The reader-level average yield count is the average number of times a single poll result was yielded across all tables.

    Note

    A high Average Yield Count can indicate a high inflow of data or a very low value for the YieldAfter property. Inspect the data inflow at the table-level and if it is found to be very high consider increasing the ThreadPoolSize. If the data inflow is low and the Average Yield Interval is also low, consider increasing the value of YieldAfter.

You can see the following YieldAfter metrics per table:

  • Average Yield Interval in ms: represents the wait time between the execution of different yielded batches from a single poll result. The per-table interval is calculated from the average yield intervals across multiple poll results.

  • Average Yield Count: the number of times the single poll result has been yielded. The per-table count is the average yield count calculated across multiple poll results.