Data warehouse monitoring metrics
The following monitoring metrics are returned for Azure Synapse Writer, BigQuery Writer, Databricks Wrtier, Fabric Data Warehouse Writer, Iceberg Writer, and Snowflake Writer.
Metric name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Connection Retry Information | total number of reconnects and the last known reconnect time | Since the application was started | the total number of reconnects and the last known reconnect time that occurred Visible only if a retry attempt has been made by the adapter. | |
Discarded Event Count | Total number of events discarded | Since the application was started | After each event/batch execution | the total number of events discarded across all tables |
Queued Batches Size In Bytes | size of all queued batches | Since the application was started | the size of all queued batches in bytes For BigQuery and Snowflake only: calculated and populated only when the application is configured in streaming mode | |
Table Write Info | JSONArray listing DWH-specific metrics | Since the application was started | After each batch execution | a list of sub-metrics showing granular table-level write info See details in table below. |
Target Freshness | Time since a batch was executed | Since the application was started | After each batch execution | the time since a batch was executed across all tables. |
Write Timestamp | Time when a batch was executed | Since the application was started | After each batch execution | the time the last batch was executed |
Table Write Info sub-metrics
Name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Avg Batch Size in bytes | Average Size of a Batch | Since the application was started | After each batch execution | The average size of a batch in bytes. |
Avg Compaction Time in ms | Average Compaction Time | Since the application was started | After each batch execution | The overall average compaction time taken across all the batches |
Avg Event Count Per Batch | Average Event Count present in a Batch | Since the application was started | After each batch execution | The Average Number of events present in a batch |
Avg Integration Time in ms | Average Overall Time for a Batch to make it to the Target Table | Since the application was started | After each batch execution | Average Time taken to move a processed batch to the target table. This includes all the sub processes like Upload, Compaction, Merge etc., |
Avg Merge Time in ms | Average Merge Time | Since the application was started | After each batch execution | The overall average merge time taken across all batches |
Avg Micro Batch Count Per Batch | Average number of micro batches | Since the application was started | After each batch execution | Average number of micro batches present in a batch Visible and populated only when using Streaming mode in DWH Adapters. |
Avg Stage Resources Management Time in ms | Average time taken to clear/create the staging resources | Since the application was started | After each batch execution | The Average time taken to clear/create the staging resources such as stage table & area. |
Avg Upload Time in ms | Average Upload Time | Since the application was started | After each batch execution | The overall average upload time taken across all the batches |
Avg Waiting Time in Queue in ms | Average Batch waiting time in queue | Since the application was started | After each batch execution | The average time batches spent waiting in the queue. |
Last Applied DDL Statement | Latest DDL statement executed in the Target Table | Most Recent Object | After each DDL Batch | The last executed DDL statement for a specific table. This metric is visible only when a DDL execution has occurred. |
Last Applied DDL Time | Latest DDL Batch Executed Time | Most Recent Object | After each DDL Batch | Time of the last executed DDL for a specific table. This metric is visible only when a DDL execution has occurred. |
Last batch info | number of updates, inserts, deletes, and other batch-specific metrics. | Most Recent Object | After each batch execution | Shows the granular metrics from the last executed batch. Includes details like the number of updates, inserts, deletes, and other batch-specific metrics. |
Last successful merge time | Latest Batch Executed Time | Most Recent Object | After each batch execution | Time of the last executed task for a specific table. Updated in both Append Only and Merge modes, regardless of naming. The batch execution end time is noted and updated as the last successful merge time for each batch. |
Mapped Source Table | Source table name | Most Recent Object | After each batch execution | Indicates the source table to which the target table is mapped. |
Max Integration Time in ms | Maximum time taken for a batch to make it to the target table | Since the application was started | After each batch execution | The maximum taken for a processed batch to reach the target table |
Min Integration Time in ms | Minimum time taken for a batch to make it to the target table | Since the application was started | After each batch execution | The minimum time taken for a processed batch to reach the target table |
Total Batches Created | Total Batches Created | Since the application was started | After each batch execution | Total number of tasks created for a specific table |
Total Batches Queued | Total Batches Queued | Since the application was started | After each batch execution | Total number of tasks currently in queue for a specific table. |
Total Batches Uploaded | Total Batches Uploaded | Since the application was started | After each batch execution | Total number of tasks successfully executed for a specific table. |
Total event info | number of DDLs, inserts, deletes, pk updates, and total events merged. | Since the application was started | After each batch execution | Depicts the overall event count for a specific table. |
Total Micro Batches Created | Total Micro Batches Created | Since the application was started | After each micro-batch execution | Total number of micro-batches created for a specific table. |
Total Micro Batches Uploaded | Total Micro Batches Uploaded | Since the application was started | After each micro-batch execution | Total micro-batches successfully executed for a specific table. |
Troubleshooting using data warehouse monitoring metrics
Poor Target Freshness:
Issue: If the target freshness is high (e.g., several hours or more), it indicates that the target adapter has not processed any batches for a significant amount of time.
Cause: This may happen if the adapter is facing connectivity issues, the system is overloaded, or there are configuration problems that prevent it from processing batches.
Suggestion:
Check the adapter logs for any connection errors or retry attempts.
Ensure that the adapter has the correct permissions and access to the target system.
Validate the system resources like memory and CPU usage to make sure the adapter is not being starved of resources.
Review the configuration settings and ensure the adapter is set to process batches at the required frequency.
High Number of Total Batches Queued / High Average Waiting Time in Queue:
Issue: A high number of total batches queued / high average waiting time in queue means that the events for the target table are being received faster than the target adapter can process them, causing batch accumulation in the queue and unnecessary memory usage.
Cause: The incoming event rate is higher than the batch processing rate of the target adapter, leading to excessive queuing and potential delays.
Suggestion:
For optimal performance, adjust the batch policy so that the rate at which events are received aligns more closely with the processing rate of the target adapter.
Review the batch size, batch timeout, and other relevant policies to ensure they are configured for efficient throughput.
Monitor the adapter’s performance regularly to ensure that batch queuing remains under control and no memory is unnecessarily held.
High Difference between Max Integration Time and Average Integration Time:
Issue: A significant difference between the maximum integration time and the average integration time indicates that one or more batches have taken significantly longer to process than usual. This could be due to connectivity issues or other delays, causing that specific batch to take longer.
Cause: Connectivity issues, system overload, or other external factors can cause one batch to take longer, resulting in a queue buildup that slows down the processing of subsequent batches.
Suggestion:
Investigate the batch that took longer to process by reviewing the logs and checking for connectivity issues or system resource bottlenecks.
Optimize resource allocation to prevent such delays in the future.
Consider adjusting the retry and timeout settings for the adapter to handle delayed batches more efficiently.
High Discarded Event Count:
Issue: A high number of discarded events indicates that many events are not being processed due to missing table mappings or other issues.
Cause: This can happen if the events don’t have suitable mappings to a target table, or if a batch is ignored due to configuration or exception handling.
Suggestion:
Review the mapping configurations to ensure that all events have corresponding target tables.
Investigate whether the discarded events contain any critical data that should be reprocessed or mapped.
Slow Average Integration Time:
Issue: A high average integration time means that batches are generally taking longer than expected to be processed by the adapter.
Cause: This could be due to inefficient resource usage, connectivity issues, or an overly aggressive batch policy that leads to larger batch sizes or high batch timeout values.
Suggestion:
Monitor resource utilization (memory, CPU) to identify potential bottlenecks.
Check network performance and connectivity between the source and target systems.
Tune batch size, batch timeout, and other processing parameters to reduce integration time.
Partition the target tables.