Data warehouse monitoring metrics

The following monitoring metrics are returned for Azure Synapse Writer, BigQuery Writer, Databricks Wrtier, Fabric Data Warehouse Writer, Iceberg Writer, and Snowflake Writer.

Metric name	Measure	Scope	Updated	Description
Connection Retry Information	total number of reconnects and the last known reconnect time	Since the application was started		the total number of reconnects and the last known reconnect time that occurred Visible only if a retry attempt has been made by the adapter.
Discarded Event Count	Total number of events discarded	Since the application was started	After each event/batch execution	the total number of events discarded across all tables
Queued Batches Size In Bytes	size of all queued batches	Since the application was started		the size of all queued batches in bytes For BigQuery and Snowflake only: calculated and populated only when the application is configured in streaming mode
Table Write Info	JSONArray listing DWH-specific metrics	Since the application was started	After each batch execution	a list of sub-metrics showing granular table-level write info See details in table below.
Target Freshness	Time since a batch was executed	Since the application was started	After each batch execution	the time since a batch was executed across all tables.
Write Timestamp	Time when a batch was executed	Since the application was started	After each batch execution	the time the last batch was executed

Table Write Info sub-metrics

Name	Measure	Scope	Updated	Description
Avg Batch Size in bytes	Average Size of a Batch	Since the application was started	After each batch execution	The average size of a batch in bytes.
Avg Compaction Time in ms	Average Compaction Time	Since the application was started	After each batch execution	The overall average compaction time taken across all the batches
Avg Event Count Per Batch	Average Event Count present in a Batch	Since the application was started	After each batch execution	The Average Number of events present in a batch
Avg Integration Time in ms	Average Overall Time for a Batch to make it to the Target Table	Since the application was started	After each batch execution	Average Time taken to move a processed batch to the target table. This includes all the sub processes like Upload, Compaction, Merge etc.,
Avg Merge Time in ms	Average Merge Time	Since the application was started	After each batch execution	The overall average merge time taken across all batches
Avg Micro Batch Count Per Batch	Average number of micro batches	Since the application was started	After each batch execution	Average number of micro batches present in a batch Visible and populated only when using Streaming mode in DWH Adapters.
Avg Stage Resources Management Time in ms	Average time taken to clear/create the staging resources	Since the application was started	After each batch execution	The Average time taken to clear/create the staging resources such as stage table & area.
Avg Upload Time in ms	Average Upload Time	Since the application was started	After each batch execution	The overall average upload time taken across all the batches
Avg Waiting Time in Queue in ms	Average Batch waiting time in queue	Since the application was started	After each batch execution	The average time batches spent waiting in the queue.
Last Applied DDL Statement	Latest DDL statement executed in the Target Table	Most Recent Object	After each DDL Batch	The last executed DDL statement for a specific table. This metric is visible only when a DDL execution has occurred.
Last Applied DDL Time	Latest DDL Batch Executed Time	Most Recent Object	After each DDL Batch	Time of the last executed DDL for a specific table. This metric is visible only when a DDL execution has occurred.
Last batch info	number of updates, inserts, deletes, and other batch-specific metrics.	Most Recent Object	After each batch execution	Shows the granular metrics from the last executed batch. Includes details like the number of updates, inserts, deletes, and other batch-specific metrics.
Last successful merge time	Latest Batch Executed Time	Most Recent Object	After each batch execution	Time of the last executed task for a specific table. Updated in both Append Only and Merge modes, regardless of naming. The batch execution end time is noted and updated as the last successful merge time for each batch.
Mapped Source Table	Source table name	Most Recent Object	After each batch execution	Indicates the source table to which the target table is mapped.
Max Integration Time in ms	Maximum time taken for a batch to make it to the target table	Since the application was started	After each batch execution	The maximum taken for a processed batch to reach the target table
Min Integration Time in ms	Minimum time taken for a batch to make it to the target table	Since the application was started	After each batch execution	The minimum time taken for a processed batch to reach the target table
Total Batches Created	Total Batches Created	Since the application was started	After each batch execution	Total number of tasks created for a specific table
Total Batches Queued	Total Batches Queued	Since the application was started	After each batch execution	Total number of tasks currently in queue for a specific table.
Total Batches Uploaded	Total Batches Uploaded	Since the application was started	After each batch execution	Total number of tasks successfully executed for a specific table.
Total event info	number of DDLs, inserts, deletes, pk updates, and total events merged.	Since the application was started	After each batch execution	Depicts the overall event count for a specific table.
Total Micro Batches Created	Total Micro Batches Created	Since the application was started	After each micro-batch execution	Total number of micro-batches created for a specific table.
Total Micro Batches Uploaded	Total Micro Batches Uploaded	Since the application was started	After each micro-batch execution	Total micro-batches successfully executed for a specific table.

Troubleshooting using data warehouse monitoring metrics

Poor Target Freshness:

Issue: If the target freshness is high (e.g., several hours or more), it indicates that the target adapter has not processed any batches for a significant amount of time.
Cause: This may happen if the adapter is facing connectivity issues, the system is overloaded, or there are configuration problems that prevent it from processing batches.
Suggestion:
- Check the adapter logs for any connection errors or retry attempts.
- Ensure that the adapter has the correct permissions and access to the target system.
- Validate the system resources like memory and CPU usage to make sure the adapter is not being starved of resources.
- Review the configuration settings and ensure the adapter is set to process batches at the required frequency.

High Number of Total Batches Queued / High Average Waiting Time in Queue:

Issue: A high number of total batches queued / high average waiting time in queue means that the events for the target table are being received faster than the target adapter can process them, causing batch accumulation in the queue and unnecessary memory usage.
Cause: The incoming event rate is higher than the batch processing rate of the target adapter, leading to excessive queuing and potential delays.
Suggestion:
- For optimal performance, adjust the batch policy so that the rate at which events are received aligns more closely with the processing rate of the target adapter.
- Review the batch size, batch timeout, and other relevant policies to ensure they are configured for efficient throughput.
- Monitor the adapter’s performance regularly to ensure that batch queuing remains under control and no memory is unnecessarily held.

High Difference between Max Integration Time and Average Integration Time:

Issue: A significant difference between the maximum integration time and the average integration time indicates that one or more batches have taken significantly longer to process than usual. This could be due to connectivity issues or other delays, causing that specific batch to take longer.
Cause: Connectivity issues, system overload, or other external factors can cause one batch to take longer, resulting in a queue buildup that slows down the processing of subsequent batches.
Suggestion:
- Investigate the batch that took longer to process by reviewing the logs and checking for connectivity issues or system resource bottlenecks.
- Optimize resource allocation to prevent such delays in the future.
- Consider adjusting the retry and timeout settings for the adapter to handle delayed batches more efficiently.

High Discarded Event Count:

Issue: A high number of discarded events indicates that many events are not being processed due to missing table mappings or other issues.
Cause: This can happen if the events don’t have suitable mappings to a target table, or if a batch is ignored due to configuration or exception handling.
Suggestion:
- Review the mapping configurations to ensure that all events have corresponding target tables.
- Investigate whether the discarded events contain any critical data that should be reprocessed or mapped.

Slow Average Integration Time:

Issue: A high average integration time means that batches are generally taking longer than expected to be processed by the adapter.
Cause: This could be due to inefficient resource usage, connectivity issues, or an overly aggressive batch policy that leads to larger batch sizes or high batch timeout values.
Suggestion:
- Monitor resource utilization (memory, CPU) to identify potential bottlenecks.
- Check network performance and connectivity between the source and target systems.
- Tune batch size, batch timeout, and other processing parameters to reduce integration time.
- Partition the target tables.

Data warehouse monitoring metrics

Table Write Info sub-metrics

Troubleshooting using data warehouse monitoring metrics

Search results