Iceberg Writer monitoring metrics
See Data warehouse monitoring metrics.
When parallel threads are enabled (in append-only mode), multiple instances of IcebergWriter may run concurrently, each triggering its own compute job. The Current Batch Information metric will reflect the execution status of each adapter instance as a separate element in the JSON array, with each entry tagged by its corresponding adapter instance ID.
Name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Current Batch Information | JSON array containing the current batch/task execution details at the adapter instance level | active until the task completion. | After each batch execution | Provides detailed information about the task currently being executed or awaited by the adapter instance. Useful for monitoring task status and identifying long-running or stuck operations on the compute engine |
Current Batch Information sub-metrics :
Each element in the JSON Array represents details of execution specific to one adapter instance.
Name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Adapter Instance Id | Integer uniquely representing an adapter instance. | active until the task completion. | After each batch execution | In single-threaded mode, the default value is 0. When parallel execution is enabled (such as in append-only mode), each thread is assigned a unique ID starting from 0 up to N–1, where N is the total number of configured parallel threads. |
Batch Tasks | JSON Array containing the current batch/task execution details at the adapter instance level | active until the task completion. | After each batch execution | A JSON array containing the current Batch Task execution details at the adapter instance level |
Metafetch Task | JSON Node containing the current metaFetch execution details at the adapter instance level | active until the task completion. | After each batch execution | A JSON node containing the current metaFetch execution details at the adapter instance level |
Metafetch Task sub-metrics
Name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Table Details | JSON Node representing Table Details such as Table Name | active until the task completion. | After each batch execution | A JSON node representing table details, including the table name |
Task Sequence Number | Integer uniquely representing the current task’s number. | active until the task completion. | After each batch execution | A monotonically increasing integer that identifies the execution order of tasks processed by the adapter instance; it increments with each new task submitted and helps track the sequence and lifecycle of tasks at the instance level |
Time Elapsed | Represented in a human readable time format HH:MM:SS (e.g., 01H:23M:45S) for durations including hours mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour ss:SSS (e.g., 12s:456ms) for durations under a minute | active until the task completion. | After each batch execution | Represented in a human-readable time format:
|
Batch Tasks sub-metrics
Name | Measure | Scope | Updated | Description |
---|---|---|---|---|
Table Details | JSON Node representing Table Details such as Table Name and Batch Sequence Number | active until the task completion. | After each batch execution | A JSON node representing table details, including the table name |
Task Sequence Number | Integer uniquely representing the current task’s number. | active until the task completion. | After each batch execution | A monotonically increasing integer that identifies the execution order of tasks processed by the adapter instance; it increments with each new task submitted and helps track the sequence and lifecycle of tasks at the instance level |
Task Type | String denoting the type of task current | active until the task completion. | After each batch execution | String denoting the type of the current task: MERGE, APPEND, PKUPDATE, or DDL |
Thread Name | active until the task completion. | After each batch execution | Identifies the internal worker thread within the adapter instance that handles the current batch. | |
Time Elapsed | Represented in a human readable time format HH:MM:SS (e.g., 01H:23M:45S) for durations including hours mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour ss:SSS (e.g., 12s:456ms) for durations under a minute | active until the task completion. | After each batch execution | Represented in a human-readable time format:
|
Granular Breakdown of the Current Batch Information metric in IcebergWriter
This metric offers a real-time snapshot of the tasks currently being executed by each IcebergWriter adapter instance. It is structured as a JSON array, where each element corresponds to a specific adapter instance and its active task context. Below is a detailed explanation of each sub-metric:
Adapter Instance ID
Description: A numeric identifier that uniquely represents an adapter instance.
Behavior:
Defaults to 0 when parallel execution is not enabled (i.e., single-threaded mode).
When parallel threads are enabled (e.g., in append-only mode), each instance is assigned a unique ID starting from 0 to N, based on the number of parallel threads configured.
Task Sequence Number
Description: A monotonically increasing integer that identifies the execution order of tasks handled by the adapter instance.
Behavior:
Incremented each time a new task is submitted by the adapter.
Helps in tracking the sequence and lifecycle of tasks per instance.
Table Details
Description: A JSON node containing metadata about the target table involved in the current task.
Contents:
For MetaFetch tasks: includes table name.
For Batch tasks: includes table name and the batch sequence number being processed.
Task Type
Description: A string indicating the type of task currently being executed by the compute engine.
Examples:
APPEND, DDL, MERGE, PKUPDATE, etc.
Utility:
Useful for categorizing the nature of operations and applying filters or alerts based on task type.
Thread Name
Description: Identifies the internal worker thread within the adapter instance that is handling the current task.
Behavior:
Relevant in parallel execution mode.
Helps in debugging concurrency and workload distribution across worker threads.
Time Elapsed
Description: Indicates how long the current task has been running.
Format: Human-readable formats based on duration length:
HH:MM:SS (e.g., 01H:23M:45S) for durations over an hour
mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour
ss:SSS (e.g., 12s:456ms) for durations under a minute
Behavior:
Continuously updated while the task is in progress.
Resets once the task completes.
Troubleshooting using Iceberg Writer Current Batch Information
Troubleshooting
The Current Batch Information metric provides visibility into the active task execution state for each IcebergWriter adapter instance. It helps detect issues such as long-running tasks, resource contention, and compute engine failures. Below are common problem scenarios and recommended actions.
1. Hung or Long-Running Tasks
Symptom:A task remains active in Current Batch Information for an unusually long duration. The Time Elapsed continues increasing without completion.
Diagnosis:
Check if Time Elapsed exceeds expected thresholds (e.g., >15 minutes).
Review Task Type to see if it’s a heavier operation like MERGE .
Use Thread Name to identify the specific thread executing the task.
Resolution:
Inspect compute engine logs (Dataproc, EMR) for memory or resource issues.
Consider increasing cluster capacity or optimizing the operation.
2. Adapter Appears Idle While a Task is Running
Symptom:No new output or progress is seen from the adapter, but Current Batch Information shows an active task.
Diagnosis:
Check if Task Sequence Number is progressing. If not, the task might be stuck.
Verify compute job status in the cluster dashboard.
Look into Table Details to identify which table or batch is involved.
Resolution:
Restart the adapter job if the compute task has silently failed, or configure auto-resume to avoid future occurrences.
3. Multiple Stuck Threads in Parallel Mode
Symptom: In parallel execution mode, several threads show high Time Elapsed values, with no visible pipeline progress.
Diagnosis:
Review all entries in Current Batch Information for stalled tasks.
Look at Task Type and Thread Name to isolate potential contention or deadlocks.
Check compute cluster metrics for CPU, memory, and thread saturation.
Resolution:
Investigate thread starvation or executor bottlenecks , and increase the compute resources .
Tune down the number of parallel threads if the environment can't handle the load.