Skip to main content

Iceberg Writer monitoring metrics

See Data warehouse monitoring metrics.

When parallel threads are enabled (in append-only mode), multiple instances of IcebergWriter may run concurrently, each triggering its own compute job. The Current Batch Information metric will reflect the execution status of each adapter instance as a separate element in the JSON array, with each entry tagged by its corresponding adapter instance ID.

Name

Measure

Scope

Updated

Description

Current Batch Information

JSON array containing the current batch/task execution details at the adapter instance level

active until the task completion.

After each batch execution

Provides detailed information about the task currently being executed or awaited by the adapter instance. Useful for monitoring task status and identifying long-running or stuck operations on the compute engine

Current Batch Information sub-metrics : 

Each element in the JSON Array represents details of execution specific to one adapter instance.

Name

Measure

Scope

Updated

Description

Adapter Instance Id

Integer uniquely representing an adapter instance.

active until the task completion.

After each batch execution

In single-threaded mode, the default value is 0. When parallel execution is enabled (such as in append-only mode), each thread is assigned a unique ID starting from 0 up to N–1, where N is the total number of configured parallel threads.

Batch Tasks

JSON Array containing the current batch/task execution details at the adapter instance level

active until the task completion.

After each batch execution

A JSON array containing the current Batch Task execution details at the adapter instance level

Metafetch Task

JSON Node containing the current metaFetch execution details at the adapter instance level

active until the task completion.

After each batch execution

A JSON node containing the current metaFetch execution details at the adapter instance level

Metafetch Task sub-metrics

Name

Measure

Scope

Updated

Description

Table Details

JSON Node representing Table Details such as Table Name

active until the task completion.

After each batch execution

A JSON node representing table details, including the table name

Task Sequence Number

Integer uniquely representing the current task’s number. 

active until the task completion.

After each batch execution

A monotonically increasing integer that identifies the execution order of tasks processed by the adapter instance; it increments with each new task submitted and helps track the sequence and lifecycle of tasks at the instance level

Time Elapsed

Represented in a human readable time format 

HH:MM:SS (e.g., 01H:23M:45S) for durations including hours

mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour

ss:SSS (e.g., 12s:456ms) for durations under a minute

active until the task completion.

After each batch execution

Represented in a human-readable time format:

  • HH:MM:SS (e.g., 01H:23M:45S) for durations including hours

  • mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour

  • ss:SSS (e.g., 12s:456ms) for durations under a minute

Batch Tasks sub-metrics

Name

Measure

Scope

Updated

Description

Table Details

JSON Node representing Table Details such as Table Name and Batch Sequence Number

active until the task completion.

After each batch execution

A JSON node representing table details, including the table name

Task Sequence Number

Integer uniquely representing the current task’s number. 

active until the task completion.

After each batch execution

A monotonically increasing integer that identifies the execution order of tasks processed by the adapter instance; it increments with each new task submitted and helps track the sequence and lifecycle of tasks at the instance level

Task Type

String denoting the type of task current

active until the task completion.

After each batch execution

String denoting the type of the current task: MERGE, APPEND, PKUPDATE, or DDL

Thread Name

active until the task completion.

After each batch execution

Identifies the internal worker thread within the adapter instance that handles the current batch.

Time Elapsed

Represented in a human readable time format 

HH:MM:SS (e.g., 01H:23M:45S) for durations including hours

mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour

ss:SSS (e.g., 12s:456ms) for durations under a minute

active until the task completion.

After each batch execution

Represented in a human-readable time format:

  • HH:MM:SS (e.g., 01H:23M:45S) for durations including hours

  • mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour

  • ss:SSS (e.g., 12s:456ms) for durations under a minute

Granular Breakdown of the Current Batch Information metric in IcebergWriter

This metric offers a real-time snapshot of the tasks currently being executed by each IcebergWriter adapter instance. It is structured as a JSON array, where each element corresponds to a specific adapter instance and its active task context. Below is a detailed explanation of each sub-metric:

Adapter Instance ID

  • Description: A numeric identifier that uniquely represents an adapter instance.

  • Behavior:

    • Defaults to 0 when parallel execution is not enabled (i.e., single-threaded mode).

    • When parallel threads are enabled (e.g., in append-only mode), each instance is assigned a unique ID starting from 0 to N, based on the number of parallel threads configured.

Task Sequence Number

  • Description: A monotonically increasing integer that identifies the execution order of tasks handled by the adapter instance.

  • Behavior:

    • Incremented each time a new task is submitted by the adapter.

    • Helps in tracking the sequence and lifecycle of tasks per instance.

Table Details

  • Description: A JSON node containing metadata about the target table involved in the current task.

  • Contents:

    • For MetaFetch tasks: includes table name.

    • For Batch tasks: includes table name and the batch sequence number being processed.

Task Type

  • Description: A string indicating the type of task currently being executed by the compute engine.

  • Examples:

    • APPEND, DDL, MERGE, PKUPDATE, etc.

  • Utility:

    • Useful for categorizing the nature of operations and applying filters or alerts based on task type.

Thread Name

  • Description: Identifies the internal worker thread within the adapter instance that is handling the current task.

  • Behavior:

    • Relevant in parallel execution mode.

    • Helps in debugging concurrency and workload distribution across worker threads.

Time Elapsed

  • Description: Indicates how long the current task has been running.

  • Format: Human-readable formats based on duration length:

    • HH:MM:SS (e.g., 01H:23M:45S) for durations over an hour

    • mm:m:ss:SSS (e.g., 03m:45s:123ms) for durations under an hour

    • ss:SSS (e.g., 12s:456ms) for durations under a minute

  • Behavior:

    • Continuously updated while the task is in progress.

    • Resets once the task completes.

Troubleshooting using Iceberg Writer Current Batch Information

Troubleshooting 

The Current Batch Information metric provides visibility into the active task execution state for each IcebergWriter adapter instance. It helps detect issues such as long-running tasks, resource contention, and compute engine failures. Below are common problem scenarios and recommended actions.

1. Hung or Long-Running Tasks

Symptom:A task remains active in Current Batch Information for an unusually long duration. The Time Elapsed continues increasing without completion.

Diagnosis:

  • Check if Time Elapsed exceeds expected thresholds (e.g., >15 minutes).

  • Review Task Type to see if it’s a heavier operation like MERGE .

  • Use Thread Name to identify the specific thread executing the task.

Resolution:

  • Inspect compute engine logs (Dataproc, EMR) for memory or resource issues.

  • Consider increasing cluster capacity or optimizing the operation.

2. Adapter Appears Idle While a Task is Running

Symptom:No new output or progress is seen from the adapter, but Current Batch Information shows an active task.

Diagnosis:

  • Check if Task Sequence Number is progressing. If not, the task might be stuck.

  • Verify compute job status in the cluster dashboard.

  • Look into Table Details to identify which table or batch is involved.

Resolution:

  • Restart the adapter job if the compute task has silently failed, or configure auto-resume to avoid future occurrences.

3. Multiple Stuck Threads in Parallel Mode

Symptom: In parallel execution mode, several threads show high Time Elapsed values, with no visible pipeline progress.

Diagnosis:

  • Review all entries in Current Batch Information for stalled tasks.

  • Look at Task Type and Thread Name to isolate potential contention or deadlocks.

  • Check compute cluster metrics for CPU, memory, and thread saturation.

Resolution:

  • Investigate thread starvation or executor bottlenecks , and increase the compute resources .

  • Tune down the number of parallel threads if the environment can't handle the load.