GCS Reader runtime considerations
GCS Reader monitoring metrics
The following monitoring metrics are published by the GCS Reader:
Metric | Description |
---|---|
| The name of the cloud object whose metadata was recently fetched from the cloud. Frequency: every cloud object in a batch. For example:
|
| The name of the actual object whose metadata was recently fetched from the cloud, along with its path in the GCS container. Frequency: every cloud object in a batch. For example:
|
CLOUD_OBJECT_LAST_BATCH_COUNT | The number of cloud objects whose metadata were captured in the latest fetch cycle. Frequency: every batch fetched. Units: count (Long). For example:
|
EXTERNAL_IO_LATENCY | The latency involved in capturing the cloud metadata in the latest fetch cycle. Frequency: every batch fetched. Units: milliseconds (Long). For example:
|
CLOUD_OBJECT_STATS | The following metrics related to the cloud objects are captured by GCS Reader under Cloud objects statistics.
For example: { “Count of Objects metadata fetched”: 1, “Downloaded count”: 1, “Processed count”: 0, “Missing count”: 0, “Total objects size in MB”: 0.002, “Total downloaded size in MB”: 0.002, “Current Disk Utilization in MB”: 0.002 } |
Performance optimizations
Object fetching mode: With the streaming approach (Use Streaming property) performance is expected to be faster as the bytes are streamed directly instead of requiring additional download steps. Local testing shows for sample data of 411 DSV files of varying size with 1M events in total, the download approach took 162 seconds vs 55 seconds by the streaming approach.
Object detection mode: The GCSAuditLogNotification object detection mode provides better performance during app recovery after a crash/stop when a bucket contains a huge number (in the order of millions) of objects. This is because the reader does not need to fetch the full metadata to locate the check-pointed object.
Limitations
The following limitations apply to the GCS Reader:
The GCS Reader can read Avro files with an embedded schema, but not with a separate Avro schema file.
The GCS Reader adapter's download mode is not supported on Windows OS.
If the object name is bigger than what current OS filename length supports, then you should enable the
Use Streaming
option to avoid exceptions from downloading a filename larger than what the OS supports.If a bucket contains a huge number of objects, the reader may consume a high level of memory and CPU to fetch and process the metadata. This applies to both the GCSDirectoryListing and GCSAuditLogNotification modes.
For the
GCSDirectoryListing
mode, a full metadata fetch happens when the adapter starts and for every subsequent polling fetch.For the
GCSAuditLogNotification
mode, a full metadata fetch happens when the adapter starts, and subsequent polling calls fetch only the incremental changes from the audit log.In
GCSDirectoryListing
mode, if the bucket contains a huge number (in the order of millions) of objects, app recovery after crash/stop will take a considerable time since the full metadata has to be fetched to locate the checkpointed object. You are recommended to use theGCSAuditLogNotification
mode for better performance.In the
GCSAuditLogNotification
mode, the Google cloud provider has a set default limit (60) on the number of requests per min on reading the audit log. If you are running multiple apps then you should set the polling interval based on the number of apps you are running and the audit log read limit.A time offset of 5 minutes is applied to queries to avoid a conflict during high volume data loading. To modify the 5 minutes default, contact Striim support.
Troubleshooting
This topic describes errors you may see when using the GCS Reader, and possible resolutions.
Exception | Resolutions |
---|---|
GoogleCloudBucketNotFoundException | Check if the specified bucket is present. When using Private Service Connect, verify:
|
GoogleCloudCredentialsException |
|
GoogleCloudLocalFileSystemException |
|
CloudStorageConnectionException |
|
GoogleCloudPermissionException | Ensure that the user has the following permissions on the bucket/audit log:
|