Striim 3.10.1 documentation

Monitoring end-to-end lag (LEE)

End-to-end lag ("LEE") is the time it takes from the origin of an event in an external source to its final delivery by Striim to an external target.

A low and stable LEE is desirable. A high but stable LEE indicates either that Striim is reading stale data from the source or a window or some other component in the Striim application is holding data for a significant time. A continuously increasing LEE indicates that events are being received from the source faster than the target can handle them, which might indicate that you should consider Creating multiple writer instances or reducing network bottlenecks between Striim and the external target.

To see which sources and targets support LEE in this release, see Readers overview and Writers overview.

Note: For accurate LEE calculation with SQL Server sources, the Fetch Transaction Metadata property must be set to True (see MS SQL Reader properties).

Viewing LEE

• In the Web UI: Go to the Monitor page. Under App Overview, click the application containing the target, then click Targets > End to End Lag to display a line chart of LEE in milliseconds over time. If the target has multiple sources, you will see a chart for each.

• In the console: REPORT LEE; will return the LEE for each source-target combination in the cluster, the time each was measured, and the start time type (see discussion of "Start time types" below).

+----------------------------------------------------------+----------------------------------------------------+----------------------------------+----------------------------------+----------------+
|                          Source                          |                       Target                       |  Lag End-to-End (LEE) (seconds)  |           Measured At            |  Source Time   |
+----------------------------------------------------------+----------------------------------------------------+----------------------------------+----------------------------------+----------------+
|              ns2.PGCDC1 (PostgreSQLReader)               |              ns2.testOut (FileWriter)              |              0.056               |  2020-06-27 02:17:34.844 PM PDT  |      Idle      |
|   SamplesDB2Kafka.ReadPostgresTables (DatabaseReader)    |  SamplesDB2Kafka.WriteToKafkaTopic (KafkaWriter)   |              0.089               |  2020-06-27 11:51:27.740 AM PDT  |      Idle      |
|  SamplesDB2File.ReadPostgresTablesFile (DatabaseReader)  |      SamplesDB2File.WriteToFile (FileWriter)       |             189.689              |  2020-06-27 11:51:37.547 AM PDT  |    Observed    |
|     SamplesDB.ReadPostgresTablesDB (DatabaseReader)      |  SamplesDB.WriteToPostgresTable (DatabaseWriter)   |              1.656               |  2020-06-27 02:26:47.845 PM PDT  |    Observed    |
|              ns1.CsvDataSource (FileReader)              |           ns1.PosAppFileOut (FileWriter)           |              0.319               |  2020-06-27 11:18:06.294 AM PDT  |    Observed    |
|              ns2.PGCDC2 (PostgreSQLReader)               |              ns2.testOut (FileWriter)              |              0.032               |  2020-06-27 02:17:44.866 PM PDT  |      Idle      |
+----------------------------------------------------------+----------------------------------------------------+----------------------------------+----------------------------------+----------------+

The following commands return various subsets of that information (you may omit the namespace if the specified objects are in the current namespace):

• REPORT LEE <namespace.source name> <namespace.target name>; - LEE for the specified source-target combination

• REPORT LEE <namespace.source name> *; - LEEs for all targets of the specified source

• REPORT LEE * <namespace.target name>; - LEEs for all sources of the specified target

• REPORT LEE APPLICATION <namespace.application name>; - LEEs for all targets in the specified application (sources may be in other applications)

• Using the system health REST API (Monitoring using the system health REST API): Each target element of the health map includes the latest LEE for each source in the following format:

"lagEnd2End": {
"data": [
{
"source": "ns1.source1",
"target": "ns1.target1",
"lee": 0.014,
"at": 1589415119444,
"type": "Observed"
}
]
• Using JMX (see Monitoring using JMX): The MBeans will include the information from the system health object.

How LEE is calculated

Striim calculates end-to-end lag by subtracting an event's the start time from its end time. This figure includes:

• the time it takes Striim to acquire the event data from the external source

• within Striim, any time the event spends in buffers and queue;, being enriched, joined, aggregated, filtered, or otherwise processed; and any time required for communication between nodes in a cluster

• the time it takes Striim to deliver the event to the target

LEE is calculated separately for each source-target combination.

If the clocks of the source server, Striim server, and target server are not synchronized, the lag will not be calculated correctly.

When there are multiple paths between a source and a target, events between the source and target are distributed among multiple Striim servers in a multi-node cluster, the most recent LEE will be displayed. In this situation, the LEE graph in the web UI may be jagged or noisy.

Start time types: Striim uses one of the following as the start time (these types appear in the MON and system health object reports):

• Attribute: a timestamp from an attribute of the event (for instance, a JSON timestamp field)

• Commit: a database commit timestamp

• Idle: the time Striim inserted a mock event into the pipeline. Striim uses these to track LEE when no events are being received from the external source because it is inactive. Idle events are used only to measure LEE and are discarded as they arrive at the writer.

• Ingestion: a Kafka broker ingestion timestamp

• Observed: the time the event was received by the reader (used when no timestamp is available from the external source)

• Operation: the timestamp of an individual operation in a database transaction

End time: Striim uses the time it receives an acknowledgement from the external target as the end time.

Configuring LEE

Two properties may be configured in startUp.properties / agent.conf:

LeeRate / striim.node.LeeRate: By default, LEE is computed for every source event. Set this to integer n to compute LEE every n events. For example, LeeRate=10 will calculate LEE every ten source events.

LeeTimeout / striim.node.LeeTimeout: The length of time an external source can be inactive before Striim inserts Idle events (see discussion of "Start time types" above.) The default is 10 seconds.