Adapting TQL applications for multi-server deployment
Multi-server deployment is available only in Striim Cloud Mission Critical and Striim Platform. It is not available in Striim Cloud Enterprise.
First, implement your application for a single server and verify that it is working as expected.
Multi-server deployment for fault tolerance
If you are deploying to multiple servers solely for fault tolerance, no changes to the application are required. See Continuing operation after server failover.
Deploying an application on multiple servers
To run an application on more than one server, you must deploy it ON ALL. See for DEPLOY APPLICATION more information.
Using PARTITION BY
The key to distributing an application across multiple servers in a Striim cluster is the PARTITION BY
option when creating a stream. For example:
CREATE CQ CsvToPosData INSERT INTO PosDataStream PARTITION BY merchantId SELECT TO_STRING(data[1]) as merchantId, TO_DATEF(data[4],'yyyyMMddHHmmss') as dateTime, DHOURS(TO_DATEF(data[4],'yyyyMMddHHmmss')) as hourValue, TO_DOUBLE(data[7]) as amount, TO_STRING(data[9]) as zip FROM CsvStream;
With this PARTITION BY
clause, when the output from PosDataStream
is distributed among multiple servers, all of the events for each merchantId
value go to the same server. This is essential for aggregating and performing calculations on the data. (Note that the PARTITION BY
clause in CREATE WINDOW statements has no effect on how events are distributed among servers.)
You can also partition by expression. For example, you could partition the output stream of an OracleReader source using PARTITION BY meta(OraData, 'TableName')
in order to distribute events among servers based on the source table names.
Sources in a multi-server cluster
General rules for sources in a multi-server environment:
Put the source and the CQ that parses its data (see Parsing the data field of WAEvent) in a flow by themselves.
Partition the CQ's output stream as discussed above.
If the source ingests data from a remote host, deploy it to a single-server deployment group, or to one server of a two-server deployment group (to support failover as discussed in Continuing operation after server failover). The source cannot be distributed across multiple servers since Striim has no way of distributing the incoming data among servers until it is put into a partitioned stream. Note that a server can belong to more than one deployment group, so if the source is not resource-intensive, it might share a server with other flows in another deployment group.
If the source ingests local data on a remote host, it may be deployed to a deployment group containing multiple Forwarding Agents (see Using the Striim Forwarding Agent).
Which field the output stream should be partitioned by depends on which events need to end up on the same server for aggregation or calculations. If there is no such requirement, pick a field that should distribute events evenly, such as a timestamp.
Windows in a multi-server cluster
When a window is running on multiple servers, events are distributed among the servers based on the PARTITION BY
field of its OVER
stream. Put a window in the same flow as the CQ that consumes its data.
Caches in a multi-server cluster
Put a cache in the same flow as the CQ that consumes its data.
A cache's data is distributed among the servers of its distribution group based on the value of the keytomap
field. For best performance, this should match the PARTITION BY
field used to distribute streams containing the data with which the cache data will be joined. In other words, partition the CQ's input stream on the cache key, and distribution of the cache will match that of the CQ. This will be much faster than doing a cache lookup on another server.
For example, if you modified the PosApp sample application for use in a multi-server environment:
CREATE CQ CsvToPosData INSERT INTO PosDataStream PARTITION BY merchantId ... CREATE JUMPING WINDOW PosData5Minutes OVER PosDataStream ... CREATE CACHE HourlyAveLookup ... (keytomap:'merchantId') ... CREATE STREAM MerchantTxRateOnlyStream ... PARTITION BY merchantId; ... CREATE CQ GenerateMerchantTxRateOnly ... FROM PosData5Minutes p, HourlyAveLookup l WHERE p.merchantId = l.merchantId ...
Since merchantId
is the PARTITION BY
field for PosDataStream
and the cache key for HourlyAveLookup
, the events for the join performed by GenerateMerchantTxRateOnly
will be on the same servers.
When a server is added to a cluster, cache data is redistributed automatically.
The replicas
property controls how many duplicate copies of each cache entry are maintained in the deployment group:
The default value of
replicas
is1
. This means the deployment group contains only only a single copy of each entry, on the server to which it is distributed based on itskeytomap
field value.Setting
replicas
to2
enables fault tolerance. In the event one server goes down, all cache data is still available from other servers.Setting
replicas
toall
creates a full copy of each cache on each server. If joins are not always on thekeytomap
field, this may improve performance. Note that with large caches this may significantly increase memory requirements, potentially degrading performance.
CQs in a multi-server cluster
When a CQ is running on multiple servers, events are distributed among the servers based on the PARTITION BY
field of the stream referenced by its FROM
clause. General rules for CQs:
put any cache(s) or window(s) referenced by the FROM clause in the same flow as the CQ
if the CQ's FROM clause references a cache, partition the CQ's input stream by the same key used for cache lookups
put the CQ's INSERT INTO (output) stream, if any, in the same flow as the CQ
partition the INSERT INTO stream if it is consumed by a CQ or target that is deployed to more than one server
WActionStores in a multi-server cluster
There are no special programming issues when a WActionStore is running on multiple servers. It may be in any flow or its own flow.
Note that if a WActionStore runs out of memory it will automatically drop the oldest WActions. Putting a WActionStore in its own flow and deploying that flow to servers not used by other flows will give you maximum control over how much memory is reserved for the WActionStore.
Targets in a multi-server cluster
When a target is running on multiple servers, events are distributed among the servers based on the PARTITION BY
field of its INPUT FROM
stream.
Using multiple flows
You may create multiple flows within the application to control which components are deployed to which (and how many) servers in the cluster. See Managing deployment groups for more information.