Skip to main content

Automatic load rebalancing

This feature is available only in Striim Cloud Mission Critical and Striim Platform. It is not available in Striim Cloud Enterprise.

As discussed in Continuing operation after server failover, when one server in n a multi-server cluster goes offline, Striim automatically redistributes applications deployed ON ANY among the remaining servers.

Automatic load rebalancing, which is off by default, will redistribute those applications again when the offline server comes back up. It will also redistribute applications when you add a new server to the cluster.

Note

Only applications deployed ON ANY are rebalanced. Those deployed ON ALL are already running on all servers.

Only applications in the RUNNING state are rebalanced.

By default:

  • After rebalancing has moved an application to a different server, it will not be moved again in the next four hours.

  • An application will not be rebalanced if its recovery checkpoint is more than 30 minutes old.

These defaults may be changed using the SET CLUSTER REBALANCE CONFIG ...; command.

Use the following commands to configure rebalancing:

SHOW CLUSTER REBALANCE STATUS;

Returns the current rebalancing settings. For example:

Cluster rebalance status: On
Universal configurations:
	checkpointAge: 30m
	bounceProtectionInterval: 4h
Rebalance policy: applicationCount
SET CLUSTER REBALANCE ON POLICY applicationCount;

Striim will attempt to put the run same number of applications on each server. This is appropriate if all your applications require roughly similar resources.

SET CLUSTER REBALANCE ON POLICY cpuUsage;
  [ policyConfig (
    overloadThreshold: ‘<integer>’,
    eligibilityThreshold: ‘<integer>’,
    cpuAverageTime: ‘<interval>’,
    cpuTolerance: ‘<integer>’) ];

Striim will reallocate the applications based on their CPU usage history. For example, in a two-server cluster with applications A, B, and C, if application A has had the heaviest CPU usage history, then it will run on one server and B and C on the other.

Optionally, you may set one or more policyConfig option to fine-tune this behavior.

  • overloadThreshold '<integer>': a maximum average CPU usage percentage above which no applications will be moved to a server. By default this is 90.

  • eligibilityThreshold a minimum average CPU usage percentage below which no applications will be moved from the server. By default this is 30.

  • cpuAverageTime <interval>: the period over which CPU usage is averaged for overloadThreshold and eligibilityThreshold options. By default this is 5m (five minutes). Instead of m you may use s (seconds), h (hours), or d (days).

  • cpuTolerance <integer> a number to added to or subtracted from CPU usage when determining whether to move an application. For example, with the default value of 10, assuming the average per-core CPU usage for all servers in the cluster is 60%, an application may be moved from a server only if (1) the server's CPU usage before moving the application is above 70% and (2) after moving the application, the servers usage will remain above 50%. This tolerance is intended to prevent automatic load balancing from repeatedly moving an application back and forth between servers.

You may specify as many of these as you wish. The rest will have their default values.

For example, SET CLUSTER REBALANCE ON POLICY cpuUsage policyConfig (overloadThreshold: ‘80’, eligibilityThreshold: ‘40’, cpuAverageTime: '1h') will stop Striim from rebalancing applications to servers that have averaged over 80% CPU usage, or from servers that have averaged under 40% CPU usage, in the past hour.

SET CLUSTER REBALANCE SCHEDULE ON START "<time>" END "<time>";
SET CLUSTER REBALANCE SCHEDULE OFF.

By default, rebalancing may be performed at any time. Use SCHEDULE ON to limit it to a certain window. You may specify times with a 24-hour clock or with am and pm. For example, SET CLUSTER REBALANCE SCHEDULE ON START "02:00" END "05:00" will allow rebalancing to start any time between 2:00 am and 5:00 am, and SET CLUSTER REBALANCE SCHEDULE ON START "11:30pm" END "4am" will allow rebalancing to start any time between 11:30pm and 4:00 am. Rebalancing may not complete until after the end time.

When SCHEDULE is on, if you add a server outside of this window it will not trigger rebalancing. Applications will not be rebalanced to the new server the next time rebalancing is triggered by the policy.

SET CLUSTER REBALANCE CONFIG
  (checkpointAge: '<interval>',
  bounceProtectionInterval: 'interval');
  • checkpointAge: Striim will rebalance only those applications whose latest recovery checkpoints are no older than this. By default this is 30m (30 minutes).

  • bounceProtectionInterval: the minimum amount of time that must pass after rebalancing moves an application to a different server before it can be moved again. By default this is 4h (four hours).

SET CLUSTER REBALANCE APPLICATIONS EXCLUDE <namespace.application>,...;
SET CLUSTER REBALANCE APPLICATIONS INCLUDE <namespace.application>,...;

By default, all applications deployed ON ANY may be rebalanced.

Use the EXCLUDE command to prevent selected applications (for example, those that constantly have long-running transactions that will prevent their being stopped, moved, and restarted) from being rebalanced.

Use the INCLUDE command to enable rebalancing for previously excluded applications.

SET CLUSTER REBALANCE OFF;

Turns rebalancing off.