AggreGate Server Failover

Top  Previous  Next

AggreGate Server failover cluster includes:

Single Master Server
One or more Failover Servers
Clustered or replicated database shared between all servers

The failover servers are activated when the master server fails, e.g. due to:

Network outage
Hardware failure
Operating System crash
AggreGate Server failure
Shortage of disk space
Any other reason

Failover Server Modes

Failover servers may work in Normal Mode or Read-only Mode. The difference between these modes is explained below. The mode of failover server is controlled by Failover Mode global cluster setting.

Normal Failover Mode

In Normal Failover Mode, failover server take full control control or AggreGate cluster upon Master Server failure. It controls and monitors the devices, services operators connections, etc. All configuration changes and events are stored in the database and will be available for the Master server once it becomes operative again.

Read-Only Failover Mode

In Read-Only Failover Mode, failover server does not perform any change to the underlying database. Its behaviour appears to be similar to the Normal Failover server at first glance: devices are being controlled and their configuration settings may be changed by operators, actions can be executed, and all system functions are available. However, no configuration changes and events are stored in the database. This causes several limitations:

Historical events received by Read-Only Failover Server will not be available, e.g. when browsing event history of building charts
All configuration changes will be lost if Master Server is re-activated or Failover Server is restarted

note_tip-wt

Read-only Failover Nodes are very useful for quickly restoring cluster reliability in case of permanent Master node failure:

Make a Normal Failover node new Master node
Make a Read-only Failover node new Normal Failover node by editing its Failover Mode global cluster setting

The above operations will take mere minutes, however cluster reliability will be preserved for the case of new Master node failure. It is now possible to set up a new Read-only Failover node without any rush.

note_warning-wt

Only one Normal Failover node is allowed within a high availability cluster. Other failover nodes must work in Read-Only Mode.

Failover Scenarios

This section describes several common Failover Cluster configurations. Note that database cluster is shown as a "cloud" on the below images. In practice, databases participating the database cluster will run on the same physical servers with AggreGate Server installations.

Two Nodes

The most common failover cluster configuration includes two servers: Master Server and Normal Failover Server. Once the Master fails, the Failover switches to Failover Master mode, taking over Master's operations.

ls_failover_two_nodes

Three Nodes

The three nodes failover cluster helps to maintain system reliability even when the Master server has failed.

ls_failover_three_nodes

If the Master server fails, three nodes cluster will work similar to two nodes cluster. This allows to protect from Failover Master's failure and gives system administrators spare time to restore three nodes operation.

ls_failover_three_nodes_failed

Failover Mode Operation

If a Master node fails, it stops performing regular database updates called "heart beats". The absence of these updates is notified by the Failover nodes. If no Master heart beats occur for longer than a Node Failure Detection Time, the failover nodes are activated and start servicing normal system activities, such as device control operations and operator actions.

The service interruption interval equals to the sum of Node Failure Detection Time and failover node activation time. This gap is typically less than a minute.

Disconnection of Failover Nodes

If a failover node is disconnected from the cluster, e.g. for an update, cluster operation continues without any change. However, the Master server constantly monitors the heart bean of Failover nodes. If no failover nodes seem to be alive for longer than the Node Failure Detection Time, Master server will fire a warning event in the Administration context.

Failover Alert

Once a Master Server of AggreGate failover cluster fails, the Failover node raises a Failover Alert. This helps to quickly notify system administrators of the situation. By default, an e-mail message is sent to the administrators. It is however recommended to configure an SMS message to be sent in case of Failover alert. See Alert SMS Notifications for details.

Making Failover Server a Master

In some rare cases the Master server may be completely lost in a severe accident, e.g. due to a major hardware failure. In this case it's necessary to make one of the Failover nodes the Master node.

To switch a Failover node to the Master mode:

Set up a new Failover node to preserve total number of nodes in the cluster
Change Cluster Role global configuration setting of the switched node to the Master
Restart the new Master node

Configuring Client for Failover

To prepare AggreGate Client to work in the clustered environment:

Create two or more server connections in your workspace: first one for the Master Server and others for the Failover servers. Specify addresses or Master and Failover servers in the connection settings.
Disable connections to the Failover servers to suppress startup connection errors. This is necessary since Failover nodes won't accept AggreGate Client connections while working in standby mode.

ls_failover_in_client

Once the Master node fails (and you'll get connection errors), just enable one of the failover connections

Configuring Web Desktop for Failover

Once the Master node of the high availability cluster fails, the system operators won't be longer able to log in to the Web Desktop since the IP address an host name of Failover node differs from the address of failed Master node. There are two resolutions for this issue:

All operators may manually navigate to the URL of the Failover node. In practice, this URL may be bookmarked in their browsers for emergency cases.
It is possible to set up automatic DNS redirection. Search the Internet for "DNS failover" to find available solutions. Here is just one useful link: http://www.simplefailover.com/scenario3.aspx