OS-Based Failover Cluster

Top  Previous 

Since version 4.43, AggreGate Server features integrated failover clustering support for ensuring high availability services. However, clustering support in older versions of the server has depended to the operating system facilities.

note_further-wt

This appendix is reserved for compatibility. If your version of AggreGate is higher than v4.43, refer to integrated failover cluster support for building high availability cluster.

This appendix describes how to build a failover cluster using:

Linux-based servers
DRBD service for database mirroring
Linux Heartbeat service for managing nodes state

Introduction

This article explains how to set up an AggreGate Failover Cluster in order to achieve uninterrupted service.

Failover configuration features:

- Two physical servers running Linux operating system

- Reliable high-speed network connection between cluster nodes and with the outside world

- AggreGate AggreGate Server installed on both nodes

- MySQL Server installed on both nodes

- Shared disk space containing AggreGate Server configuration and database data

- Linux Heartbeat software used to manage cluster nodes (Linux Heartbeat is a part of High Availability Linux project)

Benefits of the described cluster configuration:

- Replication of disk data between nodes ensures resistance to HDD failures

- Automatic transfer of IP addresses causes transparent switching of Clients, Agents and third-party applications to the secondary node in the case of primary node failure

- Maximum service downtime does not exceed several minutes (less than one minute in most cases)

- Different custom rules and timeouts may be applied for AggreGate Server and database server failure detection

note_tip-wt

Techniques described in this article may help to set up an AggreGate failover cluster using other operating systems and database servers.

See Windows 2008 Server High Availability (http://www.microsoft.com/windowsserver2008/en/us/high-availability.aspx) for details on clustering Microsoft Windows.

Failover Cluster Configuration Outline

The image below shows the proposed schema for cluster configuration. There are two nodes, Node 1 (primary as shown on the image) and Node 2 (secondary), connected to each other via Gigabit Ethernet link.

ls_cluster

Disk Sharing

Both nodes have access to the common disk space that is shared using Distributed Replicated Block Device (DRBD) technology. The primary node has full access to this disk, while secondary node accesses it in read-only mode. This disk contains the following data:

AggreGate AggreGate Server installation folder, including server configuration file
MySQL installation folder and database data

During primary role transfer from one node to another, DRBD is switched to read-write mode for the new primary node, and to read-only mode for new secondary node (if it has not completely failed).

Database Failover

The described configuration uses MySQL as the database server for AggreGate, but other SQL databases may be configured in the same way. MySQL Server is installed in the shared disk space. It is running on the primary node, and stopped on the secondary node. During the role transfer, MySQL is launched on the new prinary node, and stopped on the new secondary node (if it has not completely failed).

MySQL server stores all data in the shared disk space.

AggreGate Server Failover

AggreGate AggreGate Server failover mechanism is very similar to the database failover: AggreGate Server is installed in the shared disk space, running on the primary node and stopped on the secondary one. During role change, AggreGate Server is stopped on the old primary node (if it has not completely failed) and launched on the new primary node.

Heartbeat

Heartbeat daemon is running on both the primary and secondary nodes and manages cluster nodes in real time. Two heartbeat daemons maintain a constant link with one another.

On the primary node, heartbeat daemons runs a Monitor daemon that constantly polls the state of MySQL and AggreGate server by monitoring the availability of certain TCP ports.

IP Address Management

Every node in the cluster has a dedicated fixed IP address. The primary node also has an alias IP. This IP is used by Clients, Agents and third-party applications to access the AggreGate Server. During role transfer, this alias IP address is removed from the old primary node's network interface, and added to the new primary node's network interface by the heartbeat daemon. This causes transparent switching of all Clients and Agents to the new primary node.

Role Transfer

Role transfer is the process of passing the primary cluster node role to the other node. It occurs in the following cases:

If the primary node loses network connectivity or has a hardware failure. In this case, the heartbeat daemon on the primary node stops responding, and the heartbeat daemon on the secondary node switches it to the primary state as soon as a certain timeout period elapses.
If MySQL of AggreGate Server on the primary node fails. In this case, the Monitor daemon on the primary node will detect port unavailability and report error status to the heartbeat daemon. The latter will switch the local node to the secondary state and instruct the heartbeat daemon on the other node to switch it to primary.

Node That Loses Primary Status

If the node that lost primary status did not fail completely, its heartbeat daemon does its best to switch it to secondary:

Shared DRBD device is switched to secondary read-only mode
MySQL is stopped
AggreGate Server is stopped
Monitor daemon is stopped
Alias IP address is removed from node's network interface

Node That Gets Primary Status

Heartbeat daemon on the node that becomes primary does the following during role transfer:

Shared DRBD device is switched to primary read-write mode
MySQL is launched
AggreGate Server is launched
Monitor daemon is launched
Alias IP address is added to the node's network interface

Failover Cluster Setup

This section illustrates how to set up AggreGate for failover on two machines running Linux Mandrake. While it includes tips on how to configure all necessary services, the exact commands and scripts will differ in a real environment.

1. Installing DRBD

Install and configure DRBD on both machines.

1a. Install DRBD using URPMI:

urpmi drbd-utils

1b. Create new partition for the DRBD shared disk in the free HDD space.

cfdisk /dev/sda

Partition type: Linux (0x83).

Partition size must match on both machines.

In our example, the name of the newly created partition is /dev/sda7.

1c. If you have a firewall, allow incoming connections to TCP/UDP port 7791.

1d. Reboot the machine.

1e. Create a DRBD configuration in /etc/drbd.conf. Here we assume that the IP of the first machine is 192.168.1.2, while the IP of second machine is 192.168.1.3. Configuration file is the same on both machines:

hmtoggle_plus1DRBD Configuration File (/etc/drbd.conf)

global {

    # You might disable one of drbdadm's sanity check.

    # disable-ip-verification;

 

    usage-count no;

}

 

common {

  syncer { rate 10M; }

}

 

resource r0 {

  protocol        C;

 

  startup { wfc-timeout        0; degr-wfc-timeout        120; }

  disk { on-io-error detach; }

  meta-disk        internal;

  syncer {

  }

  on l0 {

    device        /dev/drbd0;

    disk        /dev/sda7;

    address        192.168.1.2:7791;

  }

  on l1 {

    device        /dev/drbd0;

    disk        /dev/sda7;

    address        192.168.1.3:7791;

  }

1f. Start DRBD driver and create virtual disk:

service drbd restart

drbdadm create-md r0

1g. Check DRBD status on both machines:

cat /proc/drbd

It should be Secondary/Secondary and Inconsisted/Inconsisted on both machines for far.

1h. Now let's say that first server is primary:

Node 1: drbdadm -- --overwrite-data-of-peer primary r0

Node 2: drbdadm secondary r0

Now DRBD status on node 1 should be Primary/Secondary, UpToDate/Inconsisted

1i: Start synchronization process:

Node 1: drbdadm adjust all

1j: Give DRBD some time to synchronize data and check status again:

Node 1: cat /proc/drbd

Status should be Primary/Secondary, UpToDate/UpToDate. It means that DRBD disk is operable.

2. Creating shared disk

2a. Create file system on DRBD device:

Node 1: mke3fs /dev/drbd0

2b. Create mount point and add it to the list of partitions mounted on boot:

Both nodes: mkdir /mnt/drbd

2c. Mount shared filesystem on first node:

Node 1: mount -t ext3 /dev/drbd0 /mnt/drbd

3. Installing MySQL and AggreGate AggreGate Server

3a. Install packages for MySQL using URPMI utility:

urpmi mysql-max

3b. Move MySQL tables to a DRBD disk and create a symlink:

cp -r /var/lib/mysql /mnt/drbd/

mv /var/lib/mysql /var/lib/mysql.bak

ln -s /mnt/drbd/mysql /var/lib/mysql

3c. Start MySQL:

service mysqld-max start

3d. Add system user for running AggreGate Server:

useradd linkserv

3e. Perform AggreGate Server installation by running an installer and following its instructions. In our case, installation directory is /mnt/drbd/linkserver.

3f. Create new MySQL user account for AggreGate Server and configure AggreGate Server to use MySQL database.

3g. Copy linkserv script from AggreGate Server installation directory to /etc/rc.d/init.d.

3h. Test the installation by executing service linkserv start, waiting until the server starts and stopping it using service linkserv stop. Check server log file for errors.

3i. Create a symlink for AggreGate Server:

ln -s /mnt/drbd/linkserver /home/linkserv/linkserver

4. Install Heartbeat and Heartbeat-drbd on both nodes

4a. Install heartbeat:

Both nodes: urpmi heartbeat

Both nodes: urpmi heartbeat-drbd

4b. Switch off boot-time autostart for MySQL and AggreGate Server:

Both nodes: chkconfig mysqld-max off

Both nodes: chkconfig linkserv off

4c. Switch on autostart for heartbeat:

Both nodes: chkconfig heartbeat on

4d. Create heartbeat configuration in /etc/ha.d/ha.cf (file contents match on both nodes):

hmtoggle_plus1Heartbeat configuration (/etc/ha.d/ha.cf)

#        keepalive: how many seconds between heartbeats

keepalive 1

#

#        deadtime: seconds-to-declare-host-dead

deadtime 3

#

#        hopfudge maximum hop count minus number of nodes in config

hopfudge 1

#        

#        What interfaces to heartbeat over?

bcast        eth1

auto_failback on

#

#        File to wirte debug messages to

debugfile /var/log/ha-debug

#

#         File to write other messages to

#

logfile        /var/log/ha-log

#

#        Facility to use for syslog()/logger (alternative to log/debugfile)

#

logfacility        local0

#

#        Tell what machines are in the cluster

#        node        nodename ...        -- must match uname -n

node        l0

node        l1 

 

4e. Configure heartbeat resources:

hmtoggle_plus1Heartbeart resources (/etc/ha.d/haresources) on Node 1

l0 IPaddr2::192.168.1.100/24/eth1:0 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysqld-max linkserv mon

hmtoggle_plus1Heartbeart resources (/etc/ha.d/haresources) on Node 2

l1 IPaddr2::192.168.1.100/24/eth1:0 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysqld-max linkserv mon

Note, that 192.168.1.100 is the alias IP address that migrates between cluster nodes upon failover. This address will be the alias IP address of the primary node.

5. Monitoring service setup

5a. Install monitor service:

Both nodes: urpmi mon

5b. Switch off autorun for mon:

Both nodes: chkconfig mon off

5c. Create "service down" script:

Both nodes: echo -ne "#!/bin/bash\n/usr/lib/heartbeat/hb_standby all\n" > /etc/mon/ha-down.alert

And make it executable:

Both nodes: chmod 0755 /etc/mon/ha-down.alert

5d. Configure monitored services in /etc/mon/mon.cf on both nodes.

We need to watch for ports:

TCP port 3306 for MySQL
TCP ports 6450 (Device Servers), 6460 (Clients), 6480 (Agents), 8443 (Embedded Web Server) for AggreGate Server
hmtoggle_plus1Monitored services configuration (/etc/mon/mon.cf)

#

# Basic mon.cf file

#

# Global options

#

cfbasedir   = /etc/mon 

pidfile     = /var/run/mon.pid

statedir    = /var/lib/mon/state.d

logdir      = /var/lib/mon/log.d

dtlogfile   = /var/lib/mon/log.d/downtime.log

#alertdir    = /usr/lib/mon/alert.d

alertdir    = /etc/mon/

mondir      = /usr/lib/mon/mon.d

maxprocs    = 20

histlength  = 100

randstart   = 60s

authtype    = pam

userfile    = /etc/mon/userfile

 

#

# Group definitions (hostnames or IP addresses)

#

hostgroup servers localhost

 

watch servers

   service agg0

        interval 30s

        monitor tcp.monitor -p 6450 localhost

        period wd {Sun-Sat} 

     alert ha-down.alert

     alertafter 2

   service agg1

        interval 30s

        monitor tcp.monitor -p 6460 localhost

        period wd {Sun-Sat} 

     alert ha-down.alert

     alertafter 2

   service agg2

        interval 30s

        monitor tcp.monitor -p 6480 localhost

        period wd {Sun-Sat} 

     alert ha-down.alert

     alertafter 2

   service agg3

        interval 30s

        monitor tcp.monitor -p 8443 localhost

        period wd {Sun-Sat} 

     alert ha-down.alert

     alertafter 2

   service mysql

        interval 10s

        monitor msql-mysql.monitor --mode mysql --username=linkserv --password=pass --database=linkserv localhost

        period wd {Sun-Sat} 

     alert ha-down.alert

     alertafter 2

 

note_tip-wt

It is possible to configure Monitor daemon to send E-mail or SMS notifications upon node failure. See its documentation for details.

5e. Start Heartbeat:

Both nodes: service heartbeat start

If your setup is correct, AggreGate cluster should be operative in several minutes.

Resolving a Split-Brain

Split-brain is a situation whereby cluster nodes are disconnected from each other for a significant time, e.g. due to some network problem.

In this case, DRBD devices may lose sync.

Here is an example for commands that can help synchronize DRBD manually following a split-brain situation.

We assume that we should use data from node 1 and discard data from node 2:

Node 2: drbdadm secondary all

Node 2: drbdadm disconnect all

Node 2: drbdadm -- --discard-my-data connect all

Node 1: drbdadm connect all