|
OS-Based Failover Cluster |
Top Previous |
|
Since version 4.43, AggreGate Server features integrated failover clustering support for ensuring high availability services. However, clustering support in older versions of the server has depended to the operating system facilities.
This appendix describes how to build a failover cluster using:
Introduction This article explains how to set up an AggreGate Failover Cluster in order to achieve uninterrupted service. Failover configuration features: - Two physical servers running Linux operating system - Reliable high-speed network connection between cluster nodes and with the outside world - AggreGate AggreGate Server installed on both nodes - MySQL Server installed on both nodes - Shared disk space containing AggreGate Server configuration and database data - Linux Heartbeat software used to manage cluster nodes (Linux Heartbeat is a part of High Availability Linux project) Benefits of the described cluster configuration: - Replication of disk data between nodes ensures resistance to HDD failures - Automatic transfer of IP addresses causes transparent switching of Clients, Agents and third-party applications to the secondary node in the case of primary node failure - Maximum service downtime does not exceed several minutes (less than one minute in most cases) - Different custom rules and timeouts may be applied for AggreGate Server and database server failure detection
Failover Cluster Configuration Outline The image below shows the proposed schema for cluster configuration. There are two nodes, Node 1 (primary as shown on the image) and Node 2 (secondary), connected to each other via Gigabit Ethernet link.
Disk Sharing Both nodes have access to the common disk space that is shared using Distributed Replicated Block Device (DRBD) technology. The primary node has full access to this disk, while secondary node accesses it in read-only mode. This disk contains the following data:
During primary role transfer from one node to another, DRBD is switched to read-write mode for the new primary node, and to read-only mode for new secondary node (if it has not completely failed). Database Failover The described configuration uses MySQL as the database server for AggreGate, but other SQL databases may be configured in the same way. MySQL Server is installed in the shared disk space. It is running on the primary node, and stopped on the secondary node. During the role transfer, MySQL is launched on the new prinary node, and stopped on the new secondary node (if it has not completely failed). MySQL server stores all data in the shared disk space. AggreGate Server Failover AggreGate AggreGate Server failover mechanism is very similar to the database failover: AggreGate Server is installed in the shared disk space, running on the primary node and stopped on the secondary one. During role change, AggreGate Server is stopped on the old primary node (if it has not completely failed) and launched on the new primary node. Heartbeat Heartbeat daemon is running on both the primary and secondary nodes and manages cluster nodes in real time. Two heartbeat daemons maintain a constant link with one another. On the primary node, heartbeat daemons runs a Monitor daemon that constantly polls the state of MySQL and AggreGate server by monitoring the availability of certain TCP ports. IP Address Management Every node in the cluster has a dedicated fixed IP address. The primary node also has an alias IP. This IP is used by Clients, Agents and third-party applications to access the AggreGate Server. During role transfer, this alias IP address is removed from the old primary node's network interface, and added to the new primary node's network interface by the heartbeat daemon. This causes transparent switching of all Clients and Agents to the new primary node. Role Transfer Role transfer is the process of passing the primary cluster node role to the other node. It occurs in the following cases:
Node That Loses Primary Status If the node that lost primary status did not fail completely, its heartbeat daemon does its best to switch it to secondary:
Node That Gets Primary Status Heartbeat daemon on the node that becomes primary does the following during role transfer:
Failover Cluster Setup This section illustrates how to set up AggreGate for failover on two machines running Linux Mandrake. While it includes tips on how to configure all necessary services, the exact commands and scripts will differ in a real environment. 1. Installing DRBD Install and configure DRBD on both machines. 1a. Install DRBD using URPMI: urpmi drbd-utils 1b. Create new partition for the DRBD shared disk in the free HDD space. cfdisk /dev/sda Partition type: Linux (0x83). Partition size must match on both machines. In our example, the name of the newly created partition is /dev/sda7. 1c. If you have a firewall, allow incoming connections to TCP/UDP port 7791. 1d. Reboot the machine. 1e. Create a DRBD configuration in /etc/drbd.conf. Here we assume that the IP of the first machine is 192.168.1.2, while the IP of second machine is 192.168.1.3. Configuration file is the same on both machines:
1f. Start DRBD driver and create virtual disk: service drbd restart drbdadm create-md r0 1g. Check DRBD status on both machines: cat /proc/drbd It should be Secondary/Secondary and Inconsisted/Inconsisted on both machines for far. 1h. Now let's say that first server is primary: Node 1: drbdadm -- --overwrite-data-of-peer primary r0 Node 2: drbdadm secondary r0 Now DRBD status on node 1 should be Primary/Secondary, UpToDate/Inconsisted 1i: Start synchronization process: Node 1: drbdadm adjust all 1j: Give DRBD some time to synchronize data and check status again: Node 1: cat /proc/drbd Status should be Primary/Secondary, UpToDate/UpToDate. It means that DRBD disk is operable. 2. Creating shared disk 2a. Create file system on DRBD device: Node 1: mke3fs /dev/drbd0 2b. Create mount point and add it to the list of partitions mounted on boot: Both nodes: mkdir /mnt/drbd 2c. Mount shared filesystem on first node: Node 1: mount -t ext3 /dev/drbd0 /mnt/drbd 3. Installing MySQL and AggreGate AggreGate Server 3a. Install packages for MySQL using URPMI utility: urpmi mysql-max 3b. Move MySQL tables to a DRBD disk and create a symlink: cp -r /var/lib/mysql /mnt/drbd/ mv /var/lib/mysql /var/lib/mysql.bak ln -s /mnt/drbd/mysql /var/lib/mysql 3c. Start MySQL: service mysqld-max start 3d. Add system user for running AggreGate Server: useradd linkserv 3e. Perform AggreGate Server installation by running an installer and following its instructions. In our case, installation directory is /mnt/drbd/linkserver. 3f. Create new MySQL user account for AggreGate Server and configure AggreGate Server to use MySQL database. 3g. Copy linkserv script from AggreGate Server installation directory to /etc/rc.d/init.d. 3h. Test the installation by executing service linkserv start, waiting until the server starts and stopping it using service linkserv stop. Check server log file for errors. 3i. Create a symlink for AggreGate Server: ln -s /mnt/drbd/linkserver /home/linkserv/linkserver 4. Install Heartbeat and Heartbeat-drbd on both nodes 4a. Install heartbeat: Both nodes: urpmi heartbeat Both nodes: urpmi heartbeat-drbd 4b. Switch off boot-time autostart for MySQL and AggreGate Server: Both nodes: chkconfig mysqld-max off Both nodes: chkconfig linkserv off 4c. Switch on autostart for heartbeat: Both nodes: chkconfig heartbeat on 4d. Create heartbeat configuration in /etc/ha.d/ha.cf (file contents match on both nodes):
4e. Configure heartbeat resources:
Note, that 192.168.1.100 is the alias IP address that migrates between cluster nodes upon failover. This address will be the alias IP address of the primary node. 5. Monitoring service setup 5a. Install monitor service: Both nodes: urpmi mon 5b. Switch off autorun for mon: Both nodes: chkconfig mon off 5c. Create "service down" script: Both nodes: echo -ne "#!/bin/bash\n/usr/lib/heartbeat/hb_standby all\n" > /etc/mon/ha-down.alert And make it executable: Both nodes: chmod 0755 /etc/mon/ha-down.alert 5d. Configure monitored services in /etc/mon/mon.cf on both nodes. We need to watch for ports:
5e. Start Heartbeat: Both nodes: service heartbeat start If your setup is correct, AggreGate cluster should be operative in several minutes. Resolving a Split-Brain Split-brain is a situation whereby cluster nodes are disconnected from each other for a significant time, e.g. due to some network problem. In this case, DRBD devices may lose sync. Here is an example for commands that can help synchronize DRBD manually following a split-brain situation. We assume that we should use data from node 1 and discard data from node 2: Node 2: drbdadm secondary all Node 2: drbdadm disconnect all Node 2: drbdadm -- --discard-my-data connect all Node 1: drbdadm connect all
|