RHEL Cluster

: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /var/www/virtual/rlogix/includes/unicode.inc on line 311.

Once RH Cluster is installed, you'll find the tools in /usr/sbin. Here's the breakdown of availble tools:

* RH Cluster Administration Tools /usr/sbin/clustat: Display the current status of the cluster. (Sun Cluster equiv: scstat)
* /usr/sbin/clusvcadm: Change the state of the services, such as enabling, disabling, relocating, etc. (Sun Cluster equiv: scswitch)
* /usr/sbin/system-config-cluster: A Cluster Configuration GUI. It simplifies the creation of a cluster.conf as well as acting as a GUI management interface.

There are two things to note if your new to Red Hat Cluster. Firstly, you need to use a Fence Device (you can go without one but its highly frowned upon and unsupported by many vendors). Secondly, you do not require shared storage. The device typically used as a Fence Device is an APC MasterSwitch. In the event that a node is unresponsive a surviving node can (don't laugh) power cycle its partner. This method is also apparently used in some failover situations to ensure that the node wasn't doing anything it shouldn't be doing prior to failover. In other clusters, a quorum device is typically needed, but not in RH Cluster (new in version 4 apparently), which means that you don't require shared storage for cluster operation, which can be a benefit if you don't actually need to store anything on shared storage.

The cluster configuration is stored in a single file as XML: /etc/cluster/cluster.conf. You can configure a new cluster by either creating the cluster.conf by hand, using a pre-existing one, or using the /usr/sbin/system-config-cluster GUI tool. Using the GUI is, of course, the supported method.

Cluster configuration consists of the following componants:

* Cluster Nodes: Nodes that are members of the cluster, also specified here is the number of votes that node has and which fence device port that controls that node.
* Fence Devices: One or more Fence devices, such as an APC MasterSwitch, including the IP address, username and password that can be used to login to and control the fence device.
* Failover Domains: Defines a logical grouping of nodes which can fail over to each other
* Shared Resources: A resource used by a cluster service, such GFS, a shared filesystem, ip address, NFS resource, script, or Samba service.
* Services: An HA service provided by the cluster, which combines together shared resources within a failover domain utilizing one or more nodes and their associated fence device.

Perhaps the most important of these is the "Script" shared resource. This script is a standard RC script (such as those in /etc/init.d) that aceepts at least 3 arguments: start, stop, and status (or monitor). When a cluster service is started the appropriate node is selected, and the shared resources given to it, such as mounting a shared filesystem and assuming a shared IP address. It then runs the script to start the service. Then, every 30 seconds, it runs the script with the "status" argument to monitor whether or not the service is indeed still online. In the event of a graceful failover the stop argument is given to the script to close it, before moving all the resources to the new node and starting it there.

The whole setup is pretty flimsy in comparison to other HA suites such as IBM's HACMP and Sun's SunCluster. Its akin to tying dental floss between two nodes. Using a network PDU is like holding a gun to the head of each node: answer me or else. You'll notice that there are no explicit interconnects.

[root@zimbra4 cluster]# clustat
Member Status: Quorate

Member Name Status
------ ---- ------
zimbra4.XX Online, Local, rgmanager
zimbra5.XX Online, rgmanager
zimbra6.XX Online, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
webmail1.XX zimbra4.XX started
webmail2.XX zimbra5.XX started

Although it might be flimsy, it does work well in some situations. Because you don't need explicit interconnects and don't require a shared quorum device it means that very little pre-planning is needed for a simple cluster setup, so long as you've got a MasterSwitch handy. If you, for instance, wanted to setup an HA Apache service, you'd just use the /etc/init.d/httpd script, add a shared IP, and then share your htdocs/ on, say, and NFS mount point which is setup as a shared resource, edit your httpd.conf for the right htdocs/ directory and your basically done. Of course, when doing this, make sure you don't allow Apache to startup on boot by itself (chkconfig off httpd).

So for small services it might work well, but would I run Oracle or DB2 on it? Not a chance in hell. Here are my gripes:

1. Shared IP's don't show up as IP aliases in ifconfig. This has got to be a bug. If a shared IP is present, I should see its address in ifconfig as eth0:1 or something, but you don't. This makes checking the current location of the address difficult (ie: telnet/ssh to it and see where you end up.) This seems to be due to the fact that RH Cluster doesn't tie shared IP's to specific interfaces, which in and of itself, is problematic imho. Either way, it still would be nice if it showed up as like "clu1" or something.
2. Shared IP Address "Drift". I have run into numberous problems with the shared IP just drifting to its failover partner. The shared storage doesn't move and the service itself doesn't move, just the IP, which means that service is effectively down, although the cluster is totally unaware of the problem (as checked with clustat). To resolve the issue I've got to disable the service completely and then restart it on the appropriate node (ie: clusvcadm -d mysvc & clusvcadm -e mysvc -m node1).
3. Unexpected shutdown of a service: Things are humming along fine and then I get a call from QA, service is down. If it wasn't IP drift it would be an unexpected failover or shutdown of the service. clustat may or may not know whats going on in these cases, and often in the case of a failover reported that the service was still running on the previous (pre-failover) node when in fact is was not.

I just can't find anything to like about Red Hat Cluster Suite. If I wanted a light cluster solutions I'd opt for something thats tried and true and enterprise grade, such as Veritas Cluster Suite. If you want a totally integrated and comprehensive clustering solution, Sun Cluster is the way to go, hands down, but that requires Solaris, and thus doesn't really apply here.

I'm aware that some of these issues listed above may be unresolved bugs, some may be monitor issues, etc. But this is supposed to be an enterprise ready suite that I paid a lot of money for and it just doesn't act like one. Some of these issues are possibly due to Zimbra's monitoring scripts, but reguardless I'm bothered that RH Cluster doesn't have a way to deal with these situations like a true solution (say, Sun Cluster or HACMP) does. Couple this with the fact that the documentation is some of the worst I've ever seen. Flip through the docs here.

UPDATE: I've been digging around the source for RH Cluster this afternoon. Aprarently although ifconfig won't show you the shared IP, ip (yes, thats a command) will. Example:

[root@zimbra4 cluster]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:50:8B:D3:8D:51
inet addr: Bcast: Mask:
inet6 addr: fe80::250:8bff:fed3:8d51/64 Scope:Link
RX packets:25040869 errors:0 dropped:0 overruns:0 frame:0
TX packets:18583752 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1373798465 (1.2 GiB) TX bytes:893112790 (851.7 MiB)

eth1 Link encap:Ethernet HWaddr 00:50:8B:D3:8D:5B
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback
inet addr: Mask:
inet6 addr: ::1/128 Scope:Host
RX packets:2757771 errors:0 dropped:0 overruns:0 frame:0
TX packets:2757771 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:296459762 (282.7 MiB) TX bytes:296459762 (282.7 MiB)

sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

[root@zimbra4 cluster]# ip addr list
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:8b:d3:8d:51 brd ff:ff:ff:ff:ff:ff
inet brd scope global eth0
inet scope global eth0 <--- Thats the Shared IP

inet6 fe80::250:8bff:fed3:8d51/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc noop qlen 1000
link/ether 00:50:8b:d3:8d:5b brd ff:ff:ff:ff:ff:ff
4: sit0: mtu 1480 qdisc noop
link/sit brd

As for the drifting IP address problem... I started to wonder if it might be because of the way Red Hat Cluster monitors the interface. If it was doing a ping test, it would explain what I've been seeing, because the address would in fact be online, it just isn't on the right system. Looking at rgmanager/src/resources/ip.sh it appears that this is exactly the problem. Why its drifting in the first place, I can't say, but clearly Red Hat Cluster's method of monitoring the links is open to some serious issues.