Let’s talk failover. Most tools for failover (keepalived, heartbeat, wackamole/spread) use a protocol known as multicast. Multicast acts as a sort of “bulletin board” between computers. Anybody on the network can look at the bulletin board, and anybody on the network can post to the bulletin board. Normally, failover tools use multicast to pass messages between computers. For instance you could have three computer on a network, all posting and listening to the same multicast group: “Hey, I’m alive!” If one of the machines stops sending this repetitive  message, the others know that something is wrong…either it has been disconnected or gone down, etc. They can use that information to act: was that computer hosting a shared IP? Give the IP to one of the computers that are still responding. This is the general idea behind IP-based failover.

Now, there’s no inherent problem with multicast. It’s generally known for being unreliable, but when all you’re sending is “Hi!” over the wire, data integrity isn’t a high priority. The problem with multicast in reality is that most “cloud” (VPS) providers (AWS, Linode, Slicehost, Rackspace, etc) don’t support it on their networks. You can send a multicast message to a group, but your other machines listening on that group won’t hear it. The other problem with multicast is that the failover tools mentioned above ONLY support multicast. There is no way to tell them to listen to another machine directly over unicast, which is supported by cloud providers.

One way you can solve this is by using GRE tunnels, which allow you to create a tunnel to another computer with everything inside encrypted. This allows multicast communications to pass between two computers, even if the router blocks them normally.

I recently tried to get this set up on my current host, Linode. I was not successful, even with the help of another member who had the same problem (but solved it with GRE). I just could not get two machines to talk to eachother over a GRE tunnel with keepalived.

The solution

I posted my question to serverfault.com in a last resort (video). I’d asked more or less the same question there before, but didn’t get the answer I wanted. This time, I hit a jackpot though.

Willy Tarreau, creator of HAProxy, responded with a patch to keepalived that allows it to communicate over unicast. I applied it, recompiled, set up the new options the patch gives (“vrrp_unicast_bind” &”vrrp_unicast_peer“), and spun it up on both machines.

Yesss!! It works! Stopping HAProxy on the first server made the second machine take the shared IP.

Now, ideally there would be a bunch of machines, namely all my web servers that would be standing by ready to take the shared IP. This patch only allows me two machines. Failover is failover though, when one instance goes down, I get an email and can go in and investigate.  I’d still like to know if there is a way to do failover on a cluster of servers without multicast, but for now this works great.

Recently I read about compute clusters and how they’re used. I just had to try it. I successfully installed (from source) OpenAIS and Pacemaker (guide here) on Slackware 12. The experience was, overall, extremely smooth. I had a few hiccups I can only attribute to my not being able to follow directions, but with a few Makefile tweaks and some very small code tweaks, I got everything to compile and run.

Keep in mind, this is all using VirtualBox VMs, so once the cluster stuff is installed on one machine, I can more or less copy and paste and have a 3-computer linux cluster running from the comfort of Windows 7 (it’s just for games, I swear!!). Aside from having to mess a bit with the networking in VirtualBox, everything was almost completely automatic.

The next step is to figure out DRBD and how it fits in with all this HA stuff. I’ve been trying to find a guide on using Pacemaker with MySQL, but no guide is written JUST for MySQL…it’s all MySQL with DRBD. I’d rather not complicate things too much until I can figure out how the hell this is all working.

Anyway, I’ll report back with my findings sometime soon.

Please note – at the time of installing (about 6 or 7 days ago), there was a bug in the Debian packaged distribution of OpenAIS/Pacemaker that makes the “expected votes” value of the quorum formula > 3 billion. Unless you have 3 billion machines laying around, you will NOT be able to start any resources unless you configure to ignore the quorum. I’ve confirmed this bug on the IRC channel, and to my knowledge, it still exists.

My advice is to compile from source (but that’s always my advice anyway :) )