In my latest frenzy, which was focused on HA more than performance, I installed some new servers, new services on those servers, and the general complexity of the entire setup for beeets.com doubled. I was trying to remember a utility that I saw a while back that would restart services if they failed. I checked my delicious account, praying that I had thought of my future self when I originally saw it. Luckily, I had saved it under my “linux” tag. Thanks, Andrew from the past.

The tool is called monit, and I’m surprised I ever lived without it. Not only does it monitor your services and keep them running, it can restart them if they fail, use too much memory/cpu, stop responding on a certain port, etc. Not only that, but it will email you every time something happens.

While perusing monit’s site, I saw M/Monit which allows you to monitor monit over web, essentially. The only thing I scratched my head about was that M/Monit uses port 8080 (which is fine) but NginX already uses port 8080, and I wasn’t about to change that, so I opened conf/server.xml and looked for 8080, replaced with 8082 (monit runs on 8081 =)). Then I reconfigured monit to communicate with M/Monit and vice versa, and now I have a kickass process monitor that alerts me when things go wrong, and also sends updates to a service that allows me to monitor the monitor.

I can’t look at things like queries/sec as I can with Cacti (which is awesome but a little clunky) but I can see which important services are running on each of my servers, and even restart them if I need to straight from M/Monit. The free download license allows to use M/Monit on one server, which is all I need anyway.

Great job monit team, you have gone above and beyond.

I decided this weekend I wanted to go down the road of trying out MySQL Cluster for beeets.com. The reason isn’t speed, it’s availability. After countless hours of research, I decided I’d rather have a plate of turds for breakfast than have to worry about Master-Master replication (or DRBD) w/heartbeat, not to mention what to do when things get out of sync. Not my cup of tea. MySQL Cluster may be a bit slower than a replicated setup (in almost all cases except for primary key lookup, I suspect), but to me it’s worth it to have a more set-it and forget-it approach. There are many benefits of cluster over replication:

  • Any server can go down. Assuming you have more than one replica of your data, you can lose any server in your setup and still be up and running. This can be achieved with replication, but it’s not as easy. You have to have some form of Master-Master replication, perhaps with DRDB, and some form of failover (usually heartbeat).
  • Your data  set scales. If you start running out of disk space with a cluster, just add a few more data nodes and your data will be spread out over them. With replication, each replicated server has to have enough storage to fit the entire database. That means if your dataset grows too large, you have to either partition (a hack, essentially) or upgrade your servers.
  • Your bandwidth scales. With a cluster, if you are running out of bandwidth, you can add more mysqld processes on your www servers or add more data nodes and your bandwidth scales almost linearly. With replication, you can only add so many slaves before your writes are the bottleneck. Then, once again, you have to look into things like circular replication (dangerous) or partitioning your data set (large updates to your app unless you have an insanely good ORM, big infrastructure change).

These are the main points that helped me decide. Historically, with a clustered approach, the entire dataset would have to fit in the memory of all the data nodes, which is somewhat restrictive if the dataset gets too large. Nowadays, the cluster only needs to store indexes in memory, and can store all non-indexed data on disk. There is talk of having completely disk-based store as well.

All that being said, I set up cluster, which was surprisingly easy. I’m not going to go over how to set it up or anything, just read the manual. After some benchmarking with the web API for beeets.com, the cluster setup appeared to be running about the same speed as the InnoDB setup when testing various commands…a pleasant surprise. It also appeared to handle concurrency a bit better.

Obviously once the dataset grows past a few megs and the traffic bumps up, we’ll revisit the benchmarking, but my hope is that what cluster loses in speed from your everyday general query, it gains in speed by having ability for higher concurrency.

This weekend I wen’t on a frenzy. I turned beeets.com from a single VPS enterprise to 4 VPSs: 2 web (haproxy, nginx, php-fpm, sphinx, memcached, ndb_mgmd) and 2 database servers (ndmtd). There’s still some work to do, but the entire setup seems to be functioning well.

I had a few problems though. In PHP (just PHP, and nothing else) hosts were not resolving. The linux OS was resolving hosts just fine, but PHP couldn’t. It was frustrating. Also, I was unable to sudo. I kept checking permissions on all my files in /etc, rebooting, checking again, etc.

The fix

Then I looked again. /etc itself was owned by andrew:users. Huh? I changed permissions back root:root, chmod 755. Everything works. Now some background.

A while back, I wrote some software (bash + php) that makes it insanely easy to install software to several servers at once, and sync configurations for different sets of servers. It’s called “ssync.” It’s not ready for release yet, but I can say without it, I’d have about 10% of the work done that I’d finished already. Ssync is a command-line utility that lets you set up servers (host, internal ip, external ip) and create groups. Each group has a set of install scripts and configuration files that can be synced to /etc. The configuration files are PHP scriptable, so instead of, say, adding all my hosts by hand to the /etc/hosts file, I can just loop over all servers in the group and add them automatically. Same with my www group, I can add a server to the “www” group in ssync, and all of a sudden the HAproxy config knows about the server.

Here’s the problem. When ssync was sending configuration files to /etc on remote servers, it was also setting permissions on those files (and folders) by default. This was because I was using -vaz, which attempts to preserve ownership, groupship, and permissions from the source (not good). I added some new params (so now it’s “-vaz –no-p –no-g –no-o”). Completely fixed it.