I decided this weekend I wanted to go down the road of trying out MySQL Cluster for beeets.com. The reason isn’t speed, it’s availability. After countless hours of research, I decided I’d rather have a plate of turds for breakfast than have to worry about Master-Master replication (or DRBD) w/heartbeat, not to mention what to do when things get out of sync. Not my cup of tea. MySQL Cluster may be a bit slower than a replicated setup (in almost all cases except for primary key lookup, I suspect), but to me it’s worth it to have a more set-it and forget-it approach. There are many benefits of cluster over replication:
- Any server can go down. Assuming you have more than one replica of your data, you can lose any server in your setup and still be up and running. This can be achieved with replication, but it’s not as easy. You have to have some form of Master-Master replication, perhaps with DRDB, and some form of failover (usually heartbeat).
- Your data set scales. If you start running out of disk space with a cluster, just add a few more data nodes and your data will be spread out over them. With replication, each replicated server has to have enough storage to fit the entire database. That means if your dataset grows too large, you have to either partition (a hack, essentially) or upgrade your servers.
- Your bandwidth scales. With a cluster, if you are running out of bandwidth, you can add more mysqld processes on your www servers or add more data nodes and your bandwidth scales almost linearly. With replication, you can only add so many slaves before your writes are the bottleneck. Then, once again, you have to look into things like circular replication (dangerous) or partitioning your data set (large updates to your app unless you have an insanely good ORM, big infrastructure change).
These are the main points that helped me decide. Historically, with a clustered approach, the entire dataset would have to fit in the memory of all the data nodes, which is somewhat restrictive if the dataset gets too large. Nowadays, the cluster only needs to store indexes in memory, and can store all non-indexed data on disk. There is talk of having completely disk-based store as well.
All that being said, I set up cluster, which was surprisingly easy. I’m not going to go over how to set it up or anything, just read the manual. After some benchmarking with the web API for beeets.com, the cluster setup appeared to be running about the same speed as the InnoDB setup when testing various commands…a pleasant surprise. It also appeared to handle concurrency a bit better.
Obviously once the dataset grows past a few megs and the traffic bumps up, we’ll revisit the benchmarking, but my hope is that what cluster loses in speed from your everyday general query, it gains in speed by having ability for higher concurrency.