Recently I’ve been working on speeding up the homepage of beeets.com. Most speed tests say it takes between 4-6 seconds. Obviously, all of them are somehow fatally flawed. I digress, though.

Everyone (who’s anyone) knows that gzipping your content is a great way to reduce download time for your users. It can cut the size of html, css, and javascript by about 60-90%. Everyone also knows that gzipping can be very cpu intensive. Not anymore.

I just installed nginx’s Gzip Static Module (compile nginx with –with-http_gzip_static_module) on beeets.com. It allows you to pre-cache your gzip files. What?

Let’s say you have the file /css/beeets.css. When a request for beeets.css comes through. the static gzip module will look for /css/beeets.css.gz. If it finds it, it will serve that file as gzipped content. This allows you to gzip your static files using the highest compression ratio (gzip -9) when deploying your site. Nginx then has absolutely no work to do besides serving the static gzip file (it’s very good at serving static content).

Wherever you have a gzip section in your nginx config, you can do:

gzip_static on;

That’s it. Note that you will have to create the .gz versions of the files yourself, and it’s mentioned in the docs that it’s better if the original and the .gz files have the same timestamp; so it may be a good idea to “touch” the files after both are created. It’s also a good idea to turn the gzip compression down (gzip_comp_level 1..3). This will minimally compress dynamic content without putting too much strain on the server.

This is a great way to get the best of both worlds: gzipping (faster downloads) without the extra load on the server. Once again, nginx pulls through as the best thing since multi-cellular life. Keep in mind that this only works on static content (css, javascript, etc etc). Dynamic pages can and should be gzipped, but with a lower compression ratio to keep load off the server.

So I got to thinking. There are some good caching reverse proxies out there, maybe it’s time to check one out for beeets. Not that we get a ton of traffic or we really need one, but hey what if we get digged or something? Anyway, the setup now is not really what I call simple. HAproxy sits in front of NginX, which serves static content and sends PHP requests back to PHP-FPM. That’s three steps to load a fucking page. Most sites use apache + mod_php (one step)! But I like to tinker, and I like to see requests/second double when I’m running ab on beeets.

So, I’d like to try something like Varnish (sorry, Squid) but that’s adding one more step in between my requests and my content. Sure it would add a great speed boost, but it’s another layer of complexity. Plus it’s a whole nother service to ramp up on, which is fun but these days my time is limited. I did some research and found what I was looking for.

NginX has made me cream my pants every time I log onto the server since the day I installed it. It’s fast, stable, fast, and amazing. Wow, I love it. Now I read that NginX can cache FastCGI requests based on response caching headers. So I set it up, modified the beeets api to send back some Cache-Control junk, and voilà…a %2800 speed boost on some of the more complicated functions in the API.

Here’s the config I used:

# in http {}
fastcgi_cache_path /srv/tmp/cache/fastcgi_cache levels=1:2
                           keys_zone=php:16m
                           inactive=5m max_size=500m;
# after our normal fastcgi_* stuff in server {}
fastcgi_cache php;
fastcgi_cache_key $request_uri$request_body;
fastcgi_cache_valid any 1s;
fastcgi_pass_header Set-Cookie;
fastcgi_buffers 64 4k;

So we’re giving it a 500mb cache. It says that any valid cache is saved for 1 second, but this gets overriden with the Cache-Control headers sent by PHP. I’m using $request_body in the cache key because in our API, the actual request is sent through like:

GET /events/tags/1 HTTP/1.1
Host: ...

{"page":1,"per_page":10}

The params are sent through the HTTP body even in a GET. Why? I spent a good amount of time trying to get the API to accept the params through the query string, but decided that adding $request_body to one line in an NginX config was easier that re-working the structure of the API. So far so good.

That’s FastCGI acting as a reverse proxy cache. Ideally in our setup, HAproxy would be replaced by a reverse proxy cache like Varnish, and NginX would just stupidly forward requests to PHP like it was earlier today…but I like HAproxy. Having a health-checking load-balancer on every web server affords some interesting failover opportunities.

Anyway, hope this helps someone. NginX can be a caching reverse proxy. Maybe not the best, but sometimes, just sometimes,  simple > faster.

With Amazon S3 being as good and cheap as it is, it’s almost essential for what I need it for…storing images and large static files. The problem is there is no interface besides SOAP, and if you don’t know how I feel about SOAP, let me tell you: it makes my head want to fucking explode. It’s insanely complicated for what it does. It tries to standardize so many things that it’s completely bloated…sending the message “hello” from one computer to another in SOAP takes oh about 10 years…5 years for a team of 50 supercomputers working in tandem to build the header and message body (which will total about 400 mb when finally complete), .02ms to send, and 5 years in decoding. Use JSON, you crackheads. Sure you’ll actually have to document it, but nothing is worse than SOAP, not even documenting.

That’s besides the point though. S3 chose to use SOAP, so I refuse to write my own client for it. This means that, as of late, the world is without a good free S3 uploading client. S3Fox, the firefox extension, is ok…it can’t handle SSL connections though, so expect to lose your private key to a sniffer about 10 seconds after your first request. JungleDisk is now a completely paid service (I already pay for S3, I’m not paying those guys to fucking USE it). Linux has some great command line tools for S3 (yay…), but that leaves windows with either S3fox, JD ($$$), or a handful of shitty S3 clients.

Right now, I have to use a PHP script I built around the S3 PHP Class to do any uploading that doesn’t make me vomit. The S3 class works REALLY well…it lets you assign ACL while uploading, change headers for images (for browser-side caching, mmm) and best of all, doesn’t completely suck. Let’s all thank Donovan Schönknecht for writing something that actually works well and communicates with S3.

Here’s a piece of code I wrote that wraps around the S3 uploader. It uploads images, sets Cache-Control headers, and removes the images. What you want to do is copy your images into a folder, run this file one directory up (change $start_folder to == the name of the folder your images are in), and sit back. It will upload all your images, directory structure preserved, publicly viewable and with cache-control headers.

< ?
	// quick config
	$bucket			=	'your.bucket.com';
	$start_folder	=	'images';

	// settings
	error_reporting(E_ALL);
	ini_set('display_errors', 1);
	ini_set('max_execution_time', 3600);

	// include S3 class
	include 'S3.php';
	$s3	=	new S3('[your key]', '[your secret]', false);

	// get list of files. if you don't want a subdirectory, just change this line to not need one. hopefully you know PHP...
	$files	=	recurse(array(), $start_folder);

	// loop over files and upload
	for($i = 0, $n = count($files); $i < $n; $i++)
	{
		$ext	=	preg_replace('/.*\./', '', $files[$i]);
		$type	=	'image/jpeg';
		if($ext == 'jpg')
		{
			$type	=	'image/jpeg';
		}
		else if($ext == 'gif')
		{
			$type	=	'image/gif';
		}
		else if($ext == 'png')
		{
			$type	=	'image/png';
		}

		if(
			!$s3->putObject(
				$s3->inputFile($files[$i]),
				$bucket,
				$files[$i],
				S3::ACL_PUBLIC_READ,
				array(),
				array('Cache-Control' => 'max-age=31536000', 'Content-Type' => $type)
			)
		)
		{
			echo '<span style="color:green;">Failed: upload of '. $files[$i] . '';
		}
		else
		{
			echo '<span style="color:green;">Succeeded: upload of '. $files[$i] .'';
			unlink($files[$i]);
		}
	}

	function recurse($files, $dir)
	{
		$d	=	scandir($dir);

		for($i = 0, $n = count($d); $i < $n; $i++)
		{
			if(!preg_match('/^\./', $d[$i]))
			{
				if(is_dir($dir . '/' . $d[$i]))
				{
					$files	=	recurse($files, $dir . '/' . $d[$i]);
				}
				else
				{
					$files[]	=	$dir . '/' . $d[$i];
				}
			}
		}

		return $files;
	}
?>

Feel free to modify, copy, blah blah…but give credit where it’s due. Let it be a light to you when all other lights go out. Hopefully it helps someone, because it sure helps me out.

Amazon S3

Very cool service. I updated beeets to pull all images from images.beeets.com, an S3 bucket. Also, all css files now go through

/css/css.php/file.css …which rewrites

url(/images/…) to

url(http://images.beeets.com/images/…)

And guess what, it all works. I had some bad experiences with the S3Fox firefox plugin in the past, but it’s since been updated and I’ve been using it regularly.

Also, using S3.php, all profile images now go directly onto images.beeets.com. Wicked.

So what does this mean? A few things:

1. Less bandwidth & work – beeets will spend more time serving HTML, CSS, and JS than images.

2. Safer – We were backing up profile images to S3 indirectly before, but the chances of S3 going down VS our hosting are slim.

3. Worse image caching – Before, I had .htaccess controlling all the caching for static files. I liked it that way. S3 doesn’t do this very well at all. Apparently it’s configurable, but I don’t know how…any ideas?

All in all, it should be better for beeets. Maybe we’ll actually let users have images bigger than 10×10 now ;)

Thumbs up to S3 (and probably all other Amazon web services).