This weekend I wen’t on a frenzy. I turned beeets.com from a single VPS enterprise to 4 VPSs: 2 web (haproxy, nginx, php-fpm, sphinx, memcached, ndb_mgmd) and 2 database servers (ndmtd). There’s still some work to do, but the entire setup seems to be functioning well.

I had a few problems though. In PHP (just PHP, and nothing else) hosts were not resolving. The linux OS was resolving hosts just fine, but PHP couldn’t. It was frustrating. Also, I was unable to sudo. I kept checking permissions on all my files in /etc, rebooting, checking again, etc.

The fix

Then I looked again. /etc itself was owned by andrew:users. Huh? I changed permissions back root:root, chmod 755. Everything works. Now some background.

A while back, I wrote some software (bash + php) that makes it insanely easy to install software to several servers at once, and sync configurations for different sets of servers. It’s called “ssync.” It’s not ready for release yet, but I can say without it, I’d have about 10% of the work done that I’d finished already. Ssync is a command-line utility that lets you set up servers (host, internal ip, external ip) and create groups. Each group has a set of install scripts and configuration files that can be synced to /etc. The configuration files are PHP scriptable, so instead of, say, adding all my hosts by hand to the /etc/hosts file, I can just loop over all servers in the group and add them automatically. Same with my www group, I can add a server to the “www” group in ssync, and all of a sudden the HAproxy config knows about the server.

Here’s the problem. When ssync was sending configuration files to /etc on remote servers, it was also setting permissions on those files (and folders) by default. This was because I was using -vaz, which attempts to preserve ownership, groupship, and permissions from the source (not good). I added some new params (so now it’s “-vaz –no-p –no-g –no-o”). Completely fixed it.

This will be a short post, but pretty cool.

You can add arrays together:

	$test1	=	array('name' => 'andrew');
	$test2	=	array('status' => 'totally gnar, dude');

	print_r($test1 + $test2);
	---------------------------
	Array
	(
	    [name] => andrew
	    [status] => totally gnar, dude
	)

Wow…who would have thought. And my most recent favorite, converting objects to events. It’s a simple foreach($object as $key => $val) and putting each element into a separate array right? WRONG:

	$array	=	(array)$object;

No fucking way. Casting actually works in this case. Why does nobody tell me anything?! This is great for parsing XML because any parser normally returns an object, and quite honestly, I hate dealing with objects. All database data is by default returned as an array usually,  and it’s a pain having some data sources being objects while others are arrays. Now it doesn’t matter…if you like objects, cast an associative array as an (object), if you like arrays cast with (array). I love PHP…

I recently (read: today) had an obnoxious problem: I’m writing some code for creating an ATOM feed, and kept getting errors about entity-escaped values. Namely, things like ’, •, etc. Even written as entities, Opera and IE7 did not recognize them. I read somewhere that it was necessary to convert the named entities to numbered entities. Great.

Well, PHP doesn’t have a native function for this. Why, I do not know…there seems to be functions for many other things, and adding an argument to htmlentities that returns numbered entities would seem easy enough. Either way, I wrote a quick function that takes the htmlentities translation table, adds any missing values that are not in the translation table, and runs the conversion to numbered entities. Check it:

function htmlentities_numbered($string)
{
	$table	=	get_html_translation_table(HTML_ENTITIES);
	$trans	=	array();
	foreach($table as $char => $ent)
	{
		$trans[$ent]	=	'&#'. ord($char) .';';
	}

	$trans['€']	=	'€';
	$trans['‚']	=	'‚';
	$trans['ƒ']	=	'ƒ';
	$trans['„']	=	'„';
	$trans['…']	=	'…';
	$trans['†']	=	'†';
	$trans['‡']	=	'‡';
	$trans['ˆ']	=	'ˆ';
	$trans['‰']	=	'‰';
	$trans['Š']	=	'Š';
	$trans['‹']	=	'‹';
	$trans['Œ']	=	'Œ';
	$trans['‘']	=	'‘';
	$trans['’']	=	'’';
	$trans['“']	=	'“';
	$trans['”']	=	'”';
	$trans['•']	=	'•';
	$trans['–']	=	'–';
	$trans['—']	=	'—';
	$trans['˜']	=	'˜';
	$trans['™']	=	'™';
	$trans['š']	=	'š';
	$trans['›']	=	'›';
	$trans['œ']	=	'œ';
	$trans['Ÿ']	=	'Ÿ';

	$string	=	strtr($string, $trans);
	return $string;
}

Hope it’s helpful.

UPDATE – apparently, even the numbered entities are not valid XML. Fair enough, I’ve converted them all to unicode (0×80 – 0x9F). All my ATOM feeds validate now (through feedvalidator.org).

Having taken my programming roots in QBASIC (shut up), C, C++, and a very healthy self-administered dose of x86 assembly, I can say that for the most part I have a good sense of what programming is. All of what I’ve learned up until now has helped me develop my sense for good code, and helped me to write programs and applications that I can sit back and be proud of. I’ve been working with PHP for over 4 years now, and I have to say it’s the most ugly language I’ve ever used.

Let me explain. PHP itself is wonderfully loosely-typed, C-like syntactically, and all around easy to write code for. The syntax is familiar because of my background. The integration with web is apparent down to its core, and it’s a hell of a lot easier than assembly to write. When perusing through a project filled to the brim with C source code, I’m usually left thinking about how it works, why the developer did what they did, and why that makes sense for that particular application. I’m usually able to figure out these questions and there’s one main reason: the code isn’t shit. With PHP, I’m usually left wondering what the developer was thinking, the 100s of ways I could have done it more efficiently, and why this person is actually making money doing this.

With roughly 90% of open-source PHP projects, everything works great. I love it, clients love it, everyone kisses eachother’s ass. But then one day you get that inevitable change request…I want it to do THIS. A quick look at the source code reveals that, omg, it’s been written by a team of highly trained ape-like creatures! It surprises me that WordPress plugins that get 100s of downloads a day throw errors (unless you disable error output, which I never do on my dev machines). Whole architectures are written with random indentation, or indentation with spaces (sorry Rubyers, but space-indentation is an evil scourge on humanity). No effort is put into separating pieces of code that could so easily be modularized if only they were given a second thought.

Do I hate PHP? No, I love PHP. I think it’s a well-written, high-level web development language. It’s fast, portable, and scalable. It allows me to focus on the problems I face, not the syntax of what I’m trying to do. Paired with an excellent editor like Eclipse (w/ PHPeclipse) I’m unstoppable. But why can’t any other PHP developers share my love of well-written code? It’s the #1 critique of PHP, and rightly so. I’m pretty sure that all programming languages, save Python, allow you to write awful, unreadable code…but PHP’s culture seems to be built around shitty code, amateurish hacks, and lack of elegance. PHP isn’t the problem, it’s the people writing it who suck!

So I do love the language, but hate most of the implementations. I have to say though, nothing is worse than Coldfusion.

Smarty is everyone’s favorite templating language for PHP. It’s great in many ways, one of the main features being that it can display things on a website. It also promotes separation of display code and logic, which many PHP programmers seem to have trouble with: oscommerce, PHPList, etc etc.

So why do I hate it?

<rant>
There’s no fucking point! All bad programmers write bad code. Why create a language within a language just to force bad programmers to do one thing right? I realize that Smarty does enforce separation of logic from display very well. I’ve used it in several projects. But if its capabilities are so strikingly similar to PHP that for most things there is a 1-1 reference, why bother? Why not just use PHP code?

Also, the plugins (and {php} tag) allow you to make logical decisions, run mysql queries, send rockets to the moon…there’s nothing you can do in PHP that you cannot do in Smarty…which makes Smarty completely worthless for what it’s trying to do.

If you want to promote good programming, you don’t need Smarty. You can rewrite Smarty as a PHP object that sets variables and includes a template. I’ve written this a dozen times over…and it does the exact same thing, except templates are in PHP so everyone can understand them, there is no caching trickery going on, and best of all you don’t need to reference some stupid guide on how to display something in a strange language which you already know how to do in PHP.
</rant>

So, in summation, please don’t stop using Smarty. It’s a good piece of code for people who don’t understand the basics of separation of logic from display…but realize that Smarty is a hack, a patch, a band-aid. The REAL problem is bad programming, not something inherently wrong with PHP that needs to be rewritten.

A while ago, I wrote a snippet about MySQL replication. Well, I finally started playing with it not long ago, and was very successful. Apparently it works well, even for someone who hasn’t set it up before. Nice. I haven’t actually set this up on beeets because we don’t need it yet, and also I haven’t played with it enough to feel confident using it on a production basis.

That said, our Lyon Bros aframe framework now supports replication. It sends selects to the slave, writes to master, last_id to master, transactions to master, etc. Basically the only thing that slaves get are dumb selects (which are most of the queries anyway). I had the opportunity to test this out with the play MySQL replcation setup, and it works perfectly. It was nice to see something that complicated actually working.

Right now, it only supports connecting to two servers: master and slave (it holds off on connecting to those servers until one of them actually gets a request to save time on request startup). Basically, aframe doesn’t support load balancing. This means that if you have more than one master or slave, to use replication effectively, you’ll have to use a MySQL replication load balancer (either software or hardware). This will give you a single IP address to send requests to, but distribute the requests automatically to improve load times.

Hopefully we’ll need to set this up soon, but for now, just the one MySQL server will do =). Sometime within the next few months, we plan on releasing aframe as open source. It’s licensed and ready to go, but we have no documentation…and just saying “check out CakePHP and hope it works the same” won’t fly…especially since aframe is 23,148,855,308,184,500x better than Cake.

One thing I’d like to check out on round two of replication tinkering is Maatkit. From what I hear, it automates a lot of stuff I would be writing scripts for and checking every day. I looked at it a while ago and it seemed overly complicated, but that was even before I tried to get a server replicated. Maybe nowadays it would be easier.

With Amazon S3 being as good and cheap as it is, it’s almost essential for what I need it for…storing images and large static files. The problem is there is no interface besides SOAP, and if you don’t know how I feel about SOAP, let me tell you: it makes my head want to fucking explode. It’s insanely complicated for what it does. It tries to standardize so many things that it’s completely bloated…sending the message “hello” from one computer to another in SOAP takes oh about 10 years…5 years for a team of 50 supercomputers working in tandem to build the header and message body (which will total about 400 mb when finally complete), .02ms to send, and 5 years in decoding. Use JSON, you crackheads. Sure you’ll actually have to document it, but nothing is worse than SOAP, not even documenting.

That’s besides the point though. S3 chose to use SOAP, so I refuse to write my own client for it. This means that, as of late, the world is without a good free S3 uploading client. S3Fox, the firefox extension, is ok…it can’t handle SSL connections though, so expect to lose your private key to a sniffer about 10 seconds after your first request. JungleDisk is now a completely paid service (I already pay for S3, I’m not paying those guys to fucking USE it). Linux has some great command line tools for S3 (yay…), but that leaves windows with either S3fox, JD ($$$), or a handful of shitty S3 clients.

Right now, I have to use a PHP script I built around the S3 PHP Class to do any uploading that doesn’t make me vomit. The S3 class works REALLY well…it lets you assign ACL while uploading, change headers for images (for browser-side caching, mmm) and best of all, doesn’t completely suck. Let’s all thank Donovan Schönknecht for writing something that actually works well and communicates with S3.

Here’s a piece of code I wrote that wraps around the S3 uploader. It uploads images, sets Cache-Control headers, and removes the images. What you want to do is copy your images into a folder, run this file one directory up (change $start_folder to == the name of the folder your images are in), and sit back. It will upload all your images, directory structure preserved, publicly viewable and with cache-control headers.

< ?
	// quick config
	$bucket			=	'your.bucket.com';
	$start_folder	=	'images';

	// settings
	error_reporting(E_ALL);
	ini_set('display_errors', 1);
	ini_set('max_execution_time', 3600);

	// include S3 class
	include 'S3.php';
	$s3	=	new S3('[your key]', '[your secret]', false);

	// get list of files. if you don't want a subdirectory, just change this line to not need one. hopefully you know PHP...
	$files	=	recurse(array(), $start_folder);

	// loop over files and upload
	for($i = 0, $n = count($files); $i < $n; $i++)
	{
		$ext	=	preg_replace('/.*\./', '', $files[$i]);
		$type	=	'image/jpeg';
		if($ext == 'jpg')
		{
			$type	=	'image/jpeg';
		}
		else if($ext == 'gif')
		{
			$type	=	'image/gif';
		}
		else if($ext == 'png')
		{
			$type	=	'image/png';
		}

		if(
			!$s3->putObject(
				$s3->inputFile($files[$i]),
				$bucket,
				$files[$i],
				S3::ACL_PUBLIC_READ,
				array(),
				array('Cache-Control' => 'max-age=31536000', 'Content-Type' => $type)
			)
		)
		{
			echo '<span style="color:green;">Failed: upload of '. $files[$i] . '';
		}
		else
		{
			echo '<span style="color:green;">Succeeded: upload of '. $files[$i] .'';
			unlink($files[$i]);
		}
	}

	function recurse($files, $dir)
	{
		$d	=	scandir($dir);

		for($i = 0, $n = count($d); $i < $n; $i++)
		{
			if(!preg_match('/^\./', $d[$i]))
			{
				if(is_dir($dir . '/' . $d[$i]))
				{
					$files	=	recurse($files, $dir . '/' . $d[$i]);
				}
				else
				{
					$files[]	=	$dir . '/' . $d[$i];
				}
			}
		}

		return $files;
	}
?>

Feel free to modify, copy, blah blah…but give credit where it’s due. Let it be a light to you when all other lights go out. Hopefully it helps someone, because it sure helps me out.

So after hours of searching and tweaking, I finally got the answer I never wanted: it’s not possible to have apache serve mod_fastcgi requests through it’s own reverse proxy (ie load-balance mod_fastcgi). I know this is incorrect. But it had taken me so long and wasted so much time getting to the point where I was almost as clueless as when I started, that I took drastic action.

I installed lighttpd. I already had the FastCGI setup running, not to mention I got a new Linode for testing remote PHP. The only problem was that I couldn’t load balance between my slack box (web server) and my new linode (debian app server). BTW I chose Debian because the image was smaller and from what I know, it’s essentially the same as Slack. I really haven’t had ANY problems moving to it yet, and let’s face it, Deb is a lot more standard. Installing PHP was a bitch, but that’s what apt-get is for (no, I compiled PHP…but I’ll be damned if I’m going to hand compile all the stupid dependencies).

Anyway, within 20 minutes, lighttpd had PHP running through FastCGI load balanced between two servers. Needless to say, I fell in love. Not to mention all the information I was inundated with along the way about how small and lean lighttpd is swayed this choice a little.

So as far as I know, beeets is running great on both of its “new” servers and loving it.

There was a bit of trouble getting used to the new URL rewriting scheme, but generally instead of doing apache’s

RewriteCond blahblah !-f

You can just do url-rewrite(‘(images|css|js)’ => ‘$0′)

(this is a horrible oversimplification, but you get the idea)…you write the URLs you DON’T want to be rewritten to $0. Works wonders.

All that’s left is some cache-control headers (I’m crazy about them, if you can’t tell yet), and some speed testing. I’m excited to see if lighttpd is actually faster than apache under ab.

Next up, Capistrano.

Amazon S3

Very cool service. I updated beeets to pull all images from images.beeets.com, an S3 bucket. Also, all css files now go through

/css/css.php/file.css …which rewrites

url(/images/…) to

url(http://images.beeets.com/images/…)

And guess what, it all works. I had some bad experiences with the S3Fox firefox plugin in the past, but it’s since been updated and I’ve been using it regularly.

Also, using S3.php, all profile images now go directly onto images.beeets.com. Wicked.

So what does this mean? A few things:

1. Less bandwidth & work – beeets will spend more time serving HTML, CSS, and JS than images.

2. Safer – We were backing up profile images to S3 indirectly before, but the chances of S3 going down VS our hosting are slim.

3. Worse image caching – Before, I had .htaccess controlling all the caching for static files. I liked it that way. S3 doesn’t do this very well at all. Apparently it’s configurable, but I don’t know how…any ideas?

All in all, it should be better for beeets. Maybe we’ll actually let users have images bigger than 10×10 now ;)

Thumbs up to S3 (and probably all other Amazon web services).

Wow. You’d think it would be easy. In fact, it should have been. Compile a module, load it from apache. Recompile PHP with –enable-fastcgi…oh wait, I already had it in there (always thinking ahead!!). Change some apache settings.

Right? Yeah, right. It took two days. I can’t even really remember why. The biggest problem was that running make && make install in the mod_fastcgi source was NOT yielding a ‘mod_fastcgi.so’ as the documentation PROMISED! In fact, it installed mod_fastcgi.la instead, a highly useless file.

So how did the master get out of this bind? Beats me, try asking him. As for me, I had to run ‘ld -Bshareable *.o -o mod_fastcgi.so’ which is mentioned in some document from a long time ago in a galaxy far, far away.

Let me interject and say that the information on the FastCGI website is “not very well documented.”

Day 2. I figured, what’s the point of FastCGI if it’s not set up to connect to a remote App server? Maybe I don’t HAVE an external server set up, but we can pretend. Well that’s another nightmare. There’s a good external FastCGI guide written about it, and guess what it worked. Not really a nightmare at all, come to think of it. Quite pleasant.

All in all, shouldn’t have taken 2 days =P (I’m a tinkerer)…but fuck it, I have FastCGI now, ready to connect to all those App servers I have churning away in the background (one day).

In all the excitement, I also compiled and installed the apache worker-MPM. A few tests with ab didn’t really show any noticeable difference. But threads are cool, right?

Next up: figure out how to configure Apache to pass all requests ending in .php (whether the file exists on the web server or not) to our “app” server. Is this possible?