I’ve been seeing a lot of posts on the webz lately about how we can fix email. I have to say, I think it’s a bit short-sighted.

People are saying it has outgrown it’s original usage, or it contains bad error messages, or it’s not smart about the messages received.

These are very smart people, with real observations. The problem is, their observations are misplaced.

What email is

Email is a distributed, asynchronous messaging protocol. It does this well. It does this very well. So well, I’m getting a boner thinking about it. You send a message and it either goes where it’s supposed to, or you get an error message back. That’s it, that’s email. It’s simple. It works.

There’s no company controlling all messages and imposing their will on the ecosystem as a whole. There’s no single point of failure. It’s beautifully distributed and functions near-perfectly.

The problem

So why does it suck so much? It doesn’t. It’s awesome. The problem is the way people view it. Most of the perceived suckiness comes from its simplicity. It doesn’t manage your TODOs. It doesn’t have built-in calendaring. It doesn’t give you oral pleasure (personally I think this should be built into the spec though). So why don’t we build all these great things into it if they don’t exist? We could add TODOs and calendaring and dick-sucking to email!!

Because that’s a terrible idea. People are viewing email as an application; one that has limited features and needs to be extended so it supports more than just stupid messages.

This is wrong.

We need to view email as a framework, not an application. It is used for sending messages. That’s it. It does this reliably and predictably.

Replacing email with “smarter” features will inevitably leave people out. I understand the desire to have email just be one huge TODO list. But sometimes I just want to send a fucking message, not “make a TODO.” Boom, I just “broke” the new email.

Email works because it does nothing but messaging.

How do we fix it then?

We fix it by building smart clients. Let’s take a look at some of our email-smart friends.

Outlook has built-in calendaring. BUT WAIT!!!!! Calendaring isn’t part of email!!1 No, it’s not.

Gmail has labels. You can categorize your messages by using tags essentially. Also, based on usage patterns, Gmail can give weight to certain messages. That’s not part of email either!! No, my friend, it’s not.

Xobni also has built incredible contact-management and intelligence features on top of email. How do they know it’s time to take your daily shit before you do? Defecation scheduling is NOT part of the email spec!!

How have these companies made so much fucking money off of adding features to email that are not part of email?

It’s all in the client

They do it by building smart clients! As I said, you can send any message your heart desires using email. You can send JSON messages with a TODO payload and attach a plaintext fallback. If both clients understand it, then BAM! Instant TODO list protocol. There, you just fixed email. Easy, no? Why, with the right client, you could fly a fucking space shuttle with email. That’s right, dude, a fucking space shuttle.

If your client can create a message and send it, and the receiving client can decode it, you can build any protocol you want on top of email.

That’s it. Use your imaginations. I’ll say it one more time:

There’s nothing to fix

Repeat after me: “There’s nothing to fix!” If you have a problem with email, fork a client or build your own! Nobody’s stopping you from “fixing” email. Many people have made a lot of cash by “fixing” email.

We don’t have to sit in fluorescent-lit, university buildings deliberating for hours on end about how to change the spec to fit everyone’s new needs. We don’t need 100 stupid startups “disrupting” the “broken” email system with their new protocols, that will inevitably end up being  a proprietary, non-distributed, “ad hoc, informally-specified, bug-ridden, slow implementation of half of” the current email system.

Please don’t try to fix email, you’re just going to fuck it up!! Trust me, you can’t do any better. Instead, let’s build all of our awesome new features on top of an already beautifully-working system by making smarter clients.

So I started getting this annoying error on our staging server today. I searched all over and people’s only answer was “use a 64-bit server.” Ok, but my data is less than 200MB so don’t tell me my server can’t handle it.

To make a long story short, some of the Mongo data files in the datadir were owned by root, not the “mongodb” user. I chowned them back to “mongodb” and everything went back to normal. Why this happened, I don’t know, but at least there’s a fix ;-).

1 chown -R mongodb:mongodb /srv/mongo-datadir

A while ago I created a vim highlighting script called void.vim. I’ve been using it for over a year now and just updated some things that have been bothering me recently, so feel free to check it out. This is my main color scheme I use for everything, and I created it to be be easy on the eyes but to actually look nice too. A lot of the color schemes I’ve used seem to have been really loud or have a bad choice of colors. Void is my favorite.

Here’s a sample (created with vim’s :TOhtml command):

 1 /**
 2  * Trigger an event for this object, which in turn runs all callbacks for that
 3  * event WITH all parameters passed in to this function.
 4  *
 5  * For instance, you could do:
 6  * mymodel.bind("destroy", this.removeFromView.bind(this));
 7  * mymodel.trigger("destroy", "omg", "lol", "wtf");
 8  *
 9  * this.removeFromView will be called with the arguments "omg", "lol", "wtf".
10  *
11  * Note that any trigger event will also trigger the "all" event. the idea
12  * being that you can subscribe to anything happening on an object.
13  */
14 trigger: function(ev)
15 {
16     var args   =   shallow_array_clone(Array.from(arguments));
17     [ev, 'all'].each(function(type) {
18         if(!this._events[type]) return;
19         Array.clone(this._events[type]).each(function(callback) {
20             callback.apply(this, (type == 'all') ? args : args.slice(1));
21         }, this);
22     }, this);
23 
24     return this;
25 }

This is javascript, but I also use it for HTML, CSS, php, and lisp. Note that all code highlighting on this blog is done via vim with this color scheme.

I’m currently doing some server management. My current favorite tool is TMUX, which among many other things, allows you to save your session even if you are disconnected, split your screen into panes, etc etc. If it sounds great, that’s because it is. Every sysadmin would benefit from using TMUX (or it’s cousin, GNU screen).

There’s a security flaw though. Let’s say I log in as user “andrew” and attach to my previous TMUX session: tmux attach. Now I have to run a number of commands as root. Well, prefixing every command with sudo and manually typing in all the /sbin/ paths to each executable it a pain in the ass. I know this is a bad idea, but I’ll often spawn a root shell. Let’s say I spawn a root shell in a TMUX session, then go do something else, fully intending log out later, but I forget. My computer disconnects, and I forget there’s a root shell sitting there.

If someone manages to compromise the machine, and gain access to my user account, getting a root shell is as easy as doing tmux attach. Oops.

Well, I just found out you can timeout a shell after X seconds of inactivity, which is perfect for this case. As root:

1 echo -e "\n# logout after 5 minutes of inactivity\nexport TMOUT=300\n" >> /root/.bash_profile

Now I can open root shells until my ass bleeds, and after 5 minutes of inactivity, it will log out back into my normal user account.

A good sysadmin won’t make mistakes. A great sysadmin will make mistakes self-correct ;-].

I just spent ~1 hour installing (and uninstalling) various cache plugins for WP and each of them sucking in their own special way. The main problem is that they don’t work with PHP in safe_mode, which Nearly Free Speech turns on. This is limiting because it doesn’t let PHP make directory trees on its own, the directories have to be created by hand and then have special permissions applied to them. Most caching plugins create a ton of directory trees in the wp-content/cache/ folder, rendering them useless unless for my purposes.

I just installed QuickCache which uses flat files installed into the wp-content/cache/ folder. Here was my installation:

  • Upload the plugin to wp-content/plugins/
  • In SSH:
    chgrp web wp-content/
    mkdir wp-content/cache
    chgrp web wp-content/cache
  • Enable plugin in admin.
  • Done!

Stupid easy, works really well. It seems commonplace for people to develop wordpress plugins with their error reporting set to the absolute lowest and display_errors=0. I don’t appreciate this as it usually produces broken code. I’m thankful that the QuickCache authors put in the effort to, like, actually make it work. Thanks!

So my brother Jeff and I are building to Javascript-heavy applications at the moment (heavy as in all-js front-end). We needed a framework that provides loose coupling between the pieces, event/message-based invoking, and maps well to our data structures. A few choices came up, most notably Backbone.js and Spine. These are excellent frameworks. It took a while to wrap my head around the paradigms because I was so used to writing five layers deep of embedded events. Now that I have the hang of it, I can’t think of how I ever lived without it. There’s just one large problem…these libraries are for jQuery.

jQuery isn’t bad. We’ve always gravitated towards Mootools though. Mootools is a framework to make javascript more usable, jQuery is nearly a completely new language in itself written on top of javascript (and mainly for DOM manipulation). Both have their benefits, but we were always good at javascript before the frameworks came along, so something that made that knowledge more useful was an obvious choice for us.

I’ll also say that after spending some time with these frameworks and being sold (I especially liked Backbone.js) I gave jQuery another shot. I ported all of our common libraries to jQuery and I spent a few days getting used to it and learning how to do certain things. I couldn’t stand it. The thing that got me most was that there is no distinction between a DOM node and a collection of DOM nodes. Maybe I’m just too used to Moo (4+ years).

Composer.js »

composerSo we decided to roll our own. Composer.js was born. It merges aspects of Spine and Backbone.js into a Mootools-based MVC framework. It’s still in progress, but we’re solidifying a lot of the API so developers won’t have to worry about switching their code when v1 comes around.

Read the docs, give it a shot, and let us know if you have any problems or questions.

Also, yes, we blatantly ripped off Backbone.js in a lot of places. We’re pretty open about it, and also pretty open about attributing everything we took. They did some really awesome things. We didn’t necessarily want to do it differently more than we wanted a supported Mootools MVC framework that works like Backbone.

I was looking around for Riak information when I stumbled (not via stumble, but actually doing my own blundering) across a blog post that mentioned a Riak GUI. I checked it out. Install is simple and oddly enough, the tool uses only javascript and Riak (no web server needed). I have to say I’m thoroughly impressed by it. Currently the tool doesn’t do a ton besides listing buckets, keys, and stats, but you can edit your data inline and delete objects. It also supports Luwak, which I have no first-hand experience with and was unable to try out.

One thing I thought that was missing was a way to run a map-reduce on the cluster via text input boxes for the functions. It would make writing and testing them a bit simpler I think, but then again it would be easy enough to write this myself in PHP or even JS, so maybe I’ll add it in. Search integration would be nice too, although going to 127.0.0.1:8098/solr/[bucket]search?… is pretty stupid easy.

All in all, a great tool.

John Resign created some incredibly wonderful Javascript code for templating. It’s so terse that it almost shouldn’t work…but it does. I’ve been using it on a lot of front-end JS apps lately, and realized I could make a few changes and improvements.

First off, I don’t like <% asp style tags %>. It reminds me of programming ASP. It reminds me of a trip through hell I’ve taken too many times. I changed it to use PHP-style tags instead:

1 <ul class="<?=myclass?>">
2     <? for(var i = 0; i < items.length; i++) { ?>
3         <li><?=items[i].name?>
4     <? } ?>
5 </ul>

This makes it easier for me to type. I also made one further modification. Adding $ in front of your variables will check if they are undefined before using them, and if not defined will return them as null:

1 Hello, <?=$name?>.
2 <? if($user.friends) { ?>
3     You have <?=$user.friends.length?> friends.
4 <? } ?>

The if statement above will compile to

1 if((typeof(user.friends) == 'undefined' ? null : user.friends)) { ...

This allows some simple usages of undefined variables, such as “if(undefined_var) { … } else { … }” which actually pops up a lot. You still can’t access non-existent properties of variables that aren’t defined, but this should catch a lot of errors that would otherwise turn your code into a bunch of if(typeof …)’s.

Here’s the code (for brevity, I left out all the caching stuff that makes this fast):

 1 var template    =   '<h1><?=$title?></h1> ...';
 2 template        =   template.replace(/(\r\n|\n\r)/g, "\n");
 3 var fnstr       =
 4     "var ___p=[],print=function(){___p.push.apply(___p,arguments);};" +
 5     "with(obj) {___p.push('" +
 6     template
 7         // fix single quotes in html (escape them)
 8         .replace(/(^|\?>)([\s\S]*?)($|<\?)/g, function(match) {
 9             return match.replace(/'/g, '\\\'');
10         })
11         // implement safe usage of $varname
12         .replace(/<\?([\s\S]*?)\?>/g, function(match) {
13             return match.replace(/\$([a-z_][a-z0-9_\.]+)/gi, '(typeof($1) == "undefined" ? null : $1)').replace(/[\r\n]+/g, ' ');
14         })
15         .replace(/\r?\n/g, '___::NEWLINE::___')
16         .split("<?").join('___::TABBBBB::___')
17         .replace(/((^|\?>)(?!=___::TABBBBB::___))'/g, "$1___::SLASHR::___")
18         .replace(/___::TABBBBB::___=\s*\$?(.*?)\?>/g, "',$1,'")
19         .split('___::TABBBBB::___').join("');")
20         .split('?>').join("___p.push('")
21         .split("___::SLASHR::___").join("\\'")
22         .replace(/___::NEWLINE::___/g, '\'+ "\\n" + \'') +
23         "');}" + "return ___p.join('');";
24 var tpl_fn  =   new Function("obj", fnstr);

This has been working for me for a bit now, and has saved me countless annoying declarations at the top if my templates. If you run into any problems, please let me know.

I just did a writeup about MongoDB’s performance in the last big app we did. Now it’s time to rip Mono a new one.

Mono has been great. It’s .NET for linux. We originally implemented it because it’s noted for being a fast, robust compiled language. I didn’t know C# before starting the project, but afterwards I feel I have a fairly good grasp on it (10 months of using it constantly will do that). I have to say I like it. Coming from a background in C++, C# is very similar except the biggest draw is you don’t separate out your definitions from your code. Your code is your definition. No header files. I understand this is a requirement if you’re going to link code in C/C++ to other C/C++ code, but I hate doing it.

Back to the point, mono is great in many ways. It is fast, compiles from source fairly easily (although libgdiplus is another story, if you want to do image processing), and easy to program in.

We built out a large queuing system with C#. You enter jobs into a queue table in MongoDB, and they get processed based on priority/time entered (more or less) by C#. Jobs can be anything from gathering information from third-parties to generating images and layering them all together (I actually learned first-hand how some of these Photoshop filters work). The P/Invoke system allowed us to integrate with third party libraries where the language failed (such as simple web requests with timeouts or loading custom fonts,  for instance).

As with any project, it started off great. Small is good. Once we started processing large numbers of items in parallel, we’d get horrible crashes with native stacktraces. At first glance, it looked like problems with the Boehm garbage collector. We recompiled Mono with –enable-big-arrays and –with-large-heap. No luck. We contacted the Mono guys and, probably in lieu of all the political shenanigans happening with Mono at the moment, didn’t really have a good response for us. Any time the memory footprint got greater than 3.5G, it would crash. It didn’t happen immediately though, it seems random. Keep in mind Mono and the machines running it were 64bit…4G is not the limit!

Our solution was two fold:

  • Put crash-prone code into separate binaries and call them via shell. If the process crashes, oh well, try again. The entire queue doesn’t crash though. This is especially handy with the image libraries, which seem to have really nasty crashes every once in a while (not related to the garbage collection).
  • Make sure Monit is watching at all times.

We also gave the new sgen GC a try, but it was much too slow to even compare to the Boehm. It’s supposed to be faster, but pitting the two against each other in a highly concurrent setting crowned Boehm the clear winner.

All in all, I like C# the language and Mono seemed very well put together at a small to medium scale. The garbage collector shits out at a high memory/concurrency level. I wouldn’t put Mono in a server again until the GC stuff gets fixed, which seems low priority from my dealings with the devs. Still better than Java though.

Let me set the background by saying that I currently (until the end of the week anyway) work for a large tech company. We recently launched a reader app for iPad. On the backend we have a thin layer of PHP, and behind that a lot of processing via C# with Mono. I, along with my brother Jeff, wrote most of the backend (PHP and C#). The C# side is mainly a queuing system driven off of MongoDB.

Our queuing system is different from others in that it supports dependencies. For instance, before one job completes, its four children have to complete first. This allows us to create jobs that are actually trees of items all processing in parallel.

On a small scale, things went fairly well. We built the entire system out, and tested and built onto it over the period of a few months. Then came time for production testing. The nice thing about this app was that most of it could be tested via fake users and batch processing. We loaded up a few hundred thousand fake users and went to town. What did we find?

Without a doubt, MongoDB was the biggest bottleneck. What we really needed was a ton of write throughput. What did we do? Shard, of course. Problem was that we needed even distribution on insert…which would give us almost near-perfect balance for insert/update throughput. From what we found, there’s only one way to do this: give each queue item a randomly assigned “bucket” and shard based on that bucket value. In other words, do your own sharding manually, for the most part.

This was pretty disappointing. One of the whole reasons for going with Mongo is that it’s fast and scales easily. It really wasn’t as painless as everyone led us to believe. If I could do it all over again, I’d say screw dependencies, and put everything into Redis, but the dependencies required more advanced queries than any key-value system could do. I’m also convinced a single MySQL instance could have easily handled what four MongoDB shards could barely keep up with…but at this point, that’s just speculation.

So there’s my advice: don’t use MongoDB for evenly-distributed high-write applications. One of the hugest problems is that there is a global write lock on the database. Yes, the database…not the record, not the collection. You cannot write to MongoDB while another write is happening anywhere. Bad news bears.

On a more positive note, for everything BUT the queuing system (which we did get working GREAT after throwing enough servers at it, by the way) MongoDB has worked flawlessly. The schemaless design has cut development time in half AT LEAST, and replica sets really do work insanely well. After all’s said and done, I would use MongoDB again, but for read-mostly data. Anything that’s high-write, I’d go Redis (w/client key-hash sharding, like most memcached clients) or Riak (which I have zero experience in but sounds very promising).

TL,DR; MongoDB is awesome. I recommend it for most usages. We happened to pick one of the few things it’s not good at and ended up wasting a lot of time trying to patch it together. This could have been avoided if we picked something that was built for high write throughput, or dropped our application’s “queue dependency” requirements early on. I would like if MongoDB advertised the global write lock a bit more prominently, because I felt gypped when one of their devs mentioned it in passing months after we’d started. I do have a few other projects in the pipeline and plan on using MongoDB for them.