So I got to thinking. There are some good caching reverse proxies out there, maybe it’s time to check one out for beeets. Not that we get a ton of traffic or we really need one, but hey what if we get digged or something? Anyway, the setup now is not really what I call simple. HAproxy sits in front of NginX, which serves static content and sends PHP requests back to PHP-FPM. That’s three steps to load a fucking page. Most sites use apache + mod_php (one step)! But I like to tinker, and I like to see requests/second double when I’m running ab on beeets.

So, I’d like to try something like Varnish (sorry, Squid) but that’s adding one more step in between my requests and my content. Sure it would add a great speed boost, but it’s another layer of complexity. Plus it’s a whole nother service to ramp up on, which is fun but these days my time is limited. I did some research and found what I was looking for.

NginX has made me cream my pants every time I log onto the server since the day I installed it. It’s fast, stable, fast, and amazing. Wow, I love it. Now I read that NginX can cache FastCGI requests based on response caching headers. So I set it up, modified the beeets api to send back some Cache-Control junk, and voilà…a %2800 speed boost on some of the more complicated functions in the API.

Here’s the config I used:

# in http {}
fastcgi_cache_path /srv/tmp/cache/fastcgi_cache levels=1:2
                           keys_zone=php:16m
                           inactive=5m max_size=500m;
# after our normal fastcgi_* stuff in server {}
fastcgi_cache php;
fastcgi_cache_key $request_uri$request_body;
fastcgi_cache_valid any 1s;
fastcgi_pass_header Set-Cookie;
fastcgi_buffers 64 4k;

So we’re giving it a 500mb cache. It says that any valid cache is saved for 1 second, but this gets overriden with the Cache-Control headers sent by PHP. I’m using $request_body in the cache key because in our API, the actual request is sent through like:

GET /events/tags/1 HTTP/1.1
Host: ...

{"page":1,"per_page":10}

The params are sent through the HTTP body even in a GET. Why? I spent a good amount of time trying to get the API to accept the params through the query string, but decided that adding $request_body to one line in an NginX config was easier that re-working the structure of the API. So far so good.

That’s FastCGI acting as a reverse proxy cache. Ideally in our setup, HAproxy would be replaced by a reverse proxy cache like Varnish, and NginX would just stupidly forward requests to PHP like it was earlier today…but I like HAproxy. Having a health-checking load-balancer on every web server affords some interesting failover opportunities.

Anyway, hope this helps someone. NginX can be a caching reverse proxy. Maybe not the best, but sometimes, just sometimes,  simple > faster.

After all my research on what it means for a service to be RESTful, I think I’ve finally got a very good understanding. Once you understand a critical mass of information on the subject, something clicks and the first thing that comes in to your head is “Oh yeah! That makes sense!”

It’s important to think of a REST web service as a web site. How does a website work?

  • A website works using HTTP. If you need to fetch something on a website, you use the HTTP verb “GET.” If you need to change something, you use “POST.” A RESTful web service uses other HTTP verbs as well, namely PUT and DELETE, and can also implement OPTIONS to show which methods are appropriate for a resource.
  • A website has resources. A resource can be information, images, flash, etc. These resources can have different representations: HTML, a jpeg, an embedded video. REST is the same way. It is resource-centric. Want a list of users? GET /users. Want an event? GET /events/5. Want to edit that event? PUT /events/5. Every resource has a unique URL to identify it!
  • Resources are not dealt with directly. Instead, representations of resources are used. This can be a bit hard to grasp. What is a user? It’s a nebulous object somewhere that I cannot interact with. It is an idea, an entity. A representation is a form of the user resource I can interact with. A representation can be a comma delimited list, JSON, XML…anything the client and server both understand. How do we know what we’re interacting with? Media types:
  • As a website will tell you what kind of image you’re requesting, a REST service tells you what kind of resource representation you are receiving. This is done using media types. For instance, if I do a GET /events/7, the Content-Type may be “application/vnd.beeets.event+json” which tells us this is a vendor specific media (the “vnd”) and it’s an event in JSON format. You can pass these media types in your Accept headers to specify what type of representation you would like. These media types are documented somewhere so that client will know exactly what to expect when consuming them.
  • If you request a page that doesn’t exist or you aren’t authorized to view, a website will tell you. This is done using headers. A good REST service will utilize HTTP status headers to do the same. 200 Ok, 404 Not Found, 500 Internal Server Error, etc. These have already been defined and refined over many, many years by people who have been doing this a lot longer than you (probably)…use them.
  • A website will have links from one page to another. This is one of the main points of a REST service, and is also widely forgotten or misunderstood (it took me a while to figure it out even doing intense research). Resources in a REST service link to eachother, letting a client know what resources can be found where, and how they relate to eachother. An HTML page has links to it. So does a REST resource. Links can be structured however you like, but some good things to include are the URI of the linked resource, the relationship it has with the current resource, and the media type. This creates what’s known as a “loose coupling” between client and server. A client can crawl the server and figure out, only knowing a pre-defined set of media types, what resources are where and how to find them. This principal is known as HATEOAS (or “Hypermedia as the Engine of Application State”).
  • REST is stateless. This means that the server does not track any sort of client state. There are no session tokens the client uses to identify itself. There are no cookies set. Every request to the REST service must contain all information needed to make that request. Need to access a restricted resource? Send your authentication info for each request. It’s that simple. Isn’t it easier to track session? Not really. Maybe it’s easier on a small level, but once you start needing to scale, you will wish you’d gone stateless. Using a combination of HTTP basic authentication and API/Secret request signing, you don’t have to send over plain text passwords at all. Hell, even throw in a timestamp with each request to minimize replay attacks. You can get as crazy as you’d like with security. Or for those who prefer security over performance, use SSL.

Now for some examples. Because I’m currently working on an event application, we’ll use that for most of the examples.

Let’s get a list of events from our server:

GET /events
Host: api.beeets.com
Accept: application/vnd.beeets.events+json

{"page":1,"per_page":10}
-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 1430
Content-Type: application/vnd.beeets.events+json

{
	"total":81,
	"events":
	[
		{
			"links":
			[
				{
					"uri":"/events/6",
					"rel":"/rel/event self edit",
					"type":"application/vnd.beeets.event"
				},
				{
					"uri":"/locations/121",
					"rel":"/rel/location",
					"type":"application/vnd.beeets.location"
				}
			],
			"id":6,
			"title":"Paris Hilton naked onstage",
			...
		},
		...
	]
}

What do we have? A list of events, with links to the resource representations of those events. Notice we also have links to another resource: the location. We can leave that for now, but let’s pull up an event:

GET /events/6
Host: api.beeets.com
Accept: application/vnd.beeets.event+json

-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 666
Content-Type: application/vnd.beeets.event+json

{
	"links":
	[
		{
			"uri":"/events/6",
			"rel":"/rel/event self edit",
			"type":"application/vnd.beeets.event"
		},
		{
			"uri":"/locations/121",
			"rel":"/rel/location",
			"type":"application/vnd.beeets.location"
		}
	],
	"id":6,
	"title":"Paris Hilton naked onstage",
	"date":"2009-12-05T04:00:00Z"
}

Using the link provided in the event listing, we managed to pull up an individual event, which we know how to parse because we know the media type…but wait, what’s this? OMG, someone is trying to smear Paris!! She’s on at 8:30!!! NOT 8!!! Let’s edit…if we do a PUT with new information, we’ll be able to save Paris’ good name:

PUT /events/6
Host: api.beeets.com
Accept: application/vnd.beeets.event+json
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

{"title":"Paris Hilton naked onstage (yuck)","date":"2009-12-05T04:30:00Z"}
-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 666
Content-Type: application/vnd.beeets.event+json

{
	"links":
	[
		{
			"uri":"/events/6",
			"rel":"/rel/event self edit",
			"type":"application/vnd.beeets.event"
		},
		{
			"uri":"/locations/121",
			"rel":"/rel/location",
			"type":"application/vnd.beeets.location"
		}
	],
	"id":6,
	"title":"Paris Hilton naked onstage (yuck)",
	"date":"2009-12-05T04:30:00Z"
}

Holy shit, that was close. Notice we had to pass in authorization info, but now Paris won’t sue us for spreading misinformation. Phew.

What have we learned? Given one URL (/events), we have discovered two more (/locations/[id] and /events/[id]). We’ve also seen the media types in the responses that allow the client to know what kind of resource it’s dealing with and how to consume it.

Hopefully this pounds two really important points in: media types and HATEOAS. Without them, it’s not REST. You can’t just pass application/xml or application/json for every response. Sure, maybe the client can decode it, but they don’t know what it is, and without linking to other resources, they don’t know how to find anything…unless you want to document everything and never change your service.

Some other tips/points:

  • Give yourself a few initial entry points to your REST service. You should be able to discover all of the resources in it just by crawling. If you can’t, you haven’t done HATEOAS correctly. This is a lot harder than it sounds, but it’s more than useful later on. Think of your REST service like a website with good navigation.
  • Remember to implement the OPTIONS verb for your resources. It will tell the client what verbs can be used on what resources. With some decent routing built into your application, this should be a cakewalk.
  • As mentioned, you can use HTTP basic authentication for your requests. If the client is anything but a web browser, you won’t have to serve up an ugly popup login box, you can just do all that shit transparently. If you don’t want to send a cleartext password (please don’t!) you can salt the password on the client side and send it over. Hash the password again with the client’s secret for added security. Crackers will be amazed at your 1337 computer hacking skillz. You can then verify the hashed salted value on the server side. Add client-secret request signing with a timestamp for uber security.
  • Read a lot more info on REST. It seems that SO many “RESTful” services out there are half-baked and made by people who researched the topic for half a day. Some good ones to take points from are the Sun Cloud API and the Netflix API. Notice the documentation of media types and LACK of documentation on every single URL you can request. This is that loose-coupling stuff I was talking about.

That’s it for now! I wrote this as a culmination of knowledge for the last week or so of research I’ve done…please let me know if any information is missing or incorrect and I can make updates. Hope it was helpful!

This is the third post (in a row) I’ve made about REST. I’ve now read multiple explanations of REST, many of which say the same thing. Then I see implementations of these explanations, and the people who coined REST turn up their noses. I know someone has implemented REST correctly. I mean, someone out there HAS to have done it right…right? Every criticism of why a service is not RESTful uses terabytes of nit-picking jargon but somehow lacks examples on how that service could be made RESTful.

“This service is NOT RESTful because in REST, payloads are self-describing and contain references to other resources, and servers should be able to change their namespace, and two points always make a line unless one is in hyperspace.” And xyz.com’s REST API wasn’t doing this? What could they have done to change this? What IS REST about their API?

Maybe I’m dumb. Actually, no I’m not. I just can’t look at a spec and say, “Oh, I get it.” I have to see real, live examples. Why can nobody do this? It seems every HTTP “REST” service is a false implementation yet nobody can give a concise explanation of why.

I get that using only GET and POST is not RESTful. I get that every resource is a noun, and has a URL associated with it. I get that verbs are defined by HTTP and should not be used in URLs. I get that HTTP status headers should be used in conjuction with responses. I sort of get what a media type is (application/vnd.ieatmcdonaldstoomuch.cheeseburger) but do not get where it is defined. Or is this arbitrary?

Some points I have taken from all the back-and-forth involved in trying to break into an extremely elitist (and rightfully so, REST is REST) architectural style:

  • REST is stateless. All information required to complete the request is sent with the request. Every time.
  • REST re-uses existing specs. HTTP has defined some very useful verbs (GET/POST/PUT/DELETE) for us to use. It makes sense to use those instead of creating custom verbage. HTTP also has status headers for letting a client know WTF is happening. 200 OK, 506 Suck My Balls Client, etc.
  • REST maps resources to locations. I can find events by doing GET /events. I can post an event by doing POST /events. I can delete an event by doing DELETE /events/123. These URLs (/events, /events/123) are resources…an event collection and a single event. They are abstract things and a description of that thing is what is interacted with (aka “representation”).
  • REST uses media types to define resources. I’m assuming this is the “application/vnd.event+xml” content-type thing. This is very confusing for me. Where is this MIME type defined? How is it defined? How does a client consume it? WTF is going on here? Any ideas, anyone?
  • A REST service is self-descriptive. Reading one of Roy Fielding’s annoyingly-complex criticisms of bad REST implementations brings up the point that given an initial URL and a “set of standardized media types that are appropriate for the intended audience,” I should be able to completely discover all information on that service. So when I hit the homepage, I should get an XML/JSON/HTML response telling me that there are events under /events? How do I structure this response? What “media type” would this resource be? I get the point behind this, but would love to see a full implementation.

So while I’m have a lot of trouble grasping the last two points, everything else seems to make good sense. So far, anyway. I’m sure in a few minutes I’ll read another guide telling me that everything I know about REST is WRONG and I’m going to hell for even thinking the word REST without knowing exactly what it means.

Perhaps a REST web service that describes how to implement REST over HTTP would be a fun and amusing project…

UPDATE – I suppose a website comprised of HTML that describes REST would be RESTful…no need to make a “service.” Maybe this should be taken under because there seems to be a vague disseration, a few authoritative “resources” who love to use useless jargon, and a collection of blog posts that individually are all wrong, but pieced together create a somewhat workable view of that REST over HTTP should be like. So, someone should definitely get on that. In fact, I just might.

This is a response to a previous post I made about how HTTP authentication is the reason why I’m not building a REST API. After doing a bit more research, I’ve decided that REST is many times better than RPC. Reviewing a lot more information on REST revealed that HTTP authentication is not at all required…but instead a suggested method of authentication only because it’s already built. In fact, any authentication scheme could be cooked up and used. I’ve currently got my eye on OAuth.

That being said, our internal framework is now being updated to support routing different requests to different controllers/actions based on the request method. For instance, I can now route a “POST /events” to /events/add_event, or “GET /events” to /events/get_events. That just about seals the deal on RESTifying the framework, except that I’ll also have to come up with a scheme to implement status headers in the response, either automatically or semi-manually.

There will be no need to do a sort-of REST implementation on the www site. All information needed can be gathered through the API. Gnar, duude.

In conclusion, I’ve decided to go with REST after all. It’s a great architecture, and the best part is that it’s already built for me. Wicked.

UPDATE: This article is completely flawed and was written when I had absolutely no clue what I was talking about. Please see my most up-to-date post on RESTful services.

So I’ve been in the process of converting beeets.com, which is currently a “website” with integrated back-end code, to a front-end with a back-end API. It’s been an interesting journey, and has spawned a lot of thought recently about how to implement the API.

Obviously, whenever someone says web API, REST is the first this that comes to mind. “Let’s make it a REST API!” From what I’ve been reading about REST (I tend not to use terminology without knowing what the hell it means) many “REST” services aren’t very RESTful (“How I Explained REST to My Wife” – great intro to REST). In fact, the way I’ve implemented the beeets API happens to be closer to RPC. After doing my fair share of research, I’m ok with this.

REST seems like an excellent architecture. In fact, one of the world’s largest global networks is built on that architecture. It works well for what it does…but what the hell does it do? From what I’ve been reading, it gives “resources” (pieces of information) a single location to be accessed/updated/created/deleted. At least this is how it works on the web.

RPC is basically a function or method that runs on a remote machine that can be called by a local machine, completely transparent to the local code. Most RPC implementations work on their own conventions, separately from the beautiful architecture which is REST…what is otherwise known as reinventing the wheel.

So why am I okay with using RPC in the beeets API? The main reason is that RESTful authentication is total junk. It’s great for server-server communication, but NOT so great for user interactions. Users are used to seeing pretty login boxes with “forgot your password? reset it, moron!” or “create an account!” links. Popping up an HTTP login screen is not cool. It’s not cool. This is my main problem with REST.

That being said, if you cannot authenticate REST, then using the built-in methods POST, PUT, DELETE are more or less worthless, unless you have a completely public set of information that anybody can add to, change, or remove…in which case, REST would kick ass.

In light of REST lacking in this area, I think the following makes sense. Create an RPC API that authenticated servers can call to get/create/update/delete information. IPhones, Facebook apps, etc can use this RPC for interacting with the actual dataset. Then on the www front-end, create a REST system that only implements GET.

So I could go to beeets.com/events/123-native-american-slaughter-celebration and get an HTML page with links I can click, an awesome layout, etc. I could go to beeets.com/events/123-native-american-slaughter-celebration.json and get a JSON representation of that event, with URLs to other pages which could be JSON-encoded if need be.

In other words, create a read-only RESTful service on top of the www site, and use an RPC for create/update/delete interactions with the dataset. The already-public information is freely accessibly, without any sort of authentication, so any person or system that cares to consume it can do so. This way you’re still mapping a resource to a location, and providing multiple consumption formats for said resource.

If there is any other way to authenticate users against a RESTful system without an HTTP login, and while being able to control the actual login/registration flow and layout, PLEASE let me know!