web services | kill the radio

After all my research on what it means for a service to be RESTful, I think I've finally got a very good understanding. Once you understand a critical mass of information on the subject, something clicks and the first thing that comes in to your head is "Oh yeah! That makes sense!"

It's important to think of a REST web service as a web site. How does a website work?

A website works using HTTP. If you need to fetch something on a website, you use the HTTP verb "GET." If you need to change something, you use "POST." A RESTful web service uses other HTTP verbs as well, namely PUT and DELETE, and can also implement OPTIONS to show which methods are appropriate for a resource.
A website has resources. A resource can be information, images, flash, etc. These resources can have different representations: HTML, a jpeg, an embedded video. REST is the same way. It is resource-centric. Want a list of users? GET /users. Want an event? GET /events/5. Want to edit that event? PUT /events/5. Every resource has a unique URL to identify it!
Resources are not dealt with directly. Instead, representations of resources are used. This can be a bit hard to grasp. What is a user? It's a nebulous object somewhere that I cannot interact with. It is an idea, an entity. A representation is a form of the user resource I can interact with. A representation can be a comma delimited list, JSON, XML...anything the client and server both understand. How do we know what we're interacting with? Media types:
As a website will tell you what kind of image you're requesting, a REST service tells you what kind of resource representation you are receiving. This is done using media types. For instance, if I do a GET /events/7, the Content-Type may be "application/vnd.beeets.event+json" which tells us this is a vendor specific media (the "vnd") and it's an event in JSON format. You can pass these media types in your Accept headers to specify what type of representation you would like. These media types are documented somewhere so that client will know exactly what to expect when consuming them.
If you request a page that doesn't exist or you aren't authorized to view, a website will tell you. This is done using headers. A good REST service will utilize HTTP status headers to do the same. 200 Ok, 404 Not Found, 500 Internal Server Error, etc. These have already been defined and refined over many, many years by people who have been doing this a lot longer than you (probably)...use them.
A website will have links from one page to another. This is one of the main points of a REST service, and is also widely forgotten or misunderstood (it took me a while to figure it out even doing intense research). Resources in a REST service link to eachother, letting a client know what resources can be found where, and how they relate to eachother. An HTML page has links to it. So does a REST resource. Links can be structured however you like, but some good things to include are the URI of the linked resource, the relationship it has with the current resource, and the media type. This creates what's known as a "loose coupling" between client and server. A client can crawl the server and figure out, only knowing a pre-defined set of media types, what resources are where and how to find them. This principal is known as HATEOAS (or "Hypermedia as the Engine of Application State").
REST is stateless. This means that the server does not track any sort of client state. There are no session tokens the client uses to identify itself. There are no cookies set. Every request to the REST service must contain all information needed to make that request. Need to access a restricted resource? Send your authentication info for each request. It's that simple. Isn't it easier to track session? Not really. Maybe it's easier on a small level, but once you start needing to scale, you will wish you'd gone stateless. Using a combination of HTTP basic authentication and API/Secret request signing, you don't have to send over plain text passwords at all. Hell, even throw in a timestamp with each request to minimize replay attacks. You can get as crazy as you'd like with security. Or for those who prefer security over performance, use SSL.

Now for some examples. Because I'm currently working on an event application, we'll use that for most of the examples.

Let's get a list of events from our server:

GET /events
Host: api.beeets.com
Accept: application/vnd.beeets.events+json
{"page":1,"per_page":10}
-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 1430
Content-Type: application/vnd.beeets.events+json
{
	"total":81,
	"events":
	[
		{
			"links":
			[
				{
					"uri":"/events/6",
					"rel":"/rel/event self edit",
					"type":"application/vnd.beeets.event"
				},
				{
					"uri":"/locations/121",
					"rel":"/rel/location",
					"type":"application/vnd.beeets.location"
				}
			],
			"id":6,
			"title":"Paris Hilton naked onstage",
			...
		},
		...
	]
}

What do we have? A list of events, with links to the resource representations of those events. Notice we also have links to another resource: the location. We can leave that for now, but let's pull up an event:

GET /events/6
Host: api.beeets.com
Accept: application/vnd.beeets.event+json
-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 666
Content-Type: application/vnd.beeets.event+json
{
	"links":
	[
		{
			"uri":"/events/6",
			"rel":"/rel/event self edit",
			"type":"application/vnd.beeets.event"
		},
		{
			"uri":"/locations/121",
			"rel":"/rel/location",
			"type":"application/vnd.beeets.location"
		}
	],
	"id":6,
	"title":"Paris Hilton naked onstage",
	"date":"2009-12-05T04:00:00Z"
}

Using the link provided in the event listing, we managed to pull up an individual event, which we know how to parse because we know the media type...but wait, what's this? OMG, someone is trying to smear Paris!! She's on at 8:30!!! NOT 8!!! Let's edit...if we do a PUT with new information, we'll be able to save Paris' good name:

PUT /events/6
Host: api.beeets.com
Accept: application/vnd.beeets.event+json
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
{"title":"Paris Hilton naked onstage (yuck)","date":"2009-12-05T04:30:00Z"}
-----------------------------------------
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2009 04:12:48 GMT
Content-Length: 666
Content-Type: application/vnd.beeets.event+json
{
	"links":
	[
		{
			"uri":"/events/6",
			"rel":"/rel/event self edit",
			"type":"application/vnd.beeets.event"
		},
		{
			"uri":"/locations/121",
			"rel":"/rel/location",
			"type":"application/vnd.beeets.location"
		}
	],
	"id":6,
	"title":"Paris Hilton naked onstage (yuck)",
	"date":"2009-12-05T04:30:00Z"
}

What have we learned? Given one URL (/events), we have discovered two more (/locations/[id] and /events/[id]). We've also seen the media types in the responses that allow the client to know what kind of resource it's dealing with and how to consume it.

Hopefully this pounds two really important points in: media types and HATEOAS. Without them, it's not REST. You can't just pass application/xml or application/json for every response. Sure, maybe the client can decode it, but they don't know what it is, and without linking to other resources, they don't know how to find anything...unless you want to document everything and never change your service.

Some other tips/points:

Give yourself a few initial entry points to your REST service. You should be able to discover all of the resources in it just by crawling. If you can't, you haven't done HATEOAS correctly. This is a lot harder than it sounds, but it's more than useful later on. Think of your REST service like a website with good navigation.
Remember to implement the OPTIONS verb for your resources. It will tell the client what verbs can be used on what resources. With some decent routing built into your application, this should be a cakewalk.
As mentioned, you can use HTTP basic authentication for your requests. If the client is anything but a web browser, you won't have to serve up an ugly popup login box, you can just do all that shit transparently. If you don't want to send a cleartext password (please don't!) you can salt the password on the client side and send it over. Hash the password again with the client's secret for added security. Crackers will be amazed at your 1337 computer hacking skillz. You can then verify the hashed salted value on the server side. Add client-secret request signing with a timestamp for uber security.
Read a lot more info on REST. It seems that SO many "RESTful" services out there are half-baked and made by people who researched the topic for half a day. Some good ones to take points from are the Sun Cloud API and the Netflix API. Notice the documentation of media types and LACK of documentation on every single URL you can request. This is that loose-coupling stuff I was talking about.

That's it for now! I wrote this as a culmination of knowledge for the last week or so of research I've done...please let me know if any information is missing or incorrect and I can make updates. Hope it was helpful!

Kill the radio (or die trying)

Recent posts

NginX as a caching reverse proxy for PHP

A simple (but long-winded) guide to REST web services