Saturday, June 28, 2008

The Oldest API

If you ask anyone about the Unix API, you'll get a lot of different answers. Some people will talk about the original Unix syscalls. Some will talk about which commands you expect to find on the system. Others might talk of POSIX. Still others might talk about which libc calls are most portable. However, there's a basic interface so fundamental to making Unix what it is that most people forget that it even exists.

I'm talking about the "process" API. When you run a Unix command, it generally has three magic "files" open. Represented by file-descriptors 0, 1, and 2, the "files" are called stdin, stdout, and stderr respectively; and they're really data streams that may refer to other processes or devices like the terminal or a serial port. Not surprisingly, the first one is read for input, the second one is written for output, and the third one is written for "error information". Finally, when the process exits, you get a "return code". An amazing amount can be done with just these simple tools.  Just the spawn(), read(), write(), and exit().  Add in setenv() if you care to pass information in the environment, too.

Why I am posting this nonsense on an Engine Yard-themed blog? In dealing with some internal engineering issues, I was struck by the elegance of components that use this API to great effect. In recognizing them, I realized that very few people probably even realize how simple it is to extend them using this relatively ancient API.

First, take a look at Nagios.  It's a slightly obtuse but fairly commonly used monitoring system.  For most small installations, you can quickly generate custom monitoring of your infrastructure.  Part of why it is so powerful is that it comes with a suite of fairly flexible plug-ins that do the heavy lifting of the monitoring.  What is slightly less well known is that these plug-ins use the process API.  With the simple application of a Ruby script (or shell, Python, Java, C, etc.), you can write a plug-in to monitor whatever you want.

How do you use this API, you ask?  Simply do whatever you need to do for check, then print out a line and exit with the appropriate return code.  The return codes are:
  • exit with return code zero (OK)
  • exit with return code one (WARN)
  • exit with return code two (CRIT)
  • exit with return code three (UNKNOWN)
The line of text has a format that encodes enough data that most graphing utilities can create some impressive graphs.  You simply output something like "OK - nuclear reactor is fine | temp=500 F;800;1000;0;1500, pressure=6000 kPa;10000;10800;0;12000".  This little bit of text gives two measurements, their names, their units of measure, the warning/critical threshold for each, and the range of each.

Another use of this simple API is found in RedHat's Clustering Suite (RHCS).  RHCS keeps track of which nodes are running.  These nodes lock appropriate resources to do their work.  In CLVM, they lock the clustered volume metadata.  In GFS, they lock blocks of the filesystem.  In csnap and cmirror, they lock blocks of a block device.  In all cases, these locks are critical to keep data from being trashed on your SAN.

When one of these nodes fails, the system must free the locks that the dead node held.  When those locks are freed, the old node no longer has permission to work with the resource it locked.  If that node were to wake up and keep going about its business (since it thinks it has the lock), then it might trash whatever data is represented by that resource.

To prevent this, the cluster "fences" the node.  The idea is that it puts it into a sort of virtual "penalty box" (i.e. behind a fence) that prevents this from happening.  As you might imagine, this is critical to the safety of data in a cluster.  Also important, every clustered infrastructure will have to do this differently.  Thus it is critical that it be as easy as possible to plug-in your own fencing agents (at least critical for the adoption of RHCS).

To write a fencing agent, all you have to do is write a program that reads / writes from stdin / stdout and returns a sane exit code.  Sound familiar?  Again, the most fundamental API in Unix rears it's venerable head.  The details are all in their wiki.  Using this interface, RHCS comes with agents that will allow manual fencing (for testing), fencing various SANs at the SAN itself, fencing machines by resetting them at programmable power switches, fencing virtual machines by talking to their control infrastructure, or anything else you can implement.

This was exactly was what was necessary here at Engine Yard, so this simple API came in handy.  It's a simple as it is powerful, and often, it's all that you need.

Sunday, June 1, 2008

May Progress and RailsConf 2008 Report

Despite the blog being dead for a while, we've had some pretty exciting developments.

The two new Erlang guys are onboard and by all appearances are coding up a storm.  In particular, Kevin has taken a pretty active role in getting a good RBAC implementation going in Erlang.

Ezra debuted at least the lower portions of his first run at my Vertebra architecture today at the final day of RailsConf.  Looks like it could be a hit.  Probably need to get a mailing list set up.

Edward and Jamie have spent a ton of time rocking on the Express Image, which also was mentioned.  Looks like Jamie is actually going to be spending some time here in the US to get some stuff seriously knocked out.  Tons of progress there.

Last month in Sacramento, Edward and I did some killer work on the networking for the new clustering.  It's not completely hammered out yet, but should be functional soon.  It's amazing how difficult it is to get multicast routing to work correctly given dynamically added/removed interfaces, link aggregation, and the NAT/IPVS stuff going on in front of everything.  Tons of moving parts.  The good news is that it looks like it might be possible to get multicast, link aggregation, and a notable lack of Proxy-ARP (which has been a blessing and a curse).

Finally, looks like Nanite might be beginning to roll, so that's good to see.  I think Jamie and Edward are targeting that for Engineering, post Express.  Need to coordinate all of it.

All in all, a productive month with tons of goodness coming down the pipe.

Note:  Some of the above projects may not be known to various people.  Some are still kind of under wraps.  Just drop me an e-mail and I can help fill in any blanks.