Engine Yard Body Shop

Saturday, April 18, 2009

Closing Down The Body Shop

Hey everybody,

This blog doesn't get used much. Since Engine Yard set up the blog aggregator on the main site, it doesn't really make sense to put stuff here, as I just have them aggregate it off of my site.

There are some good bits here (at least one or two), so I'm going to leave it around, but if you want to stay tuned to Engine Yard's blogginess, keep track at Union Station.

If you're actually interested in my blogging, well, I'm flattered. I'll continue to publish over on my blog.

Tuesday, October 14, 2008

News and Photos From The Vertebra Sprint in Omaha

Things are going strong here in Omaha. We've been focusing on getting Vertebra shored up for the open source release, Real Soon Now(TM).

I've been working on a solid, cross-platform installer script for all of the components. Kevin Smith and Kirk Haines have all been tearing into the integration testing. John Hornbeck has been visiting to get more information on the project. He (and his Vertebra pristine laptop) have been sussing out the system dependencies that we developers take for granted. Sam has been directing the work and working towards a some good demo material for the upcoming screencast.

All in all, great work. As an added bonus, I brought my camera. I've posted a set on Flickr that contains the latest photos. I may add a few more batches before the end of the sprint.

Wednesday, August 27, 2008

Busy Busy Busy

Engine Yard is going gangbusters right now.

We've been pushing newer versions of our clustering software. We have certain giant customers with insane load, and it's making for interesting work trying to scale them while they re-architect their applications.

Vertebra has also made some major progress. I think we may be really close to a public release. Sam and his team are getting the Erlang components in shape, and the Ruby guys have done an amazing job with the Ruby client library. I expect for us to have quite a decent number of systems actors in the coming days.

Things are really coming together.

Wednesday, July 23, 2008

Random Update

Lots of progress on Vertebra lately.

Since there's been a bit of interest, I thought some people might want to see the slides for my Vertebra presentation last month at the Velocity Conference.

Hope to see an open source release soon.

Saturday, June 28, 2008

The Oldest API

If you ask anyone about the Unix API, you'll get a lot of different answers. Some people will talk about the original Unix syscalls. Some will talk about which commands you expect to find on the system. Others might talk of POSIX. Still others might talk about which libc calls are most portable. However, there's a basic interface so fundamental to making Unix what it is that most people forget that it even exists.

I'm talking about the "process" API. When you run a Unix command, it generally has three magic "files" open. Represented by file-descriptors 0, 1, and 2, the "files" are called stdin, stdout, and stderr respectively; and they're really data streams that may refer to other processes or devices like the terminal or a serial port. Not surprisingly, the first one is read for input, the second one is written for output, and the third one is written for "error information". Finally, when the process exits, you get a "return code". An amazing amount can be done with just these simple tools. Just the spawn(), read(), write(), and exit(). Add in setenv() if you care to pass information in the environment, too.

Why I am posting this nonsense on an Engine Yard-themed blog? In dealing with some internal engineering issues, I was struck by the elegance of components that use this API to great effect. In recognizing them, I realized that very few people probably even realize how simple it is to extend them using this relatively ancient API.

First, take a look at Nagios. It's a slightly obtuse but fairly commonly used monitoring system. For most small installations, you can quickly generate custom monitoring of your infrastructure. Part of why it is so powerful is that it comes with a suite of fairly flexible plug-ins that do the heavy lifting of the monitoring. What is slightly less well known is that these plug-ins use the process API. With the simple application of a Ruby script (or shell, Python, Java, C, etc.), you can write a plug-in to monitor whatever you want.

How do you use this API, you ask? Simply do whatever you need to do for check, then print out a line and exit with the appropriate return code. The return codes are:

exit with return code zero (OK)
exit with return code one (WARN)
exit with return code two (CRIT)
exit with return code three (UNKNOWN)

The line of text has a format that encodes enough data that most graphing utilities can create some impressive graphs. You simply output something like "OK - nuclear reactor is fine | temp=500 F;800;1000;0;1500, pressure=6000 kPa;10000;10800;0;12000". This little bit of text gives two measurements, their names, their units of measure, the warning/critical threshold for each, and the range of each.

Another use of this simple API is found in RedHat's Clustering Suite (RHCS). RHCS keeps track of which nodes are running. These nodes lock appropriate resources to do their work. In CLVM, they lock the clustered volume metadata. In GFS, they lock blocks of the filesystem. In csnap and cmirror, they lock blocks of a block device. In all cases, these locks are critical to keep data from being trashed on your SAN.

When one of these nodes fails, the system must free the locks that the dead node held. When those locks are freed, the old node no longer has permission to work with the resource it locked. If that node were to wake up and keep going about its business (since it thinks it has the lock), then it might trash whatever data is represented by that resource.

To prevent this, the cluster "fences" the node. The idea is that it puts it into a sort of virtual "penalty box" (i.e. behind a fence) that prevents this from happening. As you might imagine, this is critical to the safety of data in a cluster. Also important, every clustered infrastructure will have to do this differently. Thus it is critical that it be as easy as possible to plug-in your own fencing agents (at least critical for the adoption of RHCS).

To write a fencing agent, all you have to do is write a program that reads / writes from stdin / stdout and returns a sane exit code. Sound familiar? Again, the most fundamental API in Unix rears it's venerable head. The details are all in their wiki. Using this interface, RHCS comes with agents that will allow manual fencing (for testing), fencing various SANs at the SAN itself, fencing machines by resetting them at programmable power switches, fencing virtual machines by talking to their control infrastructure, or anything else you can implement.

This was exactly was what was necessary here at Engine Yard, so this simple API came in handy. It's a simple as it is powerful, and often, it's all that you need.

Sunday, June 1, 2008

May Progress and RailsConf 2008 Report

Despite the blog being dead for a while, we've had some pretty exciting developments.

The two new Erlang guys are onboard and by all appearances are coding up a storm. In particular, Kevin has taken a pretty active role in getting a good RBAC implementation going in Erlang.

Ezra debuted at least the lower portions of his first run at my Vertebra architecture today at the final day of RailsConf. Looks like it could be a hit. Probably need to get a mailing list set up.

Edward and Jamie have spent a ton of time rocking on the Express Image, which also was mentioned. Looks like Jamie is actually going to be spending some time here in the US to get some stuff seriously knocked out. Tons of progress there.

Last month in Sacramento, Edward and I did some killer work on the networking for the new clustering. It's not completely hammered out yet, but should be functional soon. It's amazing how difficult it is to get multicast routing to work correctly given dynamically added/removed interfaces, link aggregation, and the NAT/IPVS stuff going on in front of everything. Tons of moving parts. The good news is that it looks like it might be possible to get multicast, link aggregation, and a notable lack of Proxy-ARP (which has been a blessing and a curse).

Finally, looks like Nanite might be beginning to roll, so that's good to see. I think Jamie and Edward are targeting that for Engineering, post Express. Need to coordinate all of it.

All in all, a productive month with tons of goodness coming down the pipe.

Note: Some of the above projects may not be known to various people. Some are still kind of under wraps. Just drop me an e-mail and I can help fill in any blanks.

Saturday, May 3, 2008

Brief Update

Well, for those of you who aren't on the inside of all of this, we've been pretty busy.

Ezra and I have had a few meetings about the architecture and the specs are getting really solid in the wiki.

We also hired two new Erlang guys to work on this. I expect great work from both of them.

More as it becomes available.