Engine Yard Body Shop: 2008

Tuesday, October 14, 2008

News and Photos From The Vertebra Sprint in Omaha

Things are going strong here in Omaha. We've been focusing on getting Vertebra shored up for the open source release, Real Soon Now(TM).

I've been working on a solid, cross-platform installer script for all of the components. Kevin Smith and Kirk Haines have all been tearing into the integration testing. John Hornbeck has been visiting to get more information on the project. He (and his Vertebra pristine laptop) have been sussing out the system dependencies that we developers take for granted. Sam has been directing the work and working towards a some good demo material for the upcoming screencast.

All in all, great work. As an added bonus, I brought my camera. I've posted a set on Flickr that contains the latest photos. I may add a few more batches before the end of the sprint.

Wednesday, August 27, 2008

Busy Busy Busy

Engine Yard is going gangbusters right now.

We've been pushing newer versions of our clustering software. We have certain giant customers with insane load, and it's making for interesting work trying to scale them while they re-architect their applications.

Vertebra has also made some major progress. I think we may be really close to a public release. Sam and his team are getting the Erlang components in shape, and the Ruby guys have done an amazing job with the Ruby client library. I expect for us to have quite a decent number of systems actors in the coming days.

Things are really coming together.

Wednesday, July 23, 2008

Random Update

Lots of progress on Vertebra lately.

Since there's been a bit of interest, I thought some people might want to see the slides for my Vertebra presentation last month at the Velocity Conference.

Hope to see an open source release soon.

Saturday, June 28, 2008

The Oldest API

If you ask anyone about the Unix API, you'll get a lot of different answers. Some people will talk about the original Unix syscalls. Some will talk about which commands you expect to find on the system. Others might talk of POSIX. Still others might talk about which libc calls are most portable. However, there's a basic interface so fundamental to making Unix what it is that most people forget that it even exists.

I'm talking about the "process" API. When you run a Unix command, it generally has three magic "files" open. Represented by file-descriptors 0, 1, and 2, the "files" are called stdin, stdout, and stderr respectively; and they're really data streams that may refer to other processes or devices like the terminal or a serial port. Not surprisingly, the first one is read for input, the second one is written for output, and the third one is written for "error information". Finally, when the process exits, you get a "return code". An amazing amount can be done with just these simple tools. Just the spawn(), read(), write(), and exit(). Add in setenv() if you care to pass information in the environment, too.

Why I am posting this nonsense on an Engine Yard-themed blog? In dealing with some internal engineering issues, I was struck by the elegance of components that use this API to great effect. In recognizing them, I realized that very few people probably even realize how simple it is to extend them using this relatively ancient API.

First, take a look at Nagios. It's a slightly obtuse but fairly commonly used monitoring system. For most small installations, you can quickly generate custom monitoring of your infrastructure. Part of why it is so powerful is that it comes with a suite of fairly flexible plug-ins that do the heavy lifting of the monitoring. What is slightly less well known is that these plug-ins use the process API. With the simple application of a Ruby script (or shell, Python, Java, C, etc.), you can write a plug-in to monitor whatever you want.

How do you use this API, you ask? Simply do whatever you need to do for check, then print out a line and exit with the appropriate return code. The return codes are:

exit with return code zero (OK)
exit with return code one (WARN)
exit with return code two (CRIT)
exit with return code three (UNKNOWN)

The line of text has a format that encodes enough data that most graphing utilities can create some impressive graphs. You simply output something like "OK - nuclear reactor is fine | temp=500 F;800;1000;0;1500, pressure=6000 kPa;10000;10800;0;12000". This little bit of text gives two measurements, their names, their units of measure, the warning/critical threshold for each, and the range of each.

Another use of this simple API is found in RedHat's Clustering Suite (RHCS). RHCS keeps track of which nodes are running. These nodes lock appropriate resources to do their work. In CLVM, they lock the clustered volume metadata. In GFS, they lock blocks of the filesystem. In csnap and cmirror, they lock blocks of a block device. In all cases, these locks are critical to keep data from being trashed on your SAN.

When one of these nodes fails, the system must free the locks that the dead node held. When those locks are freed, the old node no longer has permission to work with the resource it locked. If that node were to wake up and keep going about its business (since it thinks it has the lock), then it might trash whatever data is represented by that resource.

To prevent this, the cluster "fences" the node. The idea is that it puts it into a sort of virtual "penalty box" (i.e. behind a fence) that prevents this from happening. As you might imagine, this is critical to the safety of data in a cluster. Also important, every clustered infrastructure will have to do this differently. Thus it is critical that it be as easy as possible to plug-in your own fencing agents (at least critical for the adoption of RHCS).

To write a fencing agent, all you have to do is write a program that reads / writes from stdin / stdout and returns a sane exit code. Sound familiar? Again, the most fundamental API in Unix rears it's venerable head. The details are all in their wiki. Using this interface, RHCS comes with agents that will allow manual fencing (for testing), fencing various SANs at the SAN itself, fencing machines by resetting them at programmable power switches, fencing virtual machines by talking to their control infrastructure, or anything else you can implement.

This was exactly was what was necessary here at Engine Yard, so this simple API came in handy. It's a simple as it is powerful, and often, it's all that you need.

Sunday, June 1, 2008

May Progress and RailsConf 2008 Report

Despite the blog being dead for a while, we've had some pretty exciting developments.

The two new Erlang guys are onboard and by all appearances are coding up a storm. In particular, Kevin has taken a pretty active role in getting a good RBAC implementation going in Erlang.

Ezra debuted at least the lower portions of his first run at my Vertebra architecture today at the final day of RailsConf. Looks like it could be a hit. Probably need to get a mailing list set up.

Edward and Jamie have spent a ton of time rocking on the Express Image, which also was mentioned. Looks like Jamie is actually going to be spending some time here in the US to get some stuff seriously knocked out. Tons of progress there.

Last month in Sacramento, Edward and I did some killer work on the networking for the new clustering. It's not completely hammered out yet, but should be functional soon. It's amazing how difficult it is to get multicast routing to work correctly given dynamically added/removed interfaces, link aggregation, and the NAT/IPVS stuff going on in front of everything. Tons of moving parts. The good news is that it looks like it might be possible to get multicast, link aggregation, and a notable lack of Proxy-ARP (which has been a blessing and a curse).

Finally, looks like Nanite might be beginning to roll, so that's good to see. I think Jamie and Edward are targeting that for Engineering, post Express. Need to coordinate all of it.

All in all, a productive month with tons of goodness coming down the pipe.

Note: Some of the above projects may not be known to various people. Some are still kind of under wraps. Just drop me an e-mail and I can help fill in any blanks.

Saturday, May 3, 2008

Brief Update

Well, for those of you who aren't on the inside of all of this, we've been pretty busy.

Ezra and I have had a few meetings about the architecture and the specs are getting really solid in the wiki.

We also hired two new Erlang guys to work on this. I expect great work from both of them.

More as it becomes available.

Wednesday, April 9, 2008

Roles and Workflows FTW!

Some of the most useful things to come out of Computer Science are solutions to the complex problems that you encounter when you combine seemingly simple things. In the design for Vertebra, we've been dealing with a number of simple technologies, trying to seam them into a whole.

One of the more vexing problems has been how to do security. Inevitably a distributed system has the chance to be a distributed security disaster. We've taken a look at using an RBAC system (role based access control). This has interesting implications when integrated with our distributed workflow system.

Enter the ACM Digital Library. Got to love it. It looks like there are people researching this. There I found an article titled "A Flexible Model Supporting the Specification and Enforcement of Role-based Authorizations in Workflow Management Systems". I don't think we'll come close to implementing the whole system they describe, but it sure is nice to have an understanding of where it all could go.

It also confirms that we must be doing something right, as we're implementing stuff that was leading edge Computer Science only a decade ago!

Tuesday, April 1, 2008

Busy Week At The Yard

I wish this was an April Fools post, but it's not. Sorry to disappoint.

I've been so tied up with little things that there hasn't been much progress on development.

Not that it's boring. We're busy rolling out our "Special Projects" cluster. Just what those projects are is still kind of a secret, but it's important to get it rolling. We're also doing some exciting stuff with Open Source Telephony.

Also, I'm excited about my meeting with Ezra tomorrow over Nanite, an internal management project. Not entirely sure how it's supposed to work yet, and I already have some suggestions, but it's nice to see the enthusiasm. Any time someone mentions Puppet and Git in the same sentence certainly raises my eyebrows.

Friday, March 21, 2008

RamDisk Backed By Real Disk?

Ramback looks like an interesting project.

Essentially, it's a device-mapper layer that sits in between a backing store (i.e. regular block device) and a virtual DM device.

In most cases, it acts as a virtual RAM disk of the same size as the device. Writes automagically happen to the RAM disk and it flushing stuff out to disk as massive (albeit unsafe) linear writes. The idea is that your system is embedded and battery backed up. When line power goes, you go into an emergency mode where everything gets flushed to disk.

I've been scheming about something like:

AoE <--> DRBD <--> RamBack <--> AoE SAN (staggered across physical shelves)

This would be cool to have ultra-fast but still reliable storage for things like MySQL transaction logs, BDB transaction logs, JFS/Ext3/ReiserFS external journals, and the like. Rather than being backed up by batteries (to provide RamBack with assurance of runtime), DRBD would provide the redundancy against single failure (although separate power feeds would be nice).

When it's SMP-ready I might take a stab at shoving it in production.

UPDATE: I meant DRBD, not GNBD. Also, I recently discovered Write-Intent Bitmaps for MD devices, which also would be great for this kind of backing store.

Documentation and Testing

Big changes to the Agent Architecture as of late. In particular the security infrastructure is getting fleshed out a bit. It doesn't feel like an afterthought, which is nice.

Unfortunately I'm not getting as much code done as I'd like. I've spent a lot of time revising the documentation and working out examples. Writing examples is slow going, as there's a lot to document. However, I'm committed to it, since the examples are going to become my unit tests. It's all in the wiki. I can't seem to access the wiki without logging in, so I probably need to ask somebody when we are opening that part up.

Speaking of unit tests, with a messaging protocol it's absolutely amazing how much infrastructure has to be in place to test some of this stuff. Some of the original design was really poor for testing against. Getting a good test environment still eludes me a bit, although I'm getting there.

Erlang packaging is still a little fuzzy for me. Ezra sent me a link to this exciting Erlang P2P project called Monsoon that's on GitHub. I've actually learned a bit about packaging an Erlang App from them, so that makes me feel good at least.

Well, back to the grindstone.

Monday, March 17, 2008

Entrepot, Docs, Testing, and Collaboration!

So I'm currently doing some work on Entrepôt. I won't go too into detail, but it's part of our management infrastructure. It's been my focus for a while.

I took a break tonight to move some of the design docs over to the new wiki. I'm unsure when it will be available to the public, but I hope it's soon. I'll be sure to mention it here.

Before I tear back into things too hard, I think I'm going to work on the protocol examples a little more. If you're familiar with the project, you'll know that's just extensions to XMPP. At any rate, that shouldn't take too long, but I expect it will be useful for anyone who wants to collaborate. It'll also be helpful for me, since I really need some more tests once I get a good framework together for testing our XMPP infrastructure.

I'm meeting with some people later this week that may like to help, and that's always nice.

Body Shop Open For Business

Well, my old blog is currently under wraps, and I wanted somewhere to document the work we're doing, so I opened this sucker up. In a perfect world, I may even update it regularly! We'll just have to see.

We've been doing some exciting design work. I must admit I was kind of shocked when I found out how much of it we were going to open source. I mean, every high-minded code-jockey thinks "I'm going to get THIS employer to open source my code." but invariably never gets anywhere.

These guys are different. Of course, it helps that I helped to found it with Tom and Ezra, but I'm continually surprised how committed that our founders, employees, and even our investors are to opening this stuff up. Talk about a group of people that really "get it".

That's enough of my sentimental ramblings. Back into the hole for me...