Monday, February 23, 2009

When an elephant sits on your blade center

Trade shows should be considered a full-contact sport. Especially if you are a blade center heading to a Cassatt booth.

To show off our software at the Gartner Data Center Conference or other industry data center gatherings, we not only ship a couple of smiling, knowledgeable employees, but we also send a mobile computing cluster. Thing is, we've had a few problems with it arriving in one piece.

And you thought trade show cocktail parties were rough.

Our mobile computing cluster is *supposed* to be an easy way to show off some of the cool tricks you can pull off with our software (or actually, to watch our software show itself off -- policy-based automation can do stuff like that). This mobile cluster is essentially a data-center-in-a-box. Unlike the big Sun, Rackable, and other "18-wheeler" or "mobile-home"-style containerized data centers, this is more like an oversized suitcase, stuffed with servers.

As you can see from the picture here, something happened on the way to the Forum. More on that in a moment.

How it's supposed to work

The idea for this mobile computing cluster is simple: put 4 servers in a 2' x 3' shipping case on wheels. Include a network (and switch) and voila!, you have a simplistic data center that you can wheel into a trade show booth for product demonstrations. One of the cluster's servers has the Cassatt Active Response controller software on it. The others are running permutations of Windows, Linux, and/or VMware ESX. All the servers in this case were Dell blade servers.

To begin the "showing off" part, then, all we usually have to do is set up Cassatt Active Response to show two scenarios. First is a simulation of a dev/test environment moving back and forth between used and unused states (say, from during the workday to after hours and back again). In response to the policies you set, our software gracefully shuts down the software and servers based on a time schedule. Second is a simulation that shows how the software dynamically handles changing demand on our sample applications. Applications are given priorities and as load on those apps increases (one of our smiling, knowledgeable employees helps this along), Cassatt automatically provisions new physical or virtual servers to handle the load, turning on cold, bare metal, and laying down the operating system and application image (plus virtual machine where appropriate) to enable the app to scale out. As the load on the app drops, Cassatt de-provisions unnecessary servers, turns them off, returning them to the spare pool of resources. The fun you can have with demand-based policies, eh?

What happened in Vegas nearly stayed in Vegas

As I mentioned at the outset, at the recent Gartner Data Center Conference in Vegas, things didn't quite go as planned. When we got to the booth, our sturdy data-center-in-a-box looked like it had been sat upon by a large elephant. Or had been on the receiving end of a very angry forklift. OK, so it wasn't anything like what the GoGrid guys show at nohardware.com, but something very heavy had definitely come into contact with the crate, warping the frame so much that the servers were no longer sitting squarely in their tracks. In fact, there wasn't much "squareness" left at all.

Of course, since we were in Vegas, we took bets on how many servers would actually even turn on.

I figured all was lost from a demo standpoint. I started thinking about what other flashing gizmos we could include in the booth to attract attention if the software was out of commission. But hang on a minute, our techie gurus said. This seems like exactly the kind of thing our software should be able to do: it should be able to help the apps in our mini-data center recover from minor setbacks like having all available servers dislodged by unidentified blunt trauma. You know, the kind of thing that happens in your data center every day.

So, the apps already had service levels assigned to them. The Cassatt control node had a pool of hardware to work with, uncertain though the quality was. We booted up the control node, crossed our fingers, and let it do its thing.

Truth be told, we also used the next few seconds to glance over to see where the fire extinguishers were, and how close our nearest usable exit actually was.

Software that finds and uses whatever available resources you have for your apps

The good news: the controller turned on. And, after our techies fiddled with a network cable or two, our software was talking to the power controllers for each of the other servers. It booted each server in turn as it looked for working compute hardware to support the demo application at the service level we had set. It got a couple of the servers working. The one remaining server, not so much. When one of the most damaged servers didn’t respond appropriately, Cassatt Active Response "quarantined" it: putting it into the maintenance pool for a smiling, knowledgeable human to investigate. Of course, we knew already that there was, um, a hardware problem.

So, a happy ending. We were able to show off our software and came away with a great little true-life application resiliency story out of the deal.

Even better, it turned out that our booth was right next to the bar. And we had glowing green swizzle sticks to hand out. But that was just the Vegas trade show gods trying to make it all up to us somehow, I think.

The really happy ending is that after we (carefully) shipped the damaged mobile cluster back to our offices and pulled each of the servers out, it turned out that each blade survived the ordeal. We should probably let our friends at Dell know this. Their blades are officially Cassatt trade-show proof.

The shipping crate, however, has been retired to a corner of our headquarters offices that we call The Dark Side, where it awaits its fate as a vaguely modern coffee table or other such creative use in which having its sides at 90 degree angles to each other is not a requirement.

5 comments:

Anonymous said...

I'll have to tell you my horror stories of hardware and trade shows, including a one-of-a-kind SGI Crimson, and a Digital server with forklift holes run through it.

davemc

Jay Fry said...

This I gotta hear, Dave. But I think I can guess how it turned out. :^)

Anonymous said...

Awesome story, I love the picture!

Did you ever find out just what happened on the way there?

Jay Fry said...

Rob--
No, we didn't. It had to be something *seriously* out of the ordinary. The mind reels at the possibilities.

Of course, the shipping people were rather, um, vague. Needless to say, they're not my favorite shipping people anymore.

Mike Foley said...

Dave, I was there when the forklift when thru the DEC server! Bent the frame really nice. And the union guy just shrugged and said "I didn't see anything" (of course)

Oh the horror stories from tradeshows past I could tell.

mike foley