The Tool Shed: ClusterIt
The Story
Lean back for a bit and imagine that you have rack upon rack of FreeBSD web servers, all humming along like little electronic bee hives. They glow blue in the dim light because you've got the fancy new 1U rack servers with the blue LEDs instead of the boring old green lights. All the network cables are tidy and the web services run with so much stability that your pager hasn't gone off in weeks.
Your favourite CD is playing in your workstation and with your headphones on you can tune out the rest of the corporation, basking in the knowledge that your corner of the world is just beautiful.
Oh yeahhh.
That's paradise for some of us.
And then some generic manager comes in, taps you on the shoulder (interrupting John Lee Hooker in Chill Out), and tells you that she needs to know which web servers have PHP4 installed. Apparently there's some strange non-Perl script that a customer wants to run. Oh, and she needs to know in the next two minutes.
Your idyllic world starts to crumble around you.
Naturally, she doesn't apologize for the short notice — after all, you're the computer guru and you can wave the magic wand to make it happen. It's just, you know, expected.
"Glarge", and several other unpronouncable noises, would normally be appropriate here. I'd even sympathize with you if you banged your head on your desk a few times. I know, I've been there.
However, I've been playing with ClusterIt lately, and if you had too you'd have that magic wand. Logging onto to your workstation, you'd do this:
$ dsh 'portversion -v | grep -i php' coyote: mod_php4-4.3.1 = up-to-date with port athena: mod_php4-4.3.0 < needs updating (port has 4.3.1)
You say that only the servers named `coyote' and `athena' have it installed, and that Athena isn't running the most current version.
``Oh,'' she says, ``can we get it installed on all the other servers by this afternoon?''
``No problem.'' you can reply. A quick dsh 'portinstall -P php' later and you're back to listening to Mr. John Lee performing with Van Morrison and Carlos Santana. Bliss.
The Details
As the home page for ClusterIt says, it's modeled after IBM's PSSP. That means something to the odd AIX admin out there (Hi Lonny!), but not much to folks coming from a Linux or BSD background. I'd explain the concept thusly: ClusterIt allows you to treat a bunch of computers as a single computer for most kinds of batch jobs, and some kinds of interactive jobs.
Installing ClusterIt is fairly trivial if you don't care about job scheduling or syncronizing jobs between nodes. There's only a few pieces that ClusterIt doesn't provide and that you'll have to set up yourself. You're going to need your own kind of passwordless authentication system — while old-fashioned rhosts works fine, I'd recommend using Kerberized rsh or SSH with keys if you don't want to be taught a lesson in security by the first bored teenager to wander along. You're going to need a fairly firm grasp of how shells and quoting works. A homogenous environment also doesn't hurt, though it's not necessary.
To get started, simply install the port, package or RPM on one of your machines. Create a simple config file consisting of one host name per line (basically listing the hosts that are part of the ``cluster'') and set the CLUSTER environment variable to point to that file. With that minimal configuration, you should be ready to use the base tools:
- dsh – run a command on a cluster of machines (in parallel)
- dshbak – a dsh front-end that tidies up the line-wrapping problem that prepending every line with the host name creates
- run – run a command on a random node
- seq – run a command on a cluster on machines (in sequence)
- pcp – copy a file to a cluster of machines
- pdf – display free disk space across a cluster of machines (portable, converts the df output from a variety of Unix variants)
- prm – delete a file on a cluster of machines
With a little more configuration (i.e., installing some daemons on each of the nodes) you can run some of the more advanced tools:
- barrier – synchronize a process on a number of machines (requires barrierd to be installed on every node)
- barrierd – the daemon portion of barrier
- jsd – a simple command scheduling daemon for remote execution (must be installed on every node)
- jsh – run scheduled commands on remote machines
If you don't want to, though, the base tools will give you 80% of the benefit for 20% of the installation work.
The Wrap-up
Excited by your new clustering tools, you introduce the junior administrators to ClusterIt and let 'em loose.
Not even a day passes before one of them, logged in as root and with their shell using /usr as their current working directory (naturally), types dsh rm -rf /home/junior/tmp/ * ... and that extra space before the star makes all the difference.
As you thump your forehead against the desk, you think ``If only there was a nice article on shell quoting ... and restricting rootly powers ... and on change control ...''

