Tiger Lab Migration Part 11: Panasas Crashes and Caches

The last time we visited this topic, I thought we were done. Well, turns out I was wrong.

Things are, for the most part, working well now. Finally. We're running Tiger and we've managed to iron out the bulk of the problems. There is one issue which has persisted, however: the home account RAID.

To refresh, our network RAID, which is responsible for housing all our users' home account data, is made by a company called Panasas. Near as I can figure, we've got some experimental model, 'cause boy does it crash a lot. Which is not what you want in a home account server, by any means. After upgrading the Panasas OS awhile back, the crashing had stopped. But it was only temporary. Lately the crashing is back with a vengeance. Like every couple of days it goes down. And when it goes down, it goes down hard. Like physical-reset hard. Like pull-the-director-blace-and-wait hard. Like sit-and-wait-for-the-RAID-to-rebuild hard.

Again: Not what you want in a home account server.

So we've built a new one. Actually, we've swapped our backup and home account servers. See, awhile back we decided it would be prudent to have a backup of the home account server. Just something quick 'n' dirty. Just in case. This was built on a reasonably fast custom box with a RAID controller and a bunch of drives. It's a cheap solution, but it does what we need it to, and it does it well. And now we're using it as the new home account server. So far it's been completely stable. No crashes in a week-and-a-half. Keep in mind, this is a $3000 dollar machine, not a $10,000 network RAID. It's not that fast, but it's fast enough. And it's stable. By god it's stable.

And that's what you want in a home account server.

Moving to the new server — which, by the way, is a simple Linux box running Fedora Core 4 — has afforded us the opportunity to change — or, actually, revert — the way login happens on the Macs. In the latter half of las semester, we were using a login hook that called mount_nfs because of problems with how Mac OS X 10.4.2 handled our Panasas setup, which creates a separate volume (read: filesystem) for each user account. Since we're now just using a standard Linux box to share our home accounts, which are now just folders, we have the luxury of reverting to the original method of mounting user accounts that we used last year under Mac OS X 10.3. That is, the home account server is mounted at boot time in the appropriate directory using automount, rather than with mount_nfs at login. Switching back was pretty simple: Disable the login hook (by deleting root's com.apple.loginwindow.plist file), place a Startup Item that calls the new server in /Library/StartupItems, reboot and you're done, right? Well, not quite. There's one last thing you need to do before you can proceed. Seems that, even after doing all of the above, the login hook was still running. I could delete the scripts it called, but it would still run. Know why? This will blow your mind. Cache.

Yup. It turns out — and who would have ever suspected this, as it's so incredibly stupid — login hooks get cached somewhere in /Library/Caches and will continue to run until these caches are deleted. I'm sorry, but I just have to take a minute and say, that is fucked up. Why would such a thing need to be cached? I mean maybe there's a minimal speed boost from doing this. The problem is that now you have a system level behavior that's in cache, and these caches are fairly persistent. They don't seem to reset. And they don't seem to update. This is like if your browser only used cached pages to show you websites, and never compared the cache to files on the server. You'd never be able to see anything but stale data without going and clearing the browser cache. At least in a browser — 'cause let's face it, this does happen from time to time (but not very often) — there is always some mechanism for clearing caches — a button, a menu item, a preference. In Mac OS X there is no such beast. In fact, the only way to delete caches in Mac OSX is to go to one or all of the various Cache folders and delete them by hand. Which is what I did, and which is what finally stopped the login scripts from running.

If this isn't clear evidence that Mac OS X needs some much better cache management, I don't know what is.

In any case, we're now not only happily running Tiger in the lab, but we've effectively switched over to a new home account server as well. So far, so good. Knock wood and all that. Between the home account problems, the Tiger migration, and getting ready for our server migration, this has been one of the busiest semesters ever. Though I keep this site anonymous because I write about work, I just want to give a nod to all the people who've helped me with all of the above. I certainly have not been doing all of this alone (thank god) and they've been doing kick-ass work. And, though I can't mention them by name, I really appreciate them for it. At the dawn of a new semester, we've finally worked out all of our long-standing problems and can get down to more forward-looking projects.

So ends (I hope) the Tiger Lab Migration.

Tiger Lab Migration Part 10: Up and Running

Take it as a symbol of my incompetence. Take it as a sign of the inherent difficulties in unifying any highly cross-platform environment. Take it as an indication that major OS updates in said environments are often far more difficult than you might have ever guessed. Take it for what you will. I'm just happy and relieved to report that, after many months of difficulties, we seem to be out of the woods: Our Tiger Lab Migration is complete.

(Note the sound of me frantically knocking wood.)

There are several things that have transpired -- things mostly out of my control -- that have made this migration, at last, complete. If you recall, there were two initial major problems from which we were suffering at the outset of the migration: 1) Mac OS X 10.4.2 was unable to write files across filesystems, which was impeding our use of certain software, particularly Final Cut Pro, which is used heavily in our lab; and 2) our Panasas brand home account RAID (on which we host our networked Mac home accounts via NFS) was crashing, almost daily at the end there. Problem number one was mitigated by certain workarounds I was able to implement using mount_nfs and loginhooks rather than the preferred automount method for mounting home accounts on our client workstations. Problem number two had both myself and the Panasas engineers scratching our heads.

Ironically, both problems are solved by system updates.

The inability of Mac OS X 10.4 to successfully write files across filesystems is cured, thankfully, with the latest update to the OS: 10.4.3. I have not yet implemented 10.4.3 lab-wide as yet, and so we're still running my mount_nfs workaround on the client systems, but on my admin machine I am testing the new update, and so far so good. Things behave properly again. I will also be running the update and the automount method on a test system on the floor for good measure, but I'm reasonably confident that this issue can, at last, be put to rest.

The home account server crashes also seem to have been cured by updating the Panasas RAID OS software. We upgraded on Friday, November 4th, 2005, and have not had a single crash since. This could be mere coincidence, but I'd be pretty surprised if that were the case. The crashing seems to have stopped, and I'd bet a fair chunk of cash that the update is the reason why this is so.

The best part of all this is that, for the past week, the lab has been quiet. People come in. They log on to the systems. They do their work. And everything functions properly, without incident, for the first time in months. The knocks on my door, the panicked moments, the hair-pulling-filled long nights and weekends have all subsided. My stress levels are slowly returning to what they once were, which is not to say normal, but rather normal for me. It is truly a thing of beauty.

There are two things a SysAdmin likes best in the world: a good challenge, and successfully overcoming said challenge. The Tiger Lab Migration has afforded me a hefty dose of both. And despite all the trauma, it's been an enlightening and illuminating experience from which I've learned a very great deal.

Tiger Lab Migration Part 9: More Problems and Solutions

Well, when last we visited this issue, we were having problems saving, among other things, Final Cut Pro files. After working long and hard with the Panasas folks, I was confident I'd found a solution. And I had, but the solution has created new problems. I want to kill Apple for Tiger.

To refresh, the original problem was that Final Cut was trying to write files across filesystems, and the OS was choking on this operation. This was happening because our RAID exports its /home volume -- and this is what we mount -- but each user account is also an individual volume. FCP writes to the /home level, and then has to cross filesystems to save the file to the user's home account, itself a seperate volume. Our solution was, rather than mounting all of /home, to mount each user's home account individually. This required a whole bunch of scripting and launchd voodoo to really work the way we wanted, particularly with regards to adding new users. But we got it working, and it fixed the Final Cut problem. I implemented it, tested it, and emailed the community about the fix.

Unfortunately, I recently discovered that using this method causes some new, less ugly, but certainly annoying problems: the sidebar home account link no longer works, causing untold confusion among users who suddenly think their home account no longer exists (as per the alert message that appears when clicking this link); but most alarming, users can no longer open documents from their home accounts. Not only can users not open docs, but root via cron fails to open documents. This is a big problem, as our "Scratch" partition gets deleted every Friday, and the users rely on an alert message to warn them of the oncoming doom. This message no longer launches, and that's a real bummer.

So I was in the middle of drafting an email to the community informing them of the latest bugs and workarounds, when I was struck with extreme embarrassment and shame. How can I continue to expect users to work around these bugs when each week brings a new set? Sure, it's not my fault that Tiger is broken. It's not my fault that the Panasas is set up the way it is. But it is my responsibility to create a user environment that works seamlessly and properly for the user. So I resolved to do everything in my power to implement a solution. I stayed at work on Friday night until 2:30 AM, learning much, but left with no solution in hand.

What I learned on Friday, essentially, is that automount in Tiger is totally fucked. I already knew that NFS in Tiger was fucked in that it can't cross filesystems. But it turns out there have been some major changes to the way automount is handled in the GUI, and thus, for all intents and purposes -- or at least for our intents and purposes -- it is indeed fucked. Hard. The inability to follow sidebar references is a direct result of automount's new Finder behavior: automount mounts in the Finder are now invisible! Unless the path is explicitly called, the user accounts, mounted as they are, are quite invisible, both to the user, and apparently to the Finder's sidebar. I also discovered that the Finder's inability to launch files from the user's home account has something to do with automount, or at least how automount is implemented in the GUI. This behavior only exhibits itself from these invisible NFS mounts; it goes away if we mount the user's home account in /home, as we used to do. And it goes away if we use a different command to mount user homes.

Enter mount_nfs.

The mount_nfs command is used to directly and statically mount NFS exports, and it has its own set of peculiarities. First and foremost, mount_nfs grafts mounts to existing folders, which means that mount_nfs requires a folder named for the user who wants to log in. Secondly, and equally important in our scenario, mount_nfs magically mounts the export in the Finder. Or, rather, shows the mount in the Finder (and this has actually, probably more to do with how the Finder translates mounts requested by nfs_mount). Lastly, since the mounts are static, they remain mounted and appear in the Finder until they are explicitly unmounted. Mounting via mount_nfs is equivalent to mounting nfs exports in the Finder with command-k. It also seems to work, you may notice, completely opposite to automount.

To do what we've been doing -- which is to automount home accounts at boot -- using mount_nfs instead would require us to mount every home account at boot on every machine, and those 200+ mounts would be present in the Finder at all times. This simply would not work. So, mount_nfs at boot: not good. You thinking what I'm thinking?

Enter loginhooks.

To use mount_nfs at boot would be disastrous. What we need in this case is something that will dynamically mount and unmount users' home accounts at login and logout respectively. And that's just what loginhooks and logouthooks do, respectively. Getting loginhooks to work in Tiger was again an exercise in frustration, but this time it was due to my own poor understanding of the technology. There is a great reference on login hooks at Mike Bombich's site, and a decent one at Apple's Knowledge Base as well. Between these two resources, I was finally able to cobble together a solution over the weekend which I think will work.

Briefly, there are three things you need to do to implement loginhooks, and some info you need to know.

The info first:
1) Scripts called by loginhooks run as root (good)
2) They run before login to the GUI takes place, and after authentication (also good)
3) The user requesting login can be represented in your script by the variable $1 (excellent!)

What to do:
1) Write a script to be executed at login, make it executable, put it somewhere accessible
2) Run this command:
sudo defaults write com.apple.loginwindow LoginHook /path/to/loginScript
3) Test the script! This is imperative! You can test it as any user you want by running it thusly:
sudo /path/to/script User

For "User," specify a user who would actually log in, and who is available to the system. "User" here will be interpreted in your script with the $1 variable, just as it will at login. If it works in your test, it should work in practice as a loginhook.

Setting up a logouthook works the same way, except you run the command:
sudo defaults write com.apple.loginwindow LogoutHook /path/to/logoutScript

And, BTW, to ch eck that the loginhook (or logouthook, or both) has been successfully added, you should see your script(s) listed, when running this command:
sudo defaults read com.apple.loginwindow

Finally, to disable, login(logout)hooks run:
sudo defaults delete com.apple.loginwindow LoginHook
and:
sudo defaults write com.apple.loginwindow LogoutHook

So, what I have now are two scripts. The login one creates a folder in /home named for the user who's logging in, and then uses mount_nfs to requset the user's home account from the server and mount it in this folder. The logout one forcibly unmounts the user's home account. And that is all.

I've only tested this at home, but it seems to work brilliantly here, and is fast even over a wireless connection. I will be trying it at work on Monday morning and will post back with my results, success or failure.

Wish me luck.

UPDATE 1:
So far, so good. There were some snags, though. I came in a little early this morning (in hopes of starting before the lab became overrun with students) to implement and test my loginhook plan, only to find the home account server had crashed. So, I had to spend the first hour restarting and troubleshooting the Panasas. By the time I'd finished, the lab was filling up. So I had to do everything piecemeal, and it took a bit longer then I'd hoped. That was snag one.

The next snag was that, after one user logged in, the next user would be locked out of his/her home account. This turned out to be a simple matter of setting permissions (on the /home directory) via the loginhook in a manner in which this did not happen.

There are some advantages and disadvantages to this new mounting method. The greatest benefit is that any changes to the mount method are done via very simple shell scripts, and changes don't require a reboot to take effect. Just change the script and you're done. Another big plus it that, if the home account server goes down, the machines aren't really affected to the extent they once were. If no one's logged in, there's no server access happening, so the machines are fine. Only if someone's logged in is it a problem, but then, that's always a problem. It occurs to me, though, that I may want to add a -hard option to my mount script sometime soon. Another advantage is the fact that, since only individual user accounts are mounted, and not the entire home account server, a command like sudo rm -rf /home can't wipe out the entire server as it could before. The final advantage is that reboots are now much faster, since nothing happens at boot time.

The disadvantages are twofold, and solvable but minor: First, since the entire home account server is never mounted now, users no longer have easy access to each other's home accounts. To access another user's home account, they now must connect to the home account server via the Finder's "Connect to Server..." command (or command-k). Secondly, any user who wants to ssh to one of the other Macs, say, from a laptop, will not have access to his home account since the mount only occurs when logging in to the GUI. These issues are minor, and can be solved by simply mounting the home account server somewhere other than /home, but I think we'll try and live without it for now, as I kind of like not having the whole RAID mounted.

Anyway, no complaints thus far. I'll be convinced this is a go after a week or so without problems. But I'm cautiously optimistic at this point.

UPDATE 2:
Day two, and all's quiet on the Western front, as it were. I'm obsessively monitoring the situation, but it's looking very good at this point. Fingers crossed.

Tiger Lab Migration Part 8: Home Account Woes

Boy have we got troubles. Sure I tested everything. Sure I did everything I could to make sure this upgrade went smoothly. But does that ever matter? No. It doesn't.

Everything tested out fine. Tiger was able, from day one, to mount our NFS home account server -- a network RAID known as the Panasas. It was able to read files, to write files. It seemed fine. But we'd had our fair share of troubles with the Panasas, even in the beginning, even in Panther. Certain Macromedia products -- actually, Flash MX 2004, to be specific -- had certain functional disabilities. In particular, when any user was logged onto the Mac as a network user whose home account was located on the Panasas, he would find himself unable to publish HTML previews from within the Flash MX 2004 application. The error message was something along the lines that Flash was unable to read the HTML templates, which live in the user's home account. Oddly, on first launch, Flash could write the templates, but for some reason, it could not read them for the preview. If you have any Flash knowledge whatsoever (which I don't -- I was informed by student experts) this renders Flash essentially useless. The problem did not, however, exhibit itself with network home accounts mounted on Apple shares, neither via AFP nor NFS. So the workaround was to create a special account for Flash developers, which they could log into when doing Flash work, and which was mounted directly on the MacServer and shared via AFP. Hence was born our FlashDev account, which has worked fine all year.

Jump ahead to the present: we've upgraded to Tiger. It seems to have gone fine. Suddenly, certain apps begin acting strangely. Illustrator can't save files. Word also has trouble saving files, and behaves erratically, complaining of "permissions problems" and the inability to write temp files to a "network disk." Final Cut, too, begins acting flakey. Shit. It's like one of these sci-fi-horror flicks where the brain transfer experiment seem to have gone perfectly, but then, gradually, the patient begins acting... Different somehow...

And then all Hell breaks loose.

Tracking down all the problems, I finally came to the conclusion that a number of applications are unable to read their preference files in Tiger when those files reside on the Panasas. This perhaps causes a chain reaction that makes these apps unable to properly write or save files: FCP writes 0 byte files; Illustrator CS just can't save your document. Oddly, Illustrator CS2 doesn't seem to suffer from this problem, and all these apps (except, of course, Flash) seem to work properly from within the Panther environment.

Seems to me like a nasty reaction between Tiger and the Panasas home account server.

I've done some testing, and I plan to do more. So far, I've tried booting into a Panther client and found that the problem is greatly reduced in Panther (though it always existed to some extent, particularly, again, with Flash) and exacerbated in Tiger. I've also tried resharing the Panasas NFS export via AFP from the MacServer. I was fairly sure this would provide a reasonable, if temporary, workaround, but it didn't. I want to try it again, just to be sure, but if resharing via AFP does not provide a cure, it would indicate that the fault lies somewhere with the Panasas. The other possible culprit is Tiger's implementation of NFS. To test this, I am building a more generic (non-Apple, non-Panasas, non-proprietary) NFS server to test from, on a Linux install on my new Dell laptop.

So, best laid plans, and all that. What a frickin' mess.

I'll keep you posted as I learn more.

UPDATE 1:
I built a Linux install for the express purpose of testing networked Mac home accounts on a non-proprietary (non-Mac, non-Panasas) NFS mount. When running a Mac home account from this mount, the Final Cut problem disappeared, and the application acted normally. The same was true for Microsoft Word. Illustrator CS still behaved badly, though, as did Flash MX 2004. It's clear now that these problems result both from Tiger's implementation of NFS as well as Panasas's. So, the breakdown goes something like this:

  • The Flash problem is probably Flash/NFS-specific (though possibly Mac-specific as well)
  • The Illustrator problem is Tiger/NFS-specific
  • The Final Cut and Word problems are Tiger/Panasas-specific

None of this makes my job any easier. It kind of sucks to realize that there is no magic bullet, because that means that there probably is no single fix. Still, it's useful to know.

UPDATE 2:
The issue with Final Cut is not a matter of the application being unable to read files; it's a matter of it being unable to write them. Essentially, FCP cannot write a preference file. Launching with no preference file, FCP creates a 0 byte preference file in the user's home account. Also, if you try to save the FCP project file to the NFS share, it will be a 0 byte file and unopenable. Saving the FCP project file to a local volume results in a perfectly usable FCP project file. The complete inability to write files seems to be unique to Final Cut. Other applications (Illustrator, Flash) seem to have trouble reading files, but writing them seems to work fine (which is why I assumed this was the case in FCP). Word (now updated to 11.2) also has trouble writing files, but only when saving an open, modified document. That is, if you create a new document in Word, and hit "Save," it saves just fine. If you keep the document open, make changes, then hit "Save" again, it complains that it can't write to the shared disk. Hit "Save" again, and Word will create a new document and use the first string of characters in the document as the name. "Save As..." however, works as expected.

UPDATE 3:
It occurs to me (through the suggestion of other, smarter folks) that maybe this is not completely an NFS problem, particularly in the case of the apps that have problems writing files (Word and FCP). Those apps work just fine when we mount our homes on a generic NFS export. So it's possible that the write problems we're having are filesystem-related, not NFS-related. This theory holds a lot of water considering the fact that the Panasas uses its own, proprietary file system.

UPDATE 4:
I've been in contact with the Panasas folks, and I'm working with them in trying to determine the problem. So far, I've sent them tcpdump results from both the Panasas and the Mac while FCP tries to save a file. I'm told that there are a "massive number of 'file not found' type errors" from the dump. That can't be good. I've also sent them a few file listings from the home account of the user in question. Meanwhile, I'm wondering if the next Tiger update will do anything to correct these issues, or if upgrading the Panasas shelf would help, while at the same time, trying to come up with a reasonable workaround, in case Panasas is unable to reslove the issue. The only thing I can think of is to move the Mac home accounts to another server. Which would suck hard. Two steps backward.

UPDATE 5:
Still in constant contact with Panasas. I'm sending them log files now, and planning to do a ktrace, which will be a new experience for me. Also, I'm seeing this issue occur with other applications now. In particular, Macromedia Director MX 2004 is unable to write files to the Panasas mount. This is particularly odd because under Panther that application had a completely different problem: it just wouldn't launch. So I'm currently dealing with three seperate incident reports: one for the problem noted here, one for a blade crash that happened a couple of weeks ago, and one for the original Flash problem from our Panther days. It's a pain. Fortunately, the Panasas people are very nice, professio nal, and helpful.

UPDATE 6:
So, this does seem to be a filesystem issue at its core. Sort of. After much work with the Panasas guy, we've narrowed down the problem considerably.

Mac applications write temporary files -- files that are in use but that haven't yet been saved -- to various locations depending on the app. Final Cut writes its temporary files to a folder called .TemporaryItems at the top of whatever volume the user is working from. So in our model, the top volume is /home, and the sboy home account is inside /home. The whole thing looks like this: /home/sboy. Final Cut wants to write its temp files in /home/.TemporaryItems. And it is able to do this. If I look in /home/.TemporaryItems, I find all the files I've been unable to save to my user account, perfectly intact. So, temporary files go in /home/.TemporaryItems, and saved files in /home/sboy. Now, here's the grind: The Panasas setup requires that each home account be a seperate volume. So, /home is one big volume, and /sboy is another volume inside /home. This means that when Final Cut tries to move the temporary files from the .TemporaryItems folder, which is located on the volume /home, to the sboy user account, which is another volume at /home/sboy, it is moving the files across filesystems. And we get an error.

Interestingly, moving files across filesystems when Panasas is mounted on a Linux box produces the same error, but Linux does something to compensate for this problem, and is then able to save the file. But the error still gets produced. It's just that Mac OS X 10.4.2 (this did not happen in Panther) does not have the mechanism to correct for this error. It gives up and leaves the file in its original location: the /home/.TemporaryItems directory.

Also, this explains why everything works fine when I mount the user's home account on my Linux NFS export. In that case, everything is on the same volume, and there is no moving of files across filesystems. It's got nothing to do with NFS at all.

I'm curious to know why this doesn't happen in Panther. Is it possible that Panther has the same error correction function we see on Linux? What about earlier versions of 10.4? Do they have this mechanism? Will 10.4.3 have it? I surely hope so. I'll be looking into who I should contact at Apple about all of this.

Incidentally, the Panasas guys have figured all this out by examining ktrace and tcpdump files made while the program ran and the error occured. I did not figure any of this out on my own. All I did was run the commands and upload the files. Still, it's a nifty bit of sleuthing. I'm learing a lot.

UPDATE 7:
So, after mulling this over for a while, a rather obvious workaround occured to me. Currently, we mount the entire Panasas volume on our systems. This volume includes all home accounts within it, as well as the .TemporaryItems folder at its top level. Each user's home account on Panasas is a separate volume, and so is the top-level volume that contains .TemporaryItems, so saving files must cross filesystems (from the .TemporaryItems folder on /home to the user's folder). The workaround is to mount each user's home account individually. Doing so causes FCP to treat the user's home account as the active volume, and to use the .TemporaryItems folder in said home account. In this scenario, no filesystem boundaries are crossed, and no error occurs. I tried it, and it works. FCP (and Word, incidentally) can now both save files to the user's home account normally, and application preference stick. Final Cut is again usable.

Hallelujah!

I'm testing this on a couple machines before I go and announce it to the community and put it on all the Macs, just to make sure it works properly. But it's looking good. I'm glad to know that the past two weeks I've spent on this haven't been in vain.

Tiger Lab Migration Part 7: Wiggly-Niggly

Well, we're getting close. We're down to last, wiggly-niggly little tidbits. Here's where we've been in the last few weeks:

The Master System
We have completed a very nice build of Tiger with most of the software we need on all machines, though we're lacking all the most recent Adobe stuff due to not having received the most recent Adobe stuff. 'S'okay. We'll make do with CS1 for now. Our system is all loaded up with our custom scripts, and a new, beefier home account mount script. And it's got a nice, big, 75GB system partition. Why, it's just swell.

We've used this system to clone to all our other lab machines. We boot these other machines in firewire target disk mode, and then we use Disk Utility to restore the Master System to the new machine's system partition. Or, we wanted to use Disk Utility. Turned out that our first Disk Utility-cloned system had some problems due to -- yep, you betcha -- Tiger bugs. The problem was that Disk Utility wasn't blessing the new system partition, so when we'd boot the new machine, we'd get the flashing question mark of doom. So we'd have to then boot off the Tiger install disk, and set the new system partition on the newly built machine to be the startup disk. I decided this was a pain, and that it would be much easier and more instructional to do our clones with a tiny, tiny shell script that calls ASR (Apple Software Restore, via the asr command).

#!/bin/bash

## Script to Clone Local Volumes #### systemsboy - Aug 18, 2005 ##

clear

## Define Variables ##

echo "Please enter the path to the SOURCE volume or ASR-ready disk image...(Example: /Volumes/MyVolume)"read SOURCE

echo "Please enter the path to the TARGET volume...(Example: /Volumes/MyVolume)"read TARGET

## ASR Portion of Script ##echo "Building Clone... Please wait...________________________________"sudo asr -source "$SOURCE" -target "$TARGET" -erase

exit 0

This did the trick indeed, and was as easy to use, if not as purdy, as the Disk Utility application. This method only yielded one problem, and that was that all the invisible folders -- /var /etc /private and the like -- were not made invisible on our cloned systems. This is another Tiger bug, but one that was fixed in 10.4.2's version of Disk Utility. Apparently not fixed in ASR from the command-line, however. This problem yielded both an intersting solution, and some interesting information. Back in the day, these files were made invisible to the Finder by the magic of the .hidden file. The .hidden file was simply a list of files or folders at the top level of the root drive that should be treated as invisible by the Finder. Want to make something invisible on /? Just add it to the .hidden file and restart the Finder. (Ironically, the .hidden file was also at the root level of the system, but was rendered invisible by the fact that its name began with a period.) Nowadays, however -- in Tiger, that is -- this is no longer that case. Files on root are now made invisible, according to this Apple KBase article, by setting certain file attributes. I'll tell you, I have yet to figure out what those attributes are, or how to modify them, but when I do I'll shout. In the interim, I was forced to locate the "SetHidden" program on my original Tiger install DVD. After insering the DVD, it can be found at:

/Volumes/Mac\ OS\ X\ Install\ DVD/System/Installation/Packages/OSInstall.mpkg/Contents/Resources/

I've now copied this file to a folder called SetFile, which I've placed in the Applications folder for future use. Also, I needed the hidden_MacOS9, which can be found, and should be placed, in the same folder as the SetHidden command. So I now have a folder in my Applications folder, called SetHidden, in which the SetHidden command and the hidden_MacOS9 files reside, installed on my master, and I've added the commands:

cd /Applications/SetHiddensudo SetHidden /Path/To/My/Clone/System hidden_MacOS9

to my shell script, and voila! Clone script complete!

#!/bin/bash

## Script to Clone Local Volumes #### systemsboy - Aug 18, 2005 ##

clear

## Define Variables ##

echo "Please enter the path to the SOURCE volume or ASR-ready disk image...(Example: /Volumes/MyVolume)"read SOURCE

echo "Please enter the path to the TARGET volume...(Example: /Volumes/MyVolume)"read TARGET

## ASR Portion of Script ##echo "Building Clone... Please wait...________________________________"sudo asr -source "$SOURCE" -target "$TARGET" -erase

## Set Hidden Files (Tiger Bug Workaround) ##echo "Mac OS X 10.4 does not properly set hidden files.We will now use the SetHidden command to rectify the problem.For this step, the SetHidden folder must be present in /Applications.---------------------------------"cd /Applications/SetHiddensudo ./SetHidden "$TARGET" hidden_MacOS9

exit 0

Cloning all 14 systems on the floor took my Lab Assistants a few short hours.

Spotlight
I mentioned some concerns regarding Spotlight and the particular way in which users use our lab. After some mucking around, and testing of various methods of disabling Spotlight, both forced and optional, I've decided to do nothing. I do this kind of thing all the time. Worry and worry about something, and then, finally, throw caution to the wind and do nothing about it. It's my way. It's part of what makes me so gosh-darn lovable. But I do have two things behind that decision: a rationale, and a backup plan.

The first problem I'd anticipated was that Spotlight would begin indexing user home accounts over the network all at once as soon as everyone logged in, and that this would cause problems with network performance and incomplete indexes. Upon further reflection (read: obsessing), I decided that that Spotlight's network resource usgae would probably be low, and that it would probably avoid incomplete indexes by either continuing in the background after logout, or by resuming at next login. Also, since home accounts have quotas, the largest of which is 7 GB, indexing shouldn't take too long, and any problems should be quickly mitigated.

The second possible problem I'd worried about was that Spotlight's insistance on indexing firewire drives would take forever to yield complete indexes on large drives -- a process that would likely be interrupted at logout before it was complete -- leaving many users with partially indexed drives. In my tests, however, Spotlight picked up right where it left of with drives that were partially indexed when ejected and then re-mounted, so I'm not convinced this will even be a problem. I also worried that Spotlight's firewire drive indexing would cause performance problems that would incapacitate users who were working on video. This may yet be the case. We'll know soon enough.

I came up with a number of possible solutions to these problems, all messy, all imperfect. Ultimately, I decided that the best option was to wait and see what problems arose, and then tackle those problems with the most appropriate of the solutions I devised. Who knows? Maybe it will all work out great without any intervention from me. And that'd be just ducky.

Network
We have an intersting and complex network. It consists of Mac, Linux, and Windows machines, a network RAID, and web and mail storage. And all of this is accessible, in one way or another, via the network. The problem comes when you try to explain all this to new (or sometimes even old) students. It's not an easy picture to paint. And overwh elmed first years usually don't quite catch on for some time. But as we design our network, and build our systems, we're very concerned about how all that is presented to the users. And we try to make it as simple and as understandable as possible, which is, after all, the Mac way, and is why people love the Mac. So let me just say, right here, right now, what the fuck is up with the "Network" view? That piece of shit gets worse with every OS revision. And no one seems to care. I can't find diddly about it on the forums.

So here's the background on this outburst. Back in the Panther years, the Network view offered up a pretty nifty way to present a great deal of our network -- the shared drives of all platforms -- to our Mac users. The coolest part was that it looked much like what you saw on Windows, which simplified things greatly by presenting a consisten network view across platforms. There were a couple things that made this possible. One was SMB workgroup names. SMB does a great job of not only sharing files, but broadcasting itself as a service on a network. Setting up computers under SMB with a workgroup name based on the platform of the given machine got all our Mac, Linux and Windows computers grouped appropriately on the network. And Mac can read those groups and smartly creates folders in the Network directory named after those groups and populated with the systems that belong to them. So in my Network browser, there'd be a folder called "Linux," and all the Linux computers would be in there. Same for Windows. Here's the catch: Our Macs share via SMB to Windows, but we want them to share via AFP to other Macs. Personal File Sharing, which uses AFP, does not, by default broadcast any kind of grouping. If you turn on AFP, you just broadcast the service, but you can't specify it's appearance or where on the client the shares will appear. Computers broadcast via AFP in Panther just show up in the "local" folder, not in a "Macs" folder like their SMB sharing counterparts. There was, however, a sneaky way to define this in Panther, by editing a little file called /etc/slpsa.conf, and thereby magically creating what are known as SLP scopes. In fact, SLP scopes are just that -- a way to group a particular service from a particualr device into logical groups. So we scoped our Macs to the "Macs" scope and they showed up in our "Macs" group in /Network, and all was well in the universe. If you're unbearably astute, you'll probably be wondering why, though, our Macs, when connecting to other Macs via the "Macs" scope in the Network folder, didn't connect to said Macs via SMB rather than AFP. And here was something amazingly cool in Panther. Panther smartly dealt with the folder full of Macs in our Network view. So, when it saw these other Macs broadcasting and sharing over two protocols -- AFP and SMB -- it intellegently chose the best one, which for a Mac was AFP, of course. That was the thing of beauty that allowed us to present the same view of the entire, heterogeneous network the same way on all platforms while still allowing us to connect with different protocols. It was frickin', frackin', goddamn sweet, is what it was. Brilliant. And now it's gone.

Not only is this intelligent decision making gone from Tiger, at least in the latest incarnation (10.4.2 as of this writing), but so too is the ability to define SLP scopes. So, this means two bad things: 1) I can no longer define specific locations for my AFP-shared Macs, and 2) any connection made via the "Macs" folder in the Network folder (which, remember, is now only defined by SMB, not by SLP) connects via SMB; my Macs connect to each other via the Windows file sharing protocol, which is a big fat problem.

So, I'm currently trying to devise a way to scope the Mac shares into some sort of logical grouping that will allow me to present those shares in a palatable way. But it ain't easy. I know you can do this with Tiger Server, and this is probably why they've crippled it on the client, but I don't a Tiger Server yet, and I'm hesitant to build one this close to the beginning of the semester. So we shall see...

The migration is nearly complete. Time permitting, I would abasolutely love to build a fresh new Tiger Server. I'll probably give it a shot in this last week before classes begin, but I'll be sure and keep a clone of the existing Panther Server in case things go awry. Beyond that, and the aforementioned wiggly-niggly, we're done. We're upgraded, on the workstations anyway. And it is good.

There will, most likely, be problems that arise once the students come in and start using the new systems. I will share those in a later post. Until then...