Wednesday, March 17, 2010

Troubleshooting an odd symlink bug

About a week ago an odd bug that was brought to my attention that occurs when people tried to install the Puppet package into an image made with InstaDMG. The bug started out in private emails, but we got it moved over to the developer mailing list, and you can take a look at it. A group of us banged our collective heads over it for a while, and finally I found it by just going over every step to see what was wrong. The problem manifested itself as the Puppet installer overwriting the softlink that you normally find at '/usr/lib/ruby/site_ruby', and instead putting a folder with the desired contents there. Replacing this symlink apparently broke other things, and thus began the bug-hunt. The bug was reported against InstaDMG because the installer works fine when used on a booted volume. My bet is that a similar problem would have manifested if someone had tried installing this to another volume other than the boot volume, thus clearing InstaDMG in this bug, but we didn't think of that at the time. My first instinct was that there was something wrong with the code in the 'installer' program when faced with the complex series of softlinks that it had to follow (a listing of that appears in a moment). I even created a script that mounted a dmg and tried to re-create the problem in a much simpler manner, but with no success. I did repeat the observed behavior, and knew that there was a problem in there somewhere, so I decided to try and figure out what was different about the softlink chain in this case from my test case. So I carefully followed the chain of symlinks on a mounted volume (InstaDMG output dmg, since I have a few of those lying around). Here is what I found: /usr/lib/ruby -> ../../System/Library/Frameworks/Ruby.framework/Versions/Current/usr/lib/ruby /System/Library/Frameworks/Ruby.framework/Versions/Current -> 1.8 /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/site_ruby -> ../../../../../../../../../../Library/Ruby/Site So if you follow this set of rules, on a booted volume '/usr/lib/ruby/site_ruby' winds up pointing at '/Library/Ruby/Site'. My understanding is that this the same in both 10.5.x and 10.6.x, but was different in 10.4. But if you are careful you would count the number of back-references in that last link. There are 10. But if you count the number of folders in the chain back form the 'site_ruby' folder you will only find 9. When you are booted this does not matter, once you are at the root directory you can just keep back-referencing all you like, and you still wind up in the same place. As a quick demonstartion you can do this in the Terminal: 'cd /; cd ..; pwd' and you will still be at root. But when the volume is mounted this means that the 'site_ruby' link winds up pointing back outside the image. So this explains the bad behavior: when the installer goes to look for the folder at this point it finds a broken symlink, so instead replaces that broken symlink with a valid folder. A pretty reasonable thing for the installer to do. I might have made this a failing error if I were the one programming it, but I am sure a lot of smart people came together in a meeting at Apple (or possibly NeXT) at some point in the past and decided that this was the correct behavior, and I can't call them wrong. When I started looking back, it seems that this extra back-reference has been in place since 10.5.0, and has been kept all the way through 10.6.2 (and it might continue). It has been masked all along because it only becomes a problem when you are not booted from the volume. I really should write up a small tool to comb through the whole filesystem and see if there are any other similar problems in any other symlinks, and report those as well in another Radar report, but I think I will leave that for another day. But I thought I would get this out there so if anyone else runs into some other similar bug they might remember this.

Wednesday, March 3, 2010

One installer issue down, one to go

I wrote recently about the two issues I have been trying to solve with some bad installers in 10.6. Well with rev261 of InstaDMG I now have one of the issues solved. The solution is exactly as I described: replace the launchdaemon offering the installd service with one that is chrooted into my install target. I rain a pair of tests with the new version of InstaDMG on a 10.6.2 vanilla image: one with the new code, and one with it disabled (there is a switch for that). The results were exactly as I had hoped for: the iLife Support Update 9.0.3 components get installed with the new code, but get left out of the one without the new code.
So I am marking one of those two issues worked-arround. I wish that the solution to the other one suddenly presents itself, but I am not going to hold my breath, as I am pretty convinced that that one is going to take Apple making a change to solve.
As always, if you want it solved, then tell Apple how this is affecting you, and how many purchases it is affecting. This is not strictly a bug with Apple's code (at least not in the installer binary), so they are likely to not see it worth the engineer time to make the changes unless we can show them reason that it is worth the time (that would otherwise go into other features/changes/fixes).

Tuesday, March 2, 2010

JavaScript reference for Apple .pkg makers

I have never seen it mentioned anywhere and just stumbled across Apple's developer documentation on the AppleScript objects that are available to installer writers. I probably have totally missed something obvious telling me where to find it, but I have always just used what I have gleaned from taking Apple's installers apart, but now have actual documentation:
For those writing scripts that check for the presence of things to decide what to install (I am looking at you iLife Support team) I will pointedly reference the "target" item, and it's "mountpoint" property.