Every once in a while -- every year or two, as it happens -- I need to install Solaris on a SPARC machine here at home. From a Linux server.
This, you might think, shouldn't be too hard a thing to do. And you'd be right: it's not. What is hard to do, however, is configure your Linux server correctly, as the various scripts and whatnot that Sun provide aren't entirely suited to the GNU tools, and have a tendency to break.
Not good.
So, here is a shortlist of issues you may face when attempting this. Google
is not always helpful, and to save me the usual three hours or so it takes
to work all this lot out again from first principles, here's a list of what
goes wrong, how to fix it, and how to identify the problems. I make no
apologies for being somewhat terse and assuming you know what you're doing;
you're installing Solaris: if you don't understand how to dd
something to somewhere, you're in the wrong job.
Please note that for the purposes of this demonstration, I'm installing Solaris 10.hw0606 as it happens to have a SCSI driver for the machine I'm eventually going to be installing on that doesn't barf on a 2TB LUN. These hints and problems aren't in any particular order, as the thought to write them all down has come somewhat late in the day and I've almost finished it now. Server is a home-built AMD64 PC running Debian testing/unstable, with a custom 2.6.18.1 kernel (at present) named 'newdesktop'. Client is a Netra T1 105, with an entire 256MB RAM and a couple of old SCSI discs, to be called 'newmach' (for reasons which are unlikely to become clear, although they do exist).
Without further ado:
screen -fn /dev/ttyS0 9600
in a handy
screen(1)
session.
apt-get install atftpd
. Start with 'atftpd --daemon /tftpboot
'.
apt-get install rarpd
. Add your client's MAC address to
/etc/ethers
: 8:0:20:da:11:4e 172.29.23.54
apt-get install bootparamd
. This is where the fun really starts.
/etc/bootparams
: this file is superficially easy,
however there are a number of gotchas about the Linux version that don't
seem to apply to the Solaris version:
boottype=172.29.23.206:in
and rootopts=172.29.23.206:rsize=32768
; the daemon
returns:getfile got question for "newmach" and file "rootopts" getfile failed for newmachotherwise.
WARNING: /pci@7c0/pci@0/pci@1/pci@0,2/scsi@1/sd@0,0 (sd1): disk capacity is too large for current cdb lengthmeans that you're trying to install an older Solaris onto a machine with an MPT SCSI controller (eg. a T2000) with a LUN attached to it that's >2TB. Bad luck. Try another version.
add_install_client
to setup the boot environment.
They'll fail. Just copy the contents from the DVD (exercise left to
the reader), and edit /etc/ethers
,
/etc/bootparams
, and /etc/exports
appropriately.
nfs-user-server
not nfs-kernel-server
.
The kernel server exports over NFS 3 by default, and this will fail.
I'm not entirely sure why, but I believe it's something to do with
NFS 3 and ACL support; Linux's ACL support is subtly different to
Solaris' ($current-contract had some fun with this causing a reboot
of their fileserver some time back) and either way it doesn't work.
Setting rootopts
to mount it NFS 2 will probably work,
but I wouldn't like to guarantee your sanity when it comes to
installing packages later. Best be safe. A reader pointed me at this
blog post and suggested it works for him. I'm not convinced,
but haven't tried it. YMMV.nfs-user-server
, and says that the blog
post above works, with the addition of :vers=2
to the
rootopts
.
ifconfig eth1:1 netmask 255.255.0.0 broadcast 172.29.255.255worked for me, but then I can get away with that here. Ask your network admin nicely. In theory the client sends an ICMP request for the network's netmask, but I've yet to see it.
ERROR: bpgetfile unable to access network /sbin/install-discovery: information: not found #(and it'll bomb-out at that root prompt) as the broadcast-RPC for the bootparams
install_config
file will fail. You can
test things using bpgetfile
:
# bpgetfile rootopts 0.0.0.0 rsize=32768,proto=udp #for example. This also comes in handy when trying to debug the sixth point in this list.
Configured interface eri0(or similar interface name) when installing Solaris 8. This is caused by FreeBSD ignoring the ICMP netmask request, so Solaris just sits waiting. The solution is apparently:
sysctl -w net.inet.icmp.maskrepl=1and restart the installation. Please note that I haven't verified this.
boot: cannot open kernel/sparcv9/unix Enter filename [kernel/sparcv9/unix]:can be fixed by ensuring your client can NFS mount your server. Cryptic error message, I know. Might also be related to NFS 3 problems.
clntbtcp_call: rpc cansend error
is, I think, the same
problem.
Using rules.ok from 132.185.128.21:/install/jumpstart/dickonhtest. Checking rules.ok file... awk: division by zero record number 17 /sbin/install-solaris: test: unknown operator 1 #means you've got a disc with an EFI bootlabel on it that Jumpstart is attempting to consider for use as a root disc. Unfortunately, there's a bug in Jumpstart where the prtvtoc command prints a different set of details about EFI-formatted devices, and the awk script doesn't take this into account.
dd /dev/zero
over the start of
/dev/rdsk/cxtxdx
, and another load over the EFI
bootlabel at the end. Note that as you can't directly access
cxtxdxs8
as there's no device associated with it,
you'll have to compute the correct seek
values for
dd
yourself. This isn't hard: just run
format(1M)
on it and use the values in the slice 8
section.format -e
and installing
an SMI bootlabel on the disc. I've yet to try this.svc.configd: smf(5) database integrity check of: /etc/svc/repository.db failed. The database might be damaged or a media error might have prevented it from being verified. Additional information useful to your service provider is in: /etc/svc/volatile/db_errors The system will not be able to boot until you have restored a working database. svc.startd(1M) will provide a sulogin(1M) prompt for recovery purposes. The command: /lib/svc/bin/restore_repository can be run to restore a backup version of your repository. See http://sun.com/msg/SMF-8000-MY for more information. Requesting System Maintenance Mode (See /lib/svc/share/README for more information.) svc.configd exited with status 102 (database initialization failure)means that your filesystem is exported with your client's root user squashed to a nobody value. You can confirm this by trying to read /etc/svc/repository.db, which is 0600 root / sys. Ensure your exports line in
/etc/exports
reads something akin to:/install *(insecure,sync,ro,no_root_squash)Don't ask how long it took me to find that one.
ifconfig
up another interface on a
Solaris box, don't expect it to NFS export everything from it
immediately. Or, indeed, after restarting all the NFS and RPC
services using svcadm
. You need to
re-share(1M)
the filesystem first. Very odd symptoms
occur otherwise, although I forget what. I gave up, and got my
sysadmin to track it down. How he found it I don't know.
rtioctl: kstr_ioctl failed: error 128 whoami: couldn't add route: error 128. WARNING: hme0: no response from interfaceappears to be an error produced by the boot sequence when rpc.bootparam feeds it a default route it can't get to (eg., when you're booting it from a secondary interface and the server is sending back the address of that interface as its default route). This can be seen by:
bootparamd: whoami got question for 172.29.23.54 This is host newmach Returning newmach (none) 8x.158.xx.xxx(ie, my public interface) in the rpc.bootparamd logs; avoid it by starting it as '
rpc.bootparamd -r 172.29.23.1
'. You'll
see this immediately after the copyright notice and 'Use is
subject to license terms.
' line. It appears to be
harmless, but if you're installing packages from a machine which
isn't on the local LAN (either by HTTP or NFS), you'll probably hit
problems.
Cannot open pkginfo file /cdrom/Solaris_10/Patches/./xxxxxx-xx/pkginfothen you need to ensure your Jumpstart's Patches directory only contains patch directories, and no files. Yup, odd, especially considering that Solaris patches are order-sensitive. Apparently Jumpstart installs them in mtime order (according to something I read somewhere) rather than
`cat patch_order`
(which
install_cluster
in a *_Recommended
does),
although I've not verified this myself. By 'shedload', I mean 'at
9600 baud it's time to go for lunch'.
That's it for the moment. Any questions, corrections, or clarifications,
let me know. No HTML mail, please; my
spam filter has something
of a dislike of it, and generally files it appropriately.