Howdy all.
Sorry for the delay I've been pretty busy working on this and
also starting to investigate Disk Suite and Veritas Volume
Manager.
I noticed a number of the Sun related newsgroups have
received a flurry of questions along the same lines
of disaster recovery, disk mirroring, and utilizing
duplicate boot disks.
My next related endeavour will be to create a generic
JAZ Boot disk. Don't know if it's possible yet but
I'm going to try.
Got a few quick responces...and not one of them
has mentioned Solaris DiskSuite or Veritas Volume Manager.
My solution is right after the original post section.
I also enclosed a few of the more detailed responces I
received.
I did learn ONE thing from my Sun Service Representative,
apparently there was a design flaw in earlier Sun Sparc 20's
and there are only two fans in the power supply.
Later models of Sparc 20's contained 3 Fans in the power
supply and those only containing two were retro-fitted
with an additional fan located near the SCSI Disks.
The additional fan was use to alleviate overheating
problems that could result in a number of system
malfunctions/errors. One of the overheating side
effects was a problem description very similar to
what I described below.
Original Problem:
Got to work at 8:45 AM 8/8/97 only to find that the primary
Mail, NIS, FTP..etc..etc..etc.. machine to be down.
When attempting to boot, the system did NOT see the system
disk. Every attempt at a boot from the 'ok>' prompt resulted
in a slightly different error message. Each message was
to the effect that /dev/dsk/c0t3d0sX is not responding or
device not ready.
A probe-scsi-all command resulted in listing all of the SCSI
devices twice. 5 devices on c0 and 1 device on c1.
2 1.05 GB Sun Internal Drives
1 Internal Sun CD-ROM
1 External HP 4mm DDS2 Tape
1 External 4GB 3rd Party SCSI II Drive
1 External 9GB Drive.
I finally did a power cycle on the system and that appears to
have reset the SCSI Bus. Even though the device WAS showing
up during the probe-scsi-all it strikes me a little odd
that all of the devices were listed twice.
Perhaps the SCSI controller is going bad and the disk is fine.
After the power cycle a loud grinding noise could be heard
over the din of the computer room. So, maybe the disk really
is going bad. Sun Engineers will be here Monday 8/11/97 to
replace the (bad) disk.
Since the power cycle the system disk has responded, no errors
were logged via syslog. Of course there may have been some sort
of SCSI contention not allowing writes.
probe-scsi-all only listed the devices one time after the
power cycle.
Original Post: (Intended for FUTURE use/reference)
>What is the most effective method of maintaining and restoring
>a redundant system disk.
>
>I have a key Sparc 20 server with two internal system disks.
>Both 1.05 GB.
>
>My primary is running solaris 2.5.1 and I just booted up my redundant
>and it is running Solaris 2.3.
>
>What would be the quickest method of duping the primary 2.5.1 disk to the
>backup 2.3 disk.
>
>I realize all partitions need to be the same size.
>
>
>Both disks are functional and I also have a complete Level 0 backup of
>the system disk.
>
>
>Here's my first thought:
>
> make all of the slices/partitions on the redundant disk
> /dev/dsk/c0t1d0sX the exact same size as /dev/dsk/c0t3d0sX
>
> Then use ufsdump and ufsrestore.
>
>/usr/sbin/ufsdump 0uf - /dev/dsk/c0t3d0sX \
> | (cd /mnt/junk; /usr/sbin/ufsrestore xf - )
>
>Would that work ? Would there still be a boot block ?
>
>Is there some method I can use to run suninstall and install a base
>Solaris 2.5.1 onto the redundant disk without having to shutdown and
>then 'boot cdrom' and choose /dev/dsk/c0t1d0sX as the install disk ?
>
>
>TIA
>
>Looks like time for a disaster recover section to the Sun-Managers
>FAQ.
>
>BTW if this is covered in the Faq I have a version from 4/15/1997
> and I don't have time currently to search through it.
> (301) 846-5721 | Frederick MD, 21702
What I did:
After the power cycle worked I was able to boot of off the
default system disk without any problems.
1) Performed an immediate Level 0 Dump of the system disk.
My system disk contains the following partitons
/, swap, /usr, /usr/openwin, /var, and /export
/opt and /var/mail are on a separate disk for just
this purpose.
2) Shutdown the system and booted my redundant disk.
The redundant disk is /dev/dsk/c0t1d0s0.
For MY prom mode I simply had to type boot disk1
I seem to recall having to set up this disk1 alias
a while ago, I do not think it is a standard default
Solarisism.
The steps for creating the NVRAM or PROM alias
are detailed at the prom level. But in case
you cannot set up an alias you should be able to
just do the following
boot /iommu/sbus/espdma@4,8400000/esp@4,8800000/sd@X,0:a
Where 'X' is the SCSI ID of the redundant boot disk.
Likewise, this is the setting for a disk off of SCSI
controller 0.
3) 'ok> boot disk1 -s' and determined I was running Solaris 2.3
Hm...that's kinda out of date so I decided to try running
off of the system disk again (/dev/dsk/c0t3d0s0)
4) From the system disk I ran the format command and then
partitioned my redundant disk /dev/dsk/c0t1d0sX with
partitions of the exact same size.
5) newfs'd each of the new partitions.
6) Installed a boot block with the following command:
/usr/platform/'/usr/sbin/uname -i'/lib/fs/ufs/bootblk \
/dev/rdsk/c0t1d0s0
/usr/sbin/uname -i responds with the correct platform.
7) Then I did the following:
mount /dev/dsk/c0t1d0s0 /mnt/junk
** /usr/sbin/ufsdump 0f - /dev/rdsk/c0t3d0s0 | \
(cd /mnt/junk; ufsrestore xf - )
**This command comes from the Solaris 2.5.1 man page for
ufsrestore.
I repeated step 7 for EACH of my file systems on my primary
system disk:
/, /usr, /var, /usr/openwin, /export
8) fsck'd each of the NEW filesystems after the dump/restore.
9) Mounted /dev/dsk/c0t1d0s0 /mnt/etc and modified
the necesarry entries in /mnt/etc/vfstab.
These modifications are necessary to point to the new
swap areas and system disk partitions. If you can do
one subsitution then replace /dev/dsk/c0t3d0 with
/dev/dsk/c0t1d0 ****
*** Remember we are changing TWO entries per line.
The device and the RAW device. (I almost forgot)
10) Shut-down the system and then booted disk1
The system came up without a glitch although I'm not sure
some of the permissions are correct in the / (root) partition.
What I expected to hear:
Use Solstice DiskSuite to maintain a duplicate then there are
two writes for every system disk update. ie: if /etc/hosts is
modified then the modifications are automatic to the other
redundant system device.
All reads come from the primary system disk.
Alternative method:
If my system disk NEVER responded I would have been forced to
do the following:
1) Boot cdrom
2) Choose /dev/dsk/c0t1d0sX as the system install disk.
3) Refer to my printed out hardcopy of the system
configuration. To dupe the disk layout.
(Good Sys Admins have one... :0) You do too don't you ?!!)
The hard copy print outs have the partition sizes, what
each partition correlates to as a filesystem. And contains
the starting and ending sector/cylinder for each partition.
4) Install Solaris 2.5.1
5) Refer to my hardcopy PATCH print out regarding what patches
are installed.
6) Download and install the necessary patches.
7) Install any local software. (read Sun Compilers and FDDI driver)
8) Install any necessary Software patches.
9) Re-install Legato Networker base.
10) Restore data through Legato Networker from the backup archive.
11) Read through list of local customizations :0)
and duplicate where necessary.
12) Test setup to make sure everything is functional.
Enclosed Responces:
Dave Haut Wrote:
Hi,
You are on the right track. Use ufsdump piped to ufsrestore.
Example:
# mount /dev/dsk/c0t0d0sx /mnt ( mount one of the old Sol2.3 partitions )
# cd /mnt
# ufsdump 0f - /dev/dsk/sol2.5.1part | ufsrestore vrf -
Also, You DO need to install the bootblock.
Check out the man page for installboot and use the example that is provided.
Mark Fromm wrote:
Greetings,
> Would that work ? Would there still be a boot block ?
You need to install the boot block after the fact.
This is how we do it here. We run this once a week from cron
during a quiet period of the machine
#!/sbin/sh
#
# Weekly dump of the operating system from c0t3d0s0
# to c0t1d0s0
# Modified on 01/14/97 to newfs drive and changed mount
# point from /mnt to /mntos.
# define variables
#
primaryosdisk="/dev/rdsk/c0t3d0s0"
secondaryosdisk="/dev/rdsk/c0t1d0s0"
blockdevicename="/dev/dsk/c0t1d0s0"
mountpoint="/mntos"
export primaryosdisk secondaryosdisk mountpoint blockdevicename
#
# Newfs the drive before dumping the OS.
/usr/sbin/newfs /dev/rdsk/c0t1d0s0 << EOF
y
EOF
/usr/sbin/fsck /dev/rdsk/c0t1d0s0
# mount the secondary O/S disk onto temporary mount point
#
#
mount $blockdevicename $mountpoint
#
# Dump the primary O/S disk to the secondary O/S disk
#
ufsdump 0f - $primaryosdisk | (cd $mountpoint; ufsrestore rf - )
#
# Install the boot block
# Sun 4m architecture only!
#
/usr/sbin/installboot /usr/platform/sun4m/lib/fs/ufs/bootblk $secondaryosdisk
#
rm $mountpoint/restoresymtable
umount $mountpoint
Additional scriptage I have running on a couple really critical
machines include having 2 vfstab files - (vfstab.c0t3d0s0 and
vfstab.c0t1d0s0) - the vfstab.c0t1d0s0 is setup with proper
swap and root partitions, the script copies
/mntos/etc/vfstab.c0t1d0s0 to /mntos/etc/vfstab so I have a
bootable disk ready to go without cleanup.
Hope that helps
Internet mail - mfromm@physio-control.com
Summary of responces for maintaining a redundant disk:
1) Those that suggested 'dd' what options would you use to
optimize the block size for such usage ?
The last time I used 'dd' it took close to 8 hours to dump
200 Megs. I must of had the wrong parameters. Although
the new area did work without errors.
2) Those suggesting using newfs and dump, that seems like
quite a bit to do once a day, weekly or even monthly.
I could see doing that, as I did, in certain cases but
not on a regular basis.
3) I like the suggest method of having the alternate devices
mounted and then running a 'find -mtime -1 | cpio -p'
or some derivative thereof. To simply copy the
updated/newer files into the related redundant location.
4) It would appear that the creation of the boot block
can happen before or after the dump/restore.
Most people suggested that the boot block get created
last although I created it first right after running
'newfs /dev/dsk/c0t1d0s0'. I do know that adding the
boot block needs to be done AFTER the newfs but appears
to be irrelevant in regards to data existing on the
disk.
Perhaps someone with more knowledge about disks and
the SOLARIS filesystems could elaborate on this matter.
-- -- Phil Poole | Unix Systems Administrator poole@ncifcrf.gov | Frederick Biomedical SuperComputing Center (301) 846-5721 | Frederick MD, 21702
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:00 CDT