SUMMARY: Disksuite: Mirroring on a 2 disk system

From: Thomas M. Payerle <payerle_at_phys-mail1.physics.umd.edu> Date: Fri Aug 17 2001 - 11:14:17 EDT · This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:25:01 EDT

Hi,
Sorry about the delay in summarizing, but wanted to test it out first and
my first attempt to reply failed for some reason.

Problem: Attempting to use DiskSuite 4.2.1 to mirror /, /usr, swap, etc
on a 2 disk system.  Put 3 state database replicas on each of the disks
(total of 6).  System remained up and running when simulated a disk failure,
but was unable to reboot because only had exactly half of the database
replicas available, not 1/2 + 1.
Solution: (verified)
Boot into single-user mode and use metadb -d _slice_ where slice is the
location of the replicas on the failed disk.  Use metadb -i if need to 
figure out what disk failed or where the replicas on that disk located.
If the primary boot disk failed, may need to specify the bootable slice
of the other disk in the boot command.  

The database replicas can be deleted even if the disk is failed.  Upon reboot,
you will now have all replicas available (the ones on the failed disk no longer
count), so it will boot normally (again, may need to specify the alternate 
boot device).

Upon replacing the failed drive, partition the new drive if needed, and use
metadb -a to add the replicas on the new disk, and metareplace to synchronize
the new slices with the valid filesystems.

Of course, if one can catch the system before it reboots one can remove
the replicas on the failed disk ahead of time and bypass the need to get into
single user mode.

An useful reference on the subject is:
 http://www.slacksite.com/solaris/disksuite/SDSrecovery.html

Alternate solution: (not verified)
It was also suggested that one could boot into single-user mode and edit
the entries in vfstab to reflect physical slice names not diskSuite 
pseudo-device names.  It was unclear whether the database replicas on the
failed disk would need to be removed also.  Basically, just unmirror everything
and recreate the mirror when get a replacement disk.

The general consensus (with which I agree having resolved the rebooting issue)
is that mirroring is better than doing a periodic copy via cron, etc.  There
is some added complexity in rebooting after a disk failure, but this is more
than made up for having up-to-date copies and the system staying up on a 
single disk failure.

There was, however, one noticable hold out from the above opinion, who 
strongly prefers daily dd's via cron, arguing that way he always has a bootable
hard disk.  If I understand his argument, he is worried about human error or
unexpected results of system maintenance damaging /, etc to the point that the
system ceases to function, and using the daily backup disk to restore.
However, that is only useful if the problem is discovered before the next
backup, which might not be the case if the maintenance work was deemed "too
minor" to have such a disastrous result (ie, the damage was done, but won't
be noticed til next reboot).  For more "dangerous" system work, the argument
has merit, but then is probably best to use metadetach to temporarily disable
mirroring until the work is done and the new system is deemed working, then
re-attaching the mirrors.

Many thanks to : Sean Berry, Darren Dunham, Rasal Kumerage, Gary Cook,
J. D. Baldwin, John Phillips, Scott Kulp, Jonathon Andrews, Gabriel
Rosenkoetter, Mike Tuupola, Tony Walsh, Jason Shackelford, and everyone
else who replied.

Tom Payerle 	
Dept of Physics				payerle@physics.umd.edu
University of Maryland			(301) 405-6973
College Park, MD 20742-4111		Fax: (301) 314-9525