SUMMARY: Networker Disaster Recovery Fails

From: Damon_LaCaille@dgii.com
Date: Thu Nov 19 1998 - 09:39:35 CST


Sun-Managers:

My original post is listed at the end of the message, but in a nutshell I
was inquiring into the problems that Networker has when completely
restoring a system from scratch. I had terrible problems with the system
being corrupt after I tried to recover from the Networker server.

Two people hit the solution right on the head, but both had different ways
of going about it, and instead of repeating what they said, I'll just
include them. To summarize, the first one (from Birger Wethne) you need to
boot off of cd-rom, get networking running, then drop out of the install
and mount/load/run the networker client. The second one (from Mark Fromm)
you would take the disk(s) out of the dead box, mount them into a currently
working Sun box, start the networker client restoring to the newly mounted
drives, relocating the data to the mounted partitions.

The whole problem was trying to restore to a non-quiescent system, most
likely the /etc file and /usr directory have files that should not be
touched during run-time. So both solutions really get around this by
booting off of alternate media, then mounting the drive, and restoring to
the mounted drive which is not being used by the system.

It appears many many people have had problems with disaster recovery and
Networker, but with these two solutions, I've successfully restored my
system without any problems, and it's back to 100%. Thanks in addition to
all of the following people who responded:

Rich Kulawiec (rsk@gsp.org)
ganeshan@gcs.com.au
Niall Obroin (nobroin@sced.esoc.esa.de)
Steve Boronski (spb@stoke.gov.uk)
James Wendling (jbwendl@bnpcn.com)
Sean Ward (sdward@uswest.com) (Thanks Sean for the detailed help!)

==============================================
(Mark Fromm's response)

Greetings,

The way I have accomplished this (total recovery of networker client) is:

1. Take boot disk for hosed Networker client, put on a working Networker
     client with the same OS and architecture (so the
     /usr/platform/"arch"/lib/fs/ufs/bootblk file matches)

2. Partition disk, create file systems on the disk using the working
client.

3. Restore the "dead" clients files (including OS) using a recover path
     pointing to the mounted disk drive on the working client.

4. Install boot block on the drive. Example:
installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0

5. Pull disk off working client, install in dead client, cross fingers and
     try booting.

For what it's worth, I have had headaches when the GNU dd is first in the
path doing this before, this screws up installboot. Make sure /bin/dd is
first in your path before using "installboot"

Hope that helps. Good luck.

Mark

================================================
(Birger Wethne's response)

It could be that the root file system isn't at the same patch level
as the restored /usr. There could also be problems if the new and
old Box A are different hardware.

If the new Box A is the same kind of hardware, try the following
restore procedure:

- Boot machine from Solaris 2.x CD. Answer all questions until you get
  'system identification completed' or similar. Then break out of
suninstall.
- Mount Networker client software from a file server
- Partition/newfs disk
- Mount all partitions to be restored under /a (/ as /a, /usr as /a/usr,
etc)
- Run nwrecover. Select all partitions to be restored. Relocate restore to
/a
- Run installboot

This has worked perfectly for me. I have even restored SunOS 4.x systems
using the same method. When restoring a SunOS 4.x machine, use a Solaris
2.x
CD and this procedure, but run the 4.x installboot from the restored
disk as the last step.

With this method you are back to exactly the same / file system as well.
And you save a lot of time first installing the OS just to restore.

Birger

===================================================

MY ORIGINAL MESSAGE:

We're currently running Solstice Networker 4.2.6 on an Ultra-Enterprise
4000 with an external single DLT tape drive and an external DLT4700
jukebox. We have about 8 Sun (Solaris 2.4 - 2.5.1) clients backing up to
this machine. All backups work perfectly, 100% everytime. The problem
comes when restoring a system.

For instance, I get in a new machine (Sparc Server 5) and install Solaris
2.5.1 and the networker client onto it. I'll shut down the original "Box
A" machine, and make this new machine "Box A". I then go into nwrecover
and select "Box A" as the client, and last night's backup the time frame I
want for the files. I then select all relevant file systems, such as /usr,
/usr/openwin, /etc, /opt, etc. After the client has finished restoring all
files, I then type "reboot".

Now the problems start: First, when the system is shutting down, it stops
when it's saying "syslogd: terminating on signal 15" (or something similar)
- the system just hangs. I waited at least 10 minutes, no drive activity,
nothing. I attempt a Stop-A, that doesn't even respond. I power cycle
the machine. It comes up now with the following errors:

not found:pageout_reserve
not found:pageout_reserve
not found:po_share
not found:po_share
not found:po_share
krtld: error during initial load/link phase
Memory Address not Aligned
Type help for more information
ok

And that's it. I've tried mounting the /etc directory after booting from
cdrom into single user mode, I thought the vfstab file had been restore
from the old machine, but it was safe (and correct) - I also checked the
/etc/system file - nothing in there, at all (as it should be) so it's not
trying to load modules or metadevice information.

The networker manual (the 4.2.6 version at least) isn't very informative
when it comes to Disaster Recovery, only file-related restores, not whole
systems. This seems a huge issue for our DR plan. I've tested the memory
in the boot prom and it checks out ok.

Thanks for any help that can be offered. Will summarize immediately.

--Damon



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:53 CDT