Hi Sun managers,
my original quesion was:
----- Begin Included Message -----
Hi SUN-Gurus,
we have a critical problem with a Sparc 20/512 running
Solaris 2.4. The filesystem we use for non SUN Software
(named /usr/local) gets destroyed without any disk error
logged to /var/adm/messages. I should tell you, that we
export this file system to many clients via NFS/cachfs and
is placed on the second SCSI-interface.
What I see is, that the
root directory of the filesystem gets dirty.
cd /usr/local
ls -l
says: cannot read .
Rebooting and running fsck results in asking for every inode
number (it starts with inode 2). Running fsck -y results
mostly in a clean (means empty!) filesystem.
Today the filesystem was not really empty,
but each directory under /usr/local
(and some more) was moved to lost+found.
Now some more special information:
I installed the following patches with
a public domain perl-script: fastpatch (!!!!!!):
101753-01 101933-01 102038-01
101829-01 101959-03 102044-01
101878-01 101979-03 102057-13
101879-01 102001-03 102057-14
101880-03 102002-01 102062-03
101902-01 102003-01 102066-04
101905-01 102007-01 102070-01
101907-02 102011-02 102079-01
101920-01 102020-02 102112-01
101921-04 102030-04 102137-01
101922-04 102035-01 102216-01
101923-03 102036-01 102292-01
101925-01 102037-01 102922-01
and last but not least 101945-27!
I believe that the problem occurs due to the
not correctly installed patches (fastpatch-script).
showrev -p or installpatch -p shows everything fine, but
reinstalling a client with printer problems has shown
that they have gone !!!
I tried to reinstall the patches with installpatch -u -d
but this does not work.
Any ideas or suggestions on the problem or the reinstalling
of the patches???
(Today I replaced the disk)
----- End Included Message -----
First let me thank all who have responded until now.
Most replies supposed that overlayed disk partitions caused
the problem.
Format says that they do not overlapp.
At the moment I think about three possible reasons:
1.) The filesystem was export with write access and root-access to some
clients, administrated by me. If there would be a root-process running
an unlink() on the root-directory of that filesystem, the effect
would be the same.
There are two possible candidates for this unlink:
nfsfind is executind a find without -xdev option on /usr but
the mentioned FS is mounted under /usr/local
cachefs is used on the clients, because there is only less update
on /usr/local. It could be a cache coherency problem
I stopped root-access!
2.) Using the disk for the first time on SUN OS, the label was destroyed
due to power of during formatting. I reconstructed the label
with the informations I received from "format-current" in another
disk of the same type. Maybe something has gone wrong during the
new disk formatting. (I swapped the disk)
3.) I figured out a method for reinstalling the patches.
Deleting SUNW_PATCHID=.... from /var/sadm/pkg/..../pkginfo and
running installpatch works!
I have several misterious effects with Solaris 2.4 systems installed
by using fastpatch.
I do not use fastpatch anymore.
Thanks,
Volker
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:32 CDT