Thanks to Jason Grove, Chris Ruhnke and Brad Morrison for their input on this topic. Possible reasons for the disk problems included high humidity, power supply problems, dying motor, poor cabling and a problem with the SCSI interface to the server. Unfortunately, I didn't receive any information on the meaning of the vendor's error code. If anyone has access to this, I'd appreciate the information. I suspect the drives' motors were slowly dying; I replaced both disks one week ago and have not experienced any SCSI errors since then which would seem to rule out problems with power, cabling and the server's interface. Also, the climate in our server room is controlled, with humidity typically below 20%. I've included my initial question along with some of the responses I received below. -Damian > Those drives (IBM) if I remember correctly had some problems with > humidity. Make sure the environment they are in does not have high > humidity.. Sun was replacing them a while ago... run an iostat -En and > see how many hard and media errors there are. if it is over 10 I think, > then you need to replace the drive. > > jason > I have an E450 which has exhibited similar problems on Fuji and Sun SCSI harddrives. > > "device not ready" means exactly that -- the device has spun down for some reason. > In my case I have been able to "unplug" the disk from the backplane, wait one minute and plug it back in and the > drive will spin back up. > You will then have to re-enable it with SVM and it should sync up with its mirror -- "# metareplace -e <metadevice> > <slice>". > If the drive is truly bad, it won't spin back up. > It could also be an early indication of a failing drive; but you won't know for sure until it dies completely. > Or your power supply may be marginal and under "stress" of heavy activity the power level may fall below the level > needed by this drive. > > > --CHRis > > Chris H. Ruhnke > Technical Services Professional > IBM Global Services > Dallas, TX > My opinion is that it's a bad cable or a physical problem with the interface on the machine. It seems like a very, > very remote possibility that both drives have the same problem. Yes, they're old, but what are the odds of two > having the same problem, i.e., transport failures at high bandwidth usage. Don't let the block identifier fool you: > "Drive not ready" means that the operation was interrupted because the "drive ready" signal went to zero during the > operation. Although this can be caused by a bad drive, it doesn't seem likely that both drives would come/go > on/offline. > > Hmmm. It is possible (not too likely, IMHO) that you have a power problem. It's unlikely b/c a drive that fails in > this way would have to spin up again after having lost power, i.e., you'd be seeing many more "drive not ready" > messages during the spin-up. > > OTOH, given that you have replacement drives handy, you could prove this by swapping them out and causing the high > traffic. In fact, since they're mirrored with SVM, you could perform one drive replacement to the mirror and > determine whether the same errors happen with the replacement drive. I'm guessing that it will. :-) > > Be sure to summarize this one. SCSI errors (and their associated resolutions) can always use more exposure. ;-) > > Brad Morrison | The Capital Group Companies > Location: SNO | x43199 | (210) 474-3199 | Cell: (281) 704-5375 > E-mail: Brad_Morrison@capgroup.com > [ Mailing: 3500 Wiseman Blvd San Antonio, TX 78251-4320 USA ] -----Original Message----- From: Wiest, Damian [mailto:dmwiest@rc2corp.com] Sent: Tuesday, January 31, 2006 8:25 AM To: 'sunmanagers@sunmanagers.org' Subject: SCSI Disk Errors - sense key: not ready Greetings all, I have a couple of IBM SCSI drives that are requiring maintenance on a weekly basis. I have six 18GB drives installed in the first half of a D1000 array which is attached to a dual-channel Symbios SCSI card in an old E-250. Four of the disks are from IBM (product number DDYST1835SUN18G, revision S94A) and the other two are from Fujitsu (product number MAJ3182M SUN18G, revision 0804). I have configured the disks as three, two-way mirrors under SVM; one of the mirrors with IBM drives is logging errors. Here's a sample entry from /var/adm/messages: Jan 28 06:30:01 lcidev01 unix: WARNING: /pci@1f,4000/scsi@5,1/sd@2,0 (sd47): Jan 28 06:30:01 lcidev01 Error for Command: write(10) Error Level: Fatal Jan 28 06:30:01 lcidev01 unix: Requested Block: 6137304 Error Block: 6137304 Jan 28 06:30:01 lcidev01 unix: Vendor: IBM Serial Number: 00361EE587 Jan 28 06:30:01 lcidev01 unix: Sense Key: Not Ready Jan 28 06:30:01 lcidev01 unix: ASC: 0x4 (<vendor unique code 0x4>), ASCQ: 0x1, FRU: 0x0 Jan 28 06:30:01 lcidev01 unix: WARNING: md: d112: write error on /dev/dsk/c1t2d0s7 Jan 28 06:30:11 lcidev01 unix: WARNING: md: d112: /dev/dsk/c1t2d0s7 needs maintenance The disks typically begin exhibiting this behavior during periods of high activity. I do have a couple of replacements lying around, but I'd like some advice as to whether this problem is related to the drives, or if it's indicative of a bigger problem before simply swapping them out. TIA! -Damian _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Feb 7 11:16:22 2006
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:55 EST