SUMMARY: T3+ disk errors.

From: Anthony Miller <anthony__miller_at_yahoo.com>
Date: Fri Aug 11 2006 - 00:22:28 EDT
Hi All,

In short, the firmware version 2.0x was upgraded to
3.1.6

Now the long version:

We had 8 disks in a Raid 5 config with a standby disk.
 Disk 6 was faulty but reconstruction to the standby
disk kept failing.

Replaced disk 6, however, the array would not enable
it, so reconstruction could not take place.

After I reset the array, it just kept looping through
the boot sequence. Still complaining about disk six. 

Wrestling with this and considering setting up a
tftpboot server, it was recommended to take disk 6 out
and reset. Yes!!! it booted up.

I then immediately installed the 3.1.6 firmware and
reset again.

Now when I pushed disk 6 back into place, it still
failed. :(

So I decided to switch disk 6 with the standby disk,
and it worked!!  Reconstruction to disk 6 was taking
place.  Seems you can never be too sure about your
replacement disks.

After 9-10 hours, reconstruction was complete, vol
verify was running, the stanby disk was replaced, and
everyone was happy again.

Many thanks to Dave Markham for recommending the
firmware upgrade and Terix.com for their support.

-Anthony-




--- Anthony Miller <anthony__miller@yahoo.com> wrote:

> Hi All,
> 
> I'm having headaches over this T3+ unit.
> 
> The unit is configured as an 8 disk raid 5 device
> with
> the 9th disk as standby.
> 
> First we had multiple errors of the form:
> 
> Jul 06 08:29:03 ISR1[1]: N: u1d6 sid 944896 stype
> 2024
> disk error 3
> 
> Which would indicate a faulty disk 6.
> 
> This disk was replaced and reconstruction started
> followed shortly thereafter by the following errors:
> 
> Jul 06 11:33:34 ISR1[1]: N: u1d6 sid  944896 stype
> 2024 disk error 3
> Jul 06 11:33:36 LPCT[1]: N: u1d6: Not ready on loop
> 1
> Jul 06 11:33:36 LPCT[1]: N: u1d6: Bypassed on loop 1
> Jul 06 11:33:36 LPCT[1]: E: u1d6: Not present
> Jul 06 11:33:36 TMRT[1]: E: u1d6: Missing; system
> shutting down in 30 minutes
> Jul 06 11:33:37 ISR1[1]: N: u1d6 sid 944896 stype
> 2024
> disk error 3
> Jul 06 11:33:37 ISR1[1]: N: u1d6 sid 944892 stype
> 2024
> disk error 3
> Jul 06 11:33:37 LPCT[1]: N: u1d6: Not ready on loop
> 2
> Jul 06 11:33:37 LPCT[1]: N: u1d6: Bypassed on loop 2
> Jul 06 11:33:46 LT00[1]: N: u1d6 Reconstruction to
> standby disk started
> Jul 06 11:35:36 LPCT[1]: N: u1d6: Bypassed on loop 1
> Jul 06 11:35:36 LPCT[1]: N: u1d6: Bypassed on loop 2
> Jul 06 11:35:36 ISR1[1]: N: u1ctr ISP2200[0]
> Received
> LIP(f7,f7) async event
> Jul 06 11:35:37 ISR1[1]: N: u1ctr ISP2200[1]
> Received
> LIP(f7,f7) async event
> Jul 06 11:35:39 LPCT[1]: N: u1d6: Not bypassed on
> loop
> 1
> Jul 06 11:35:40 LPCT[1]: N: u1d6: Not bypassed on
> loop
> 2
> Jul 06 11:35:50 ISR1[1]: N: u1ctr ISP2200[1] Fatal
> timeout on u1d1
> Jul 06 11:35:50 ISR1[1]: N: u1ctr ISP2200[1]
> QLCF_ABORT_ALL_CMDS: Command Timeout Pre-Gauntlet
> Initiated
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d1 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d2 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1d3 SVD_CHECK_ERROR:
> Cmd
> Aborted (path = 1)
> Jul 06 11:35:50 ISR1[1]: N: u1ctr ISP2200[1]
> Received
> LIP(f7,ef) async event
> Jul 06 11:35:53 ISR1[1]: N: u1d9 sid 1305262 stype
> 2024 disk error 3
> Jul 06 11:36:01 ISR1[1]: N: u1d9 sid 936401 stype
> 2024
> disk error 3
> Jul 06 11:42:21 ISR1[1]: N: u1d9 sid 1017153 stype
> 2024 disk error 3
> Jul 06 20:41:42 ISR1[1]: N: u1d9 sid 1280447 stype
> 2024 disk error 3
> Jul 06 20:41:45 ISR1[1]: N: u1d9 sid 1280448 stype
> 2024 disk error 3
> Jul 06 20:41:47 ISR1[1]: N: u1d9 sid 1280688 stype
> 2024 disk error 3
> Jul 06 20:41:48 ISR1[1]: N: u1d9 sid 1281168 stype
> 2024 disk error 3
> Jul 06 20:41:51 ISR1[1]: W: u1d3 SCSI Disk Error
> Occurred (path = 0x1)
> Jul 06 20:41:51 ISR1[1]: W: Sense Key = 0x3, Asc =
> 0x11, Ascq = 0x0
> Jul 06 20:41:51 ISR1[1]: W: Sense Data Description =
> Unrecovered Read Error
> Jul 06 20:41:51 ISR1[1]: W: Valid Information =
> 0x12bd255b
> Jul 06 20:41:51 ISR1[1]: N: u1d3 SVD_DONE: Command
> Error = 0x3
> Jul 06 20:41:51 ISR1[1]: N: u1d3 sid 2452927 stype
> 1003 disk error 3
> Jul 06 20:41:51 SX11[1]: W: u1ctr read failed during
> recon stripe scb=126cdb0
> Jul 06 20:41:51 SX11[1]: N: u1ctr Internal Command
> error (Multiple Disk Failed)
> Jul 06 20:41:51 SX11[1]: N: u1ctr Internal Command
> error (Terminated by system)
> Jul 06 20:41:51 LNXT[1]: W: u1ctr recon failed in
> vol
> (v0)
> Jul 06 20:41:54 ISR1[1]: N:  sid 1281169 stype 2024
> disk error 3
> Jul 06 20:41:54 LT00[1]: N: u1d6 Reconstruction to
> standby drive failed
> Jul 06 20:41:54 LT00[1]: W: u1d6 Recon attempt
> failed
> Jul 06 20:41:54 ISR1[1]: N:  sid 1282120 stype 2024
> disk error 3
> Jul 06 20:41:55 ISR1[1]: N:  sid 1000363 stype 2024
> disk error 3
> Jul 06 20:42:02 ISR1[1]: N:  sid 1001324 stype 2024
> disk error 3
> 
> 
> Disk 6 was replaced again, but the same errors
> occured.
> 
> Disk 6 remains disabled.
> 
> It would seem that disk 3 is also faulty, however,
> until we can get disk 6 enabled, we can't replace
> disk
> 6.
> 
> My Questions are:
> 
> 1. If disk 6 is replaced, why does an error on disk
> 3
> affect reconstruction, since disk 6 should be
> reconstructed from disk 9.
> 
> 2. Any reason why we get errors on disk 6, disk
> 1,2,3
> and disk 9 ?
> 
> 3. How can you tell if disk 9 (standby) is in fact
> being used?
> 
> Thanks for any help or advise.
> 
=== message truncated ===


Send instant messages to your online friends http://au.messenger.yahoo.com 
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Fri Aug 11 00:23:19 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:00 EST