SUMMARY: SCSI transport failed

From: Wianecki, Christopher <Christopher.Wianecki_at_sothebys.com> Date: Thu Feb 14 2002 - 14:59:54 EST · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:34 EST

I would like to thank all of people who replied to my message and provided
me with valuable information. It was a very weird problem because in our
situation nothing indicated there was any hardware problem. I looked in
system logs and I only seen what I provided while system locked up and our
Veritas HA cluster did not switch nor picked up a problem. Anyway here are
some info I collected from people in order to determine which disk was in
question. Also it was suggested by Doug Emby to set sd_max_throttle = 15, so
that no more than 190 SCSI commands will be queued up in the UDWIS host
adapter memory, even if there are 12 disk drives in a D1000, A1000, or A3X00
disk array. (15 x 12 = 180) 
add the following to /etc/system:
set sd:sd_max_throttle=15
Anyway I rebooted system few times because of those timeouts and it turned
out that this particular disk went bad and we had to replace it. My guess
was that this disk was failing but the A5200 did not detect it soon enough
nor it warned us about this causing system to not respond. As soon as the
disk failed it was obvious. I did not use sd_max_throttle settings since the
disk failed. After it failed we replaced it and we did not have problems
since then... Well it was a good experience and I have learned a bit. Thank
you once again to all who answered my email and provided me with valuable
pointers and solutions.

Chris

============================================================================
==

logs
Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING:
/pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13):
Feb 12 02:50:37 renoir  SCSI transport failed: reason 'reset': retrying
command
Feb 12 02:50:37 renoir 
Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING:
/pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13):
Feb 12 02:50:37 renoir  SCSI transport failed: reason 'timeout': retrying
command
Feb 12 02:50:37 renoir

============================================================================
==

format
23. c1t89d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133>
/pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 

============================================================================
==

iostat -E
ssd13    Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST39103FCSUN9.0G Revision: 034A Serial No:
0031E51249 
Size: 9.06GB <9055065600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

============================================================================
==

iostat -En
c1t89d0         Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST39103FCSUN9.0G Revision: 034A Serial No:
0031E51249 
Size: 9.06GB <9055065600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

============================================================================
==

luxadm probe
Found Enclosure(s):
SENA               Name:Zippo   Node WWN:50800200000613a0   
  Logical Path:/dev/es/ses0
  Logical Path:/dev/es/ses1
SENA               Name:Bic   Node WWN:50800200000709b0   
  Logical Path:/dev/es/ses2
  Logical Path:/dev/es/ses3

============================================================================
==

bash-2.03# luxadm display Zippo

                                   SENA            
                                 DISK STATUS 
SLOT   FRONT DISKS       (Node WWN)          REAR DISKS        (Node WWN)
0      On (O.K.)         2000002037221ac4    On (O.K.)
20000020372217c0
1      On (O.K.)         200000203787708e    On (O.K.)
2000002037876b29
2      On (O.K.)         20000020371b86fc    On (O.K.)
20000020378770de
3      On (O.K.)         200000203787a00e    On (O.K.)
200000203722147b
4      On (O.K.)         20000020371b8624    On (O.K.)
20000020371b84f3
5      On (O.K.)         2000002037221b28    On (O.K.)
20000020371b8709
6      On (O.K.)         20000020378775b1    On (O.K.)
20000020372210b3
7      On (O.K.)         20000020371b87fa    On (O.K.)
20000020378776ee
8      On (O.K.)         200000203787787f    On (O.K.)
2000002037085e0b
9      On (O.K.)         20000020370fd1a4    On (O.K.)
2000002037877401 <= this disk
10     On (O.K.)         2000002037221a9a    On (O.K.)
2000002037a60869
                                SUBSYSTEM STATUS

Now in the output from 'luxadm display', look for the disk ending
in:
            2037877401

============================================================================
==

-----Original Message-----
From: Wianecki, Christopher [mailto:Christopher.Wianecki@sothebys.com] 
Sent: Tuesday, February 12, 2002 4:15 AM
To: sunmanagers@sunmanagers.org
Subject: SCSI transport failed

I got the following error in the system log. Can someone help me understand
what has happened? This brought system down; I mean the system was not
responding. When I did df for example shell locked up. I have E250 and two
A5200 arrays hooked up to it. It looks to me that one of the disks failed.
How can I find out which exact disk failed? By looking at the log is there
any way to say which exact disk? I do not understand this /pci path, maybe
you guys has some sun FAQ's which they would help in this.

Thank you so much for help

Chris

Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING:
/pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37
renoir  SCSI transport failed: reason 'reset': retrying command Feb 12
02:50:37 renoir 
Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING:
/pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37
renoir  SCSI transport failed: reason 'timeout': retrying command

**********************************************************************
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the postmaster at
postmaster@sothebys.com.

www.sothebys.com
**********************************************************************
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers