I would like to thank all of people who replied to my message and provided me with valuable information. It was a very weird problem because in our situation nothing indicated there was any hardware problem. I looked in system logs and I only seen what I provided while system locked up and our Veritas HA cluster did not switch nor picked up a problem. Anyway here are some info I collected from people in order to determine which disk was in question. Also it was suggested by Doug Emby to set sd_max_throttle = 15, so that no more than 190 SCSI commands will be queued up in the UDWIS host adapter memory, even if there are 12 disk drives in a D1000, A1000, or A3X00 disk array. (15 x 12 = 180) add the following to /etc/system: set sd:sd_max_throttle=15 Anyway I rebooted system few times because of those timeouts and it turned out that this particular disk went bad and we had to replace it. My guess was that this disk was failing but the A5200 did not detect it soon enough nor it warned us about this causing system to not respond. As soon as the disk failed it was obvious. I did not use sd_max_throttle settings since the disk failed. After it failed we replaced it and we did not have problems since then... Well it was a good experience and I have learned a bit. Thank you once again to all who answered my email and provided me with valuable pointers and solutions. Chris ============================================================================ == logs Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING: /pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37 renoir SCSI transport failed: reason 'reset': retrying command Feb 12 02:50:37 renoir Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING: /pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37 renoir SCSI transport failed: reason 'timeout': retrying command Feb 12 02:50:37 renoir ============================================================================ == format 23. c1t89d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 ============================================================================ == iostat -E ssd13 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST39103FCSUN9.0G Revision: 034A Serial No: 0031E51249 Size: 9.06GB <9055065600 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 ============================================================================ == iostat -En c1t89d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST39103FCSUN9.0G Revision: 034A Serial No: 0031E51249 Size: 9.06GB <9055065600 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 ============================================================================ == luxadm probe Found Enclosure(s): SENA Name:Zippo Node WWN:50800200000613a0 Logical Path:/dev/es/ses0 Logical Path:/dev/es/ses1 SENA Name:Bic Node WWN:50800200000709b0 Logical Path:/dev/es/ses2 Logical Path:/dev/es/ses3 ============================================================================ == bash-2.03# luxadm display Zippo SENA DISK STATUS SLOT FRONT DISKS (Node WWN) REAR DISKS (Node WWN) 0 On (O.K.) 2000002037221ac4 On (O.K.) 20000020372217c0 1 On (O.K.) 200000203787708e On (O.K.) 2000002037876b29 2 On (O.K.) 20000020371b86fc On (O.K.) 20000020378770de 3 On (O.K.) 200000203787a00e On (O.K.) 200000203722147b 4 On (O.K.) 20000020371b8624 On (O.K.) 20000020371b84f3 5 On (O.K.) 2000002037221b28 On (O.K.) 20000020371b8709 6 On (O.K.) 20000020378775b1 On (O.K.) 20000020372210b3 7 On (O.K.) 20000020371b87fa On (O.K.) 20000020378776ee 8 On (O.K.) 200000203787787f On (O.K.) 2000002037085e0b 9 On (O.K.) 20000020370fd1a4 On (O.K.) 2000002037877401 <= this disk 10 On (O.K.) 2000002037221a9a On (O.K.) 2000002037a60869 SUBSYSTEM STATUS Now in the output from 'luxadm display', look for the disk ending in: 2037877401 ============================================================================ == -----Original Message----- From: Wianecki, Christopher [mailto:Christopher.Wianecki@sothebys.com] Sent: Tuesday, February 12, 2002 4:15 AM To: sunmanagers@sunmanagers.org Subject: SCSI transport failed I got the following error in the system log. Can someone help me understand what has happened? This brought system down; I mean the system was not responding. When I did df for example shell locked up. I have E250 and two A5200 arrays hooked up to it. It looks to me that one of the disks failed. How can I find out which exact disk failed? By looking at the log is there any way to say which exact disk? I do not understand this /pci path, maybe you guys has some sun FAQ's which they would help in this. Thank you so much for help Chris Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING: /pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37 renoir SCSI transport failed: reason 'reset': retrying command Feb 12 02:50:37 renoir Feb 12 02:50:37 renoir scsi: [ID 243001 kern.warning] WARNING: /pci@1f,4000/SUNW,ifp@2/ssd@w2200002037877401,0 (ssd13): Feb 12 02:50:37 renoir SCSI transport failed: reason 'timeout': retrying command ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the postmaster at postmaster@sothebys.com. www.sothebys.com ********************************************************************** _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Feb 14 14:00:35 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:34 EST