Hi Everybody Long overdue summary. No applicable answers received. However, it seemed there was some interest on this topic. Anyway, the Solaris system was healthy, with the failure way downstream in the SAN infrastructure (fibre cable between switches). Somehow this slipped past the SAN supplier and was only found after this started impacting other servers. So much for logs... After the fibre was replaced, the errors stopped. Best regards Eugene =============================================== Hope someone has seen this one and can help please? Customer has an E4500, Solaris 8 with newly attached 2 x EVA disk arrays via two QLogic 2200 SBus HBA's. Tesing was 100% and fast. Secure Path 3.0D is loaded for channel failover. Started experiencing hangs today. What had changed? Was rebooted this morning. No changes prior to reboot. Initially no errors in /var/adm/messages, but after a second reboot, errors started appearing: Oct 8 11:00:41 proddb scsi: [ID 243001 kern.warning] WARNING: /swsp@0,2/ssd@0,1 (ssd5): Oct 8 11:00:41 proddb SCSI transport failed: reason 'aborted': retrying command Oct 8 11:09:00 proddb scsi: [ID 243001 kern.warning] WARNING: /swsp@0,2/ssd@0,0 (ssd4): Oct 8 11:09:00 proddb SCSI transport failed: reason 'aborted': retrying command Oct 8 11:58:52 proddb scsi: [ID 243001 kern.warning] WARNING: /swsp@0,2/ssd@0,0 (ssd4): Oct 8 11:58:52 proddb SCSI transport failed: reason 'aborted': retrying command Oct 8 12:11:13 proddb scsi: [ID 243001 kern.warning] WARNING: /swsp@0,2/ssd@0,0 (ssd4): Disks c7t0d0 c7t0d1 hanging. C6 performs beautifully. Switch logs and EVA logs shows nothing. No other error messages except the shown above. Mounting disk readonly and putting heavy I/O on it emulates problem. Also, iostat shows disk as 100% busy, with no I/O passing thru. hsx dev - current path - has same hung state: "9 9 17 66 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hsx1 .... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 hsx813 ..... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 0.0 0.8 0.0 0.4 0.0 0.0 0.0 13.9 0 1 c0t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d1 0.0 4.2 0.0 18.6 0.0 0.0 0.0 0.4 0 0 c6t0d2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c7t0d0 0.0 0.0 ... " Below lenghty config files as installed by install script. Promise a summary. Thx E Schmidt ========== "spmgr" display shows the following config: # spmgr display Server: acproddb10 Report Created: Fri, Oct 08 16:34:46 2004 Command: spmgr display = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Storage: 5000-1FE1-5002-81C0 Load Balance: Off Auto-restore: Off Path Verify: On Verify Interval: 30 HBAs: qla2200-0 qla2200-2 Controller: P5849D5AAPW01O, Operational P5849D5AAPW038, Operational Devices: c6t0d0 c6t0d1 c6t0d2 c6t0d3 TGT/LUN Device WWLUN_ID #_Paths 0/ 0 c6t0d0 6005-08B4-0001-3879-0000-D000-0150-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW01O no hsx-1-37-1 qla2200-0 no Active hsx-3655-36-1 qla2200-2 no Available Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW038 no hsx-204-38-1 qla2200-0 no Standby hsx-3858-39-1 qla2200-2 no Standby TGT/LUN Device WWLUN_ID #_Paths 0/ 1 c6t0d1 6005-08B4-0001-3879-0000-D000-0153-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW01O no hsx-2-37-2 qla2200-0 no Standby hsx-3656-36-2 qla2200-2 no Standby Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW038 no hsx-205-38-2 qla2200-0 no Active hsx-3859-39-2 qla2200-2 no Available TGT/LUN Device WWLUN_ID #_Paths 0/ 2 c6t0d2 6005-08B4-0001-3879-0000-D000-0156-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW01O no hsx-3-37-3 qla2200-0 no Active hsx-3657-36-3 qla2200-2 no Available Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW038 no hsx-206-38-3 qla2200-0 no Standby hsx-3860-39-3 qla2200-2 no Standby TGT/LUN Device WWLUN_ID #_Paths 0/ 3 c6t0d3 6005-08B4-0001-3879-0000-D000-0164-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW01O no hsx-4-37-4 qla2200-0 no Standby hsx-3658-36-4 qla2200-2 no Standby Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPW038 no hsx-207-38-4 qla2200-0 no Active hsx-3861-39-4 qla2200-2 no Available Storage: 5000-1FE1-5002-2510 Load Balance: Off Auto-restore: Off Path Verify: On Verify Interval: 30 HBAs: qla2200-0 qla2200-2 Controller: P5849D5AAPC09X, Operational P5849D5AAPC09E, Operational Devices: c7t0d0 c7t0d1 c7t0d2 c7t0d3 TGT/LUN Device WWLUN_ID #_Paths 0/ 0 c7t0d0 6005-08B4-0001-24D1-0000-A000-0193-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09X no hsx-813-33-1 qla2200-0 no Standby hsx-4467-32-1 qla2200-2 no Standby Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09E YES hsx-1016-34-1 qla2200-0 no Active hsx-4670-35-1 qla2200-2 no Available TGT/LUN Device WWLUN_ID #_Paths 0/ 1 c7t0d1 6005-08B4-0001-24D1-0000-A000-0196-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09X no hsx-814-33-2 qla2200-0 no Active hsx-4468-32-2 qla2200-2 no Available Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09E no hsx-1017-34-2 qla2200-0 no Standby hsx-4671-35-2 qla2200-2 no Standby TGT/LUN Device WWLUN_ID #_Paths 0/ 2 c7t0d2 6005-08B4-0001-24D1-0000-A000-0199-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09X no hsx-815-33-3 qla2200-0 no Standby hsx-4469-32-3 qla2200-2 no Standby Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09E YES hsx-1018-34-3 qla2200-0 no Active hsx-4672-35-3 qla2200-2 no Available TGT/LUN Device WWLUN_ID #_Paths 0/ 3 c7t0d3 6005-08B4-0001-24D1-0000-A000-01A7-0000 4 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09X no hsx-816-33-4 qla2200-0 no Active hsx-4470-32-4 qla2200-2 no Available Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC09E no hsx-1019-34-4 qla2200-0 no Standby hsx-4673-35-4 qla2200-2 no Standby ======== END OF OUTPUT ============ Entries in /etc/system: * Start of CPQhsv edits. DO NOT DELETE THIS LINE forceload: drv/clone set maxphys=8388608 set sd:sd_max_throttle=32 set sd:sd_io_time=180 * End of CPQhsv edits. DO NOT DELETE THIS LINE * Start of HPfcraid edits. DO NOT DELETE THIS LINE forceload: drv/clone forceload: drv/ssd set maxphys=8388608 set sd:sd_max_throttle=32 set sd:sd_io_time=180 set ssd:ssd_max_throttle=32 set ssd:ssd_io_time=180 * End of HPfcraid edits. DO NOT DELETE THIS LINE set shmsys:shminfo_shmmax=4194304000 ------- EOF --------------- Entries in /kernel/drv/ssd.conf: # # Copyright (c) 1995-1999 by Sun Microsystems, Inc. # All rights reserved. # #ident "@(#)ssd.conf 1.9 99/07/29 SMI" name="ssd" parent="SUNW,pln" port=0 target=0; .... name="ssd" parent="SUNW,pln" port=0 target=15; name="ssd" parent="SUNW,pln" port=1 target=0; name="ssd" parent="SUNW,pln" port=1 target=1; ..... ditto port=1 to port=5, with target=0 thru target=15 ..... name="ssd" parent="SUNW,pln" port=5 target=15; name="ssd" parent="sf" target=0; name="ssd" parent="fp" target=0; name="ssd" parent="ifp" target=127; name="ssd" parent="scsi_vhci" target=0; ---EOF -------------- /kernel/drv/hsx.conf: # # Compaq StorageWorks Secure Path # hsx.conf - Hardware Configuration file for hsx, a Disk Array Block # SCSI Target driver. Refer to the driver.conf(4) manpage # for more information on the syntax of this file. # # name "hsx" - required # class "scsi" - required # target SCSI target-ID # lun SCSI logical unit number # qdepth depth of command queue (1,..,64) # parent restrict parent HBA # preferred this path is preferred for a controller when load # balancing is disabled # # If no "parent=" qualifier is present, all SCSI-HBA adapters in # the system will attempt to attach an HSX instance at the indicated # target/lun on the SCSI bus. # # HSX will only attach device instances for Compaq StorageWorks HSx80 # disk array targets. The SD device will also want to claim these # targets. Explicit use of "parent=" in sd.conf may be required to # resolve conflicts. # # Each HSX instance found will result in a path being provided via # the misc/path driver. name="hsx" parent="qla2200" target=37 lun=0 qdepth=32; name="hsx" parent="qla2200" target=37 lun=1 qdepth=32; name="hsx" parent="qla2200" target=37 lun=2 qdepth=32; name="hsx" parent="qla2200" target=37 lun=3 qdepth=32; name="hsx" parent="qla2200" target=37 lun=4 qdepth=32; name="hsx" parent="qla2200" target=37 lun=5 qdepth=32; .... etc, For targets = 32 to 39 (although not in sequence) , lun= 0 thru 202 ============= EOF Contents of /kernel/drv/qla2300.conf # Number of times to retry a SCSI queue full error. # Range: 0 - 255 hba0-queue-full-retry-count=16; # Amount of time to delay after a SCSI queue full error before # starting any new I/O commands. # Range: 0 - 255 seconds hba0-queue-full-retry-delay=2; # Maximum fibre channel frame size. # Range: 512, 1024 or 2048 bytes hba0-max-frame-length=1024; # Maximum number of commands queued on each logical unit. # Range: 1 - 65535 hba0-execution-throttle=16; # Number of port login retry attempts. # Range: 0 - 255 hba0-login-retry-count=8; # Enable/disable the use adapter hard loop ID address on the fibre # channel bus. # 0 = disable, 1 = enabled hba0-enable-adapter-hard-loop-ID=0; # Adapter hard loop ID address to use on the fibre channel bus. # Range: 0 - 125 hba0-adapter-hard-loop-ID=0; # Enable/disable the use LIP reset for loop reset. # 0 = disable, 1 = enabled hba0-enable-LIP-reset=0; # Enable/disable the use LIP full login for loop reset. # 0 = disable, 1 = enabled hba0-enable-LIP-full-login=1; # Enable/disable the use of target reset for loop reset. # 0 = disable, 1 = enabled hba0-enable-target-reset=0; # Amount of time to delay after a loop reset for starting any new # I/O commands. # Range: 0 - 255 seconds hba0-reset-delay=5; # Number of times to retry a port that is not responding. # Range: 0 - 255 hba0-port-down-retry-count=90; # Maximum number of LUNs to scan for, if a device does not # support SCSI Report LUNs command. # Range: 1 - 256 hba0-maximum-luns-per-target=8; # Connection options. # 0 = loop only # 1 = point-to-point only # 2 = loop preferred, otherwise point-to-point # 3 = point-to-point preferred, otherwise loop hba0-connection-options=1; # Fibre Channel tape support enable/disable. # 0 = disable, 1 = enabled hba0-fc-tape=1; # PCI latency timer. # Range: 0 - 0xF8 # Default: 0x40 hba0-pci-latency-timer=0x40; # During link down conditions enable/disable the reporting of # errors. # 0 = disabled, 1 = enable hba0-link-down-error=1; # Amount of time to wait for loop to come up after it has gone down # before reporting I/O errors. # Range: 0 - 240 seconds hba0-link-down-timeout=10; # Persistent binding only option. # 0 = Reports to OS discovery of binded and non-binded devices # 1 = Reports to OS discovery of persistent binded devices only hba0-persistent-binding-configuration=1; # Fast error reporting to Solaris, enabled/disabled. # 0 = disabled, 1 = enable hba0-fast-error-reporting=0; # Enable extended logging. # 0 = disabled, 1 = enable hba0-extended-logging=0; ##################################################################### # WARNING: Beginning of Configuration Data stored by the QLogic # # Applications. Consult documentation before editing # # any data passed this text. # ##################################################################### # CPQ installation changes made. # CPQswsp: start of Secure Path edits. Caution: do not remove! This line is used by pkgadd/pkgrm. hba0-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9"; hba2-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9"; hba0-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC"; hba2-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC"; hba0-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8"; hba2-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8"; hba0-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD"; hba2-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD"; hba0-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519"; hba2-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519"; hba0-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C"; hba2-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C"; hba0-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518"; hba2-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518"; hba0-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D"; hba2-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D"; # CPQswsp: end of Secure Path edits. Caution: do not remove! This line is used by pkgadd/pkgrm. =========== EOF ===================== /kernel/drv/swsp.conf # Compaq StorageWorks Secure Path # swsp.conf - Configuration file for swsp # # use swsp.conf to configure which arrays can be controlled by Secure Path # add one entry of the following form per array: # name="swsp" class="root" portid=0 reg=0x0,0x(instance+1),0x1 # instance=(instance #) array-name="ARRAY_WWID"; # # configurable parameters can be set globally, or on an array basis by # adding one of path-verify, path-verify-period load-balance or auto-restore # to the line defining the array instance, or on a line by itself (for global) # # path-verify=? # 1= path-verification enabled # 0= path-verification disabled # path-verify-period=X # X = number of seconds between path verification attempts # # load-balance=? # 1= enabled # 0= disabled # # auto-restore=? # 1= enabled # 0= disabled # path-verify=1; name="swsp" class="root" portid=0 reg=0x0,0x1,0x1 instance=0 array-name="5000-1FE1-5002-81C0"; wwlid-0-0="6005-08B4-0001-3879-0000-D000-0150-0000@0,0"; wwlid-0-1="6005-08B4-0001-3879-0000-D000-0153-0000@0,1"; wwlid-0-2="6005-08B4-0001-3879-0000-D000-0156-0000@0,2"; wwlid-0-3="6005-08B4-0001-3879-0000-D000-0164-0000@0,3"; name="swsp" class="root" portid=0 reg=0x0,0x2,0x1 instance=1 array-name="5000-1FE1-5002-2510"; wwlid-1-0="6005-08B4-0001-24D1-0000-A000-0193-0000@0,0"; wwlid-1-1="6005-08B4-0001-24D1-0000-A000-0196-0000@0,1"; wwlid-1-2="6005-08B4-0001-24D1-0000-A000-0199-0000@0,2"; wwlid-1-3="6005-08B4-0001-24D1-0000-A000-01A7-0000@0,3"; ======================== EOF ======================================== ===================================================================== _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Nov 2 18:28:09 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:39 EST