First, I'd like to thank all those who responded. Thank you. Sorry for the Summary taking so long to be sent back to the group. However, given the sensitivity of the site where these warnings were appearing, it has taken quite a bit of time to get changes implemented and tested. Trying to get to the bottom of this has involved trial and error of a few things, which I'll summarise those which made a positive impact to our system: 1. Implemented the changes to the SD values in /etc/system, the settings which seem to work best for us is: set ssd:ssd_io_time=60 set ssd:ssd_max_throttle=20 2. We patched the OS and Veritas software to the latest releases of patches available. Our systems were nearly a year behind the current recommended patches. 3. We reset the vxdmp settings back to default, as we had played around with the iotimeouts and queuedepth: # Set queuedepth and io back to defaults: vxdmpadm setattr arraytype A/A-A-HDS recoveryoption=default 4. After do some more research we discovered a little fact about the HDS SAN's whereby they are not a real Asymmetric, Active-Active arrays. They mimic an A/A-A by performing internal switching in the HDS Controllers. This, in theory, shouldn't affect performance or reliability. However, after talking to Veritas, it was decided to set vxdmp to use a single path to the SAN for all I/O. This doesn't exclude the other path from being used, ie. in the event of a HBA failure, or even with multiple LUN's you can still load balance over your two HBA's, but once set it will use that HBA until a failure on the path is detected. # Trying this as a setting to resolve the VXDMP from flappying about on the HDS SAN: vxdmpadm setattr enclosure AMS_WMS0 iopolicy=singleactive use_all_paths=no vxdmpadm setattr enclosure AMS_WMS1 iopolicy=singleactive use_all_paths=no Now our system appears to be stable, and the number of SCSI warnings has dropped to 1 or 2 per day, which we can align with errors occuring on the SAN fabric between the two sites (set and out of frame errors). Regards Graham Subject: Are SCSI Warnings Normal When Using Extended SAN Fabrics? ------------------------ From: *Graham Leggate* <graham.leggate@gmail.com> Date: 2008/7/31 To: sunmanagers@sunmanagers.org Hi, I have a question regarding what would be considered a "normal" number of scsi warnings when using remote SAN's? We have a number of SUN Servers, E2900, V890, X4200M2's, with dual HBA's running Solaris 10, U3, Veritas Storage Foundation 5 connected to a HDS SAN. We have two SAN's, located in two physical datacenters (prod & DRC) which are approximately 40kms apart. We run dark fibre between to the two sites and use CWDM's to provide 2 x 2Gbps Data Networking + 6 x 2Gbps Fibre Channel. The SUN servers use vxdmp to connect to 2 Brocade switches, and then each Brocade switch has 3 x 2Gbps trunked ISL's to connect to the switches at the remote datacentre, we also use the Extended Fabric Licenses in the switches. The servers data volumes are located on the SAN's, where we have a LUN presented by the local SAN and a second LUN presented by the remote SAN. The volumes is then mirrored using Veritas. The SUN servers run a mix of Oracle RAC 10gR2 and an inhouse transaction processing engine and custom database. Each day the servers produce a number of warnings to syslog as shown below. Each time the system warns of a scsi transport issue, it is always the remote LUN which is reporting the problem against. These warnings are not causing the systems to fail in anyway, however the customer is asking for an explanation as to why these messages are occurring. Previously we did not have the Extended Fabric License or the Trunking Licenses, and we would see many of these scsi errors in succession which would then either cause Veritas to mark a disk as failing or failed, which would mean we would need to re-mirror the disk. But since we have had the Extended Fabric Licenses installed on the Brocade switches the number of scsi warning has greatly decreased and we haven't had any disk failures. I do not know if these types or messages are "normal" when running systems with remote mirrors, or if this is something we need to investigate further to see if there is any other under-lining problems. Any in-sight from those of you who run Solaris with remote mirrors would be greatly appreciated. ---messages---- Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a00f2,8 (ssd166): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 11880000 Error Block: 11880000 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409750029 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,3 (ssd144): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 11880000 Error Block: 11880000 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680012 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,5 (ssd165): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 1132771808 Error Block: 1132771808 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680023 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,7 (ssd169): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 327259936 Error Block: 327259936 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680029 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,8 (ssd171): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 1132650832 Error Block: 1132650832 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680024 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a00f2,1 (ssd150): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 3407136 Error Block: 3407136 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409750014 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x1, FRU: 0x0 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,1 (ssd146): Jul 31 02:00:24 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 1331088 Error Block: 1331088 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680014 Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:00:24 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x1, FRU: 0x0 Jul 31 02:04:07 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a00f2,3 (ssd148): Jul 31 02:04:07 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 02:04:07 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 12554768 Error Block: 12554768 Jul 31 02:04:07 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409750012 Jul 31 02:04:07 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 02:04:07 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Jul 31 02:11:05 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,3 (ssd144): Jul 31 02:11:05 SERVER001 SCSI transport failed: reason 'tran_err': retrying command Jul 31 03:37:08 SERVER001 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/SUNW,emlxs@1/fp@0,0/ssd@w50060e80102a0082,5 (ssd165): Jul 31 03:37:08 SERVER001 Error for Command: write(10) Error Level: Retryable Jul 31 03:37:08 SERVER001 scsi: [ID 107833 kern.notice] Requested Block: 1132772880 Error Block: 1132772880 Jul 31 03:37:08 SERVER001 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 750409680023 Jul 31 03:37:08 SERVER001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jul 31 03:37:08 SERVER001 scsi: [ID 107833 kern.notice] ASC: 0xc0 (<vendor unique code 0xc0>), ASCQ: 0x3, FRU: 0x0 Many Thanks Regards Graham ____________________ Graham Leggate - _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Sun Sep 21 23:31:18 2008
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:12 EST