SUMMARY: SAN - HDS - JNI HBA - SCSI tran_err

From: <ron.gulls_at_talk21.com> Date: Fri Nov 29 2002 - 10:58:58 EST · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:59 EST

Many thanx for the replies. It appears that  there may be many reasons for
these errors. All the replies are included
as all of them contain very valuable information.

I think the bottom line is SANs are still in their infancy and long way to go
until the maturity. The situation is complicated as there are so many parties
involved: HBA vendors, Storage Vendors, Disk Management software vendors,
Switch Vendors and the computer manufacturers.

Care should be taken whenever work is done on the SAN  contrary to the glossy
pictures presented by the SAN
vendors. If possible keep mini SAN islands for production/live, development
and test. I know this defeats the purpose of the SAN world, but we should
realise the reality of  the limitations of various components that made up a
SAN and consequences when things horribly go wrong.

Best Regards,

Ron.

1. From Ken,

We don't have the budget for HDS, but we have experienced an identical problem
with JNI FCE-1063s, Brocades, VM, and Clariion JBODs.

The problem is caused by too many asynchronous I/Os going to a single disk and
overflowing the buffer in the disk. In our case, we set sd_max_throttle in
/etc/system to 8 (the default is 256 if there is not sd_max_throttle entry)
and the problem went away.

While researching our problem, I think I read somewhere that you should set
sd_max_throttle to 16 for HDS and JNI (though a friend of mine said Hitachi
told them to set it to 2).

Let me know if it helps,

2.  From Johan

I'm having problems as well. but my problems are slightly different symptoms.

The solution is download the latest JNI drivers, eg 2.9.11 for the FC64-1063.
Make sure your OBPs are patched.
Make sure your kernel is patched.
For Sol 2.6, add this to the /etc/system file:
set kobj_map_space_len=0x200000
This line above helps for reconfiguration to complete properly!

make sure you have this in /etc/system on all versions of Solaris:
* Hitachi Disk / JNI HBA settings
set sd:sd_io_time=0x3c
set sd:sd_max_throttle=8
* End of JNI HBA settings

In your sd.conf, put only the entries you need.

ZONE ZONE ZONE.

Put every host in it's own zone if possible. If there is 2 HBAs in a host,
give it two zones. This has given me stability on my SAN.

And there is good news : Sun and JNI have committed to start working together
to resolve our woes.

3.  From  John

We have the same HBAs in our servers, but don't get the errors when rezoning.
We're running version 2.5.9 of the JNI drivers with Brocade Silkworm 2800s and
Compaq storage. Absolutely no problems. We do see the pause, but we don't lose
the LUN information.

Are you running any kind of multipathing software?

4. From Mohamed

Since many servers are loosing LUN I will focus on the
switches and Storage We are using mcdata switches here with an IBM ESS I have
had problem showing tran_err and it was caused by the SToarge HBA card

--------------------
talk21 your FREE portable and private address on the net at
http://www.talk21.com
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers