SUMMARY: vxconfigd hanging system on boot

From: <colin_at_ccsisupport.com>
Date: Mon Nov 06 2006 - 18:11:47 EST
Well, the problem is fixed. Turned out to be a wedged SAN switch. I had
spent about seven hours casually troubleshooting the problem, and tracked
it down to vxconfigd hanging on the second controller, which is indicative
of a bad path. Then after our power outage, we started to bring up
everything else in the data centre, and found that every other machine
attached to the same SAN switch got stuck at the same problem.

Rebooting the switch fixed everything.

Thanks all,
Colin

> Hey all. Bit of a grey area here between Veritas and Solaris, and probably
> a bit offtopic, but not too far.
>
> We were changing some root disk mirroring (all under SDS) on a V880
> running
> Solaris 8, and managed to hang the box. Force a break and boot, and now
> it hangs during the /etc/rcS.d/S25vxvm-sysboot script, right after it
> displays "VxVM starting in boot mode..." and then starts "vxconfigd -m
> boot"
>
> I can disable the Veritas startup scripts and the machine comes up fine,
> but of course without Veritas, which means we don't have access to our
> Hitachi disk.
>
> I can run cfgadm -al -o show_FCP_dev and all paths to LUNs show up
> properly,
> with two paths to each. However when I run the Veritas sysboot script, let
> it hang, and then CTRL-C it, I see that the second path to the Hitachi
> LUNs
> have all changed to status=failing. When I turn on debugging in the VxVM
> script, it tells me that it hangs while opening the path to that second
> controller.
>
> The tail of the degug output is:
>
> 11/03 21:35:47: DEBUG: Controller 0: Open device /dev/rdsk/c0t6d0s0
> 11/03 21:35:47: DEBUG: Controller 1: Open device /dev/rdsk/c1t3d0s0
> 11/03 21:35:47: DEBUG: Controller 2: Open device /dev/rdsk/c2t3d0s0
> 11/03 21:35:47: DEBUG: Controller 3: Open device
> /dev/rdsk/c3t500060E8027A8202d1 s0
> 11/03 21:35:47: DEBUG: Controller 4: Open device
> /dev/rdsk/c4t500060E8027A8212d1 s0
>
> And then it hangs.
>
> Has anyone seen this before? We're partway through a 36-hour data centre
> outage, and when it's done, this machine really needs to come back up.
>
> Don't know what other information to provide, other than that we're not
> using MPxIO on this system (vxdmp takes care of it all).
>
> Thanks,
> Colin
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Nov 6 18:18:41 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:02 EST