SUMMARY mutex_error

From: David Knight <knight_at_atmos.albany.edu> Date: Fri Jul 06 2001 - 09:27:05 EDT · This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:24:58 EDT

Well, the problem is not resolved, but I'll summarize anyway.

This is a panic related to mutual exclusion locks (mutexes) which are common 
locking devices used in kernel code (i.e. the kernel and any drivers/kernel 
modules loaded into it).

Several people suggested looking at the stack traceback

cd /var/crash/`hostname`
adb -k ./unix.x ./vmcore.x  
 - where x= number of the dump created after the above panic
$c
 - a stack traceback should appear>
control-d to exit.

Unfortunately in our case the machine hung hard so no crash dumps
were generated. The traceback on the console was apparently also
incomplete. Bad luck...

Justin.Stringfellow@Sun.COM suggested putting:
set snooping=1
in your /etc/system.
This enables a timer ("the deadman timer") in the kernel which, if you have a 
hard hung kernel, _may_ allow the kernel to drop itself out to an OK prompt, 
where you can then type 'sync' and get a crash dump. No guarantees though.
We'll try this and see what happens. He also suggested trying to disconnect
the keyboard to see if that might get me an OK prompt.

Some people suggested this could be related to a CPU hardware
problem, or perhaps a software problem. I guess we still have
some detective work - so far we have been unable to recreate the
problem on demand...

Thanks for your help and suggestions
David

> 
> Hi,
>     We have an Ultra 10 running Latest Sol 8 patches.
> It occasionally panics with error:
> 
> 
> panic[cpu0]/thread=40037e60 recursive mutex_error Lp=70357f40 owner=40037e60
> 40037a78 unix:mutex_vector_error+208 (0, 0, 20, 1040d45c, 104169e8, 70357f40)
> 
> Only one of our Ultra 10's seems to be effected by this.
> 
> Sunsolve search turned up nothing useful.
> 
> Any Ideas what the problem might be and how to fix
> diagnose it?
> 
> Thanks
> David
>