I got a pretty good number of responses (and miraculously no out of office messages) to this one. Most of the responders suggested trying tools I had mentioned in my posts. A couple provided some great insight into why a kernel can be slammed, but alas Solaris 9 really doesnbt have the tools to really dig into the kernel. At least for one who is terrified of adb (panicing a running production system is a real CLM). I was lucky enough to move some of the load to a couple Solaris 10 systems. To date all systems are healthy (knock on wood). Webre guessing that spreading the load to more servers has provided a reprieve to our problem. For a good set of Dtrace tools see http://users.tpg.com.au/adsln4yb/dtrace.html. Webve started looking at these and we are firmly convinced that Dtrace really rocks. Great thanks to Rich Kulawiec and Darren Dunham for providing kind and insightful information. Original question: 1 CPU system running Resonant and apache Solaris 9 (with opportunity to go to 10) System goes nuts with 100% cpu, very high kernel usage (99% vs. user at 1%) and very high load averages (> 30) (uptime, top) Little or no I/O load (top, iostat) Lots of memory, lots of swap free (top, vmstat) No significant mutex locks (mpstat) The question: Is there a tool for Solaris 9 that will tell me what process is using so much kernel code. Or if I have the opportunity to go to Solaris 10 suggest a dtrace script that will show me the same thing. Follow-on information To add more information and restate the issue: I'm looking for something to identify the specific culprit that is using the excessive kernel code (e.g. the 'sy' column from vmstat) while the user code (e.g. the 'us' column from vmstat) is very low. There is LOTS of free memory (seen in vmstat and top) so memtool isn't much use. I have run prstat, vmstat, top, mpstat, and lsof (though this doesn't show process consumption). I'm still digging through lockstat. Running prstat (e.g. prstat -va) shows interesting information but the numbers don't add up. I don't see any one process really eating system mod e (SYS column) and the numbers don't add up as the CPU column (prstat -t) does not total the ones from top or vmstat. _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Nov 17 15:51:17 2005
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:53 EST