Thanks for the help everyone... The machine did restart end up restarting after a /usr/sbin/reboot (thanks Simon Convey). I mistakenly thought a reboot == shutdown -i 6. As for the problem, Rainer Heilke said he ran into a similar problem a while ago and increased shmsys:shminfo_shmmax in /etc/system (an ipcs -a before reboot did show a lot of shared memory, but that is not unusual in our dev environment). BTW the machine is a Dual 450 220R, running a pitifully old Solaris 8 kernel (there are reasons why its still at 108528-03). There were no messages in the logs. Thanks again everyone. Below are the full responses. shawn ---------- From: Jonathan A. Zdziarski [mailto:jonathan@networkdweebs.com] Subject: RE: Invincible Do you have enough CPU cycles to do an lsof and find out what resources ps might be spinning on? what if you kill the ps (somehow) and truss it or sotruss it do you still have a /proc ? ------------- From: Convey, Simon [mailto:simon.convey@csfb.com] Subject: RE: Invincible /usr/sbin/reboot or if that doesn't work, /usr/sbin/uadmin 2 0 ...... Simon. ------------ From: Yura Pismerov [mailto:ypismerov@tucows.com] Subject: Re: Invincible It might be some system limits (number of processes, open files, etc.) are not tuned up properly, so something is eating up allthe resources and the rest users/processes are queued up waiting for them. Check both system wide and user limits once you bring the box back online. ------------- From: Heilke, Rainer [mailto:Rainer.Heilke@atcoitek.com] Subject: RE: Invincible We had similar problems on one of our servers. We found one issue, and Sun suggested a second fix. The first that we found was that our set shmsys:shminfo_shmmax= setting in /etc/system was too low. Depending upon what the server is used for, you may want to bump this up to 60% of the physical RAM (remember to adjust it when you add more RAM). There is also a bug on Solaris 2.6 that Sun alerted us to. If the system is fairly fully configured, add a kobj setting into the /etc/system file as well. The default was 100000, and we bumped it to 200000. The line looks like: set kobj_map_space_len=0x200000 This is as per SRDB 20267. This was supposedly fixed in Sol7 and 8. You didn't say which OS you were running, so I'm throwing it in as an "in case". The value must always be on an even boundary. Start with 200000, go to 300000, etc. Rainer Heilke ------------- From: Daniel Zhuang [mailto:daniel.zhuang@amdocs.com] Subject: RE: Invincible did you try to exit the telnet session which you issued command "shutdown ...", in some cases, it starts to reboot. as I know, Oracle internal processes prevent system from rebooting. kill them if any. ------------- From: Gaziz Nugmanov [mailto:sunman@lists.gaziz.ca] Subject: Re: Invincible 1/ check your PATH 2/ check/verify all the packages installed 3/ have you been hacked? Sure? 4/ reinstall OS ------------- -----Original Message----- From: Shawn Tagseth [mailto:Shawn.Tagseth@crystaldecisions.com] Sent: Thursday, November 22, 2001 10:19 AM To: Sunmanagers (E-mail) Subject: Invincible I have a machine that is acting very strangley. I can log into the machine and run various processess (eg vmstat, uptime etc), but as soon as I try to run commands like ps, pkill or top they hang and do not come back even with a ^C. Vmstat is showing 100% sys usage. Its a machine used by our developers and seemed to go into this state while running suns workshop debugger. I call the machine invincible because I thought "I could work all day to find out what is causing this (without ps etc) or I can restart it". Deadlines being what they are (and because I figured because I can log in as of this moment) we decided to restart. A shutdown -g 0 -i 6 is not bringing the system down. Shawn K. Tagseth PS its been about 20 minutes since the shutdown command and the machine is still running :(Received on Thu Nov 22 19:24:46 2001
This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:36 EDT