Dear all, The problem was finally fixed. In a desparate attempt to fix the fault at the earliest, I took some parallel steps as given below: 1. Installation of latest patch cluster (including glm patch) 2. Changed the Lan card hme0 with qfe0. 3. I also found some errors in arp cache, where the mac address of problematic machine had different mac address in arp cache of other servers. I cleared arp cache of all machines. After these three steps, i found that the hangup problem was gone. Since then the server is working fine without any problem. Thanks to all who responded: Aaron Daniel Vega Villa Check your firmware level at disk and system board, may be thera has been memory or cpu errors that your OBP is not handling properly! try running at run level 2 or run level 1 so you can determine if there is a service /program / specific process affecting the whole system! hope this helps.. Cian O'Sullivan Sounds like it could be an IP address conflict. Ghassan Qanzu'a it seems that your system is hacked, could you run the following two commands on your server # ps -ef | wc -l # /usr/ucb/ps aux | wc -l does both commands give the same number?? if not the definitly your system is hacked. Ed Guenther Well in hindsight you should have built a new box from scratch and not touched this one. Then swap the new box with this one. That way if there were problems, switching back would be no trouble. I would say that your problem could be incomplete network connections, i.e. ping of death and the like. You need to work with your networking people and determine what connections are getting to the box. The connections could be at such a low level that your box may not even note them in netstat output. My original post: Dear Admins, I am managing a Sun E-450 server, running solaris 8 with two processors and 512 MB RAM. Since yesterday evening, I am facing a strange problem. The server hangs suddenly. If we isolate the server from network by pulling out lan cable then it does not hang. But when the server is on the network it hangs in just 15 min. Surprisingly the load average, swap utilization, io wait state, top etc show normal values just seconds before it hangs. I am running apache and qmail on it with effective RBL spam blocking. There are no signs of any intrusion. We are using PIX firewall for security. I have a standby server. I just changed the IP of that server and replaced the problematic one. The stand by server shows same behaviour. The log files,syslog and messages, do not show any error messages except the following SCSI: Warning: pci@1f,4000/scsi@3 (glm0) or occassionaly this error : SCSI bus reset I get this error only with the actual server. However, the standby server does not give any error, it just hangs without any error message. I had applied the sun recommended patch cluster on oct 2003. Now I am downloading the latest patch cluster. This server has been running without any problem since last 3 years or so and recently there has been no change or upgradation done. I wonder y this error is appearing. Can any one guide me about this problem. We are an ISP and can not affoard such hangups as this machine is working as RADIUS/Mail and web server. The load average of the machine is less than 2.0 (max.) and typically it is at 0.5. Please help me in solving this issue please. Regards, Rizwan H. Sadiq _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Sun Nov 28 08:45:29 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:40 EST