Hello, Thanks to everyone who replied. Well, I didn't find any traces why the server crashed. The POR in power log is only the "power on reset". As the console wasn't connected at the time of the crash there's nothing there either. Also savecore didn't work. I got some good suggestions though that you might find usable. Thanks to Bertrand, Jon, Mehran, Chuch, Brad & JV for the following replies: Console, log & maintenance tool stuff: ---- POR: Power On Request check the front panel switch position, it must be in the lock position. also use /opt/FJSVmadm/sbin/hrdconf -l and /opt/FJSVhwr/sbin/fjprtdiag -v the XSCF shell may give more informations (telnet to the SCF ip address on port 8010) ---- check that your PC does not go to sleep mode, this could cause a break on the serial line and may reset the PrimePower. Putting the switch to lock will ignore the break. ---- Yes, that is indeed true, i.e., a PC console will issue a BREAK signal on resume. I've also seen assertions that it sends BREAK over serial line when it goes to sleep, doesn't seem possible, tho. That seems like the most likely scenario, e.g., a BREAK-initiated reset won't trigger a core dump. If you have the space&time, you might want to test this: connect PC to non-critical system, set it to go to sleep in 60 seconds, and watch the machine. That way you can (a) prove the assertion, and (b) determine whether the BREAK is sent when the PC goes to sleep and/or whether it sends a BREAK when it wakes up. This is good information, please do summarize. I just got back from our DR site, where all machines are set to ignore the BREAK for this reason--and I needed to send a BREAK !!! :-) ---- Not famailar with Fujitsu hardware, but on Sun hardware, POR inidcated a "power on reset" - either the system crashed or was rebooted. I was hoping to see FATAL there, which generally indicates a fatal hardware problem that happened so fast that the system couldn't log anything - if it's any consolation, it *does* output stuff to the console when that happens. You might want to consider hooking something up to catch the serial console output in case it happens again . ---- Crash dump stuff: ---- There might be something on the console, there might be a dump use isda (its on sunsolve) to analyze it. Good luck Mike Salehi ---- boot cdrom and run SUNWvts for a couple days if it doesn't crash, it's your OS image if it does crash, it's your HW. JV711 ---- Unless you'd already turned on the crash dump facility, there's no evidence besides what was written to the logs on the filesystems--and those may have been clobbered by the fsck at reboot. Unless you're running VxFS, that is. To enable crash dumps, see http://slacksite.com/solaris/crashdump.html. In general, look at /etc/init.d/savecore. You can quickly verify whether crash dumps are enabled by running dumpadm(1M), viz: Check for loose power cables, and see about replacing the power supply. If this machine has more than one power supply, e.g., for failover, this is a bad sign. Also check for where this machine is connected to AC power. Even though other machines don't crash like this, this machine may not be connected to the same power origin. ---- Power failure stuff: ---- Jarkko, I have had 2 systems that had these symptoms and the fix was a new power supply. chuck -----Original Message----- From: sunmanagers-bounces@sunmanagers.org [mailto:sunmanagers-bounces@sunmanagers.org] On Behalf Of Jarkko Airaksinen Sent: miircoles, 07 de febrero de 2007 15:05 To: sunmanagers@sunmanagers.org Subject: "frequent" crashes without a trace Hello to all Gurus out there, One of our Fujitsu-Siemens PP450's running sol8 just rebooted. It didn't leave anything in the messages files: the last entry before the crash is just a normal ftp login message and then 20 minutes later the normal boot messages start. This has happened twice before as well, last time 105d ago. I don't think we had a power outage as there are other servers connected to the same power rails; more servers would have at least shown "psu failures" but there's nothing there. In the madmin in the power log at the time of the crash there are two entries: 1. Feb 7 14:37:41 2007 CET Reset-Release [Unlock] Nothing (Detail=00,00,00,00) 2. Feb 7 14:37:35 2007 CET POR [Unlock] How could I interpret those messages? Any ideas how to start investigating what caused the server to crash from the fly like that? Thanks to everyone, Jarkko __________________________________________________________________________ La informacion incluida en el presente correo electronico es CONFIDENCIAL, siendo para el uso exclusivo del/os destinatario/s arriba mencionado/s. Si usted recibe y lee este correo electronico y no es el destinatario senalado, el empleado o el agente responsable de entregar el mensaje al destinatario, o ha recibido esta comunicacion por error, le informamos que esta totalmente prohibida cualquier divulgacion, distribucion, uso o reproduccion del mismo, y le rogamos que nos lo notifique inmediatamente respondiendo al mensaje original a la direccion arriba mencionada y eliminando el mensaje a continuacion. The information contained in this e-mail is CONFIDENTIAL and is intended only for the use of the addressee named above.If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, or you have received this communication in error, please be aware that any diffusion, distribution or duplication of this communication is strictly forbidden, and please notify us immediately by return to the original message at the address above eliminating it afterwards. _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Feb 8 07:47:54 2007
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:04 EST