Sun Managers, I have a Sun Blade 100 that after OBP firmware update to version 4.17.1 and installation of Solaris 10 01/06, it kept dropping to the ok prompt with a 'RED State Exception' after exactly 15 minutes of system inactivity. This appears to be a problem in Solaris 10 01/06 (installed from Sun DVD p/n 708-0118-10), not OBP firmware version 4.17.1. The solution is to run this command under Solaris 10: # svcadm disable system/power:default There appears to be an incompatibility between Solaris 10 01/06 power management and the mainboard in this Sun Blade 100, Sun part number 375-0096. Thanks to Filo Smith who wrote: > Gotta be power management then. Filo's email caused me to redouble my efforts to find a solution that involved power management. Simply killing the powerd process did not work. Renaming powerd so the system could not find it at reboot did not work: mv /usr/lib/power/powerd /usr/lib/power/powerd-DISABLED Downgrading the OBP firmware version from 4.17.1 back to the original version 4.0.45 (followed by the set-defaults command) did not work. Reinstalling Solaris 9 09/04 did not work. Swapping components with another Sun Blade 100 revealed the 'RED State Exception' problem was resident on the mainboard, but it stubbornly refused to clear itself. It should be noted that the Sun Blade 100 does have two batteries on the mainboard. -One inside the old style (large) IDPROM chip, and a second lithium CR2032 battery. I did pull the IDPROM chip off the mainboard and pulled the CR2032 battery and waited some time, hoping the errant power management settings would be forgotten by the mainboard, but this did not work either. In the end, I upgraded the OBP firmware back to version 4.17.1 again, installed Solaris 10 01/06 again, then just ran the command: # svcadm disable system/power:default -A simple solution, but not the course of action I took the first time the problem appeared. My years of experience told me to put the machine back in it's original state when the problem arose (old OBP version, old Solaris version), but this time that was not the correct action to take. It appears Solaris 10 01/06 broke something, and only by using Solaris 10 could I fix it. Since this machine is being used as a server with no display/keyboard/mouse, power management needs to be disabled anyway, so having power management disabled is not an issue for this machine. While most 'RED State Exception' errors are solved by finding and replacing defective hardware, this was not the case this time. Thanks very much to all the Sun Managers who took time to email me their ideas and experiences to help me solve this problem. For anyone seeing the same problem in the future, I'll throw in a bit more info to help the search engines find this email. On ttya, the system drops to the OK prompt with these messages: RED State Exception TL=0000.0000.0000.0005 TT=0000.0000.0000.0064 TPC=ffff.ffff.d6ca.2bfc TnPC=ffff.ffff.9100.726c TSTATE=0000.0099.5800.1505 TL=0000.0000.0000.0004 TT=0000.0000.0000.0010 TPC=0000.0000.0100.87fc TnPC=0000.0000.0100.8800 TSTATE=0000.0099.5804.1405 TL=0000.0000.0000.0003 TT=0000.0000.0000.0064 TPC=ffff.ffff.d6ca.2bfc TnPC=ffff.ffff.9100.726c TSTATE=0000.0044.5800.1505 TL=0000.0000.0000.0002 TT=0000.0000.0000.0010 TPC=0000.0000.0100.0688 TnPC=0000.0000.0100.068c TSTATE=0000.0044.5800.1505 TL=0000.0000.0000.0001 TT=0000.0000.0000.0034 TPC=0000.0000.0104.0ad4 TnPC=0000.0000.0104.0ad8 TSTATE=0000.0044.0000.1605 ERROR: error-reset-cleanup: Externally Initiated Reset has occurred. ERROR: Last Trap: Externally Initiated Reset ok If input/output changed from ttya to keybd/screen, then these messages are printed on the screen: ok FATAL: no exception frames available, forcing misaligned trap ok FATAL: no exception frames available, NESTED ERRORs, going interactive (repeats several dozen times, then): Rejecting alloc-mem!Rejecting alloc-mem!...(repeats) Under Solaris 10, with power management disabled via: # svcadm disable system/power:default and the system has been up more than 15 minutes, if you run this command, it locks the system immediately: # svcadm enable system/power:default One would think 15 minutes of system inactivity would need to elapse before the system would crash after power management was re-enabled, but whatever power management timer (ACPI?) has already counted down to zero and this makes the system react with hair-trigger speed (immediately). Scott Mickey -------- Original Message -------- Subject: Sun Blade 100 - strange behavior after firmware update. Date: Fri, 18 Aug 2006 12:52:37 -0600 From: Scott Mickey <mickey@denver.net> To: sunmanagers@sunmanagers.org Sun Managers, I updated the firmware on a Sun Blade 100, and now after exactly 15 minutes with the system idle, it drops to the ok prompt with these messages: > RED State Exception > ERROR: error-reset-cleanup: Externally Initiated Reset has occurred. > ERROR: Last Trap: Externally Initiated Reset If booted single user mode, or if the system is kept busy, then this never happens. System stays up indefinitely. Solaris 10 01/06 and Solaris 9 09/04 both install without error (as the machine is kept busy). However, after OS installation is complete and machine goes idle, 15 minutes later the 'RED State Exception' happens and it drops to the ok prompt. Background info: This machine was very reliable and trouble free with original OBP firmware, version 4.0.45. Ran Solaris 9, headless (no USB keybd or mouse, no monitor), with 2x 80GB IDE disks, primarily as a jumpstart and SAMBA server. Idle nights and weekends, and sometimes extremely busy during work days. -Never a crash, no errors, no problems. A good little machine. Upgraded to OBP firmware 4.17.1 using Sun patch 119235-01, dated Apr/29/2005. Installed Solaris 10 from DVD without error, but then 'RED State Exception' happened. Downgraded OBP firmware back to 4.0.45 using patch 111179-01, and reinstalled Solaris 9, but 'RED State Exception' problem remained. Again, only after 15 minutes of system inactivity at run-level 3 or run-level 2. Using parts from another Sun Blade 100, swapped memory, then CPU, then IDPROM chip, and then power supply. -Problem remained. Put the mainboard (Sun p/n 375-0096) into another Sun Blade 100 chassis (this one had just one 10 GB IDE drive), and did a Solaris 9 install. -Problem remained. The problem is on the mainboard, but it is NOT random. I can tell within 30 seconds when the 'RED State Exception' will occur, by running this script in a ssh window immediately after boot: $ cat show_uptime #!/bin/sh - while : do uptime sleep 60 done Here is the output: $ ./show_uptime 4:18pm up 1 min(s), 1 user, load average: 0.35, 0.15, 0.06 4:19pm up 2 min(s), 1 user, load average: 0.14, 0.13, 0.05 4:20pm up 3 min(s), 1 user, load average: 0.05, 0.11, 0.05 4:21pm up 4 min(s), 1 user, load average: 0.02, 0.09, 0.05 4:22pm up 5 min(s), 1 user, load average: 0.01, 0.07, 0.05 4:23pm up 6 min(s), 1 user, load average: 0.00, 0.06, 0.04 4:24pm up 7 min(s), 1 user, load average: 0.00, 0.05, 0.04 4:25pm up 8 min(s), 1 user, load average: 0.00, 0.04, 0.04 4:26pm up 9 min(s), 1 user, load average: 0.00, 0.03, 0.04 4:27pm up 10 min(s), 1 user, load average: 0.00, 0.03, 0.04 4:28pm up 11 min(s), 1 user, load average: 0.00, 0.02, 0.03 4:29pm up 12 min(s), 1 user, load average: 0.00, 0.02, 0.03 4:30pm up 13 min(s), 1 user, load average: 0.00, 0.02, 0.03 4:31pm up 14 min(s), 1 user, load average: 0.00, 0.01, 0.03 4:32pm up 15 min(s), 1 user, load average: 0.00, 0.01, 0.03 (Then RED State Exception and drops to ok prompt). In single user mode, system runs fine: # uptime 6:34pm up 17:42, 0 users, load average: 0.00, 0.00, 0.01 Or if I open a second ssh window and run this script, it runs fine: $ cat find_usr #!/bin/sh - while : do find /usr -print sleep 5 done I need to be honest and admit that neither Sun Blade 100 has Sun-branded memory or Sun-branded hard disks. However, this isn't an enterprise-class machine by any stretch or measure, so that should not be a factor. The memory is good memory, as are the disks. I guess I could do another OBP firmware upgrade on another Sun Blade 100 to see if this is a repeatable error, but then I might have two useless Sun Blade 100's. Doing an OBP firmware upgrade and OS reinstall is a very common procedure. I'm sure someone out there must have seen this problem also. I know this machine is a FRU, but I would like to get it working again, rather than throw it in the recycle bin. I look forward to your emails, with accounts of successful and unsuccessful Sun Blade 100 OBP firmware updates. -Thanks! Oh, and why did I do an OBP firmware update in the first place? I wanted to try out the OBP 'wanboot' feature, available only in OBP versions 4.17 and above. Also, if someone at Sun Microsystems could please forward this to the person or persons in-charge of OBP firmware for the Sun Blade 100/150 series, I would really appreciate it. Scott Mickey _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Aug 22 12:13:32 2006
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:00 EST