SUMMARY: Help Needed, server down - supply rail 4 FATAL FAULT: failed -shutdown req'd

From: Roe, Patrick <patrick.roe_at_thus.net>
Date: Sat Jul 01 2006 - 07:05:28 EDT
Hi there.

Got to the bottom of it and all responses were good.

Replaced the power supply, got a bit healthier. The boot disc was bad so
relied on the mirror.
This wasn't as healthy as I'd hoped and so I took it out into a disk
cage I had and fsck'd the relevent slices. Altered the vfstab etc and
was able to boot off this disk.

Back up and running

Thanks again
Patrick

-----Original Message-----
From: sunmanagers-bounces@sunmanagers.org
[mailto:sunmanagers-bounces@sunmanagers.org] On Behalf Of Roe, Patrick
Sent: 28 June 2006 09:16
To: sunmanagers@sunmanagers.org
Subject: Help Needed, server down - supply rail 4 FATAL FAULT: failed
-shutdown req'd

Hi,
i've tried to get this in the mailing list before but it's not appeared.
Can you let me know how to get this on or even forward it on?
i wonder if anyone can help me with a major server problem, at present
the server does not boot up properly at all.
I'd to reboot it a few times to get any info from ALOM, all below.


lom>showlogs
Eventlog:
   +29d+0h33m49s supply rail 4 FAULT: failed
   +29d+0h44m34s supply rail 4 FATAL FAULT: failed - shutdown req'd
   +29d+0h44m34s Fault LED 3Hz
   +29d+0h44m39s host power off
   +34d+1h51m41s Fault LED OFF
   +34d+1h56m12s host power on
   +34d+1h56m14s host FAULT: unexpected power off
   +34d+1h56m14s Fault LED ON
   +0h0m0s LOM booted
   +0h0m0s host power on
lom>environment
Fault  OFF
Alarm1 OFF
Alarm2 OFF
Alarm3 ON

Fans:
1 fan1 OK speed 92%
2 fan2 OK speed 94%
3 cpu OK speed 100%
4 psu OK speed 100%

PSUs:
1 OK

Temperature sensors:
1 Enclosure 25degC OK

Overheat sensors:
1 CPU OK

Circuit breakers:
1 SCSI-Term OK
2 USB0 OK
3 USB1 OK
4 SCC OK

Supply rails:
1 5V OK
2 3V3 OK
3 +12V OK
4 -12V OK
5 CPU core OK
6 +3VSB OK

lom>
lom>
lom>
lom>break
Drive not ready

lom>
lom>
lom>
lom>break
-------------------------------------------------

lom>
lom>
lom>environment
Fault  OFF
Alarm1 OFF
Alarm2 OFF
Alarm3 ON

Fans:
1 fan1 OK speed 96%
2 fan2 OK speed 97%
3 cpu OK speed 100%
4 psu OK speed 100%

PSUs:
1 OK

Temperature sensors:
1 Enclosure 26degC OK

Overheat sensors:
1 CPU OK

Circuit breakers:
1 SCSI-Term OK
2 USB0 OK
3 USB1 OK
4 SCC OK

Supply rails:
1 5V OK
2 3V3 OK
3 +12V OK
4 -12V OK
5 CPU core OK
6 +3VSB OK

lom>poweron
lom>poweroff
lom>
LOM event: +0h8m55s host power off
lom>bootmode reset_nvram
lom>poweron
lom>
LOM event: +0h9m33s host power on
Drive not ready
Setting NVRAM parameters to default values.
Sun Fire V120 (UltraSPARC-IIe 548MHz), No Keyboard
OpenBoot 4.0, 512 MB memory installed, Serial #51813843.
Ethernet address 0:3:ba:16:9d:d3, Host ID: 83169dd3.


bla bla bla - got to initialising memory and stops

------------------------------------------------------

        PBM Diag Reg Test
        CPU PBM Reg Test

All Basic APB Simba Tests

Init PCI

All Basic RIO Tests (RIO# 1)
        RIO Ebus Config Space Reg Test
        RIO Network Config Space Reg Test
        RIO Firewire Config Space Reg Test
        RIO USB Config Space Reg Test
All Basic RIO Tests (RIO# 2)
        RIO Ebus Config Space Reg Test
        RIO Network Config Space Reg Test
        RIO Firewire Config Space Reg Test
        RIO USB Config Space Reg Test
All Basic SCSI Controller Tests
        Symbios SCSI Controller PCI Config Space Test
All Basic SCSI Controller Tests
        Symbios SCSI Controller PCI Config Space Test
Basic SouthBridge Tests
        Southbridge ISA Config Space Reg Test
        Southbridge PMU Config Space Reg Test
        Southbridge IDE Config Space Reg Test
        Southbridge Audio Config Space Reg Test
All Memory Stress Tests
        Consist Write Data Test
Resetting...

Processor Speed = 548 MHz
Baud rate is 9600
8 Data bits, 1 stop bits, no parity (configured from lom)

Firmware CORE  Sun Microsystems, Inc.
@(#) core 1.0.12 2002/01/08 13:00
Software Power ON
Verifying NVRAM...Done
Bootmode is 0
[New I2C DIMM address]
MCR0 = 37b2c206
MCR1 = 80008000
MCR2 = cf10000f
MCR3 = a9000086
Ecache Size = 512 KB
Clearing E$ Tags Done
Clearing I/D TLBs Done
Probing memory
Done
MEMBASE=0x0
MEMSIZE=0x20000000
Clearing memory...Done
Turning ON MMUs Done
Copy ROM to RAM (170040 bytes) Done
Orig PC=0x1fff0007e44  New PC=0xf0f07e9c
Processor Speed=548MHz
Looking for Dropin FVM ... found
Decompressing Client Done
Transferring control to Client...

ttya initialized
Reset Control: BXIR:0 BPOR:0 SXIR:0 SPOR:1 POR:0
Probing upa at 1f,0 pci pci pci
Probing upa at 0,0 SUNW,UltraSPARC-IIe SUNW,UltraSPARC-IIe (512 Kb)
Loading Support Packages: kbd-translator
Loading onboard drivers: ebus flashprom eeprom idprom SUNW,lomh
Probing /pci@1,1 Device 3  pmu i2c temperature dimm i2c-nvram idprom
   motherboard-fru fan-control
lomp
Probing Memory Bank #0 512 Megabytes
Probing Memory Bank #1   0 Megabytes
Probing Memory Bank #2   0 Megabytes
Probing Memory Bank #3   0 Megabytes
Probing /pci@1,1 Device 7  isa power serial serial
Probing /pci@1,1 Device c  network firewire usb
Probing /pci@1,1 Device 3
Probing /pci@1,1 Device d  ide disk cdrom
Probing /pci@1,1 Device 5  pci108e,1100 network firewire usb
Probing /pci@1 Device 8  scsi disk tape scsi disk tape
Probing /pci@1 Device 5  Nothing there
Probing /pci@1 Device 6  Nothing there
Probing /pci@1 Device 7  Nothing there
Sun Fire V120 (UltraSPARC-IIe 548MHz), No Keyboard
OpenBoot 4.0, 512 MB memory installed, Serial #51813843.
Ethernet address 0:3:ba:16:9d:d3, Host ID: 83169dd3.



Environment monitoring: disabled
Boot device: net  File and args:
Using Onboard Transceiver - Link Up.
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet

ok boot disk
Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0  File and args

+++++++++then just hangs forever+++++++++++

dodgy disk?????????

...............last time

        CPU PBM Reg Test

All Basic APB Simba Tests

Init PCI

All Basic RIO Tests (RIO# 1)
        RIO Ebus Config Space Reg Test
        RIO Network Config Space Reg Test
        RIO Firewire Config Space Reg Test
        RIO USB Config Space Reg Test
All Basic RIO Tests (RIO# 2)
        RIO Ebus Config Space Reg Test
        RIO Network Config Space Reg Test
        RIO Firewire Config Space Reg Test
        RIO USB Config Space Reg Test
All Basic SCSI Controller Tests
        Symbios SCSI Controller PCI Config Space Test
All Basic SCSI Controller Tests
        Symbios SCSI Controller PCI Config Space Test
Basic SouthBridge Tests
        Southbridge ISA Config Space Reg Test
        Southbridge PMU Config Space Reg Test
        Southbridge IDE Config Space Reg Test
        Southbridge Audio Config Space Reg Test
All Memory Stress Tests
        Consist Write Data Test
Resetting...

Processor Speed = 548 MHz
Baud rate is 9600
8 Data bits, 1 stop bits, no parity (configured from lom)

Firmware CORE  Sun Microsystems, Inc.
@(#) core 1.0.12 2002/01/08 13:00
Software Power ON
Verifying NVRAM...Done
Bootmode is 1
[New I2C DIMM address]
MCR0 = 37b2c206
MCR1 = 80008000
MCR2 = cf10000f
MCR3 = a9000086
Ecache Size = 512 KB
Clearing E$ Tags Done
Clearing I/D TLBs Done
Probing memory
Done
MEMBASE=0x0
MEMSIZE=0x20000000
Clearing memory...Done
Turning ON MMUs Done
Copy ROM to RAM (170040 bytes) Done
Orig PC=0x1fff0007e44  New PC=0xf0f07e9c
Processor Speed=548MHz
Looking for Dropin FVM ... found
Decompressing Client Done
Transferring control to Client...

ttya initialized
Reset Control: BXIR:0 BPOR:0 SXIR:0 SPOR:1 POR:0
Probing upa at 1f,0 pci pci pci
Probing upa at 0,0 SUNW,UltraSPARC-IIe SUNW,UltraSPARC-IIe (512 Kb)
Loading Support Packages: kbd-translator
Loading onboard drivers: ebus flashprom eeprom idprom SUNW,lomh
Probing /pci@1,1 Device 3  pmu i2c temperature dimm i2c-nvram idprom
   motherboard-fru fan-control
lomp
Probing Memory Bank #0 512 Megabytes
Probing Memory Bank #1   0 Megabytes
Probing Memory Bank #2   0 Megabytes
Probing Memory Bank #3   0 Megabytes
setting input and output device to ttya because of lom bootmode "-u".
input-device =        ttya
output-device =       ttya
Probing /pci@1,1 Device 7  isa power serial serial
Probing /pci@1,1 Device c  network firewire usb
Probing /pci@1,1 Device 3
Probing /pci@1,1 Device d  ide disk cdrom
Probing /pci@1,1 Device 5  pci108e,1100 network firewire usb
Probing /pci@1 Device 8  scsi disk tape scsi disk tape
Probing /pci@1 Device 5  Nothing there
Probing /pci@1 Device 6  Nothing there
Probing /pci@1 Device 7  Nothing there
Sun Fire V120 (UltraSPARC-IIe 548MHz), No Keyboard
OpenBoot 4.0, 512 MB memory installed, Serial #51813843.
Ethernet address 0:3:ba:16:9d:d3, Host ID: 83169dd3.



Environment monitoring: disabled
Boot device: net  File and args:
Using Onboard Transceiver - Link Up.
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet

ok boot disk
Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0  File and args:

-----------------------------------------------------------

at last got the error from before

...........................
om>poweroff
lom>
LOM event: +0h50m12s host power off
lom>poweron
lom>
LOM event: +0h50m58s host power on
 Power ON
Verifying NVRAM...Done
Bootmode is 0
[New I2C DIMM address]
MCR0 = 37b2c206
MCR1
LOM event: +0h50m59s supply rail 1 FATAL FAULT: failed - shutdown req'd

LOM event: +0h50m59s Fault LED 3Hz
= 512 KB
NVRAM Test
Icache Test

lom>
LOM event: +0h51m4s host power off
lom>......................................
...............................
.
this time supply rail 1 fatal fault: failed shutdown required. i've seen
some views that it may need a new board, disk, nvram chip, power supply
or all of the above but was hoping to narrow that down, or get in a
position to narrow it down.
Any ideas greatly appreciated.
Patrick
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Sat Jul 1 07:06:12 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:59 EST