SUMMARY: ES5000 - FATAL ERROR

From: David Pope (pope@vicinity.com)
Date: Wed Jul 31 1996 - 18:10:50 CDT


Dear Sun Managers:

You are a great group! This was my first time using this mailing list and
I am truly amazed by the responsiveness.

It turns out that the board in slot 0 was going bad. I swapped boards 0 &
2 and the error followed the board. The next time I rebooted the system
recognized the board as being bad and disabled it. Someone from Sun came
out and replaced the board and the system has been humming ever since.

The responses I received contained some GREAT info. If you have an ES5k,
it may be in your best interest to give a glance...

* special thanks goes to:

 - jeffw@smoe.org (Jeff Wasilko)
 - gibian@stars1.hanscom.af.mil (Marc S. Gibian)
 - Bert Shure <Bert_Shure%SOLSOURCE@notes.worldcom.com>
 - fletcherc@postoffice.ttmc.com (Fletcher Cocquyt)
 - etolido@flight.cnsgroup.com (Erik J. Tolido)

** jeffw@smoe.org (Jeff Wasilko) wrote:

Two more questions:

What does prtdiag -v and prtdiag -l show? (prtdiag is in
/usr/platform/sun4u/sbin).

* great commands to know!

When the POST runs, does it deconfigure the failed CPU?

* it finally did after several reboots...

If you have a dumb terminal (or another sun), connect
it to ttya on the E5000 and put the key in DIAG mode. You'll see
the diags run. If you press 's' once while the POST is running,
you'll get a menu at the end of the post that summarizes the
system status.

* cool!

Please keep in touch with me as you work thru this
problem...Since the system is under warranry, have you opened a
call with Sun?

* i sure did..

I work for Sun, and have been doing alot with the Ex000 servers..
I hope I can help you a bit...

Jeff

Did you install the CPUs, or did they come from the factory
already installed on a CPU/Memory board?

* already installed..

The Ex000 use a new style connector (not a MBus/XDbus). When the
CPU is being screwed down, you have to rotate bewteeb the 5
screws like you are putting lugnuts on a wheel--you want even
pressure.

If the CPU was installed too loosely, or with uneven pressure
you'll get intermittent CPU failures.

Keep in mind that you have the whole weight of the module hanging
upside down, pulling it away from the connector...

Jeff

** gibian@stars1.hanscom.af.mil (Marc S. Gibian) wrote:

Never seen this, but sure looks like hardware or a driver.

* yep - hardware it seems

Marc S. Gibian
Telos Consulting Services phone: (617) 377-6350
PRISM/TFS email: gibian@stars1.hanscom.af.mil

** Bert Shure <Bert_Shure%SOLSOURCE@notes.worldcom.com> wrote:

dave:

you're in a bit of a spot because there aren't that many e5000's in the field.

i have a custoemr who has an e5000 with 2 cpu's, 1.5gb ram, loaded ssa model
112 and oracle. no netscape, and i'm not sure about the version of oracle.

no problems, no crashes. please post your findings!

==bert==

** fletcherc@postoffice.ttmc.com (Fletcher Cocquyt) wrote:

Its probably not causing your error, but have you applied patch 103451-01
which upgrades the firmware on Seagate ST3255 disks common in storage arrays?

* applying...

----------------------------------------------------------------------
Fletcher Cocquyt fletch@ttmc.com (441) 299-2900
System Administrator Trout Trading Hamilton, Bermuda
----------------------------------------------------------------------

** etolido@flight.cnsgroup.com (Erik J. Tolido) wrote:

Dave,

Sounds like a hardware problem to me, but I would check to see if
you have the HW patch #103346-02 from Sun installed.

* checking on this patch also...

-Erik

Patch-ID# 103346-02
Keywords: Ultra Enterprise flashprom update
Synopsis: SunOS 5.5.1: Ultra Enterprise flashprom update
Date: Jun/10/96

Solaris Release: 2.5.1

SunOS Release: 5.5.1

Unbundled Product:

Unbundled Release:

Relevant Architectures: sun4u Ultra Enterprise 3000/4000/5000/6000
only

BugId's fixed with this patch: 1245745

Changes incorporated in this version:

Patches accumulated and obsoleted by this patch:

Patches which conflict with this patch:

Patches required with this patch:

Obsoleted by:
Files included with this patch:
 
Problem Description:
 

1245745: Create unix version of flash-update for UltraEnterprise
Servers

Patch Installation Instructions:
--------------------------------

Special Install Instructions:
-----------------------------

        NOTE: Please read this entire file before proceeding to
update
                the flash proms on your system.

The above mentioned series of machines contain their firmware in
flashproms
that can be programmed with new code while the machine is up and
running
unix. The reprogrammed flashproms take affect next time you reset
the machine.

Here are the recommended steps to download the latest firmware on
your system:

    1) Login as root on the system whose firmware needs to be
        upgraded.

    2) Execute the binary './flash-update' in this directory.

                a) It first extracts the flashprom driver from
itself
                   and installs the driver on the system.

                b) Next it extracts the program that actually flash
                   updates the proms with the right images.

    3) The program display the current revision of the proms in
        your system. It then displays the versions that are
available
        in the release of the flash-update-<n> program that you
have
        executed.

    3) The program will display its progress at every step. Here
        is a sample output where a machine has boards 0, 1, 4, 5
        and 6 and its flash proms are being updated:

# ./flash-update-<latest-rev>
Generating flashprom driver...
Generating SUNW,Ultra-Enterprise flash-update program...

Current System Board PROM Revisions:
------------------------------------
Board 0: cpu OBP 3.2.2 1996/03/20 10:07 POST 3.0.3
1996/03/16 17:54
Board 4: cpu OBP 3.2.2 1996/03/20 10:07 POST 3.0.3
1996/03/16 17:54
Board 1: dual-sbus FCODE 1.7.0 1996/03/20 10:07 iPOST 3.0.3
1996/03/16 17:55
Board 5: upa-sbus FCODE 1.7.0 1996/03/20 10:07 iPOST 3.0.3
1996/03/16 17:55
Board 6: dual-sbus FCODE 1.7.0 1996/03/20 10:07 iPOST 3.0.3
1996/03/16 17:55

Available 'Update' Revisions:
-----------------------------

                 cpu OBP 3.2.3 1996/04/04 20:23 POST 3.1.4
1996/04/04 20:23
           dual-sbus FCODE 1.7.0 1996/04/04 20:23 iPOST 3.1.4
1996/04/04 20:23
            upa-sbus FCODE 1.7.0 1996/04/04 20:23 iPOST 3.1.4
1996/04/04 20:23

Verifying Checksums: Okay

Do you wish to flash update your firmware? y/[n] : y <-----
Enter y here

Are you sure? y/[n] : y <-----
Enter y here

Updating Board 0: Type 'cpu'
1 Erasing ... Done.
1 Verifying Erase ... Done.
1 Programming ... Done.
1 Verifying Program ... Done.

Updating Board 4: Type 'cpu'
1 Erasing ... Done.
1 Verifying Erase ... Done.
1 Programming ... Done.
1 Verifying Program ... Done.

Updating Board 1: Type 'dual-sbus'
1 Erasing ... Done.
1 Verifying Erase ... Done.
1 Programming ... Done.
1 Verifying Program ... Done.

Updating Board 5: Type 'upa-sbus'
1 Erasing ... Done.
1 Verifying Erase ... Done.
1 Programming ... Done.
1 Verifying Program ... Done.

Updating Board 6: Type 'dual-sbus'
1 Erasing ... Done.
1 Verifying Erase ... Done.
1 Programming ... Done.
1 Verifying Program ... Done.
#

        NOTE: The flash proms are write protected by either of
the
                following two conditions:
                
                        a) Front panel key switch in secure mode.
                        b) Jumper (P601) removed on clock board.

                At the time of writing this document systems are
                shipped with the jumper on the clock board
installed.
                This means that only the front panel key switch
                being in secure position write protects the proms.

                If the proms are detected to be write protected
then
                the flash update process will fail with the
following
                message:

FPROM Write Protected: Check Write Enable Jumper or Front Panel Key
Switch.

                If there is a power failure while the flash proms
are
                being upgraded then you need to follow steps listed
                in the document:

                        Ultra Enterprise 6000/5000/4000/3000
                        System Flash PROM Programming Guide
                        Part No.: 802-5579-10
                        Rev. A, March 1996

        The data under the "Current System Board PROM Revisions:"
        label lists:

                a) Board number
                b) Board type:
                        1) cpu: CPU/Memory
                        2) dual-sbus: IO Type 1
                        3) upa-sbus: IO Type 2
                c) OpenBoot/FCODE revision
                d) OpenBoot/FCODE date and time
                e) POST/iPOST revision
                f) POST/iPOST date and time

        The data under the "Available 'Update' Revisions:" lists
the
        following (for each type of board). These are the images
that
        are going to be installed on your boards if you answer 'y'
to
        the two questions:

                a) Type of image
                b) OpenBoot/FCODE revision
                c) Openboot/FCODE date/time
                d) POST/iPOST revision
                e) POST/iPOST date and time

        
        Once you answer yes to the two questions the flash update
        of the proms images starts. Any other response gets you out
        of the flash-update process.

        Information is displayed for each board being updated.

        There are four parts to the update process:

                1) Erasing the flash prom.
                2) Verifying that the erase actually happened.
                3) Programming flash proms with the new images.
                4) Verifying that the programming was without
errors.

        The number displayed on the left most column is the pass
number.
        If either of the 4 steps listed above fail then all the 4
steps
        have to be repeated and the pass number is incremented.

    4) Once the flash update program completes it removes the
flashprom
        driver from the machine. It also cleans up the temporary
files
        that it created in the /tmp directory.

    5) Next time when you halt and reboot your system please
        power-cycle the system for the new flashproms to take
        affect.

Since this directory will contain all revisions of the firmware
released for these machine you can follow the same steps to downrev
your firmware if the need ever arises. Execute the correct revision
of the flash-upate-<n> program. The higher revision number of this
patch will contain a newer release of the firmware.

Version -01 specific information:
---------------------------------

The version -01 of flash-update in this directory contains the
following rev
of the firmware:

        CPU/Memory Board:
                 Ultra Enterprise 3.2 Version 3 created 1996/04/04
20:23
                 POST 3.1.4 1996/04/04 20:23

        IO Graphics Board:
                 I/O Type 2 1996/04/04 20:23
                 iPOST 3.1.4 1996/04/04 20:23

        Dual Sbus IO Board:
                 I/O Type 1 1996/04/04 20:23
                 iPOST 3.1.4 1996/04/04 20:23

Version -02 specific information:
---------------------------------

        CPU/Memory Board:
                 Ultra Enterprise 3.2 Version 4 created 1996/05/30
11:17
                 POST 3.2.2 1996/05/29 19:46

        IO Graphics Board:
                 I/O Type 2 1996/05/30 11:17
                 iPOST 3.2.2 1996/05/29 19:46

        Dual Sbus IO Board:
                 I/O Type 1 1996/05/30 11:17
                 iPOST 3.2.2 1996/05/29 19:46

Bugs fixed in this release for the firmware:

1246116 OBP prints incorrect about of memory in its banner
1244993 Unable to boot SunFire from NPI FDDI card 3.0/4.0
1248141 OBP needs to inform failed SIMM information to system
1250943 .speed command missing in Sunfire OBP
1246458 OBP prints incorrect ratio frequency in 2/3 mode
1254307 TODC checksum destroyed on power cycle

===========================================================
Erik Tolido Phone: (203) 921 - 4450
Unix Systems Engineer (800) 437 - 2778
CNS Group Fax: (203) 921 - 4489
500 West Avenue E-mail: etolido@flight.cnsgroup.com
Stamford, CT 06902 Web: http://flight.cnsgroup.com
===========================================================



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:06 CDT