Thanks to those who responded to my original posting:
Subject: 128Mb memory added to 4/470--now system crashing without a trace
Message-ID: <cornell.700211987@vivaldi>
which commented on mysterious crashes after adding a 128Mb memory board
(Parity Systems) to our Sun 4/470's original 32Mb (for a total of 160Mb).
The problem was that the system would go down without giving a panic or
and messages in /var/adm/messages. This would happen during moderate
and light loads on the system (haven't had a heavy load on it yet).
The responses I got fell into two categories--heat and jumper positions.
I did get a panic message with the last crash and I'd like to know if
this verifies the jumpering issue.
4/470 setup:
Original 32Mb board in slot 1 jumper position 0
128Mb addition in slot 2 jumper position 2
ALM-2 board in slot 7
Since ALM-2 board was already present, I chose
not to use slot 7 (which was recommended) for
the position of the 128Mb board.
eeprom -> memsize=160; memtest=160
This worked fine for two days over the weekend and then
has crashed three times--twice without panic message and
once with.
SUMMARY of responses:
One category of responses dealt with jumper positions. It
was suggested by several people that my problem lies with
jumpering the 32Mb board to be jumpered "logically" (position
0) before the 128Mb board (position 2). It was recommended
that the board with larger simms come logically before the
board with smaller simms (i.e. jumper 128Mb board to 0 and
32Mb board to 1).
The other category of responses dealt with heat considerations.
Possibly heat built up by the new board could cause problems
with the systems if airflow was interrupted.
ADDITIONAL information:
I have not changed any jumpers at this time or slot positions
and crashed again. This time I got the following messages:
vmunix: Memory Error Register 74d3<INTR,INTENA,CE_ENA,UE,CE>
vmunix: DVMA = 0, context = 3a, virtual address = 13f10
vmunix: pme = 82003cc7, physical address = 798ff10
vmunix: panic: uncorrectable ECC error
vmunix: mem2: soft ecc addr 398ff00 syn e9<S32,S16,S8,S2,S
X> No bit information
QUESTION:
Does anyone know if these messages verifies the first category
of suggestions (jumper positioning)? The power-up memory check
goes through without any problems.
Since things appear to run OK for quite a while and then crash
out of nowhere, it's difficult to test configurations.
-- | Cornell Kinderknecht | Email: cornell@csl.dl.nec.com | | CNAD/CCSL | UUCP: uunet!necbsd!cornell | | NEC USA/NEC America, Inc. | Phone: 214-518-3509 | | Irving, TX | Fax: 214-518-3552 |
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:38 CDT