Many thinks to Moore, L. Bryan, Joe Fletcher, Eduardo Sanchez M., Shawn Russell, Tim Chipman, Hindley Nick, Smith Cathy-CSMITH4, Steve Beuttel, Edward Scown, Chris Keladis, Vlad, Buddy Lumpkin, Rick McKinney. My original question is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Machine was automatically reboot yesterday. No clue in log files. I've got a crash dump in /var/crash/<machine name>/vmcore.1, run adb -k unix vmcore.1 $<msgbuf I've got _______________________________________________________ ........ WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access a t TL=0, errID 0x00013161.1f9f27e0 AFSR 0x00000000.80200000<PRIV,UE> AFAR 0x00000000.f9569c00 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10072e1 8 UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00 UDBH Syndrome 0x3 Memory Module Board 0 J3100 J3200 J3300 J3 400 J3500 J3600 J3700 J3800 WARNING: [AFT1] errID 0x00013161.1f9f27e0 Syndrome 0x3 indicates that this may not be a memory module problem [AFT2] errID 0x00013161.1f9f27e0 PA=0x00000000.f9569c00 E$tag 0x00000000.08401f2a E$State: Shared E$parity 0x04 [AFT2] E$Data (0x00): 0x65b631b8.20000000 *Bad* PSYND=0xff00 [AFT2] E$Data (0x08): 0x122e7c40.11d2fbc0 [AFT2] E$Data (0x10): 0x1243fc00.1243fc00 [AFT2] E$Data (0x18): 0x00000000.00f72000 [AFT2] E$Data (0x20): 0x00000000.00000000 [AFT2] E$Data (0x28): 0x00000000.00000000 [AFT2] E$Data (0x30): 0x00000000.0006eabd [AFT2] E$Data (0x38): 0x02020000.00000000 WARNING: [AFT1] CP event on CPU5 (caused Data access error on CP U1), errID 0x00013161.1f9f27e0 AFSR 0x00000000.01000800<CP> AFAR 0x00000000.f9569c00 AFSR.PSYND 0x0800(Score 95) AFSR.ETS 0x00 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00 [AFT2] errID 0x00013161.1f9f27e0 PA=0x00000000.f9569c00 E$tag 0x00000000.19401f2a E$State: Owner E$parity 0x0c [AFT2] E$Data (0x00): 0x65b631b8.20000000 *Bad* PSYND=0x0800 [AFT2] E$Data (0x08): 0x122e7c40.11d2fbc0 [AFT2] E$Data (0x10): 0x1243fc00.1243fc00 [AFT2] E$Data (0x18): 0x00000000.00f72000 [AFT2] E$Data (0x20): 0x00000000.00000000 [AFT2] E$Data (0x28): 0x00000000.00000000 [AFT2] E$Data (0x30): 0x00000000.0006eabd [AFT2] E$Data (0x38): 0x02020000.00000000 panic[cpu1]/thread=0x63f03ba0: [AFT1] errID 0x00013161.1f9f27e0 UE Error(s) See previous message(s) for details syncing file systems... 2 2 2panic[cpu1]/thread=0x30053e80: pani c sync timeout --------------------------------------------------------------------- I cannt tell if CPU1 make system panic or memory form above output. Can anyone help me what I shoud do next to certain what make system panic. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Steve Beuttel pointed the reason, so I copy his answer as following: If this is a 400MHz. or faster, CPU, then I believe this means CPU5's ecache lost address data that hosed an address location in the RAM on Board 0, where CPU1 later tried to access it. What happens is that the access indexes a location (no longer in the ecache) that is gone and so an out of bounds read or write results, causing the panic. It may happen in 5 minutes or not again for months. I would at least get CPU5 replaced (it's that old ecache problem). They'll want to wait until it happens again, but this is classic. Thanks you all. _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri Feb 1 12:56:30 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:33 EST