Thank you very much to all who responded: Jed Dobson <jed@wgtech.com> "Miller Sutfin" <millersutfin@earthlink.net> Ray Ballisti <ballisti@ifh.ee.ethz.ch> Aleksey Tsalolikhin <eesti@corp.earthlink.net> crpollino@e-milio.com "Craig Scott" <craig.scott@stc.ac.uk> "JULIAN, JOHN C (AIT)" <jj2195@sbc.com> Eric Priebe <epriebe@ACUS.com> I received mixed results as to what it was exactly. Some say the ecache parity error does not affect the U10s, some say it is the ecache parity error. A few were nice enough to point out that EDP means Ecache Data Parity, so whatever the issue, it is with the ecache. Ray Ballisti <ballisti@ifh.ee.ethz.ch> suggested running the POST diagnostic routine, which actually came up clean in this case. Thanks for the suggestion, though! The overwhelming general consensus was to that the CPU needs to be replaced. Thank you to everyone who responed, as well as anyone who may be responding as I type this! - Dan ORIGINAL MESSAGE: Hello, all. We have a machine that keeps crashing, and I think it is the ecache parity error. I have been waiting for it to happen again before I sent an e-mail to this list, though. Could anyone look at this and tell me if they think it is the ecache error? If not, any clues as to what it is? Thanks in advance! I will summarize. - Dan uname -a: SunOS netdev 5.8 Generic_108528-14 sun4u sparc SUNW,Ultra-5_10 I have tracked here is the info for the first one (note they are slightly different): echo '$c' | adb -k unix.1 vmcore.1: physmem 173a7 panicsys(104234b0,1040c198,10050068,78002000,57542400,c) + 44 vpanic(10050068,1040c198,16e76a3d8cac,10,30000689ea8,30000068438) + cc panic(10050068,804,1,1041a798,fffd,20) + 1c sync_handler(1041a980,10400000,0,0,0,2) + 150 prom_rtt(10000000,16,f0000000,16e7332a6da9,0,2) client_handler(f0066d2c,2a10007d6e8,1,104283d8,1,1041a980) + 2c prom_enter_mon(0,6,b,2a10004bd40,2a10007dd40,0) + 28 debug_enter(0,16e73315c8c5,16e73315c8c9,0,30000ddf1e8,0) + d0 kbdinput(1045a400,4d,30000689d68,300001b5000,0,1013dd4c) + 304 kbdrput(30000adabe8,30000f7e340,30000ad3a98,30000f7e340,30000689d68,30000ad3a20) + 13c putnext(30000adae48,30000ad9a90,30000adb0a8,30000f7e340,0,0) + 1cc async_softint(30000f7e340,1,ffff,20000,0,30000adae48) + 568 asysoftintr(3000017a008,30000b7e000,1,2a10007dd40,10180,1026fba8) + 70 intr_thread(2a10001fd40,1041b180,10423890,10423890,0,0) + a4 idle(1040f864,0,0,1041b180,3000005d6c8,0) + 54 thread_start(0,0,0,0,0,0) + 4 /var/adm/messages from this one: Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 932869 kern.warning] WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 0x00015289.afcae2ba Apr 12 17:59:18 netdev AFSR 0x00000000.80400080<PRIV,EDP> AFAR 0x00000000.3d41fa68 Apr 12 17:59:18 netdev AFSR.PSYND 0x0080(Score 95) AFSR.ETS 0x00 Fault_PC 0x10031cc8 Apr 12 17:59:18 netdev UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 683009 kern.info] [AFT2] errID 0x00015289.afcae2ba PA=0x00000000.3d41fa68 Apr 12 17:59:18 netdev E$tag 0x00000000.0003cf50 E$State: Modified E$parity 0x03 Badlines found=6 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.10041eb0 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.10041eb4 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.0247e008 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.10423890 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.10041eb0 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x80000000.00000000 *Bad* PSYND=0x0080 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x000002a1.000b7d20 Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 601312 kern.info] [AFT2] errID 0x00015289.afcae2ba AFAR was derived from E$Tag Apr 12 17:59:18 netdev unix: [ID 836849 kern.notice] Apr 12 17:59:18 netdev ^Mpanic[cpu0]/thread=2a10007dd20: Apr 12 17:59:18 netdev unix: [ID 455523 kern.notice] [AFT1] errID 0x00015289.afcae2ba EDP Error(s) Apr 12 17:59:18 netdev See previous message(s) for details Apr 12 17:59:18 netdev unix: [ID 100000 kern.notice] Apr 12 17:59:18 netdev genunix: [ID 723222 kern.notice] 000002a10007d200 SUNW,UltraSPARC-IIi:cpu_aflt_log+4e0 (2a10007d2be, 1, 101483a0, 2a10007d448, 2a10007d30b, 101483c8) Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 000002a10007d510 0000000000000003 0000000000000010 Apr 12 17:59:19 netdev %l4-7: 0000000000200000 0000000000400000 0000000000000000 000002a10001f9c0 Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d450 SUNW,UltraSPARC-IIi:cpu_async_error+868 (1, 2a10007d510, 80400080, 0, 640000080400080, 2a10007d6d0) Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000032 0000000000000000 0000000000000000 Apr 12 17:59:19 netdev %l4-7: 0000000000000219 0000000000000000 000003000005d748 0000000000000000 Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d620 unix:prom_rtt+0 (300001b2000, 8000000000000000, a, a, 0, 0) Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000001400 0000000000001600 000000001013fb54 Apr 12 17:59:19 netdev %l4-7: 0000030000697ea0 0000000000000001 000000000000000a 000002a10007d6d0 Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d770 genunix:callout_schedule_1+4 (300001b2000, 10443508, 300001b5000, 10072cf4, 0, 101424b0) Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000008 0000000000000002 0000000000000001 000000001041b718 Apr 12 17:59:20 netdev %l4-7: 000000001041b338 0000000000000016 000000001041baf8 000002a10007d7b0 Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d820 genunix:callout_schedule+54 (104391fc, 1, 10439178, 8, 1, 300000683c8) Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice] %l0-3: 00000000100d312c 0000030000cec000 0000030000d79602 0000030000cec000 Apr 12 17:59:20 netdev %l4-7: 000003000188f040 0000000000000000 000003000148af00 000002a10051dba0 Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d8d0 genunix:clock+474 (1045a800, 1041b338, 1042dc00, 94f476874837, 0, 0) Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000001 000002a10007dd20 0000000000000000 Apr 12 17:59:20 netdev %l4-7: 000000001045a000 000000003b9aca00 000000001041baf8 00000000fed3a004 Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d9a0 genunix:cyclic_softint+a4 (1041b338, 30000057928, 1, 3, 30000068478, 10073f0c) Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000030000057930 800000000237f894 0000000000000000 0000030000068478 Apr 12 17:59:20 netdev %l4-7: 00000300000578c8 000003000068dea8 0000000000000000 000003000068ded0 Apr 12 17:59:21 netdev genunix: [ID 723222 kern.notice] 000002a10007da60 unix:cbe_level10+8 (0, 803, 1041b338, 2a10007dd20, 10060, 1000b34c) Apr 12 17:59:21 netdev genunix: [ID 179002 kern.notice] %l0-3: 00000000102e4934 0000000000000001 0000000000000001 0000030000070ed8 Apr 12 17:59:21 netdev %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 12 17:59:21 netdev unix: [ID 100000 kern.notice] Apr 12 17:59:21 netdev genunix: [ID 672855 kern.notice] syncing file systems... Apr 12 17:59:21 netdev genunix: [ID 904073 kern.notice] done Apr 12 17:59:22 netdev genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 322174976 Apr 12 17:59:22 netdev uata: [ID 606412 kern.warning] WARNING: timeout: reset bus chno = 0 targ = 0 Apr 12 17:59:38 netdev genunix: [ID 409368 kern.notice] ^M100% done: 8116 pages dumped, compression ratio 3.96, Apr 12 17:59:38 netdev genunix: [ID 851671 kern.notice] dump succeeded And now for the second crash: echo '$c' | adb -k unix.0 vmcore.0: physmem 173a7 panicsys(104234b0,1040c198,10050068,78002000,39ff00,c) + 44 vpanic(10050068,1040c198,faabfb648,10,30000689ea8,30000068438) + cc panic(10050068,804,1,1041a798,fffd,20) + 1c sync_handler(1041a980,10400000,0,0,0,2) + 150 prom_rtt(10000000,16,f0000000,f810ca9c6,0,2) client_handler(f0066d2c,2a10007d6e8,1,104283d8,1,1041a980) + 2c prom_enter_mon(0,6,b,2a10004bd40,2a10007dd40,0) + 28 debug_enter(0,f80db6987,f80db698a,0,30001092020,0) + d0 kbdinput(1045a400,4d,30000689d68,300001b5000,0,1013dd4c) + 304 kbdrput(30000adabe8,3000108f080,30000ad3a18,3000108f080,30000689d68,30000ad39a0) + 13c putnext(30000adae48,30000ad9a90,30000adb0a8,3000108f080,0,0) + 1cc async_softint(3000108f080,1,ffff,20000,0,30000adae48) + 568 asysoftintr(3000017a008,30000b7e000,1,2a10007dd40,10180,1026fba8) + 70 intr_thread(2a10001fd40,1041b180,10423890,10423890,0,0) + a4 idle(1040f864,0,0,1041b180,3000005d6c8,0) + 54 thread_start(0,0,0,0,0,0) + 4 /var/adm/messages leading up to the reboot: Apr 24 12:20:07 netdev SUNW,UltraSPARC-IIi: [ID 370172 kern.warning] WARNING: [AFT1] EDP event on CPU0 Instruction access at TL=0, errID 0x0001d01e.baad443a Apr 24 12:20:07 netdev AFSR 0x00000000.004000f0<EDP> AFAR 0xffffffff.ffffffff Apr 24 12:20:07 netdev AFSR.PSYND 0x00f0(Score 45) AFSR.ETS 0x00 Fault_PC 0x97560 Apr 24 12:20:07 netdev UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00 Apr 24 12:20:07 netdev SUNW,UltraSPARC-IIi: [ID 798591 kern.info] [AFT2] errID 0x0001d01e.baad443a No error found in ecache (No fault PA available) Apr 24 12:20:07 netdev unix: [ID 836849 kern.notice] Apr 24 12:20:07 netdev ^Mpanic[cpu0]/thread=3000165a440: Apr 24 12:20:07 netdev unix: [ID 424580 kern.notice] [AFT1] errID 0x0001d01e.baad443a EDP Error(s) Apr 24 12:20:07 netdev See previous message(s) for details Apr 24 12:20:08 netdev unix: [ID 100000 kern.notice] Apr 24 12:20:08 netdev genunix: [ID 723222 kern.notice] 000002a1005dd6d0 SUNW,UltraSPARC-IIi:cpu_aflt_log+4e0 (2a1005dd78e, 1, 101483a0, 2a1005dd918, 2a1005dd7db, 101483c8) Apr 24 12:20:08 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 000002a1005dd9e0 0000000000000003 0000000000000010 Apr 24 12:20:08 netdev %l4-7: 0000000000200000 0000000000400000 0000000000000001 0000000000000080 Apr 24 12:20:08 netdev genunix: [ID 723222 kern.notice] 000002a1005dd920 SUNW,UltraSPARC-IIi:cpu_async_error+868 (1, 2a1005dd9e0, 4000f0, 0, 1400000004000f0, 2a1005ddba0) Apr 24 12:20:08 netdev genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 000000000000000a 0000000000000000 0000000000000000 Apr 24 12:20:08 netdev %l4-7: 0000000000004208 0000000000000000 00000000007fbdd0 0000000000000084 Apr 24 12:20:08 netdev unix: [ID 100000 kern.notice] Apr 24 12:20:08 netdev genunix: [ID 672855 kern.notice] syncing file systems... Apr 24 12:20:09 netdev genunix: [ID 733762 kern.notice] 1 Apr 24 12:20:10 netdev genunix: [ID 904073 kern.notice] done _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Apr 25 09:53:24 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:41 EST