It took awhile but I finally found the source of the problem. My best guess is a power management type bug when more than 4Gb RAM is installed. While I tried various core analysis tools including this excellent free SUN core analysis tool , http://wwws.sun.com/software/download/products/3fce7df0.html and this one, http://sunsolve.sun.com/diag/iscda/iscda.sh , they are not very useful in my case without more knowledge of how the kernel works. Those tools appear to be powerfull tools in the hands of someone who understands kernels in general. I used low tech methods to fine my problem (duplicate problem on another system and compare systems using diff). I moved the RAM to a second Blade 1000, waited for a crash and then upgraded the OpenBoot version on the Blade 1000 when it did not crash. It seems that the latest OBP version creates this variable and turns it on by default : energystar-enabled?=true I assume it triggers something in the power management daemon, powerd, that causes the panic. By setting the variable to false or disabing powerd, the crashes stop. I did not find any SUN bug or patch reported against powerd/panic but I did not look that hard. Thanks to Joe Fletcher, Jeff Cole and Casper Dik for some suggestions. Marcelino > -----Original Message----- >From: Marcelino Mata >Sent: Monday, May 31, 2004 5:26 PM >To: Sunmanagers (E-mail) >Subject: kernel panics (BAD TRAP) > > >Running Solaris 8 with April 04 patch cluster. > >I have upgraded a Blade 1000 from 1.5Gb to 5Gb several months >ago and I started to get serious kernel panics on a regular >basis (every 40-75 minutes). The computer is fine when it is >under any significant or heavy load but once it becomes idle >for awhile, the kernel panics. The problem seems to happen >the first 15 minutes after the user logs out and then it keeps >recurring until the system no longer boots due to corrupted >/etc/path_to_inst file. I am not 100% certain that the >problem started with the additional 5GB RAM. It has been so >long since I have started experimenting with the root cause of >the problem that I vaguely remember that the computer did have >several vmcore files. At the time, I did not have a serious >problem and I was getting low on disk space so I deleted the >original vmcore files (each vmcore is 400Mb). At one point >the computer was fine for about one week so I thought I had >the problem licked. I never figured out why it was fine for >several days. SunVTS 4.6 produced no hardware problems. > >I tried doing some core analysis and initially thought the >problem was related to fsflush operations. I performed the >analysis based on the information from this site >http://www.princeton.edu/~psg/unix/solaris/troubleshoot/adbcore .html. However, checking other core files, I see references to other >processes. I am either not doing the analysis correctly or >the problem is not being reflected in the core file (is this >possible?). > >I ran across a posting for core analysis which says to do the >following and forward the information to the poster. > >cd /var/crash/`uname -n` >echo '$<msgbuf' | adb -k unix.4 vmcore.4 >echo '$c' | adb -k unix.4 vmcore.4 > > Can anyone explain what this is showing or give advice on >what to do next? > >Marcelino > ># echo '$<msgbuf' | adb -k unix.4 vmcore.4 >physmem 9b34e >30001fdc223: /pci@8,700000/usb@5,3/keyboard@1 (hid0) online >3000206dee2: PCI-device: SUNW,XVR-500@1, ifb0 >3000206dac3: ifb0 is /pci@8,700000/SUNW,XVR-500@1 >3000206d800: cpu0: UltraSPARC-III+ (portid 0 impl 0x15 ver >0x22 clock 900 MHz) >3000206d540: se0 at ebus0: offset 1,400000 >3000206d283: se0 is /pci@8,700000/ebus@5/serial@1,400000 >3000206cfc2: PCI-device: firewire@5,2, hci13940 >3000206cd03: hci13940 is /pci@8,700000/firewire@5,2 >3000206ca42: PCI-device: network@5,1, eri0 >3000206c783: eri0 is /pci@8,700000/network@5,1 >3000206c203: dump on /dev/dsk/c1t1d0s1 size 2048 MB >30005a2bc42: pseudo-device: devinfo0 >30005a2b983: devinfo0 is /pseudo/devinfo@0 >30005a2b55f: SUNW,eri0 : 100 Mbps full duplex link up >30005a2b140: /pci@8,700000/scsi@6 (glm0): glm0 >supports power management. >30005a2ae80: /pci@8,700000/scsi@6 (glm0): Rev. 7 >Symbios 53c875 found. >30005a2abc2: PCI-device: scsi@6, glm0 >30005a2a903: glm0 is /pci@8,700000/scsi@6 >30005a2a640: /pci@8,700000/scsi@6,1 (glm1): glm1 >supports power management. >30005a2a380: /pci@8,700000/scsi@6,1 (glm1): Rev. 7 >Symbios 53c875 found. >30005a2a0c2: PCI-device: scsi@6,1, glm1 >30005b95d83: glm1 is /pci@8,700000/scsi@6,1 >30005b95ac0: sd6 at glm0: target 6 lun 0 >30005b95803: sd6 is /pci@8,700000/scsi@6/sd@6,0 >30005b95280: /pci@8,700000/scsi@6,1/st@5,0 (st12): ><Vendor 'COMPAQ ' Product 'SDX-300C '> >30005b94fc0: st12 at glm1: target 5 lun 0 >30005b94d03: st12 is /pci@8,700000/scsi@6,1/st@5,0 >30005b94780: ecpp0 at ebus0: offset 1,300278 >30005b944c3: ecpp0 is /pci@8,700000/ebus@5/parallel@1,300278 >30005b94200: scmi2c0 at ebus0: offset 0,40 >30005bd1f03: scmi2c0 is >/pci@8,700000/ebus@5/i2c@1,30/card-reader@0,40 >30005bd1c42: pseudo-device: winlock0 >30005bd1983: winlock0 is /pseudo/winlock@0 >30005bd16c2: pseudo-device: lockstat0 >30005bd1403: lockstat0 is /pseudo/lockstat@0 >30005bd1142: pseudo-device: vol0 >30005bd0e83: vol0 is /pseudo/vol@0 >30005bd0bc3: upa64s0 at root: SAFARI 0x8 0x480000 >30005bd0902: pseudo-device: llc10 >30005bd04e3: llc10 is /pseudo/llc1@0 >30005bd00c0: audiocs0 at ebus0: offset 1,200000 >30005bc3d83: audiocs0 is /pci@8,700000/ebus@5/audio@1,200000 >30005bc3ac2: pseudo-device: pm0 >30005bc3803: pm0 is /pseudo/pm@0 >30005bc3542: pseudo-device: tod0 >30005bc3283: tod0 is /pseudo/tod@0 >30005bc2e62: pseudo-device: lofi0 >30005bc2ba3: lofi0 is /pseudo/lofi@0 >30005bc28e2: pseudo-device: fcp0 >30005bc2623: fcp0 is /pseudo/fcp@0 >30005bc2360: fcip attach for port instance (0x0) successful >30005bc20a2: PCI-device: pci108e,7063@2, sunpci2drv0 >30005d0bda3: sunpci2drv0 is /pci@8,700000/pci108e,7063@2 >30005d0bae2: pseudo-device: fssnap0 >30005d0b823: fssnap0 is /pseudo/fssnap@0 >30005d0b560: gpio_873170 at ebus0: offset 1,300600 >30005d0b2a3: gpio_873170 is /pci@8,700000/ebus@5/gpio@1,300600 >30005d0ad22: pseudo-device: devinfo0 >30005bc2d03: devinfo0 is /pseudo/devinfo@0 >30005d0b140: /pci@8,700000/scsi@6,1/st@5,0 (st12): ><Vendor 'COMPAQ ' Product 'SDX-300C '> >30005bc33e0: st12 at glm1: target 5 lun 0 >30005bc3963: st12 is /pci@8,700000/scsi@6,1/st@5,0 >30005bc3c20: ecpp0 at ebus0: offset 1,300278 >30005bd0223: ecpp0 is /pci@8,700000/ebus@5/parallel@1,300278 >30005bc2a40: scmi2c0 at ebus0: offset 0,40 >30005bd0a63: scmi2c0 is >/pci@8,700000/ebus@5/i2c@1,30/card-reader@0,40 >30005bc3ee2: pseudo-device: lockstat0 >30005bd0fe3: lockstat0 is /pseudo/lockstat@0 >30005bd0642: pseudo-device: llc10 >30005bd1563: llc10 is /pseudo/llc1@0 >30005bd0d20: audiocs0 at ebus0: offset 1,200000 >30005bd1ae3: audiocs0 is /pci@8,700000/ebus@5/audio@1,200000 >30005bd12a2: pseudo-device: tod0 >30005b940a3: tod0 is /pseudo/tod@0 >30005bd1822: pseudo-device: lofi0 >30005b94623: lofi0 is /pseudo/lofi@0 >30005bd1da0: fcip attach for port instance (0x0) successful >30005b948e2: PCI-device: pci108e,7063@2, sunpci2drv0 >30005b95123: sunpci2drv0 is /pci@8,700000/pci108e,7063@2 >30005b94362: pseudo-device: fssnap0 >30005b95963: fssnap0 is /pseudo/fssnap@0 >30005b94e60: gpio_873170 at ebus0: offset 1,300600 >30005b95ee3: gpio_873170 is /pci@8,700000/ebus@5/gpio@1,300600 >30005b953e2: pseudo-device: devinfo0 >30005a2a4e3: devinfo0 is /pseudo/devinfo@0 >30005bc36a0: >panic[cpu0]/thread=30006cf0000: > >30005d0b980: BAD TRAP: type=34 rp=2a10093d0b0 >addr=baddcafebaddcafe mmu_fsr=0 >30005d0bc40: >30005a2b2a0: mibiisa: >30005d0b400: alignment error: >30005a2bae0: addr=0xbaddcafebaddcafe >30005a2b820: pid=420, pc=0x100f5288, sp=0x2a10093c951, >tstate=0x4480001606, context=0x8b8 >30005a2afe0: g1-g7: 1043ac00, 0, 30005b96fc8, 5, 1, 0, 30006cf0000 >30005b94a40: >3000206d6a3: 000002a10093cde0 unix:die+a4 (34, 2a10093d0b0, >baddcafebaddcafe, 0, 2a10093d0b0, 0) >3000206d3e3: %l0-3: 0000000000000000 0000030006c9b4d0 >0000000000000005 0000030006cf970a > %l4-7: 0000030006cf9688 0000030006c9b4c8 0000030006c9b448 >0000030006cf942830005a2a223: 000002a10093cec0 unix:trap+5d0 >(baddcafebaddcafe, 0, 80000d, 10000, 2a10093d0b0, 0) >30005a2b403: %l0-3: 000003000002cec0 0000030006a44240 >0000030006a9eaa0 0000000000000000 %l4-7: 0080000d00000034 >0000030006ce35380000000000010000 0000000000000000 >3000206ce63: 000002a10093d000 unix:prom_rtt+0 >(baddcafebaddcafe, 0, b8, 3000002cb40, 1, 300000ed618) >30005bc24c3: %l0-3: 0000000000000007 0000000000001400 >0000004480001606 000000001002bf0c > %l4-7: 00000000ff095c68 0000000000000000 0000000000000000 >000002a10093d0b0 >3000007fda3: 000002a10093d150 genunix:build_sqlist+40 >(30006cf8508, 30006c9b2c0, 0, 30006cf8508, b8, 0) >3000206cba3: %l0-3: 0000030006cf8650 0000000000020000 >0000000000000000 0000000000000000 > %l4-7: 00000000000000b0 00000000104121a0 0000000000000000 >0000000000000000 >30005a2bda3: 000002a10093d200 genunix:removeq+184 >(30006cf8508, 30006cf8508,30006cf86c8, 0, 30006cf85e8, 30006c9b2c0) >30005bc2fc3: %l0-3: 0000030006cf8508 0000000000020000 >0000000000000001 0000 >000000007fff %l4-7: 0000000000000000 0000000000000000 >0000030006ce36b0 0000000000000000 >30005d0afe3: 000002a10093d2b0 udp:udp_close+8 (30006cf8508, >30006ca0458, 30001243f28, f500, 4400, 30006cf86c8) >30005b956a3: %l0-3: 0000000000000400 0000000000000000 >0000000000000100 0000 >030006cf8710 %l4-7: 0000000000007fff 000000000000008f >0000030006ce2580 000002a100951af0 >30005a2ad23: 000002a10093d360 genunix:qdetach+90 (4400, >30001243f28, 3, 30006cf85e8, 0, 30006cf8508) >3000007f983: %l0-3: 000000001025a4c4 0000000000000003 >0000030006cf9688 0000 >000000000000 %l4-7: 0000000000007fff 000000000000008f >00000300067af0e0 000002a100957af0 >3000206c8e3: 000002a10093d410 genunix:strclose+3c8 >(30006a44148, 0, 30001243f28, 3, 30006cf8e88, 200000) >30005a2a7a3: %l0-3: 000000001046ec58 0000030006c9b348 >0000000000000005 0000030006cf8fea > %l4-7: 0000030006cf8f68 0000030006c9b340 0000030006c9b2c0 >0000030006cf85e8 >30005d0ae83: 000002a10093d4e0 specfs:device_close+8c >(30006a44248, 3, 300000051, 30001243f28, 0, 0) >30001fdc383: %l0-3: 0000000000000000 0000030006c9b4d0 >0000000000000005 0000030006cf970a %l4-7: 0000030006cf9688 >0000030006c9b4c8 0000030006c9b448 0000030006cf9428 >30001fdc7a3: 000002a10093d590 specfs:spec_close+124 (fc00, >30001243f28, 300000051, 3, 100, 0) >30001fdcbc3: %l0-3: 0000030006a44248 0000030006a44240 >0000030006a44228 0000030006a44140 %l4-7: 0000030006cedea0 >0000030006cedea0 0000000000002000 0000000000000000 >30001fdcfe3: 000002a10093d640 genunix:closef+58 (1046f000, >30006bb2738, 0, 30006a44248, 1, 300000ed618) >30001fdd403: %l0-3: 000000001015be40 0000000010474ee0 >0000000000000040 0000030006ce0de0 > %l4-7: 00000000ff095c68 0000000000000000 0000030006ce2210 >000000000000000030001fdd823: 000002a10093d6f0 >genunix:closeall+30 (2, 30006aa6070, 20, 30006a9f428, 2c7cf00, 0) >30001fddc43: %l0-3: 0000030000031dc0 000000001013ddd0 >0000000000000000 0000 >000000000000 %l4-7: 00000000000000b0 00000000104121a0 >0000000000000000 0000000000000000 >300002180a3: 000002a10093d7a0 genunix:proc_exit+2bc >(30006cfbc58, 1041c2c8, 30006ce3538, 30006c743d8, a, 2) >300002184c3: %l0-3: 000000000000000d 0000030006cf0000 >0000030006a9eaa0 0000000000000000 > %l4-7: 0000000000000000 0000000000000000 0000030006ce36b0 >0000000000000000 >300002188e3: 000002a10093d850 genunix:exit+8 (2, a, 48, a, 2, 0) >30000218d03: %l0-3: 0000000000000000 0000030006a9eaa0 >0000000000000200 0000 >030006ce3538 %l4-7: 000000000000000a 0000030006a9ec08 >000000000000000a 0000000000000200 >30000219123: 000002a10093d900 unix:trap_cleanup+1cc >(2a10093daa0, 1, 30006ce3 >538, 30006a9eaa0, 2a10093dba0, ffffffffc0226008) >30000219543: %l0-3: 0000000000000004 000000000000003e >0000000000000000 0000 >000000000028 %l4-7: 0000000000000011 0000000000060e00 >0000000000044968 000000000005dfa0 >30000219963: 000002a10093d9b0 unix:trap+16e8 (a9, >2a10093daf0, 800005, 10000, 2a10093dba0, 0) >30000219d83: %l0-3: 00000000ff1427f0 0000000000000000 >0000030006a9eaa0 0000000000000005 > %l4-7: 0080000500010034 0000030006ce3538 0000000000010000 >0000000000000000 >30000238220: >30000238643: syncing file systems... >30000238a63: 3 >30000238e83: 3 >300002392a3: 1 >300002396c3: done >30000239ae3: dumping to /dev/dsk/c1t1d0s1, offset 429588480 > ># echo '$c' | adb -k unix.0 vmcore.0 >physmem 9b350 >panicsys(10423d20,2a100ac5128,10053a98,78002000,30006c16018,0) + 44 >vpanic(10053a98,2a100ac5128,1,1a,8,8) + cc >panic(10053a98,31,2a100ac5490,20393735000028,0,0) + 1c >die(31,2a100ac5490,20393735000028,0,2a100ac5490,d05ea028) + a4 >trap(20393735000000,1,5,0,2a100ac5490,0) + 8b8 >sfmmu_tsb_miss(10428dc8,0,3000004df88,0,3000004df88,19) + 66c >prom_rtt(20393735000000,0,8,300051a6b70,1000c19c,0) >rm_assize(af2c,1,20393735000000,1fff,1042fe90,30002024d08) + 188 >prgetpsinfo(30002024d08,2a100ac57d0,30006c16018,2a100ac57d0,300 >06c16018,30002032d90) + 38c >pr_read_psinfo(30006c16018,2a100ac5a28,3000634a1d8,30007777390, >30007158168,30007158000) + 3c >read(0,0,2001,30001fd4708,9,1a0) + 25c _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Jun 22 23:03:59 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:34 EST