Sorry for the late summary.Thanks to Michael, Joe, Grzegorz, John, David. To find the DIMM you replaced, a)Interesting. I 1. conncect Unix of WinBox to ttya (on Sun make a tip hardwire, this connects from ttyb to your server, where you have cabled to ttya 2. Halt system 3. set OBProm values (this varies from system to system): diag-switch to on, diag-level to max, mfg-mode to true or on, and then power off server completely. 4. Power on 5. record output, this takes about 2-5 minutes. b)This I realized when I was shipping back my swapped out DIMM. If you happen to have the packing material they sent you, it has the SerialNumber on the shipping label and also on the Anti-static cover. Compare this against the SN on DIMM. My observation is that SN=PartNumber"*". Ex 501503078B3F212. Here PN=5015030 I used above to altleast be sure that old DIMM was replaced. However enlightened by forum members, I am sure that its the fault of the system. I will change the Uniboard and memory if I start getting panics/large number errors per day. For now I dont want to mess around the system. I am adding this emails so that it would be helpful to others. Michael: if you replaced the memory module that was in the dmsg, etc., you only replaced the memory module that was reporting the error. the actual error could come from any memory module in that specific memory bank.the sun systems handbook has a diagram of the systemboard with the slots numbers.of course, other parts (like the systemboard could be failing-though not likely). I AM SURPRISED THAT SUN DID NOT ASK OFFER TO REPLACE ALL FOUR. Joe: Get SUN to replace the system board. The contruction of many of the uniboards was not all it should have been. I've lost count of the one's I've replaced in V880s. SUN's ultraIII kit has a history of memory and CPU problems. There was an advisory issued last year which actually admitted to the manufacturing problems. I'll forward it to you if I can find it. If you look in the archives you'll find an old post of mine where I did a survey of people with V880s. 80% failure rate based on the repsonses I received. SUN's standard response to anythhing like this is patch/update firmware then send us an explorer. Send them the explorer by all means but just insist you want the board replaced. If they've already replaced the supposedly bad DIMM and the error persists then the problem is either with the entire bank or the board. Grzegorz, If this happens with no side effect this means that you have non-fatal memory error i.e. one bit error (intermitent or persistent). The memory is ECC so system can correct the error. The only pain could be if you had massive such errors so corrections would be too often. SUN has own "standards" about that (some 30 ecc errors during 6 hour period or so). The periodic messages are rather result of intrnal watchdog checking for memory errors ... Depending on unknown (for me) alghoritm it periodically (eg each 12 hours) looks into offending memory banks. I have such behaviour also :-) well, maybe this is the same problem I had half year ago ... I have panics & reboots because of memory errors. SUN engineer replaced few offending memory modules according to logs in messages file and also according to extensive poweron tests (OBP). After week or so I got again the same problems, also pointing to just replaced memory modules. Then SUN engineer replaced all (!!) memory modules on the cpu/memory board. He told me that a long serie of memory modules for V880 servers produced in 2002 (or maybe little earlier or later) had inconsistent chip on memory module which is responsible (this chip) for reporting errors (he called it "philips chip" if I remember correctly) This inconsistency results that the error on one module may appear (be reported) by other module ... That was the case for my V880 produced in June 2002 (we got it in August 2002). After replacement of all memory modules panics & reboots went away :-) SUN has internal problem report (PR) on this . One can check if it is a problem looking on memory module and especially on the name & numbers on the chip ... What is interesting that the problem appears only in case of V480/880. The memory modules put in other servers behaves normally... Hope this helps. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Hello Gurus, > > > As I was getting persistent errors on of > > the > > > DIMM's for a while, last week I replaced on the > > DIMMs > > > on our v880 server. DIMM was shipped by Sun. After > > a > > > week, I am getting errors again on the same DIMM. > > This > > > makes me worry "Did I put it in the wrong place?". > > The > > > only thing to reduce my grief is that the error > > moved > > > to a different bit. Earlier, I got the error > > > consistently on bit 5, now it moved to bit 35. I > > am > > > sure I put it in the rigth CPU slot. The most > > probable > > > mistake I could have made is put the DIMM in a > > DIMM > > > slot above/below. If I had a chance( which is not > > > likely to happen) to open box, is there any > > number on > > > the new DIMM I can look for? I wish I had put some > > > kind of mark on the new DIMM. I feel I am being > > too > > > imaginative. > > > > > > I will summarize. > > > > > > Thank you, > > > J. _-------------------------------------------------------------------------------------------------------------------------------------- Thank you, J. --------------------------------- Yahoo! Mail Stay connected, organized, and protected. Take the tour _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri May 13 09:51:21 2005
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:46 EST