[SUMMARY] UltraSparcII Ecache parity errors ["CBI event on CPU1" / "*Bad* PSYND=0x0004"]

From: David Foster <foster_at_dim.ucsd.edu>
Date: Wed Nov 20 2002 - 20:13:55 EST
My apologies, the Manager's List archives were down so I couldn't
tell that there are many posts about this.

This is an Ecache parity error on the CPU, a known problem with
the UltraII cpu's. Can happen when the cpu is under heavy load,
extremely intermittently, but if it happens multiple times then 
Sun will replace the cpu under contract support. Just heard from a
Sun engineer that "best practices" is to wait for 3 occurances.
It's happened once; they recommended upgrading to the latest kernel
(108528-17 for Solaris 8) and see if it presents itself again.
Apparently rev -16 included some fixes to prevent spurious cpu
errors.

Apparently this usually hits cpu's with 8 meg cache, but sometimes
4 meg as well.

Rant (source anonymous)

   It never ceases to amaze me how well SUN kept the UltraII design 
   problems quiet. In effect virtually a whole years 
   production of chips was broken. A shortcut in the design 
   (using parity instead of ECC on the cache) meant that 
   thousands of these things had to be replaced. Never 
   quite made the news though and how loud did they 
   shout about the first Pentium being unable to add up.

Thanks to:

steven.ruby
Ryan Bishop
Will Enestvedt
rene_casalme
Tim Chipman
joe.fletcher

> 
> Can anyone help with this, it doesn't look good...
> 
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 672871 kern.info] NOTICE: 
> [AFT2] errID 0x000644be.021b33e1 CBI event on CPU1
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 192776 kern.info] [AFT2] 
errID 
> 0x000644be.021b33e1 PA=0x00000000.00565000
> Nov 18 17:31:44 cressida     E$tag 0x00000000.0e40000a E$State: Shared 
E$parity 
> 0x07 
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data 
> (0x00): 0x00000000.00000000
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] 
E$Data 
> (0x08): 0x00000000.00080000 *Bad* PSYND=0x0004
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data 
> (0x10): 0x00000000.00000000
> 
> Dave
> 
> 

   << All opinions expressed are mine, not the University's >>

  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
   David Foster    National Center for Microscopy and Imaging Research
    Programmer/Analyst     University of California, San Diego
    dfoster@ucsd.edu       Department of Neuroscience, Mail 0608
    (858) 534-7968         http://ncmir.ucsd.edu/
  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

   "The reasonable man adapts himself to the world; the unreasonable one
   persists in trying to adapt the world to himself.  Therefore, all progress
   depends on the unreasonable."   -- George Bernard Shaw
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Nov 20 20:16:47 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:58 EST