SUMMARY: Sunblade 1000 lockups

From: Evan Oulashin <eno_at_bandwidth.net>
Date: Wed Apr 10 2002 - 13:09:23 EDT
Thanks to all who replied regarding the lockup problem I described a
week ago.

I believe there were six replies; of those, two mentioned this type of
problem with the
Sunblade 100, which of course is a completely different platform and not
really relevant here.
One manager said they had the same problem we described with a different
memory manufacturer, 
another mentioned that they had this problem until they tried removing
the power management
package and deleting unused language support, saying that seemed to fix
the problem.

Sean Burke reported:  "The locking happens to us when a user logs out,
normally
for the weekend, so when we come back in on Mon morning, we have several
to
reboot. We've put it down to mwm and are currently testing CDE."  Sounds
like this may remotely have something to do with our experience, in that
the lockup appears to be more consistent with a machine at idle,
although he did say he had the problem with Sun memory, not aftermarket.

Long and short of it may just be that we ran into several batches of
questionable memory from this particular vendor, although they're still
not willing to admit that.  Given past experience with the Blade 1000
and this vendor's product working perfectly, I'm not sure what else to
believe, and due to the  relatively low number of comments I received,
I'm not sure any conclusions can be drawn here, which might still be a
good thing.

Thanks again to all.

-----Original Message-----
Sent: 05 April 2002 19:00
To: sunmanagers@sunmanagers.org; ericw@bandwidth.net
Subject: Sunblade dies while idle

We've had four different Sunblade 1000s through our shop recently with
varying degrees of unreliable performance, and now believe we have
proven without doubt that this relates to some kind of memory issue.

The machines were all tested with VTS for periods of time ranging from
24 to 96 hours, and all ran without a hitch.  However, once in the
field, users began reporting various failures, lockups, red state
exceptions, etc.

After much testing, we find that if we keep the machines busy with VTS
or some other activity, they rarely, if ever, fail.  However, if left to
idle at the login prompt, they will invariably die with a lockup or hang
of some kind, typically within 3 or fewer hours, sometimes within 45
minutes.  Once hung, they're unresponsive to console input, and they do
not reply to pings.

In our initial testing, we typically didn't just let them idle for long
periods of time; they usually were fired up, and had VTS running on them
within a few minutes after O/S installation.  It was only by accident
one day that we noticed that we could cause the failure to repeat simply
by letting the machine sit at the login prompt.

These failures occured with a certain brand (one of the leaders in the
aftermarket memory business) of non-Sun memory installed.  When we
remove the non-Sun memory and install Sun-labeled memory, the machines
run perfectly whether idling or busy.  Going back to non-Sun allows us
to reproduce the errors.

Over the last weekend, we ran extensive tests with 3 different sets of
this manufacturer's memory, all of which acted the same.  Two different
sets of Sun memory both were perfectly reliable when swapped in and out
between runs with the non-Sun.  We have a listing of 17 tests which
clearly demonstrates the pattern. In NO case did we have a single
failure while running Sun memory.

Has anyone else ever run across this?  The vendor of course maintains
they've never heard of any troubles like this and swear there's nothing
wrong with their memory.  We're of the mind that others must have had
similar experiences.
---
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Apr 10 13:17:22 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:40 EST