Re: 3/50 freezing - Summary

From: C.Eagle@massey.ac.nz
Date: Tue Apr 02 1991 - 21:10:04 CST


Thanks to the people who replied to my query. The problem *seems* to have
gone away now, but I have kept the replies in case it recurs. (I dont know
what the problem was - maybe it just needed a break!).
Time to lurch onto the next crisis...

Here is a summary of the useful replies, for those of you who asked for it.
It sounds like other Suns can have this problem as well.

****************

From: anonymous@somewhere

As I said before, you should panic the machine and get a crash dump to
look at- it might not tell you much but then again it may tell you
where things were wedged.

From: dal@gcm.com (Dan Lorenzini)

It could be that your portmapper is dying. Have you installed patch
100034-02? If not, you might try that.

From: David.Maynard@CS.CMU.EDU

I'm afraid that I can't help other than by suggesting a common 3/50
failure mode. I've had several 3/50's die when their power supplies
became marginal. This usually manifests itself as unexplained reboots,
but could cause a hang (especially if a different component were to
fail).

From: tots!tots.Logicon.COM!louis@UCSD.EDU

Okay, you asked for "Any ideas?", so you get this:

I have a diskless 3/50 on which I can simulate your symptoms at will
by disconnecting the transceiver cable. If it's off long enough,
i.e., a few minutes, the freeze occurs. I see this because we've been
messing around with our net topology a bit, and the occasional
disconnect is useful to do.

Can you look (say, with etherfind) at traffic to/from your machine and
it's server(s) from another (preferably third) host?

From: sundev!ronin!kevin@Sun.COM (Kevin Sheehan {Consulting Poster Child})

Have your FE check to make sure you have an upgraded CPU - there was a
problem with early 3/50 CPUs. Basically, it couldn't handle a page
fault that straddled two pages...

From: david@srv.PacBell.COM (David St. Pierre)

are you by any chance using automounter and mounting /usr/share via that?
there were some known problems with trying to do that - /usr/share/lib/zoneinfo
is used in the kernel and if the mount times out you're lost.

i had this problem a *lot* until i found out about this. you can static NFS-mount
/usr/share if you want.

From: Rick Niziak <ontologic!gremlin!rickn@uu.psi.com>

To have it run at all and do any decent work ... you need at LEAST 8M ram

From: jerry@asc.UUCP (Jerry Stachowski)

   While this is not likely, you might check to see if a pc (or something else)
is occasionally sending a message with the same IP address as your 3/50. (I
had this happen to me, a few weeks ago.)

From: cadence!esanborn@uunet.UU.NET (Ed Sanborn)

Just a thought. Check on your transceiver connection. Make sure you
have SQE turned off.

From: sroth@eastend.jpr.com (Steven Roth)

If it's diskless, I'd tell you to look at its file system(s) on the server to
see if any key areas are full. I'd suspect the server, not the client. The first thing I'd look for is to see if the file system with the 3/50's / and or /usr are filling up. Also check out the load on the server.

Note that there is a major NFS patch from sun for 4.*.* that may also solve your problem.

From: sroth@eastend.jpr.com (Steven Roth)

I'd guess that it's one/combination of the following (now that I received your second message):

1. Too little memory or swap space for what you're trying to do. Memory is cheap these days. Buy another 4 MB and increase your swap space accordingly.

2. Get and apply the major NFS patch. As this would have to be done on your servers and clients, it's no small task. But it should be done.

A frozen is never accessible from other nodes. So that shouldn't be a surprise. It's not a function of NFS -- it's just that the machine is totally locked up.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:12 CDT