Thank you to everyone for the input you have been most helpful. It has been resolved by the client wanting the system board pulled until their new system arrives and by me seeing that you can't live by OS and software alone. Again thanks for the help and below are the responses. Plug in your laptop to the 25pin port (serial) on the back of system, run reboot, it will gives you all information that you need to tell your Client. Good luck. Hoang Verizon Global Network Inc. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ A bad system board could show up as any number of errors. Depends on whether it is an I/O board or a CPU/memory board. If it is an I/O board, I'd expect random problems with some or all of the I/O devices connected to it (ethernet controller, SCSI controllers, etc). If it is a CPU/memory board, probably random CPU panics, watchdog resets, memory allocation errors, ECC errors, etc. Of course, it could possibly show almost any random error that could be caused by the memory or processor hiccuping because of the board. What kind of errors are you seeing? Is this the only board of its type in the machine? If not, can you pull it out? Let the system run for awhile without it and see if it stabilizes. If it does, that certainly points to the board or the components on the board. If not, then it points to something else. In any case, take the board out and make sure everything is seated correctly. If it is a CPU/memory board, try and torque down the processors to make sure they have a good connection. Why doesn't the client believe SUN? Do they have a specific reason that they think they know better than the manufacturer? If the system is under maintenance, let SUN come out and replace the board and see if that solves the problem. If not, then you know they were wrong. -spp -- Stephen P Potter Columbus, Ohio, USA spp@spotter.yi.org ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Run SUNvts (validation test suite) -Val Val Popa &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Tell them SUN wants to replace the system board. Let them bitch about it, and let them tell SUN to replace everything else. Then, when they finally have to replace the system board, know that they are sitting in their beancounter offices waiting for you to come by and tell them "I told you so". You're the tech person there, why are they telling you how to diagnose hardware anyway :) --Mark +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I have had experience with Sun in which they have been wrong about the system board, so I can sympathize with you, and your customer... Having seen it from both sides. Sun, in my experience is very reluctant to recommend a system board replacement, if you are under contract, so that might be the best indicator, if your customer is under contract. If not... In our experience with the Non-ultra (read Sparc II platforms), a failure of the system board can result in system panics for unknown reasons, some even at a level that do not allow you to gather "savecore" after the panic. System board problems that we have encountered have even shown up as memory problems. I am assuming that running advanced diagnostics (diag-mode true on eeprom, and diag switch set at reset), have shown nothing (not uncommon), and that setting KADB mode and the deadman kernel switch also have not resolved the problem. When all of these fail to indicate a problem, it has been my experience that you are dealing with a system board, CPU, or internal component problem. On the UE10000, system board problems manifest themselves in strange ways also, we have seen system boards fail, and it appears to be a qfe network interface card failure. Hope this helps you. Feel free to contact me if you have any more questions, or want to discuss my opinion on a specific set of circumstances. Glenn M. Richards Senior Systems Administrator Yellow Technologies, Inc. glenn.richards@yellowcorp.com ############################################################################ I fhtey refuse to believe sun, go ahead and rebuild the server for them, BUT provide them with a written release that YOU feel that it is a bad system board, and that they are refusing the reccomended repair of the unit, and you can make no guarentees of the rebuild. odds are that you will NOt be able to convince them that it is a bad system board. You can also load sun VTS it's on the solaris 8 media, and run it to isolate any problems on the system. Geoff Reed ************************************************************************************************************* I trust you know about the "-v" switch for prtdiag. "v" for verbose. It will give you quite a bit more information: cpu temperatures, fans, power supplies, memory, etc., etc. You can cat it to a file and email it to Sun and let them diagnose it. (I don't have prtdiag loaded on my machine so I can't see what all it does or if the man pages for it are loaded. All of this is from my less-than-clear memory.) Bad system board: memory errors (permanent and trasitory), I/O problems (hard drives disappear, arrays dropped, network cards not working, etc.), CPUs not seen, incorrect time, just not working. If the CPUs are screwed down, they may need to be re-torx-ed. Loose CPUs will cause the many of the same type of problems as a bad system board. Sun has a special tool to re-torx the CPUs. I am a Sun Field Engineer with a Sun partner and I have never seen the 1-800-USA-4-SUN hardware guys NOT get it right. They are usually Sun FEs that retire to nice air conditioned call centers. And they like to be accurate. I hope you are paid by the hour. If I was, I would tell them that I agreed with Sun but would be more than happy to reload the server. (Do you want that during 8-to-5 or would you prefer after hours/overtime?) But I have been in similar situations to what you have described. When I contrasulted I told the younger/newer folks to remember what we are in it for: the paycheck! That always seemed to relieve some of the stress for me. (The last paragraph was meant to be encouraging. Being a consultant, or a contractor like me, can be tough at times.) Hope this helps, Michael Horton __________________________________________________________________ I had a bad MB on a SB1000 that caused the machine to dump core randomly. I just had Sun swap the MB. Hasn't crashed since. I've also had a couple E450's where they will just drop off the network, but when I go to check them out everything seems fine. The on-board NIC had gone bad, and a motherboard replacement was the cure. HTH, Will MIS Will Froning -----Original Message----- Subject: SUN system board Help please, I'm at my wits end. As a consultant I have replied on SUN for some diagnosis but this client doesn't believe what SUN has to say, so I need help. SUN has diagnosed a problem with a server as being a bad system board but the client doesn't think that this is the problem. I know that the prtdiag isn't going to be enough to convince them, so does anyone have any other ideas as to what I could use to show them the problem. Rather than replace the board they want to rebuild the server. What type of problems have been seen out there that were result of a bad system board? Thanks so much, its hard being a consultant some days. Dottie Weaver _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.aspReceived on Thu Aug 30 14:05:03 2001
This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:25:03 EDT