Problem resolved. Original email at bottom. Thanks to Sean Walmsley, Francisco Roque, Paul Kraus, Bryan hodgson, Tim Bradshaw, Michael Horton, J.E., Todd A. Cox, and Joseph A. Belford. Todd nailed it, and a couple of others had pieces of the same puzzle. First I took the original "failed" power supplies and put them in a spare E250 that had been set up with adequate hardware components to run. They came up fine and continued to be fine for over 24 hours. Then we took an E250 that we had been gutting for parts and removed the Power Distribution Board (The E250 Owner's Manual has simple directions). Last night, around 8pm, my boss and I both came in, took down the server, replaced the power distribution board, and brought it back up. I had set the eeprom for diag-level=max before we took it down. It came up clean without any trouble. It has remained clean through today. Previously, we had incessant complaints in /var/adm/messages about power supply 1 having failed again. The only slight bit of difficulty we had was that there are a lot of cables connecting to the power distribution board. They all have to be disconnected and reconnected correctly, and where the excess hangs has to be out of the way of things that move during hot swap. First time we tried to slide a power supply back in, part of a cable loop had gotten in the way. We had to rearrange things and then try again. Note: in looking over the internals of several E250's, mostly in the direction of 10 years old, there was no sign of any of the capacitor swelling or leakage that we routinely see in low cost PCs. --------------- Chris Hoogendyk - O__ ---- Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~~~~~~~~~ - University of Massachusetts, Amherst <hoogendyk@bio.umass.edu> --------------- Erdvs 4 -------- Original Message -------- Subject: Multiple Power Supply Replacements in Sun E250 Date: Mon, 21 Sep 2009 16:19:49 -0400 From: Chris Hoogendyk <hoogendyk@bio.umass.edu> To: Sun Managers List <sunmanagers@sunmanagers.org> I'm tossing this to the list because I'm sure there is something I'm missing here. We have a number of E250's that have been in operation for a number of years. We haven't had any trouble with any of them. A couple of years ago, we also took in 10 used E250's that were being discarded by another department on campus. We put 3 of them into operation, collecting parts from some of the others and adding new disk drives. The rest were set aside in our store room for scavenging. They've just been sitting there for a couple of years now. Now to the problem. Around the beginning of September we noticed a service light on the front of one our E250's. Turns out it was complaining that power supply 1 had faulted. That power supply showed AC in but no DC out on its indicator lights. So, we went back to our store room, pulled a power supply, and hotswapped it. Since the hotswapped supply had been in the off mode when it was put in, we had to turn the switch on the front of the E250 to diagnostic and back to run. That turned off the service light. Cool. That was Sept. 3. Then on the weekend of Sept. 12/13 there were 3 warnings in /var/adm/messages on Saturday night saying first that power supply 0 was faulting and then that power supply 1 was faulting. However, they seemed to be separated in time in some way so that it didn't take down the server. Then, on Sunday around 4pm, the server went down. The indicator lights pointed to power supply 0. My boss swapped that out. Weird. Then, same E250, started reporting power supply 1 faulted midweek the following week. We've been under an onslaught of other work, so we didn't notice it right away. Anyway, when we did notice it, I did an inventory of our stored E250's, picked the newest one based on serial numbers, that had been stored above ground level (paranoia about water leakage), and pulled its upper power supply 1, and replaced that for the "faulted 1" in our running E250. That gave us about 10 minutes of respite from the warnings. Then the warnings resumed, saying power supply 1 not ok. This just doesn't make sense. Is there something we are doing wrong? Is flipping the switch to diagnostic and back to run inadequate to really set the power supply to be in the on mode? Is there likely something more serious wrong with this E250? Should we be looking at swapping out the whole box? Have these additional power supplies just gone stale from sitting idle for a couple of years? And, can anyone give any guidance on how to authoritatively diagnose what the problem really is? This happens to be the one department that has the most trouble coming up with money for any kind of equipment updates/additions/repairs. Thanks, -- --------------- Chris Hoogendyk - O__ ---- Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~~~~~~~~~ - University of Massachusetts, Amherst <hoogendyk@bio.umass.edu> --------------- Erdvs 4 _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Sep 24 15:19:18 2009
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:14 EST