Hi Guys, Replies below. Basically, cables were ok, and it was a combination of two problems: 1) As Brad said, the controller... one of the 3 controllers on the machine was broken, and after checking everything we manage to find the issues there, isolate it, and voila! no more errors. 2) condensation on the back of the server... really odd, but it looks like at the back of the server there was some tiny little drops of water, we checked everything and nothing was leaking, nor dripping liquid... then we saw that the server is installed on a communcation rack with a solid door at the front (air-flow on the V240 is from the front to the back). We dismounted the front door, and magic! no more condensation... It seems to be working now... but this is one I won't forget easily... Cheers and thanks to everyone! Pablo.-. *Forwarded Conversation* Subject: *SCSI Bus Transition* ------------------------ * From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: sunmanagers@sunmanagers.org Date: 2 December 2007 20:07 Hi Gurus, I need some confirmation... I'm troubleshooting a remote server: SunFire V240 + D2 JB. We have 2 RAID 5 configured on the server. Everything was working fine until we moved the box to the server room into a controlled environment... now 2-3 times a week, we get the following set of errors: WARNING: /pci@1d,700000/pci@1/scsi@4 (qus0): SCSI Bus Transition WARNING: /pci@1d,700000/pci@1/scsi@4 (qus0): Received unexpected SCSI Reset Then a few hours after we get the warnings, we loose the disks, the RAIDs, everything on the external array.... My guesses here: 1- Problem with the termination of the SCSI chain - the D2 have automatic terminators, but I'm guessing some problem with them can be causing this. 2- Problems with the SCSI cables, some of the pins, or something is wrong, and they might be a bit loose, with the vibration from the storoage array the pop out, and we start getting the issues. 3- SCSI controller issues... but I don't understand how this could be the cause as the errors should be more frequent, or they should be there all the time. 4- a couple fo the disks on one of the pictures I got of the array look that they don't have the "cover" on (the tray to slot them into the array)... so they might be moving... but why all the other ones go off? The server is in a very humid environment, but we moved it into the data centre because we thought that the A/C will help to reduce the problem... but it just made it worse. Any comments, ideas, suggestions are very welcome Thanks a lot in advance, Pablo.-. -------- * From: Jeff Marble* <jrmarble@gmail.com> Reply-To: JRMarble@gmail.com To: Pablo Jejcic <pablo.jejcic@gmail.com> Date: 3 December 2007 02:19 Another common problem with the copper cables is bent pins. Check each cable end carefully for any of the small pins that might not be straight. Jeff [Quoted text hidden] > _______________________________________________ > sunmanagers mailing list > sunmanagers@sunmanagers.org > http://www.sunmanagers.org/mailman/listinfo/sunmanagers > -- Jeff Marble JRMarble@GMail.com -------- * From: Sajan* <sajhnair@yahoo.co.in> To: Pablo Jejcic < pablo.jejcic@gmail.com> Date: 3 December 2007 07:17 Hi Same thing happened to me also 2 months before. Ours too in a humid environ ment.What I did was i just opened the whole Server and reseated every thing including HDDs, RAMs and even the scsi connectors, Try, this will surely solve the poblem. Dont do it in a hurry. take ur own time and do this . Regards *Pablo Jejcic <pablo.jejcic@gmail.com>* wrote: [Quoted text hidden] _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers ------------------------------ Bring your gang together - do your thing. Start your group.<http://in.rd.yahoo.com/tagline_groups_2/*http://in.promos.yahoo.com/gr oups> -------- * From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: Sajan < sajhnair@yahoo.co.in> Date: 3 December 2007 08:35 Thanks SAjan, But the server is in Angola, and I'm in Scotland at the moment :S Cheers! Pablo.-[Quoted text hidden] -- -- ######################################### Pablo Jejcic "He sospechado alguna vez que la znica cosa sin misterio es la felicidad, porque se justifica por sm sola." Jorge Luis Borges, escritor argentino (1899-1986) Blog with me at http://hachetheboss.blogspot.com ######################################### -------- * From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: JRMarble@gmail.com Date: 3 December 2007 08:36 Thanks Jeff, will check and see... the only problem, is that the server is in Angola, and I'm in Aberdeen... :S Pablo.-[Quoted text hidden] [Quoted text hidden] -------- * From: Christopher Barnard* <cbarnar1@earthlink.net> To: Pablo Jejcic < pablo.jejcic@gmail.com> Date: 3 December 2007 14:10 Moving the server to an A/C room was most definitely the right thing. A/C both cools and dehumidifies. I do not believe there is any issue with not having the cover on the case. It is purely decorational. I would suspect the cables first, since they are easily replaced. Do you have someone who can support the machines in this remote site? I would have him or her replace all of the SCSI cables. If after a couple of weeks the problem recurs, then its time to suspect the (much harder to fix) internal bus termination. Christopher L. Barnard cbarnar1@earthlink.net ----------------------------------------------------------------------- When I was a boy, I was told that anyone could be president. Now I am beginning to believe it. -- Clarence Darrow [Quoted text hidden] > _______________________________________________ [Quoted text hidden] -------- * From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: Christopher Barnard < cbarnar1@earthlink.net> Date: 4 December 2007 23:32 Thanks! I'm getting the cables replaced this week, if that fails, I will connect the whole array to only one dual-controller, and if that fails, then I will put the server on fire and ask for a new one ;) Thanks!!! Pablo.-[Quoted text hidden] [Quoted text hidden] -------- * From: Brad Morrison* <brad.morrison@gmail.com> To: Pablo Jejcic < pablo.jejcic@gmail.com> Date: 5 December 2007 18:26 It's almost certainly the controller, if you've checked all of the other physical elements you listed. Bad SCSI controllers are 100% unpredictable. Unfortunately, they're on the mainboard for a v240. On Dec 2, 2007 2:07 PM, Pablo Jejcic <pablo.jejcic@gmail.com> wrote: > [Quoted text hidden] > _______________________________________________ > sunmanagers mailing list > sunmanagers@sunmanagers.org > http://www.sunmanagers.org/mailman/listinfo/sunmanagers > -------- _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Wed Dec 12 10:10:23 2007
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:07 EST