Thanks to all who replied, including: Scott Lawson, Tim Bradshaw, Stefan Varga, Sandesh Kubde, 'hike', Bryan Bahnmiller and Robert M. Martel We were successful in getting the disk swapped without having to reboot, or panicking the box. Some suggested I need a reboot to fix it, which the customer was having none of. Mr. Lawson suggested we use cfgadm. The drives WWN did show up in a 'cfgadm -al'. But nowhere in the server documentation did it say to use cfgadm on any of the FC-AL disks. One of my colleagues thought that yesterday when we were preparing for the change, and we considered it for a bit last night, but since this was such an important mission-critical and eggs-all-in-one-basket server, I opted to go strictly by the book, in case of catastrophe, where I could claim I was going by the book. A couple of you offered that since the disk is dead to luxadm, then you can just pull it. It would be interesting to try these things, though, just to see if it works. Unfortunately, my lab is customers' production boxes, so opportunity to experiment is limited. We determined that, as others suggested, the disk was too far gone for luxadm to communicate with it. When we executed 'luxadm remove_device <devicename>' (here <devicename> is /dev/rdsk/c1t0d0s2), luxadm couldn't check the status of the drive, so the procedure failed after issuing the first line to 'Make sure the filesystems were backed up....', and then it would fail out, posting a SCSI error. We studied the steps of 'remove_device' and determined that luxadm roughly removed the device from the device tree, offlines it, and possibly even powers the device down. After executing 'luxadm -e offline <devicename>', we verified the disk didn't show in 'luxadm inq c?t?d?s2' or format. We then executed devfsadm -C to clear the devices from the /dev device list. After that, I had the DC Engineer go check to see if the light on the drive was out. It wasn't, but it was burning solidly, whereas the light on the other disk was showing activity. Since the system otherwise didn't know about the disk, I crossed my fingers and had the Engineer swap the disk. I monitored the system via console and noted that picld saw the drive pulled and re-inserted into the system. I then verified the disk showed up in format, and executed devfsadm -C to rebuild the /dev device list. From then on, it was the usual Disk Suite disk replacement process. Mr. Martel offered these steps for a failed disk on a A5200 array: "I had this problem with a Sun A5200 array - disk too far gone for luxadm to talk to it. The procedure Sun gave me had me bypassing the ports on the failed disk using the front panel controls - I don't know the 280R, but I'd guess you don't have such controls available." "What Happened after I followed Sun's special procedure to replace the failed disk was that was the new disk was not accessible. I then ran luxadm remove_device, popped the disk out when prompted, and ran luxadm insert_device and re-installed the replacement disk. From then on all was normal again." Unfortunately, I couldn't talk to Sun, as the status of the maintenance contract on this system is being investigated. Even then, most of the support we have is Gold, and this was way outside of Gold support time. This new disk may be T&M. Thanks to all who responded, it is nice to know at least people are out there listening and offering help when you are stressed out, sitting at the keyboard all alone in the middle of the night trying to keep the machine from falling over. Regards, Gene Beaird Pearland, Texas -----Original Message----- From: Gene Beaird [mailto:bgbeaird@sbcglobal.net] Sent: Wednesday, July 09, 2008 10:34 PM To: 'sunmanagers@sunmanagers.org' Subject: luxadm remove_device SCSI failed error on Sunfire 280R I have a failed disk0 on a SunFire 280R. It is part of a mirrored pair, mirrored with Disk Suite. I have broken the mirror, and metacleared the devices. According to the SunFire 280R Service manual and Owners manual, I am supposed to remove the bad disk from the system using luxadm remove_device command before I physically swap the drive out. When I execute luxadm remove_device /dev/rdsk/c1t0d0s2, I get: Error: SCSI failure. - /dev/rdsk/c1t0d0s2 Which is the same message I get for that disk when I execute luxadm inq /dev/rdsk/c?t?d?s2. I don't see a WWN in luxadm for that device. What's wrong and how do I get this fixed? Thank you all. Regards, Gene Beaird, CISSP, Unix Support Engineer, Pearland, Texas _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Jul 10 12:53:20 2008
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:11 EST