We had a drive fail in our StorEdge A1000 (running RAID 5), and my question was, how do I replace it? Thanks to Julie Firmin, who told me that it was a simple as pulling the failed drive and replacing it with the new one. Thanks to Tony Walsh, who told me the same thing as Julie, but went into excellent detail on some of the problems that might be encountered. His advice that helped me the most was: "remove the 1,1 failed drive from the array, wait 10-15 seconds, and then replace with the new drive. You should then wait approx 30 seconds before you physically do anything else with the array (ie. don't remove the new drive before it has had a chance to spin up and be integrated into the array). You need to wait this long for the "dacstore" area on the drive to be updated with the existing array configuration. You need to do this operation with power still applied to the array so that the current DAC information is applied to the new drive." He also explained how I could do this using the RAID Manager 6 GUI, but I do not believe in X11 on servers, so that was not an option for me. (In case someone out there is reading this message in the list archives and would prefer to use the GUI, the long and short of it is, start the GUI and use the Recovery option, which walks you through the process, Wizard-style.) After the replacement, he continues: "As a result of either of these actions, you should see the LEDs for the drives in 2,5 (my hot spare) and 1,1 (my failed drive) flashing fairly constantly for some time (2-3 hours or longer is quite possible for a 36GB drive). The process happening at this point is the hot spare is being released by the process of copying all the data on 2,5 back to 1,1. (FYI You could still have lost one more drive in this configuration without losing any data as the RAID 5 layout will run in a degraded mode without having a hot spare to swap to and the data will remain good)." Finally, he suggests applying some patches (which I had already done prior to this issue): "As a further recommendation (after you have fixed this problem), I would advise you to upgrade you RM6 version to 6.22.1 with the appropriate patch 112126-05 (for Solaris 8 or 9) or 112125--04 (for Solaris 2.6 or 7). When you do this, make sure you perform the firmware flash upgrade and the NVSRAM upgrade on the array as soon as you can (Use the RM6 gui for the best results). The NVSRAM upgrade file is called "sie3240c.dl" and should be found in /usr/lib/osa/fw/ after RM6.22.1 has been installed." With Tony and Julie's great advise, the replacement went off without a hitch. Thanks, guys! My original message follows: ----------------------------------------------------------------------------- -- Our StorEdge A1000 recently lost a drive. Luckily, we had set it up to use a hot spare, and the spare took over, allowing the RAID to stay up and functioning. Sun is sending a replacement drive, which should be here in a day or two. My question is, when said drive arrives, what is involved with replacing it? The A1000 has hot-swappable SCSI drives, so we can definitely physically replace the bad drive while the array is up. From what I am reading, once the new drive is in there, we just need to unfail the drive (unless that is automatic?), and the array should reconstruct the data. We are using RAID Manager 6.22, and here is the output of drivutil -i fd026_00: Drive Information for fd026_002 Location Capacity Status Vendor Product Firmware (MB) ID Version [1,0] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [2,0] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [1,1] 34732 Failed FUJITSU MAN3367M SUN36G 1502 [2,1] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [1,2] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [2,2] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [1,3] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [2,3] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [1,4] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [2,4] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [1,5] 34732 Optimal FUJITSU MAN3367M SUN36G 1502 [2,5] 34732 Spare[1,1] FUJITSU MAN3367M SUN36G 1502 drivutil succeeded! If I am reading the man page right, all we should have to do after replacing the failed drive ([1,1]) is to run the command: drivutil -U 11 fd026_002 This should, according to the man page, unfail the drive and reconstruct the data (we are running this as RAID 5). If anyone has done this before, I would appreciate some feedback. Please do tell me if I need to take the array offline, backup data, anything like that. -- Josh Glover <jmglov@incogen.com> Associate Systems Administrator INCOGEN, Inc. http://www.incogen.com/ GPG keyID 0x62386967 (7479 1A7A 46E6 041D 67AE 2546 A867 DBB1 6238 6967) gpg --keyserver pgp.mit.edu --recv-keys 62386967 [demime 0.99c.7 removed an attachment of type application/pgp-signature] _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri Oct 18 14:44:52 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:56 EST