Hello Sun Managers, We received additional info from Sun, which I'd like to pass along. When you offline (zpool offline poolName device) a device, ZFS still "tracks" that device. IN our case, the device disappeared then reappeared (we took the array down for maintenance), causing ZFS to have CKSUM errors once then device is onlined again in ZFS (zpool online). The recommended action for future work (where the array will be offlined), is to detach the legs of our ZFS mirrors, then offline the device in ZFS. I'm not sure if you can, or if there would be a point to, offlining since the device would be detached from the mirror. Detach with something like: zpool status # Take record of which devices mirrors or made of for later zpool detach poolName device # Offline the array, once it's back online: zpool attach poolName existingDevice device # This is why you need zpool status output We fixed the climbing CKSUM errors, by detaching then re-attaching those legs of our mirrors. I'd like to get more definitive info on when zpool offline / online is appropriate, and why the CSUM errors kept climbing after we onlined the devices. This is something we'll probably further experiment with, and keep asking Sun about. Thanks, Ivan. On Wed, 7 Jan 2009, Ivan Fetch wrote: > Hello Sun Managers, > > > We've been working on a weird ZFS issue, and not getting very far with > Sun. > > We needed to relocate a storage array, so "zpool offlined" the second half > of mirrors on multiple machines. Once the array was back online, and we > verified the LUNs were seen in the OS, we did "zpool online" for each of the > previously offlined LUNs. > > The first LUN took about 35 minutes to resilver, and the mirror was fine; > no errors in "zpool status." Subsequent mirrors reported resilver completed > in a matter of seconds, and we got quite a few CKSUM errors (in one case, a > few thousand in 12 hours), but no read or write errors. > > We're experiencing this idential issue on three boxes so far, a couple of > them are: > > 5.10 Generic_127127-11 sun4v sparc SUNW,SPARC-Enterprise-T2000 > > 5.10 Generic_127111-06 sun4v sparc SUNW,Sun-Fire-T200 > > > Sun's answer is to "Just upgrade the kernel, a lot of ZFS bugs have been > fixed, but only upgrade to 137137-06 as later kernels will introduce other > ZFS issues." > > We ended up detaching, then re-attaching the second leg of the mirrors, > and all of them resilvered and do not have CKSUM errors. We will probably end > up doing this on our remaining ZFS boxes but would like to match our symptoms > with a particular bug / resolution / patch, and have more complete answers. > > I've found a few simelar cases on the ZFS Discuss list, but no resolutions > there. > > > Has anyone else run into this issue? > > > Thanks, > > Ivan. > > > --- > Ivan Fetch > University of Denver > Computer Operations, University Technology Services > 303-871-3092 _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Jan 15 10:46:55 2009
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:13 EST