Thanks to Don Mies and Jay Lessert for their suggestions. I have narrowed down the cause of the slow down but I haven't entirely solved the problem. I'm still hoping someone might have some good information about NFS to help me out. See below for a further description of my problem. Don reminded me of the auto-negotiation problem that sometimes occurs on network interfaces (especially with Suns connected to Cisco equipment). I checked all of the NICs and ports, and everything was configured for 1000 mbps, full-duplex. Jay wrote me with this: >You don't give us any details about the file systems on the destination. > >If the destination file system is generic, default Solaris ufs with no >NVRAM cache used in the disk controllers, the operation you describe >will depend on the data: > > - If small numbers of very large files: very fast. > > - If large numbers of very small files: very, very slow. > >In this case the culprit is synchronous file creation, and you can't >even fill a 10BaseT pipe. > >One solution is fastfs (http://www.science.uva.nl/pub/solaris/fastfs.c.gz), >which would be the very fastest, but requires some knowledge and care on the >part of the admin. You would run in fast mode only for the duration of >the copy. > >Another solution is to turn on logging in the destination vfstab. > >If, on the other hand, your're already running VxFS and the controller >has 256MB of NVRAM, then I'm out of ideas. I assume you've checked out >the patch situation on both boxes already. Good information, but as it turns out my destination filesystem is VxFS and the dual controllers both have 256MB of NVRAM. A key point in my original message was at the end, where iostat -x showed that the nfs "device" was 100% busy. (The source is an nfs-mounted volume.) I checked the NFS server and it was running with "nfsd -a 64". I restarted nfsd with "nfsd -a 128" and saw an immediate, significant speed increase. I also saw iostat -x start to show the destination drives pegged at 100% busy, which is what I would expect (writes should be the slower than reads). But, after a while, iostat showed nfs hitting 100%b again, and the speed of copies dropped significantly again. I'm now running "nfsd -a 256" and I'm wondering what this should be set to. I found some Sun documentation that suggested using 16 threads for 10 mbps of bandwidth, which would be "nfsd -a 1600" for a gigabit network. That's a big jump from the original 64 though. So the questions I have at this point are: - Is there a downside to a high number of NFS threads? (Memory usage, etc.) Would "nfsd -a 1600" be reasonable or is that too high? (It is an E420R with 4 CPUs and 4 GB memory) - Is there a way to tell how many NFS threads are in use? - If 128 threads was not enough, why did take over an hour for NFS to hit 100% busy? (The nfs server is currently being used *only* for this copy operation.) Is something else going on here? Maybe idle threads are not being released quickly enough? Is there a timeout parameter that can be tuned for this? - Anyone know of some good up-to-date information on NFS performance tuning? I'll re-summarize. Thanks. :) Doug ----------------- Original message: I am trying to copy a large amount of data (about 500 GB) from a Solaris 7 server ('servera') with 3 A1000s to a Solaris 8 server ('serverb') with a Compaq storage array. I am doing this by NFS mounting the file systems from servera on serverb over a gigabit ethernet link -- both servers plugged into the same switch. I start up several cpio commands to copy the files from the nfs mounted filesystems to their destination on the Compaq array. Initially "netstat -i 60" shows over 300,000 packets per minute going across the wire (there is no other network activity, just the nfs traffic and my ssh sessions to run the cpio commands). I left it to run overnight, and this morning "netstat -i 60" is showing 30,000 - 40,000 packets per minute -- a 90% decrease. (Also, iostat -x 15 on servera showed anywhere from 6,000 - 9,000 kr/s yesterday, and this morning shows about 600 - 700 kr/s.) None of the cpio commands I started have finished. They have not stalled either, but they are going very slow now. Both servers have load averages below 0.10 (both have 4 CPUs), and top shows CPU as > 90% idle. iostat -x on both servers show %b < 10 for all devices *except* for the following on serverb: extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs2 0.9 0.0 23.5 0.0 0.0 0.9 942.1 0 46 nfs3 2.2 0.0 67.6 0.0 0.0 1.4 619.1 0 64 nfs4 18.7 0.0 559.9 0.0 8.6 12.4 1128.2 94 100 So nfs appears to be holding things up (100%b, svc_t over 1000, wait is 8.6), but why? nfs does not appear to be fully utilizing CPU, disk, or network, so what is slowing it down? Is there anything I can do to get this back up to the speeds I was seeing when it started? Will summarize, thanks in advance. Doug _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Mon Feb 11 13:55:16 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:33 EST