First off, thank all of you for your very helpful replies. Most of you just helped me prove that the problem is not at the OS level but rather at the oracle or application level (which I've always believed). However, I am still unable to identify the source of my high IOWAIT time. mpstat verifies what top is telling me, so top is not reporting false statistics as was suggested by many replies. It was also suggested that the source of my iowait may be network related, but netstat -i doesn't show anything out of the ordinary. Another reply suggested running vxstat to see the statistics for my VxFS file system. The output shows an average read time of around 5.5MS and a write time of 0.7MS --yes, less than 1ms--. So, I guess I really haven't solved anything but I have managed to narrow down the field. My next steps will be into oracle itself. Thank you all, -Aaron -------------------- Aaron Dokey - MIS Reid Tool Supply 2265 Black Creek Rd. Muskegon, MI 49444 (231) 777-3951 -------------------- Replies: Erwin Fritz [efritz@glja.com] I'd take a look at the Oracle side of things, using either the utlbstat/utlestat scripts, or the newer perfstat utility. Chances are it's Oracle itself that's the culprit, either through poorly-written queries, missing indexes, or a misconfigured instance. Kerekes, Ed [Ed_Kerekes@steris.com] Are you running Hitachi-graph track? If you are, take a look at "total I/O rate" and "% write pending". Casper Dik [Casper.Dik@Sun.COM] >However in TOP there is a TON of iowait, and there is never any free CPU >time while oracle is running: > >last pid: 5601; load averages: 0.56, 0.73, 0.73 >09:54:53 >103 processes: 99 sleeping, 4 on cpu >CPU states: 0.0% idle, 17.1% user, 9.2% kernel, 73.6% iowait, 0.0% swap >Memory: 4096M real, 2887M free, 1063M swap in use, 5716M swap free Strange question perhaps but have you tried recompiling top? If vmstat says the CPU is idle, it really is. Top gets its information from a kernel data structure and might be wrong if the binary doesn't exactly match your kernel. Casper Jeff Kennedy [jlkennedy@amcc.com] Is there any tool that will give you the fcal statistics? Sun doesn't really like jni fcal cards, even though I think they are superior to qlogic. I would start looking from the fcal out. We had a similar problem with an EDA tool; it kept going to sleep after a few minutes and would pick up again after a few more minutes. Turned out to be an nfs locking problem with the tool. My system showed no problems either but also didn't show a high wait. Maybe not realted but that's where I would start looking. The other, slight, possibility is the filesystem itself. Who configured it? Is it possible the block/stripe/depth sizes are all off? ~JK Kevin Buterbaugh [Kevin.Buterbaugh@lifeway.com] Aaron, Don't rely on top. It's not a Sun tool; they don't support it. I have personally seen it give incorrect information. Run mpstat instead and see what it says for the I/O wait. As an aside, Sun includes a top-like tool called prstat in Solaris 2.8. If mpstat doesn't agree with top, believe mpstat. Ask your app vendor to produce evidence about the "slow I/O." If mpstat agrees with top, then you'll need to do some more digging, obviously. One thing I did notice in your iostat output is that the load is not evenly spread across all the disks. While those that do show activity are not very busy, they could be "bursty," i.e. there could be brief spikes of activity which causes things to slow down, but which don't last long enough to show up in your stats. What's the interval you're running iostat at? Another thing to look at is fsflush if your databases are in filesystems (as I believe you indicate they are). You may want to increase the interval at which it runs to prevent brief bursts of activity. HTH... Kevin Buterbaugh LifeWay "Anyone can build a fast CPU. The trick is to build a fast system." - Seymour Cray Brett Lanham [blanham@cleartrack.com] I am sorry that I do not have the answer for you but I wanted to make sure you followed up with a summary of what you learned from the list or could pass on what you found out directly to me. I have seen somewhat the same thing you are seeing. I am running Oracle 8.1.6 on Solaris 8 and our database resides partly on the local drives and partly on external storage (some emc san storage device) connected via FC adapter. I have seen a lot of CPU time consumed by oracle and also top reports a fair amount of iowait. I have spent quite a bit of time looking into it but I am not extremely experienced with this type of thing. I eventually passed it on the my DBA and asked him to fix his queries. :-) BTW i've got version 3.5beta12 of top. What version are you running? Brett Greg Gallagher [ggallag@foc.com] Hi Aaron, yeah, that seems a little funny. I had a similar problem a few months ago, and it turned out that we were causing a large amount of I/O but with very small amounts of data (i.e. several hundred I/O's a second on a particular device but with just 1k on each I/O. Turned out to be a developer flippiantly running a flush() everytime they wrote a line to a log file). Anywho, the only thing I see in your case is that NFS should be looked into. The service time is just a little high. Check out the NFS/NIS Tuning guide and look into nfsstat numbers. Also, since you're using VxFS, you may want to look at the stats that way. For example: lancelot:/root/burns# vxstat OPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE vol opt 957886 982686 37495760 7304136 5.7 12.1 vol rootvol 98621 180169 2879102 297291 6.6 13.2 vol swapvol 39605 12115 633680 3086432 9.9 83.4 vol usr 477696 516392 14468454 822445 6.1 16.5 vol var 404089 1462950 20596707 16080130 5.7 8.0 Hope this helps! cheers, Rakthet, Jay [Jay.Rakthet@caltech.edu] Aaron, You have an interesting problem. I suggest you look at your network bandwidth with 'netstat' that could be a source of IO wait. Let me know if you figure it out. Jay Rakthet Unix Systems Administrator Administrative Technology Center, Caltech 626-395-3518 jay.rakthet@caltech.edu Tim Chipman [chipman@ecopiabio.com] Sounds like oracle is thrashing - ie - tons of I/O for your oracle processes hitting the disks. (I'm assuming you have no other services running on this box that would be generating I/O ?) I'll be very interested to hear in your summary if other people believe this to be true. Certainly I've noted similar behaviour on our Oracle Server here - an e450 with 2 gigs ram, 4 x 400mhz CPUs, an a1000 as the direct-attached storage. Typically when people complain of slow response time for orcale, top (or iostat) indicates iowaits > 50% even though oracle processes are never absurdly high. I get the feeling that "top" process load reporting is ignoring IO-waits generated by a given process, ie, treating them as a separate issue from actual CPU loading reported for that process (?) Clearly I'm not absoltutely positive here though (hence my interest in the pending summary :-) --Tim Chipman Dave Weis [djweis@sjdjweis.com] One place you can look for more Oracle information is here: http://www.tusc.com/oracle/books/overbook.html The Oracle Performance Tuning books has lots of great stuff in it. dave _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Wed Mar 27 10:55:35 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:38 EST