Greetings All,
Well, since it's Friday afternoon I thought I'd go ahead and send a
summary on this post even though I don't (and may never) have the problem
solved. If, after reading this, anyone has any more ideas, I'd like to
hear them.
I did receive replies from the following people: Eugene Kramer, Eric
Maes, Harlan Braude, Marty Backe, and David Schwarze. Thank you,
gentlemen, for your attempts to help.
I've included their suggestions below, as they were all excellent
ideas. We'd checked most of the obvious ones already. By that I mean that
I've checked all the Sun related things I could find; hardware, tuneables,
etc., read Adrian Cockcroft's book cover to cover (as well as Brian
Wong's), been to Sun's performance and tuning class (both old and new
versions). My Sybase DBA has checked Sybase tuneables, indexing, etc.. I
suspect what this is going to boil down to is something within the
application (it is homegrown for the most part).
Harlan provided the most detailed reply, which I forwarded to my
Sybase DBA. He said he'd looked into those things already. I'll either
have to take his word for that, or convince my manager to let me go to
Sybase's administration and performance and tuning classes!
The main thing I still don't understand is why the first server is
doing all the kaio's with AIOWAIT flag. If anybody can explain that one,
I'll post another summary and thank you in all capital letters! Both these
servers run Sybase ASE 12.0, both have the databases in raw partitions with
Sybase handling the mirroring, and both show no signs of an I/O bottleneck
that I can see. The main difference between the 2 boxes is that the first
server is on Solaris 2.6 and the second on Solaris 2.7. We've even thought
of re-installing 2.6 on the second server (it is a test box) just to see if
we could reproduce the problem!
Also, most suggestions for solving the perceived performance problem
on the Ultra 5 suggested moving the database to SCSI disks. I neglected to
mention that while the OS is on the internal IDE drive, all of Sybase is on
UltraSCSI disks (raw partitions).
Anyway, here's the replies I received...
---------------
It's been some time since I worked with Sybase. I recall that in version 11
there were tunable parameters, which said how many simultaneous async IOs
you can run. You might want to play with that to see if you can reduce kaio
wait calls.
As to differences between U1 & U5 - one has SCSI drives and the other -
IDE,
which makes it way slower. I actually saw Sparcstation 5 beating U5 on some
disk heavy activities.
0.02
Eugene.
-----------------
I think you stand a better chance of getting some useful replies
from the Sybase list at
SYBASE-L@LISTSERV.UCSB.EDU
As far as problem 1 is concerned, I believe the fact that Sybase does
'busy waits' when it has nothing else to do may be part of the answer.
Cheers - Eric
Eric Maes EDS
Noorderlaan 147 email : emm@ti.emea.eds.com
B-2030 Antwerp Tel : +32 3 544 6328
BELGIUM FAX : +32 3 544 6123
-------------------
I'm new to the Solaris arena, but I have quite a bit of experience with
Sybase on HP servers. As you noticed, Sybase depends heavily on the async
I/O driver to manage its resources. The fact is that Sybase manages both
it's memory pool (from the shared memory chunk it gobbles up when it starts
up) and disk writes.
It may sound counter-intuitive that the OS isn't directly involved in disk
I/O activities, but that's the whole point of the async I/O. The way I
understand it, async I/O allows the DB server to continue on to the next
disk I/O request without waiting for the OS to acknowledge that the
physical
write has occurred. That's a significant time-saver, though it has
implications on recovery from a crash, etc. which Sybase deals with using
the transaction logs.
You'll find that the standard sar and other system monitoring tools provide
misleading information about what Sybase is up to (though you can't beat
truss to tell you the nitty-gritty details - I wish I had had that on the
HP
platform.)
Probably the best you can do is to allocate as much shared memory to Sybase
as you can afford based on the size of your system. Fast disk arrays with
large cache pools will give you significant performance boosts (despite my
earlier comments on how the async driver works, having large chunks of disk
data stored in the array cache in anticipation of future reads can't be
beat.) In our case, we had enough money and a DB large enough to justify an
EMC (more recently replaced by a similar product from Hitachi)
On another front, your application folks have to scrutinize their code
(stored procedures, sql scripts, etc) to make them more efficient. For
example, the quick and dirty coding practice of generated multiple queries
with the 'sql' tool in shell scripts or what have you must be modified to
do
large reads from the DB within fewer calls to sql. Each call to sql
requires
that new virtual connections to be established with all the authentication
processing that goes along with that. Then there's the housekeeping that
occurs when the sql completes. You can imagine the impact of a script that
does an sql call for each name in a database of a million customer names,
etc.)
Also, your DBA should work with your application developers to avoid
situations where data needed by every function is contained within one or
two tables. If you have many simultaneous requests to update one table,
you'll have a lot of processes waiting around for the resource. Have the
DBA
monitor for deadly embraces. I think the newer versions of Sybase support
row-level locking, which, if used correctly, can reduce the impact of a lot
of these issues.
Your DBA should also run "update statistics" on critical tables (especially
the ones with many rows) on a regular basis (e.g., daily.)
Another thing you may need to consider regarding the table/row contention
issue is setting a separate database server for some processing. For
example, if you run reports for the business managers against the same
database as your application is running, you may see some benefit if you
could have the reports run against a copy of the "primary" database. The
sticky issue is how to make that copy and how in-sync with the "primary"
must that second DB be (most mgmt reports don't need up to the minute data.
They're more interested in trending, revenue and customer counts and such
things can be a few days old and still be meaningful.) Depending on what
your second DB is doing will determine if you can just do daily dump/load
or
need something more sophisticated like Replication Server.
Anyway, that's all I can think of for now. If you're still stuck you can
always open a case with Sybase. They should have someone who can give you
some pointers.
Good luck
hARLAN bRAUDE
Audible
65 Willowbrook Blvd
Wayne, NJ 07470
------------------------
random thoughts...
o Did you see if you happened to have a high number of minor faults? I
guess, this would bump up system time.
o here is a link for tuning async i/os in Sybase on AIX:
http://www.sybase.com/detail/1,3151,205909,00.html
o Oracle Async i/o discussion in Solaris invironment:
http://gsite.www.ecn.purdue.edu/Gsite/oracle_doc/platform/doc/SunTune/suntun
e.htm#index0002
o development machines tend to loose indices in the database, so the
queries
will be running much longer.
Eugene.
----------------------
Sorry, I don't have time to dig into your problem personally, but for your
future reference I can highly recommend these two books:
Sun Performance and Tuning, by Adrian Cockcroft / Richard Pettit
Solaris 7 Performance Adminstration Tools, by Frank Cervone
Both of these books are generic for all versions of Solaris, so don't let
the 'Solaris 7' keep you away from.
Good luck,
Marty Backe
----------------------
I asked a guy here working with me who is good with sybase. His comments
were as follows:
1) Don't mirror with Sybase - it's slow and possibly trouble-prone. (my
comment: Why didn't you just mirror everything with DiskSuite?)
2) He thinks your cpu utilization might be high from I/O caused by the
sybase mirroring. He suggests turning mirroring off on the test box and
see
if the CPU utilization drops down.
3) What is happening in the database when the CPU utilization is high?
-David
----------------------
Kevin Buterbaugh
LifeWay
"Failure is not an option ... it's a standard feature of every Microsoft
product."
---------------------- Forwarded by Kevin Buterbaugh/Nashville/BSSBNOTES on
09/15/2000 03:57 PM ---------------------------
Sent by: owner-sun-managers@sunmanagers.ececs.uc.edu
To: sun-managers@sunmanagers.ececs.uc.edu
cc:
Subject: Performance issues
Greetings All,
I have been subscribed to this list for several months now and have
answered several questions, but this is my first post. I have two Sybase
servers that I am having some performance issues with that I am unable to
understand. This post is going to be somewhat lengthy, for which I
apologize...
PROBLEM 1:
This system is an Ultra 1 (1 x 167 Mhz CPU) with 1 GB RAM running
Solaris 2.6 and Sybase Adaptive Server Enterprise 12.0. Sybase databases
(there's actually only one main database) are in raw partitions, with
mirroring done by Sybase. The root disk is mirrored via DiskSuite. The
system is up to date (as of this past weekend) on all recommended /
security / Y2K patches. I have allocated 750 MB of shared memory for
Sybase via /etc/system. My Sybase DBA tells me that he is actually only
using 600 MB of that at this point in time.
No one is complaining about the performance of this system. However,
whenever I do a "sar -u," I notice that system time is almost always
significantly more than user time. Note that wait for I/O time is next to
nothing. I am trying to figure out why system time is higher than user
time.
SunOS demetrius 5.6 Generic_105181-23 sun4u 09/13/00
08:00:02 %usr %sys %wio %idle
08:20:00 10 40 3 47
08:40:01 17 44 1 38
09:00:01 18 45 1 36
09:20:00 19 38 0 42
09:40:00 25 46 0 28
10:00:01 20 43 1 37
10:20:01 21 49 1 29
10:40:00 19 46 1 34
11:00:01 27 55 1 17
11:20:00 25 49 0 26
11:40:00 21 41 1 38
12:00:00 23 46 0 31
12:20:00 21 42 1 37
12:40:00 18 42 6 34
13:00:01 44 55 0 1
13:20:00 46 54 0 0
13:40:01 19 44 0 37
14:00:01 19 45 1 35
14:20:00 23 50 1 27
14:40:00 21 48 0 31
15:00:01 25 47 1 28
15:20:00 29 46 0 25
15:40:00 19 43 0 38
16:00:01 41 33 0 26
16:20:00 21 46 0 33
16:40:01 16 46 1 38
17:00:01 17 38 0 44
17:20:00 14 34 0 51
17:40:00 16 27 1 57
18:00:01 13 28 1 58
19:00:02 16 26 0 57
20:00:01 22 24 1 54
21:00:01 11 29 1 59
22:00:01 12 35 0 53
23:00:01 24 30 0 46
Average 20 39 1 40
Here's the output of "sar -g," which shows that the system is not
short on memory, as well as the average for the day from the output of "sar
-d," which doesn't show any disks with high service times or high percent
busy times (I don't want to include the full output from sar -d, but trust
me that there is never a point during the day when the service times go
above 50 ms on any disk or the percent busy goes over 25%):
SunOS demetrius 5.6 Generic_105181-23 sun4u 09/13/00
08:00:02 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
08:20:00 21.31 111.84 113.31 1.48 0.00
08:40:01 0.26 0.28 0.72 0.39 0.00
09:00:01 0.23 0.25 1.00 0.72 0.00
09:20:00 0.28 0.37 1.26 1.39 0.00
09:40:00 0.47 0.55 1.26 2.38 0.00
10:00:01 0.27 0.34 1.09 3.17 0.00
10:20:01 0.44 0.55 2.18 2.24 0.00
10:40:00 0.10 0.11 0.90 0.94 0.00
11:00:01 0.56 0.60 1.04 0.64 0.00
11:20:00 0.28 0.41 1.16 1.03 0.00
11:40:00 0.23 0.26 0.99 0.90 0.00
12:00:00 0.11 0.12 0.36 0.31 0.00
12:20:00 0.37 0.53 1.21 0.84 0.00
12:40:00 1.80 1.81 2.22 0.51 0.00
13:00:01 0.05 0.06 0.06 0.00 0.00
13:20:00 0.53 1.74 2.38 0.72 0.00
13:40:01 0.11 0.12 0.69 0.61 0.00
14:00:01 0.23 0.61 1.21 0.67 0.00
14:20:00 0.23 0.52 1.66 2.11 0.00
14:40:00 0.06 0.07 0.11 0.07 0.00
15:00:01 0.37 0.39 0.63 0.45 0.00
15:20:00 0.23 0.24 0.95 1.31 0.00
15:40:00 0.01 0.01 0.01 0.00 0.00
16:00:01 0.04 0.05 0.17 0.25 0.00
16:20:00 0.16 0.25 0.34 0.16 0.00
16:40:01 0.19 0.19 1.20 1.93 0.00
17:00:01 0.02 0.02 0.02 0.00 0.00
17:20:00 0.01 0.02 0.02 0.00 0.00
17:40:00 0.17 0.18 0.73 1.02 0.00
18:00:01 0.23 0.24 0.83 1.17 0.00
19:00:02 0.18 0.73 2.07 2.58 0.00
20:00:01 0.02 0.02 0.02 0.01 0.00
21:00:01 0.01 0.01 0.01 0.00 0.00
22:00:01 0.04 0.04 0.06 0.02 0.00
23:00:01 0.03 0.03 0.15 0.15 0.00
Average 0.67 2.78 3.25 0.79 0.00
Average md0 0 0.0 0 2 0.0 33.7
md1 0 0.0 0 2 0.0 32.7
md2 0 0.0 0 2 24.6 36.5
md3 0 0.0 0 0 0.0 13.4
md4 0 0.0 0 0 0.0 13.2
md5 0 0.0 0 0 0.2 13.0
md6 0 0.0 0 3 0.0 30.9
md7 0 0.0 0 3 0.0 31.6
md8 0 0.0 0 3 23.0 33.5
md9 0 0.0 0 2 0.0 18.1
md10 0 0.0 0 2 0.0 18.8
md11 0 0.0 0 2 19.0 19.9
nfs1 0 0.0 0 0 0.0 12.9
nfs2 0 0.0 0 0 0.0 0.0
nfs22 0 0.0 0 0 0.0 13.1
nfs31 0 0.0 0 0 23.5 16.3
nfs60 0 0.0 0 0 0.0 6.9
sd0 1 0.0 1 7 0.0 29.8
sd0,a 0 0.0 0 2 0.0 33.7
sd0,b 0 0.0 0 0 0.0 13.3
sd0,c 0 0.0 0 0 0.0 0.0
sd0,d 0 0.0 0 3 0.0 30.9
sd0,f 0 0.0 0 2 0.0 18.1
sd0,h 1 0.0 0 0 0.0 31.2
sd1 1 0.0 1 7 0.0 29.6
sd1,a 0 0.0 0 2 0.0 32.7
sd1,b 0 0.0 0 0 0.0 13.2
sd1,c 0 0.0 0 0 0.0 0.0
sd1,d 0 0.0 0 3 0.0 31.5
sd1,f 0 0.0 0 2 0.0 18.8
sd1,h 1 0.0 0 0 0.0 30.8
sd3 4 0.2 7 52 4.1 21.2
sd3,a 4 0.2 7 52 4.1 21.2
sd3,c 0 0.0 0 0 0.0 0.0
sd5 2 0.1 4 16 0.0 16.9
sd5,a 2 0.1 4 16 0.0 16.9
sd5,c 0 0.0 0 0 0.0 0.0
sd6 0 0.0 0 0 0.0 0.0
sd8 1 0.0 2 59 0.0 15.0
sd8,a 1 0.0 1 14 0.0 11.3
sd8,b 0 0.0 1 44 0.0 19.7
sd8,c 0 0.0 0 0 0.0 0.0
sd8,d 0 0.0 0 1 0.0 10.2
sd8,h 0 0.0 0 0 0.0 15.3
sd9 1 0.0 2 29 0.0 13.5
sd9,a 1 0.0 1 7 0.0 32.7
sd9,c 0 0.0 0 0 0.0 0.0
sd9,h 1 0.0 2 22 0.0 7.4
sd10 1 0.0 2 11 0.0 10.0
sd10,a 0 0.0 0 5 0.0 27.0
sd10,c 0 0.0 0 0 0.0 0.0
sd10,h 1 0.0 2 6 0.0 6.6
As this screen paste from top shows, the only thing getting any
significant CPU time is the Sybase data server (top is usually under 10%,
BTW):
last pid: 24326; load averages: 0.93, 0.66, 0.61
08:52:16
79 processes: 77 sleeping, 1 running, 1 on cpu
CPU states: 11.2% idle, 52.1% user, 35.4% kernel, 1.2% iowait, 0.0% swap
Memory: 1024M real, 16M free, 91M swap in use, 2204M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
2546 sybase 17 0 0 593M 584M run 38.9H 47.28% dataserver
24320 root 1 11 0 2264K 2088K cpu 0:07 18.49% top
24323 stats 1 51 0 5952K 2616K sleep 0:00 2.09% sqlsrv
24316 stats 1 58 0 5952K 2616K sleep 0:00 0.53% sqlsrv
24311 stats 1 58 0 5952K 2616K sleep 0:00 0.20% sqlsrv
24306 stats 1 58 0 5928K 2600K sleep 0:00 0.19% sqlsrv
24322 root 1 48 0 928K 800K sleep 0:00 0.09% sh
24301 stats 1 58 0 5952K 2616K sleep 0:00 0.07% sqlsrv
24296 stats 1 58 0 8968K 2680K sleep 0:00 0.06% sqlsrv
24315 root 1 49 0 928K 800K sleep 0:00 0.02% sh
23041 root 1 48 0 1648K 1208K sleep 0:00 0.02% ksh
24310 root 1 48 0 928K 800K sleep 0:00 0.01% sh
530 root 1 59 0 6496K 1680K sleep 0:05 0.00% dtterm
24300 root 1 48 0 928K 800K sleep 0:00 0.00% sh
24305 root 1 58 0 928K 800K sleep 0:00 0.00% sh
So, of course, I did a truss on the dataserver process. Here's some
of the output from truss:
poll(0xCCD27F00, 36, 0) = 0
poll(0xCCD27F00, 36, 0) = 0
poll(0xCCD27F00, 36, 0) = 0
poll(0xCCD27F00, 36, 0) = 1
getmsg(53, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xEC989800, 2048, 0x620A6000, 0xEDE9BBE0)
= 0
poll(0xCCD27F00, 36, 0) = 1
kaio(AIOWAIT, 0xFFFFFFFF) = 0
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
write(40, "0401\0 q\0\0\0\083\v\0 D".., 113) = 113
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA44F030)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303449120
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xEC989800, 2048, 0x620A6000, 0xEDE9BBE0)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9528)
poll(0xCCD27F00, 36, 0) = 1
kaio(AIOWAIT, 0xFFFFFFFF) = -303449120
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xEB10C000, 2048, 0x620A6800, 0xEDE9A020)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
write(40, "0401\0 "\0\0\0\0D1BD8F\0".., 34) = 34
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9528)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303456224
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xEB10C000, 2048, 0x620A6800, 0xEDE9A020)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 1
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9590)
kaio(AIOWAIT, 0xFFFFFFFF) = -303456224
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xEB10C000, 2048, 0x620A6800, 0xEDE9A9E0)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
write(40, "0401\011\0\0\0\0FD10\002".., 17) = 17
poll(0xCCD27F00, 36, 0) = 0
signotifywait() = 14
Received signal #14, SIGALRM [caught]
siginfo: SIGALRM pid=2546 uid=1002 code=-3
lwp_sigredirect(1, SIGALRM) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9510)
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9528)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303453728
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xEB10C000, 2048, 0x620A6800, 0xEDE9A9E0)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9590)
kaio(AIOWAIT, 0xFFFFFFFF) = -303453728
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xEB10C000, 2048, 0x620A6800, 0xEDE9E5E0)
= 0
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xED059000, 2048, 0x620A7000, 0xEDE9D320)
= 0
poll(0xCCD27F00, 36, 0) = 1
kaio(AIOWAIT, 0xFFFFFFFF) = 0
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
poll(0xCCD27F00, 36, 0) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9590)
kaio(AIOWAIT, 0xFFFFFFFF) = -303438368
kaio(AIOWAIT, 0xFFFFFFFF) = -303443168
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xEB10C000, 2048, 0x620A6800, 0xEDE9E5E0)
= 0
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xED059000, 2048, 0x620A7000, 0xEDE9D320)
= 0
write(40, "0401\0 %\0\0\0\08307\0 D".., 37) = 37
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9528)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303438368
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9528)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303443168
kaio(AIOWRITE|AIO_POLL_BIT, 14, 0xEC5AA800, 2048, 0x620A7800, 0xEDE9A2C0)
= 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = 0
poll(0xCCD27F00, 36, 0) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA4D9590)
kaio(AIOWAIT, 0xFFFFFFFF) = -303455552
kaio(AIOWRITE|AIO_POLL_BIT, 29, 0xEC5AA800, 2048, 0x620A7800, 0xEDE9A2C0)
= 0
poll(0xCCD27F00, 36, 0) = 1
kaio(AIOWAIT, 0xFFFFFFFF) = 0
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
poll(0xCCD27F00, 36, 0) = 1
kaio(AIOWAIT, 0xFFFFFFFF) = 0
getmsg(40, 0xCA44F64C, 0xCA44F63C, 0xCA44F67C) = 0
signotifywait() = 22
Received signal #22, SIGPOLL [caught]
lwp_sigredirect(1, SIGPOLL) = 0
sigprocmask(SIG_SETMASK, 0xEF797E20, 0x00000000) = 0
setcontext(0xCA44F000)
poll(0xCCD27F00, 36, 0) = 0
kaio(AIOWAIT, 0xFFFFFFFF) = -303455552
write(40, "0401\01B\0\0\0\08307\0 E".., 27) = 27
poll(0xCCD27F00, 36, 0) = 0
write(53, "040101C2\0\0\0\0FF A\002".., 450) = 450
poll(0xCCD27F00, 36, 0) = 0
poll(0xCCD27F00, 36, 0) = 0
poll(0xCCD27F00, 36, 0) = 0
The thing that caught my eye here is the kaio system calls with the
first field as "AIOWAIT" and the return value being "-303455552." Reading
the man pages and searching SunSolve told me that this is a "kernel
asynchronous I/O" system call that is waiting to complete. But why? If my
problem is caused by this (and that would explain why system time is high),
should I not see that reflected in some of the disk statistics?
I have run (in fact I have a cron job which automatically runs and
logs the output to files) vmstat, iostat, and mpstat. If you're
interested, I can send you the output. I'm not going to waste any more
bandwidth than necessary, but suffice it to say that the output of the
xxstat commands agrees with sar. One thing I looked for was potential lock
contention (the smtx field of mpstat). Values there were extremely low.
Any thoughts?
PROBLEM 2:
This system is an Ultra 5 (1 x 360 Mhz CPU) with 512 MB RAM running
Solaris 2.7 and Sybase ASE 12.0. This is our Sybase "test" server. Sybase
databases are again in raw partitions, with mirroring done by Sybase. The
system is reasonably up to date on patch clusters. I have allocated 450 MB
of shared memory for Sybase via /etc/system. My Sybase DBA is using it
all.
The "problem" is that my Sybase DBA / developers are complaining of
sluggish response time on Sybase queries on this box. Specifically, they
say that a query that usually comes back in about 2 seconds on the first
box, usually takes about 10 seconds on this box (and they're quick to point
out that this box has a more than twice as fast processor in it).
All performance statistics on this box look great to me, however. The
only thing I've noticed (via the xxstat commands) is that there are brief
periods where CPU utilization does hit 100%. However, no significant run
queue forms, and the busy periods usually don't last for more than a couple
of minutes. I've included some sar output here and can send any interested
parties xxstat output.
SunOS corinth 5.7 Generic_106541-11 sun4u 09/13/00
00:00:00 %usr %sys %wio %idle
01:00:01 0 0 0 100
02:00:00 0 0 0 100
03:00:00 0 0 0 100
04:00:00 0 0 0 100
05:00:00 0 0 0 100
06:00:00 0 0 0 100
07:00:00 21 2 0 77
08:00:01 32 4 0 64
08:20:00 41 5 0 55
08:40:00 26 3 0 71
09:00:00 39 7 0 54
09:20:00 21 3 0 76
09:44:23 17 3 3 77
10:00:00 3 4 0 94
10:20:00 6 6 0 88
10:40:00 8 4 0 88
11:00:00 10 6 0 84
11:20:01 8 9 0 83
11:40:00 10 22 0 67
12:00:00 4 11 0 85
12:20:00 3 3 0 94
12:40:00 7 4 0 89
13:00:00 15 6 0 79
13:20:00 44 8 0 48
13:40:01 30 5 0 65
14:00:00 20 4 0 76
14:20:00 21 4 1 75
14:40:00 34 7 0 59
15:00:00 1 3 0 96
15:20:00 11 9 0 80
15:40:00 35 9 0 56
16:00:00 28 12 0 60
16:20:00 46 22 0 32
16:40:01 10 4 0 85
17:00:00 3 3 0 94
17:20:00 2 2 0 97
17:40:00 1 2 0 98
18:00:00 1 2 0 97
19:00:01 7 7 3 84
20:00:00 2 7 19 72
21:00:01 0 3 2 95
22:00:00 1 1 0 98
23:00:00 2 2 0 96
Average 10 4 1 85
/home/kbuterb
kbuterb@corinth: sar -g -f /var/adm/sa/sa13
SunOS corinth 5.7 Generic_106541-11 sun4u 09/13/00
00:00:00 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
01:00:01 0.00 0.00 0.00 0.00 0.00
02:00:00 0.00 0.00 0.00 0.00 0.00
03:00:00 0.00 0.00 0.00 0.00 0.00
04:00:00 0.00 0.00 0.00 0.00 0.00
05:00:00 0.00 0.00 0.00 0.00 0.00
06:00:00 0.00 0.00 0.00 0.00 0.00
07:00:00 0.00 0.01 0.03 0.11 0.00
08:00:01 0.43 0.45 0.42 0.07 0.00
08:20:00 0.56 0.58 0.56 0.11 0.00
08:40:00 0.43 0.46 0.57 1.00 0.00
09:00:00 0.70 0.73 0.69 0.01 0.00
09:20:00 0.42 0.45 0.42 0.00 0.00
09:44:23 0.52 1.85 8.88 103.32 0.00
10:00:00 0.28 0.30 0.35 0.15 0.00
10:20:00 0.24 0.25 0.40 0.42 0.00
10:40:00 0.19 0.20 0.22 0.04 0.00
11:00:00 0.49 0.50 0.56 0.21 0.00
11:20:01 0.31 0.31 0.52 0.53 0.00
11:40:00 0.37 0.40 0.41 0.03 0.00
12:00:00 0.12 0.12 0.13 0.00 0.00
12:20:00 0.16 0.16 0.18 0.04 0.00
12:40:00 0.21 0.22 0.34 0.28 0.00
13:00:00 0.32 0.34 0.55 0.62 0.00
13:20:00 0.00 0.00 0.00 0.00 0.00
13:40:01 0.00 0.00 0.00 0.00 0.00
14:00:00 0.00 0.00 0.00 0.00 0.00
14:20:00 0.24 0.26 0.87 2.00 0.00
14:40:00 0.69 0.70 0.65 0.00 0.00
15:00:00 0.10 0.11 0.11 0.00 0.00
15:20:00 0.51 0.51 0.50 0.00 0.00
15:40:00 0.65 0.66 0.64 0.00 0.00
16:00:00 0.35 0.35 0.45 0.31 0.00
16:20:00 0.28 0.29 0.29 0.02 0.00
16:40:01 0.12 0.12 0.13 0.03 0.00
17:00:00 0.12 0.13 0.15 0.07 0.00
17:20:00 0.15 0.17 0.17 0.00 0.00
17:40:00 0.12 0.12 0.13 0.02 0.00
18:00:00 0.21 0.23 0.29 0.17 0.00
19:00:01 0.28 0.32 0.58 0.93 0.00
20:00:00 6.98 102.74 103.18 14.76 0.00
21:00:01 0.00 0.00 0.00 0.00 0.00
22:00:00 0.00 0.00 0.00 0.00 0.00
23:00:00 0.00 0.00 0.00 0.00 0.00
Average 0.46 4.67 4.82 2.28 0.00
Average dad0 1 0.0 4 100 2.4 4.3
dad0,a 0 0.0 0 2 31.7 8.0
dad0,b 0 0.0 0 1 2.2 14.0
dad0,c 0 0.0 0 0 0.0 0.0
dad0,d 0 0.0 0 2 6.2 6.3
dad0,e 0 0.0 2 50 1.1 4.2
dad0,f 0 0.0 2 45 0.8 3.7
dad0,h 0 0.0 0 0 0.0 0.0
fd0 0 0.0 0 0 0.0 0.0
md0 2 0.0 7 178 0.0 5.2
nfs1 0 0.0 0 0 0.0 0.0
nfs10 0 0.0 0 0 0.0 0.0
nfs15 0 0.0 0 0 13.6 16.5
nfs24 0 0.0 0 0 0.0 20.2
sd0 1 0.0 3 43 0.0 10.4
sd0,a 1 0.0 2 14 0.0 8.7
sd0,c 0 0.0 0 0 0.0 0.0
sd0,d 0 0.0 0 0 0.0 0.0
sd0,e 0 0.0 0 12 0.0 24.4
sd0,f 0 0.0 0 0 0.0 6.6
sd0,g 0 0.0 0 0 0.0 0.0
sd0,h 0 0.0 0 17 0.0 17.0
sd1 2 0.1 4 77 0.0 13.5
sd1,a 0 0.0 0 0 0.0 0.0
sd1,c 0 0.0 0 0 0.0 0.0
sd1,d 2 0.1 4 68 0.0 13.7
sd1,e 0 0.0 0 0 0.0 9.9
sd1,f 0 0.0 0 0 0.0 3.3
sd1,g 0 0.0 0 9 0.0 12.1
sd1,h 0 0.0 0 0 0.0 10.8
sd2 0 0.0 2 46 0.0 6.7
sd2,a 0 0.0 0 0 0.0 8.3
sd2,c 0 0.0 0 0 0.0 0.0
sd2,d 0 0.0 0 3 0.0 28.4
sd2,e 0 0.0 0 0 0.0 23.8
sd2,f 0 0.0 0 0 0.0 0.0
sd2,g 0 0.0 0 0 0.0 0.0
sd2,h 0 0.0 2 43 0.0 5.6
sd3 0 0.0 2 40 0.0 5.2
sd3,a 0 0.0 0 0 0.0 6.5
sd3,c 0 0.0 0 0 0.0 0.0
sd3,d 0 0.0 0 0 0.0 0.0
sd3,e 0 0.0 0 0 0.0 9.0
sd3,f 0 0.0 0 0 0.0 0.0
sd3,g 0 0.0 0 0 0.0 0.0
sd3,h 0 0.0 2 40 0.0 5.2
sd15 0 0.0 0 0 0.0 0.0
With this box, as with the first, the only significant user of CPU
time is the Sybase dataserver process. Interestingly enough in relation to
my first problem, is that a truss of the dataserver process on this box
never shows it doing any kaio calls with the first field as "AIOWAIT."
Of course, the main difference between these 2 boxes is that the first
runs Solaris 2.6 and the second Solaris 2.7. Are there any issues there
that I should be aware of? Anybody running Sybase ASE 12.x on Solaris 2.8?
I think that my DBA / developers may be overreacting; i.e. they submit
a query once or twice which happen to be run during a "busy" period, but
then complain that performance is "always" poor. Any thoughts?
Again, I apologize for the length of this post. Thanks for any input
you can provide, especially on "Problem 1"...
Kevin Buterbaugh
LifeWay
"Anyone can build a fast CPU. The trick is to build a fast system." -
Seymour Cray
S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:17 CDT