Hi! Ok. I've got the following answer from Don (thanks a bunch, Don) and it seems it's good explanation of what was going on with my system. > There is a problem in Solaris 7 and early Solaris 8 OS systems with regard to large > numbers of mutexes. As the number of mutexes increases in the system, the amount > of cpu time they consume increases linearly after a certain point until they are consuming > virtually all of the available cpu time, even if the system is essentially idle! > There is a patch for Solaris 7 (which I don't have the number of any more) and one of the > early Solaris 8 systems which is 108827-05. I've found that patch for Solaris 7 - it's 106980-17. I applied it and rebooted machine. Now, after 2 days it still works like charm. We'll se what will it be like after a few weeks... And here is my original mail: > Hi people! > > I've got strange problem: > The machine is 14 processor domain on E10k with Solaris 7 11/99. It > runs Oracle 8.1.7 and some processes that use this database. There is > about 1000 oracle sessions. Machine is _heavily_ loaded - loadavg is > 95. And.. well just look at some statistics: > > "vmstat 3" shows: > r b w swap free re mf pi po fr de sr s0 s1 s2 s8 in sy cs us sy > id > 3 0 0 16928 1384 0 0 0 0 0 0 0 0 0 0 0 4294967196 0 0 -46 -8 > -95 > 81 1 0 17316992 6124176 0 66 0 0 0 0 0 0 0 0 0 7880 589133 > 12364 75 25 0 87 3 0 17315136 6122600 0 133 0 0 0 0 0 0 0 0 0 > 9435 552462 14111 74 26 0 > 95 2 0 17311744 6120176 0 242 0 0 0 0 0 4 3 0 0 9574 493991 14674 75 > 25 0 > 90 0 0 17308128 6117512 0 301 0 0 0 0 0 0 0 0 0 8812 594396 13331 75 > 25 0 > 102 1 0 17305520 6115024 0 245 0 0 0 0 0 0 0 0 0 6746 590357 11123 76 > 24 0 > > And "mpstat 3" shows > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 2 0 0 226 1 563 232 29 6601 0 43940 74 26 0 0 > 1 18 0 13 1005 474 1975 783 30 5776 0 30448 71 29 0 0 > 2 11 0 93 911 533 1736 668 35 6544 0 33818 80 20 0 0 > 3 28 0 815 173 1 413 167 23 5810 0 28223 68 32 0 0 > 28 0 0 537 456 1 1140 469 20 8601 0 46569 78 22 0 0 > 30 0 0 0 385 1 1080 404 23 7813 0 48830 76 24 0 0 > 32 0 0 34 262 1 673 270 22 8893 0 54499 78 22 0 0 > 33 1 0 17 302 1 792 316 16 7328 0 47958 79 21 0 0 > 34 19 0 5169 172 1 439 174 19 7511 0 42699 78 22 0 0 > 35 0 0 86 366 1 939 383 20 9280 0 55249 73 27 0 0 > 48 117 0 489 244 1 573 257 32 4971 0 27181 84 16 0 0 > 49 12 0 1248 205 1 530 209 26 8644 0 53641 74 26 0 0 > 50 12 0 13 428 71 585 212 20 9953 0 65631 54 46 0 0 > 51 0 0 43 2699 2276 596 209 20 7892 0 43045 64 36 0 0 > > Look at smtx column (!) it's incredibly high! > > and then i ran "lockstat sleep 5" and here is a bit of output from it: > > Adaptive mutex spin: 467216 events > > Count indv cuml rcnt spin Lock Caller > > ---------------------------------------------------------------------- > ------ > --- > 439790 94% 94% 1.00 36 tod_lock uniqtime+0x10 > > 1841 0% 95% 1.00 3 0x30000c6c000 untimeout+0x18 > > 1828 0% 95% 1.00 2 0x30000c72000 untimeout+0x18 > > 1663 0% 95% 1.00 4 0x30000c72000 timeout_common+0x4 > 1547 0% 96% 1.00 3 0x30000c6c000 timeout_common+0x4 > > 1337 0% 96% 1.00 3 0x30000c75000 timeout_common+0x4 > > 1305 0% 96% 1.00 2 0x30000c75000 untimeout+0x18 > > 1255 0% 96% 1.00 102 0x30005c703c8 qfestart+0x204 > > I have totally no clue what to do with this :-( > Any suggestions will be helpful. > I will summarize of course. > > best regards, > Jedrzej Nasiadek > > p.s. > One more thing - i've looked at ps' output - there is no > cpu-consuming "pig" > the most busy process occupies 1.8% cpu. _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Mon Jan 7 03:58:11 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:31 EST