A **BIG** thanks goes out to those who responded promptly, leading me straight to a painless solution. (Casper Dik, Andy Lee, Frank Smith). Concensus was, -do some tcp tuning/monitoring ("netstat -an" gives a feel for the state of connections,ie, in use or wait state?) -tweak the tcp stack to decrease the timeout value from (the default) 4 minutes to 1 minute for sockets: ndd -set /dev/tcp tcp_time_wait_interval 60000 [solaris 8] ndd -set /dev/tcp tcp_close_wait_interval 60000 [solaris 2.6] (NOTE that these tweaks do *not* persist across reboots ; I've created a "kludge" script, /etc/rc2.d/S99_ndd_tcp_kludge, which sets these parameters at system boot.) It seems that rsh makes use of "reserved" ports (ie, < 1024), of which there are only ~400 available inherently. With a session timeout of 4 minutes, and most of my rsh jobs taking < 20 seconds, typically I was saturating the 400 available reserved ports, leaving most waiting to timeout, but blocking new rsh sessions from being established. Hence the socket:all ports in use ... type error messages. I've tweaked the parameters on my solaris 8, 2.6 boxes and now they both run equally well (ie, no errors). Another option mentioned by Casper Dik (in addition to the "ndd" tweaking) was to use something like "ssh" instead of "rsh", which is not limited to reserved ports .. hence no ~400 port limit ... hence this entire kettle of worms becomes far less relevant. (Initially when setting up this software, I had been concerned that ssh might introduce non-insignificant delays on the encrypting CPU in particular, but more preliminary benchmarks attempted today suggest this concern is almost certainly unfounded, esp. if we force use of ciphers that are NOT "the MOST robust available". Private-Public Key pairs let me have "no-authentication" ssh commands being pushed in the same manner as rsh, so it is a very strong candidate to replace rsh when we scale this thing onto a higher number of parallel tasks (ie, from the current 14-at-once to .. more .. later). Finally: I was *TOTALLY* down the garden path with my tweaking of pt_cnt, npty, etc etc. In this case, my google searches that took me down this avenue were really a hinderance / distraction more than a help. "Old Style" BSD ttys were simply not an issue in this (and apparently they are rarely an issue with most solaris apps .. ) Anyhow. A big thank-you to everyone for their help. I hope that the archives of this posting help prevent others from making the same mistakes I did :-) ---Tim Chipman :original posting follows: -=-=-=-=-=-=-=-=-=-=--=-=--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=- Hi all, I've got a bit of locally-developed software which is giving a fair bit of grief. I'm attempting to find how to resolve these errors, and after extensive crawling through google and the list archives, all the apparently obvious fixes still leave me in a bind, so I'm hoping someone can comment on what else can be done with the situation. Basic scoop is: The program is spawning N parallel processes initially, where each process will rsh a command through to another server ; capture the stdout it recieves, parsing the output, and saving it to a local file. As one thread completes, another is spawned to ensure there are always N parallel processes grinding away. For the moment, N=14, but in the future I expect this to increase to 28 or higher. The Problem: Initially most critically observed on the Solaris 2.6 box running this software, it would often throw error messages to the console: socket: All ports in use This appeared to be a derivative on how many concurrent rsh sessions the solaris 2.6 box was happy with. A bit of reading suggested there were some kernel paramaters in /etc/system which could be tuned to alleviate the situation, possibly, including: set pt_cnt=512 (it was already high ; this box is a sunray server with plenty of users logged in with many term windows open). set npty = 176 (didn't exist initially, was set and didn't seem to help as much as expected ; also tweaked the /etc/iu.ap file, "ptsl" line, to jive with this npty setting.) tried "pty_cnt=number" but apparently this isn't a legitimate variable in solaris 2.6 kernel, only in 7 and 8 (?) After making these changes (along with reconfig-reboot), the problem still crops up for N > 4 (approx). I had given up hope on this solaris 2.6 box running this app, since we had a solaris 8 box that was behaving better and could do the same thing. Or so I thought. ThisAM, the solaris 8 box is now throwing similar errors in the same circumstances. Error message is, rcmd: socket: Cannot assign requested address I've tweaked the "pty_cnt=number" setting in /etc/system, which seemed the most obvious place to muck with the config for this, and there is no joy. According to my reading on the problem (?) these resources are dynamically managed on Solaris 8 (at least, far more so than for solaris 2.6), hence my hopes that the problem would not persist from this environment. Alas, it may not be the case. (?) If anyone has some suggestions on other things I'm overlooking, it certainly would be greatly appreciated.. Thanks! ---Tim Chipman _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Feb 26 09:49:24 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:35 EST