Well, I got two replies: 1) Check for interrupts. They aren't sharing interrupts 2) Check auto-negotiation / Bad Hub Well, neither of those really hit the nail on the head. But I did start to notice that this occurred consistently after about 15 days of uptime. The fix (yeah, this is a hack!). I put in a cron job to "Down/Up" iprb0 (The external network card, not the card that appeared to be having the problem) and the system immediately responded. After doing this, I noticed that the number of idle connections (reported by netstat -a) dropped significantly. Kinda strange, totally a hack, but it works! --Brett ----- Original Message ----- From: "Brett Thorson" <bthorson@ekosystems.com> To: <sunmanagers@sunmanagers.org> Sent: Monday, October 01, 2001 10:52 AM Subject: Solaris Networking (Application Layer Snooping?) > Preamble: Sorry if this is a repeat; first version didn't go through. > ------- > > I've seen this twice now, and I am not even sure where to begin looking. > > Consider a Solaris x86 box (BOX) with two network cards. No routing between > the two. > iprb0 goes to the outside world > iprb1 goes to an internal hub connected to devices (DEVS) that get their > address via DHCP from BOX. > > I come up to one of these machines, and find that the application that > communicates between the DEVS and the BOX is no longer communication. But > DEVS does have a dhcp address from BOX. > > I try telnetting to DEVS from BOX. No response. > So I ping DEVS from BOX, no reply. > Start snooping on iprb1. > > Ping DEVS from BOX. Ping (the little program) receives no reply, but I can > see the packet going > from BOX to DEVS, and I can see the packet replying from DEVS to BOX. > However ping (the program) says no response. > > Unplug & reset DEVS. DEVS gets a DHCP address no problem (Confirmed: it is > not re-using an old address, it actually gets assigned a new address from > BOX). This leads me to beleive (along with the initial communication > architecture between BOX & DEVS) that UDB / Broadcast stuff is working and > moving around. > > So that means (here is my jump/stab at it) that there is something going on > in the Sessions/Presentation/Application layer for things to not be working > right?!?! > > The routing tables haven't changed. (I checked netstat -rn) The IP > addresses haven't changed. > Nothing changed (as far as I have determined) on the box for this to > precipitate this problem. > > The first time that this occurred, I solved the problem (after these > diagnostics) by rebooting BOX. > > The second time, I brought the card down and then back up with ifconfig, and > the system flew, it ran just fine. I watched the snoop traffic on iprb1, > and it looked exactly the same as when the system was failing. The routing > looked the same. > > There weren't any errors in syslog or dmesg. The only errors I really saw > were java socket timeout failures (When it was trying to open a TCP/IP > Socket after finding DEVS via broadcast) > > There isn't a whole lot of traffic in iprb1, but I did see a few Oerrs > (about .1%) in a netstat -i > > If the packets are actually moving, but the apps aren't seeing them. Where > do I start poking my nose to see what's going on? > > And even so, would this mean that broadcast (UDP)stuff works, but peer to > peer (TCP) stuff doesn't for some reason? And why would ping not work? > > Any help, advice, or whatever would be great for this one. > > --Brett > >Received on Wed Oct 10 15:21:04 2001
This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:33 EDT