SUMMARY: input errors on hme NICs

From: Alex <ded_subs_at_tumko.org> Date: Thu Jun 06 2002 - 19:44:03 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:46 EST

Thanks to everyone who replied and shared great ideas of what might
have cause the problem:

topher
Darren Dunham
Gaziz Nugmanov
tony bourke
Vlade
Steve Mickeler

Almost everyone advised to double-check 100 FullDuplex settings on all
NICs and switch ports, and the settings were correct.

topher proposed to try out the next steps:
> My first step would be a "showmount -a" - and make sure that there aren't any
> +systems in there that have been turned off - I found a BUNCH of packet errors
> +while the system was trying to figure out NFS connections - something along the
> +same line as what you see now...  you can manually edit /etc/rmtab and either
> +comment out, or completely remove, the systems that are no longer in existance,
> +then start and stop the nfs server (/etc/init.d/nfs.server
> +stop;/etc/init.d/nfs.server start) and you should be all good to go (the nfs
> +thing won't even drop the existing connections, worst case they get a blip of a
> +message on the end users terminals, but that's it...)
> 
> ----
> 
> My next step would be to increase the xmit and recv buffers with ndd:
> 
> ndd -set /dev/tcp tcp_xmit_hiwat 32768
> ndd -set /dev/tcp tcp_recv_hiwat 32768
> 
> these default to 16,384 and 24,576 respectively, and the range for each is from
> +4096-1,073,741,824 and 2048-1,073,741,824 respectively
> 
> These parameters specify the default value for a connection's receive and
> +transmit buffer space; that is, the amount of buffer space allocated for
> +received data (and thus the maximum possible advertised receive window) -
> +usually they should be set to be the same...
> 
> try the 32768 number, and if they make the errors a little better, you can
> +slowly increase it - just remember it's 'bytes', so if you make it too big,
> +you'll clobber your RAM with tcp packets... that'd make things worse...

The box was not an nfs-server, the 2nd step did not help as well.
I also tried bringing down several network services running on that box -
nothing changed.

Finally I installed the latest 2.6 Recommended patches and rebooted the box,
it solved the problem (I hope at least for the next ~600 days ;)). 

Thanks everyone for your help!

Alex

Initial message was:

On Thu, Jun 06, 2002 at 12:13:59PM -0700, Alex wrote:
> Hi,
> 
> I have a Solaris 2.6 box with 2 hme NICs (active and standby) connected
> to different switches. Starting from yesterday, netstat -in started showing
> a lot of input errors on hme0. When I switched to hme1, the input errors
> started to appear on it with the same speed (ratio is ~4.2%):
> 
> # netstat -in
> Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis Queue
> lo0   8232 127.0.0.0     127.0.0.1      36838233 0     36838233 0     0      0  
> hme0  1500 10.16.0.0     10.16.0.110    3348788030 74996574 2221455560 0     0      0
> hme1  1500 10.16.0.0     10.16.0.110    304529 12973 176126 0     0      0
> 
> # ifconfig -a
> lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
>         inet 127.0.0.1 netmask ff000000
> hme0: flags=862<BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
>         inet 10.16.0.110 netmask ffffff00 broadcast 10.16.0.255
>         ether 8:0:20:8f:24:29
> hme1: flags=843<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>         inet 10.16.0.110 netmask ffffff00 broadcast 10.16.0.255
>         ether 8:0:20:8f:24:29
> 
> Both nics are explicitly set to 100 Full Duplex, autonegotiation
> is turned off (the same on switch ports). snoop is not showing anything
> suspicious. The box's uptime is 649 days.
> 
> Does anyone have any ideas what could be a cause of such ierrors?
> Will summarize.
> 
> Thank you!
> 
> Alex
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers