Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products

Isilon nodes running two Intel ports in link aggregation can lose connectivity after boot or major network changes

Summary: Intermittently failure when aggregation is configured over Intel cards.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



Isilon nodes running two Intel ports in link aggregation can lose connectivity after boot or major network changes (such as MTU change or adding a port to a network pool). The node will be unreachable via ping or any other protocol.

Cause

When an Intel interface is added to a lagg, member ports have their MAC address changed to match the lagg's MAC by if_setlladdr().This function changes the MAC address in the ifnet and then clears and sets the IFF_UP flag on the interface to force it to re-initialize.

Unfortunately, there is no locking on the ifnet struct, so setting flags on the interface produces a race condition.  On some Infinity models, flexnet manages to trigger this race extremely frequently, leading to drivers to not see the IFF_UP flag transition to down, so they don't re-initialize.  For most drivers this winds up being not a problem due to luck, but the Intel drivers have been optimized to avoid resets as much as possible. This causes the second port in the lagg to have the wrong MAC address programmed into its Receive Address Filter (i.e. its MAC filter), causing it to not receive any packets addressed to the lagg.

 

 
 

Resolution

 - The fix is in OneFS 8.1.0.4
 - The issue is present in 8.1.0.0-8.1.0.3 & 8.1.1.0-8.1.1.1 

DA Patches are now available for:
8.1.1.1 (bug 229557) - patch-229557

Roll-Up patch is now available for:
8.1.0.2 (bug 226323) - patch-226323

RUP patch for 8.1.1.1 (bug 227312) is still being tested.  This KB will be updated once it becomes available.

WORKAROUND:
=============

There are two options to mitigate the issue immediately:

======
1) Node reboot

There is a risk to this step as the issue could immediately return upon boot.  Changes to MTU and adding or removing aggregate interfaces from a network can increase the chances of the race condition occurring.  At that time, another reboot would be needed.  This is the most risky mitigation as it can correct the issue but does not prevent it from returning.  

======
2) Re-initialize the interfaces
 
/sbin/ifconfig ix0 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso

/sbin/ifconfig ix1 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso

NOTE:  You will need to change the interface name (ix0 or ix1) to what matches the nodes affected iface.

There is still a risk to this issue but much less invasive than a reboot.  This command re-initializes the NIC and should correct the issue.  It is similar to the reboot as it can correct the problem but does not prevent the issue from returning.  Changes to MTU and adding or removing aggregate interfaces from a network can increase the chances of the race condition occurring.  If a change is made and the race condition returns, the command would need to be run again to correct the issue. 

This command will not persist through a reboot.  If a reboot is performed on the node, the commands must be run again should the issue return after boot.

======

Additional Information

The following conditions must exist for this issue to occur.
  • Node must be configured for aggregation.
  • Node aggregation must be configured for Intel interfaces.

NOTE 1: Nodes running bxe, cxgb, and mlxen interfaces are NOT at risk for this issue. 

NOTE 2: Gen 6 nodes have a high frequency of this occurrence. 

NOTE 3: Gen 4 and Gen 5 hardware with configured Intel interfaces in aggregation are susceptible to this issue, but there is less of a chance of this occurring and no recorded incidents on 8.1.0.0-8.1.0.2.  There were significant changes to the Intel 10G driver in 8.1.0.3 which can increase the chances of this occurring on ix interfaces.

Affected Products

Isilon Gen6

Products

Isilon A100, Isilon A200, Isilon A2000, Isilon Gen6, Isilon H400, Isilon H500, Isilon HD400, Isilon NL400, Isilon NL410, Isilon S200, Isilon S210, Isilon X200, Isilon X210, Isilon X400, Isilon X410
Article Properties
Article Number: 000167681
Article Type: Solution
Last Modified: 20 Nov 2020
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.