Skip to main content

Cisco UCS LACP Port Channel Flapping

I recently encountered this issue during a deployment and wasn't able to find much information about it on the Internet, so I figured I'd make a quick blog post to document the issue and the solution in case other people encounter the same issue.

When connecting UCS Fabric Interconnects to non-Cisco switches (in this case Juniper EX series) we noticed some strange behavior: once every 30 minutes or so, the Uplink port-channel members would go down briefly and then come back up and then everything would be fine until the next occurrence. This would generate an intermittent F0727 error - UCS complaining that its port channels had no operational members.

I turned up LACP traceoptions logging on the Juniper side to see if it might be an issue with the LACP protocol itself, but it did not yield any useful information as to the root cause of the issue.

I turned to the UCS logs to see if I could get any further information, and noticed these log entries repeated many times:

2018 Apr 11 13:25:51 ucscluster-A %LLDP-1-NO_DCBX_ACKS_RECV_FOR_LAST_10_PDUs
2018 Apr 11 13:28:07 ucsclster-A %LLDP-1-NO_DCBX_ACKS_RECV_FOR_LAST_10_PDUs
2018 Apr 11 13:30:21 ucscluster-A %LLDP-1-NO_DCBX_ACKS_RECV_FOR_LAST_10_PDUs


A Google search then turned up this Cisco article - https://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/116249-troubleshoot-nexus-00.html - which explains that LLDP was the actual root cause of the issue I was seeing.

From the Cisco.com Support article:

"Data Center Bridging Capability Exchange (DCBX) Type Length Values (TLV) are packaged within a Link Layer Discovery Protocol (LLDP) frame that is exchanged between the switch and the converged network adapter (CNA). One such Control Sub-TLV is used for acknowledgement (ACK), which is sequence-based. For example, the switch sends a Control Sub-TLV with a SeqNo of 1 and an AckNo of 2. The host is supposed to inverse this, and send an LLDP frame with a Control Sub-TLV with a SeqNo of 2 and an AckNo of 1. Refer to the Packet Captures section of this article for more details.

The switch expects this exchange from the host every 30 seconds. If the switch does not see this exchange for 100 Protocol Data Units (PDUs) , which is 3000 seconds or 50 minutes, the switch disables with this error."

Okay, so now I knew what the issue was, I started looking around for a way to disable LLDP within UCSM and came up empty. I Googled some more and wasn't able to find anything definitive on how to actually do it. So I finally admitted defeat and opened a Cisco TAC case - and the engineer very quickly responded that the reason I couldn't find a way to disable LLDP on the FIs is because the capability was not exposed via UCSM or the CLI and would have to be done via debug plugin (dplug). He also linked me to an enhancement request that would enable customers to enable/disable LLDP (you'll need a Cisco account to read it):

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCue05053

A 15-minute call later the dplug was loaded and we verified that LLDP was indeed disabled - further monitoring confirmed that disabling LLDP had resolved the flapping portchannels.

An alternative option would be to disable LLDP on the upstream switches instead of the UCS, but I elected to make the configuration change on the UCS side to keep our switches' LLDP configuration standardized.

Hopefully this will help someone else out there having the same issue - thanks for reading!

Comments

Popular posts from this blog

How To: Unjoin NetApp Nodes from a Cluster

Let me paint you a word picture:

You've upgraded to a shiny new AFF - it's all racked, stacked, cabled and ready to rock. You've moved your volumes onto the new storage and your workloads are performing beautifully (of course) and it's time to put your old NetApp gear out to pasture.

We're going to learn how to unjoin nodes from an existing cluster. But wait! There are several prerequisites that must be met before the actual cluster unjoin can be done.


Ensure that you have either moved volumes to your new aggregates or offlined and deleted any unused volumes.Offline and delete aggregates from old nodes.Re-home data LIFs or disable/delete if they are not in use.Disable and delete intercluster LIFs for the old nodes (and remove them from any Cluster Peering relationships)Remove the old node's ports from any Broadcast Domains or Failover Groups that they may be a member of.Move epsilon to one of the new nodes (let's assume nodes 3 and 4 are the new nodes, in th…

NetApp ONTAP 9.3 Simulator Deployment - Part 1

I am going to be doing a few of these simulator/lab posts in an effort to set up an environment that will pave the way for future guides and blog posts. Hopefully it'll also be a good resource for folks that want to set up their own labs to test out new features and software versions. Today I'm going to show the steps required to deploy Netapp's ONTAP Simulator 9.3 on vSphere 6.5.  I'll also be doing a follow-up article that will detail the process of clustering a second node with this first one.

Note: My lab has vCenter 6.5 deployed along with a Distributed vSwitch, so the steps will be specific to that deployment. I will also assume that you already have basic networking and storage for your virtual machines in place.

Step 1: Deploying the Simulator

1. Browse out to https://mysupport.netapp.com, click on "Sign In" in the upper right-hand corner and log in using your NetApp account credentials.

2. Click on the Downloads drop-down at the top of the screen and c…

Cisco UCS Platform Emulator Installation

To continue my series of posts on building the framework for a functional lab environment, I'd like to talk about the Cisco UCS Platform Emulator (UCSPE). It is a software appliance packaged as a vSphere OVA that approximates a UCS deployment, including the networking components (a pair of switches called the Fabric Interconnects) and both blade and rackmount UCS servers (B- and C-Series, respectively). It can be a great tool for learning and becoming more familiar with the UCS platform. I will be deploying my UCSPE on vSphere 6.7 in my lab, but it should work similarly in other recent versions.

1. Start by downloading the UCS Platform Emulator OVA from https://communities.cisco.com/docs/DOC-71877 - you will need a Cisco Connection Online (CCO) login in order to begin the download. I am using version 3.1(2ePE1) of the emulator for this guide as that appeared to be the latest version available at the time of writing. Side note, I also noticed during the boot process that this versi…