Tuesday, May 10, 2011

Cisco UCS B-Series Infrastructure IOM (FEX) Modules throw the major error: “IOM 1/x (a or b) current connectivity does not match discovery policy: unsupported-connectivity”

I was asked while ago to look at why a new Cisco UCS B-Series Infrastructure environment was lighting up like a Christmas tree.  Eager to look at what the reason why, I logged on and long behold this is what I saw:

image

As shown in the screenshot above, it’s obvious that all of the IO modules listed indeed have errors but what may not be as obvious through looking at it initially is that all of the chassis either had their first or second IOMs on the 5108 chassis missing.  Have another look again at the screenshot and you’ll notice that none of the 2 IOMs.

Before I proceeded to look at why only 1 IOM was showing up for each chassis, I went ahead to try and fix the first one by re-acknowledging the chassis 1:

image

Once the discovery process completed, the errors cleared for IOM 1 on the chassis:

image

What I also realized was when the re-acknowledging of the chassis completed, the missing IOM began showing up but all were highlighted with a major fault.  A bit more browsing around revealed the following on the 6120s:

Fabric A

image

Fabric B

image

I know it’s probably not easy to notice what I’m seeing because you don’t know the connections so let me begin by explaining that the way the IOMs were plugged into the Fabrics was:

Port 1 and 2 on Fabric A –> IOM 1 Chassis 1

Port 3 and 4 on Fabric A –> IOM 1 Chassis 1

… and …

Port 1 and 2 on Fabric B –> IOM 1 Chassis 2

Port 3 and 4 on Fabric B –> IOM 1 Chassis 2

 

By reviewing ports on the fabric interconnects, the pattern I saw was that the ports with faults were the ones that were connected to the IOMs that were previously missing on the chassis’ listed in UCS Manager.  Now that I knew there was a problem with the connections, I went ahead to dig into the events of the missing IOMs that  showed up after a re-acknowledge operation on the chassis and saw the following error:

Configuration State: unsupported-connectivity

image

Overall Status: fabric-unsupported-conn

image

Clicking on the fault showed the following details of the error:

Affected object: sys/chassis-1/slot-2

Description: IOM 1/2 (B) current connectivity does not match discovery policy: unsupported-onnectivity

ID: 415387

Cause: unsupported-connectivity-configuration

Code: F0401

Original severity: major

Previous severity: major

image

I had a hunch immediately after I read the error and that was to check the Chassis Discovery Policy and just as suspected, it was set to 2-link but I knew the environment was short of cables (it’s a new setup) and that the 2nd cable was not plugged in:

image

Once I changed the Chassis Discovery Policy to 1-link:

image

… then initiated another re-acknowledge of the chassis:

image

… the error went away:

image

So to summarize the problem, this was simply a case of specifying a Chassis Discovery Policy to a value more than the amount of links currently connected.  There has actually been a debate on what the best practice is and I have done some tests myself but will write another blog post on it when I have more time.

2 comments:

Anonymous said...

Hi Terence,

Nice write up!

When you do a chassis acknowledgement does the chassis experience any outages in regards to traffic flow between the IOM's and FI's?

I only ask as I have a chassis in production and need to re acknowledge it due to some fabric ports being in an un initialized state.

Cheers,
Trev

Anonymous said...

Trev,

You probably figured this out already but the chassis does experience a network outage when acknowledged. I would recommend either an outage window to acknowledge or if you are running all ESX/ESXi servers then VMotion all virtual machines to a different chassis.

Bryan