I was asked while ago to look at why a new Cisco UCS B-Series Infrastructure environment was lighting up like a Christmas tree. Eager to look at what the reason why, I logged on and long behold this is what I saw:
As shown in the screenshot above, it’s obvious that all of the IO modules listed indeed have errors but what may not be as obvious through looking at it initially is that all of the chassis either had their first or second IOMs on the 5108 chassis missing. Have another look again at the screenshot and you’ll notice that none of the 2 IOMs.
Before I proceeded to look at why only 1 IOM was showing up for each chassis, I went ahead to try and fix the first one by re-acknowledging the chassis 1:
Once the discovery process completed, the errors cleared for IOM 1 on the chassis:
What I also realized was when the re-acknowledging of the chassis completed, the missing IOM began showing up but all were highlighted with a major fault. A bit more browsing around revealed the following on the 6120s:
Fabric A
Fabric B
I know it’s probably not easy to notice what I’m seeing because you don’t know the connections so let me begin by explaining that the way the IOMs were plugged into the Fabrics was:
Port 1 and 2 on Fabric A –> IOM 1 Chassis 1
Port 3 and 4 on Fabric A –> IOM 1 Chassis 1
…
… and …
Port 1 and 2 on Fabric B –> IOM 1 Chassis 2
Port 3 and 4 on Fabric B –> IOM 1 Chassis 2
By reviewing ports on the fabric interconnects, the pattern I saw was that the ports with faults were the ones that were connected to the IOMs that were previously missing on the chassis’ listed in UCS Manager. Now that I knew there was a problem with the connections, I went ahead to dig into the events of the missing IOMs that showed up after a re-acknowledge operation on the chassis and saw the following error:
Configuration State: unsupported-connectivity
Overall Status: fabric-unsupported-conn
Clicking on the fault showed the following details of the error:
Affected object: sys/chassis-1/slot-2
Description: IOM 1/2 (B) current connectivity does not match discovery policy: unsupported-onnectivity
ID: 415387
Cause: unsupported-connectivity-configuration
Code: F0401
Original severity: major
Previous severity: major
I had a hunch immediately after I read the error and that was to check the Chassis Discovery Policy and just as suspected, it was set to 2-link but I knew the environment was short of cables (it’s a new setup) and that the 2nd cable was not plugged in:
Once I changed the Chassis Discovery Policy to 1-link:
… then initiated another re-acknowledge of the chassis:
… the error went away:
So to summarize the problem, this was simply a case of specifying a Chassis Discovery Policy to a value more than the amount of links currently connected. There has actually been a debate on what the best practice is and I have done some tests myself but will write another blog post on it when I have more time.
2 comments:
Hi Terence,
Nice write up!
When you do a chassis acknowledgement does the chassis experience any outages in regards to traffic flow between the IOM's and FI's?
I only ask as I have a chassis in production and need to re acknowledge it due to some fabric ports being in an un initialized state.
Cheers,
Trev
Trev,
You probably figured this out already but the chassis does experience a network outage when acknowledged. I would recommend either an outage window to acknowledge or if you are running all ESX/ESXi servers then VMotion all virtual machines to a different chassis.
Bryan
Post a Comment