One of the emails I sent out after completing my first firmware update:
Some other notes worth mentioning during the firmware update:
Updating Passive Fabric
As per the document with instructions on how to update the firmware step-by-step: http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/upgrading/from1.1.1/to1.2.1/UpgradingCiscoUCSFrom1.1.1To1.2.1_chapter4.html
While performing the following step:
Activating the Fabric Interconnect Firmware for a Cluster Configuration
Once I brought the firmware version from 1.1 to 1.2, Fabric B (passive) threw an IOM 1 error on Chassis 2. When navigating to the “High availability” status of Fabric B, the Ready value was No but the State was Up. The description of the problem was: chassis configuration incomplete. When I view the properties of IOM 1 on Chassis 2, the Faults tab indicates that the module was removed. I checked the status of the failed IOM and noticed all the servers were in the failed state. I confirmed that all the 4 servers in Chassis 2 were offline as I was not able to KVM or ping the service console IP of the 4 ESX servers.
The document basically states the following:
If the High Availability Status area for the fabric interconnect does not show the following values, contact Cisco Technical Support immediately. Do not continue to update the primary fabric interconnect.
I was a bit worried that I’ll have to call Cisco tonight but as it turns out, after 5 to 10 minutes or so, the missing IOM came back and the Ready field is now Yes on Fabric B.
Another note I’d like to make is that updating the fabric takes a lot of time. Don’t sit around in the Firmware Activation page watching the status as Activating because you can view a progress status with a % in the Fabric’s properties page.
Updating Active Fabric
5 minutes into activating the active fabric, I got kicked out of UCSM. I was able to reconnect to UCSM via the passive Fabric but upon connecting, Chassis 1 and 2 and Fabric A and B were all highlighted in red meaning there are faults. After waiting around for 5 minutes, they started turning orange and yellow indicating that they’re slowly getting back to better health. While the status of Chassis 1 and FI A was still yellow/orange, I tried to ping the service console of one of the ESX blades on Fabric A and was not able to get a reply. I got a reply when I pinged the blades on Fabric B though.
I guess it’s safe to say that as long as the active fabric is getting updated, the servers will be disrupted:
Once the activation was complete with the activating status as Ready:
… I experienced the same situation as I did with updating fabric B where UCSM would display an error indicating that IOM 1 on Chassis 1 was missing:
What was interesting this time even though it does make sense since Fabric A is the primary is that an additional IOM, IOM 2 on Chassis 2, is also indicated as failed/missing:
The status gradually switches from red to orange then to yellow and finally green. When update for fabric A was finally completed, I noticed that it was now the subordinate:
The whole exercise of updating the firmware took more than an hour and a half to complete so remember accommodate enough time to complete these updates in the future.