Pages

Showing posts with label UCS. Show all posts
Showing posts with label UCS. Show all posts

Wednesday, June 10, 2015

Recovering Cisco UCS Fabric Interconnect from the loader prompt

Problem

I recently had an issue with a Cisco UCS 6120 fabric interconnect we received from RMA that would no longer boot properly and simply presents the loader prompt no matter how many times you restart it:

image

Hitting the question mark ? would display the following available commands:

  • dir
  • reboot
  • serial
  • show
  • boot
  • help
  • resetcmos
  • set

image

Executing the dir command would display the following files:

image

A bit of researching on Google has blogs and forum posts recommending to simply execute the boot command along with the kickstart file as such:

boot ucs-6100-k9-kickstart.4.1.3.N2.1.1l.bin

imageimage

The boot process eventually brings you to the switch(boot)# prompt:

image

From here, some blog posts indicates that you can use the erase configuration command to erase the configuration on the fabric interconnect and start fresh but the command does not work as suggested:

erase configuration

% invalid command detected at ‘^’ marker.

image

It’s no surprise because executing the question mark ? command brings up the following available commands in this context:

  • clear
  • config
  • copy
  • delete
  • dir
  • exit
  • find
  • format
  • init
  • load
  • mkdir
  • move
  • no
  • pwd
  • rmdir
  • show
  • sleep
  • tail
  • terminal

image

It is possible to assign an IP address under this switch(boot) prompt as such:

config t

interface mgmt 0

ip address <ipAddress> <subnetMask>

no shut

exit

ip default <defaultGateway>

exit

image

While you can ping the interface by assigning an IP, you won’t be able to browse to it via http or https:

image

Solution

The way to properly boot the fabric interconnect from the loader prompt is to restart the fabric interconnect:

image

Boot the fabric interconnect with the kickstart and system bin files as such:

boot ucs-6100-k9-kickstart.4.1.3.N2.1.11.bin ucs-7100-k9-system.4.1.3.N2.1.1l.bin

imageimage

imageimage

imageimage

image

Once the boot process has completed, the IP address assigned earlier should now respond to pings:

image

… and you should be able to browse to the web page:

image

From here, you can use the prompt to use the console prompt to execute connect local-mgmt:

image

… and then execute a erase configuration to remove the config:

image

Wednesday, September 18, 2013

Cisco UCS Server Configuration Utility hangs at “Initializing the kernel…” process on a Cisco UCS C220 M3 server

Problem

You’ve downloaded the Cisco UCS Server Configuration Utility to perform a Windows Server install on a new Cisco UCS C220 M3 server but noticed that it hangs at the:

Initializing the kernel…

image

… for a long time and never continues ending with a black screen.

Solution

I’m not sure if this is common across all Cisco UCS C servers but the cause of this issue at one of my clients was that he was using a USB Lenovo DVD-ROM drive for the install.  After trying several older version of the Cisco UCS Server Configuration Utility without any luck, I went ahead and connected through the CIMC and used the KVM console Virtual Media tab to mount the ISO and noticed that the problem went away.

Not sure if it’s the DVD-ROM drive because I didn’t have any other DVD-ROM drive available to test but I hope this post will save another person a bit of time.

Unable to install Windows Server 2008 R2 onto a Cisco UCS C Series C220 M3 server with Microsoft media

As easy as it may seem to Cisco UCS administrators, I’ve found that people who are new to UCS typically run into bare metal Windows Server 2008 R2 installs on to Cisco UCS C Series servers.  I’m writing this blog post because I received a call yesterday from a client who purchased 4 new Cisco UCS C220 M3 servers, took a Windows Server 2008 R2 media and ran into the following issue where the RAID driver isn’t present in the media and therefore prompted the message:

Select the driver to be installed.

A required CD/DVD drive device driver is missing. If you have a driver floppy disk, CD, DVD, or USB flash drive, please insert it now.

Note: If the Windows installation media is in the CD/DVD drive, you can safely remove it for this step.

image

What the client did was go ahead to download the 3.5GB ISO package containing the UCS drivers and try to load it at this prompt.

UCS administrators would know this is not the right approach so my first response to them was to download the Unified Computing System (UCS) Server Configuration Utility to perform the install:

image 

The example I gave them for what this does was that it’s the same as HP SmartStart for HP servers.

I was curious as to whether they did any searches and was told they did but couldn’t find anything so I hope this post will help anyone in the future who may come across this problem.  The following is an old post I wrote that demonstrates what the Unified Computing System (UCS) Server Configuration Utility process looks like:

Installing Windows on a Cisco UCS C Series server with Cisco UCS Server Configuration Utility
http://terenceluk.blogspot.com/2011/07/installing-windows-on-cisco-ucs-c.html

Wednesday, July 10, 2013

Cisco UCS Manager reports the error: “VLAN default is error-misconfigured because of conflicting vlan-id with an fcoe-vlan”

Problem

You’ve recently updated your UCS infrastructure’s firmware to 2.0 or higher and noticed the following errors reported in the UCS Manager:

VLAN default is error-misconfigured because of conflicting vlan-id with an fcoe-vlan

image

Description: VLAN default is error-misconfigured because of conflicting vlan-id with an fcoe-vlan

ID: 10637116

Cause vlan-misconfigured

Code: F0833

image

Solution

The reason why this error is being reported is because Cisco no longer allows overlapping VLAN IDs for LAN and FCoE.  This usually isn’t a problem if the UCS firmware began with 2.0 or higher as the FCoE storage port native VLAN uses VLAN 4048 by default but if you’re upgrading from an earlier firmware, the default will most likely be set to 1 which overlaps with the LAN default VLAN as shown here:

image

As shown in the following documentation for firmware 2.0:

http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/gui/config/guide/2.0/b_UCSM_GUI_Configuration_Guide_2_0_chapter_010110.html#task_BECC98E803CB4DE39D256F525C556D89

… you must change the FCoE VLAN ID to a different value that is unique within the UCS infrastructure.  

**Note that changing the FCoE VLAN ID may cause a temporary outage of traffic on the SAN (until the VLAN re-converges) so schedule this small change after hours.

image

image

image

image

Note that the error immediately goes away once the overlapping FCoE VLAN has been corrected.

image

Tuesday, July 9, 2013

Logging onto Cisco UCS Manager throws the error: “Login Error: java.net.SocketTimeoutException: Read timed out”

Problem

You attempt to log into the Cisco UCS Manager via the VIP of your clustered 6100 series Fabric Interconnects but receive the following error:

Logging onto Cisco UCS Manager throws the error: “Login Error: java.net.SocketTimeoutException: Read timed out”

image

Solution

I’ve come across this several times in the past as well as received quite a few calls over the past months so I thought I’d write a post about this in case anyone is searching this on the internet.

One of the reasons why this error would be presented while you log into the UCS Manager is if there is a switchover in progress between the 2 clustered 6100 series fabric interconnects.  To determine whether this is the case, you can either console or SSH into the fabric interconnect and execute the following command:

show cluster state

image

Note how in the above screenshot that both of the fabric interconnects has the status of:

Management services: SWITCHOVER IN PROGRESS

In the event that both fabrics are stuck in this state for a long time, one of the ways to fix this is to actually reboot both fabrics one after another giving enough time in between (say 5 minutes) so that the first fabric that you reboot becomes the primary fabric.

Sunday, October 2, 2011

Updating Cisco UCS B Series Infrastructure from firmware 1.4 to 2.0

I’ve been keeping a close eye on when Cisco was going to release their UCS B Series infrastructure 2.0 firmware ever since VMware released vSphere 5 as many customers wanted to know whether they could begin testing how stable the new hypervisor would run in their existing infrastructure.  Needless to say, I was extremely excited when Cisco released the firmware a little more than a week ago.  With that being said, I didn’t have any time to try out the new firmware until mid last week when I was providing UCS training a customer.  Since it was my first time upgrading the firmware and I was explaining the process step-by-step, I took the opportunity to screenshot every step so I could write this blog.  What I noticed during the upgrade was how much more content Cisco included to show how to validate and test the firmware for administrators who are upgrading their production infrastructure.  While the process was very much the same as the 1.x version with some minor differences in ordering, I have to say Cisco did a great job with the extra content they included.

I’m going to by pass the overview, downloading and prerequisites section of the upgrade guide but if you’re performing this upgrade to your production environment, please go through those sections and the release notes so you can be aware of any issues that may pertain to your environment.

Begin by ensuring that you have the correct firmware packages uploaded to your 6100 series fabric interconnects:

image

Updating the Firmware on the Adapters, CIMCs, and IOMs

From within UCS Manager, navigate to the Equipment tab, click on the Equipment node, Firmware Management tab, Installed Firmware sub-tab and finally the Upgrade Firmware button:

image

From within the Update Firmware Window, select ALL for the Filter list and 2.0(1m) for the Set Version list:

image

Proceed with clicking on the Apply button and you’ll notice the Update Status field on the right change from ready to updating:

image

Click on the OK button to close the Update Firmware window and continue monitoring the status from within the Installed Firmware sub-tab window:

image

Continue waiting until the status for all of the components is listed as ready:

image

Activating the Firmware on the Adapters

Now that the firmware is updated, we can proceed with activating the firmware on all of the components and the first set of components to activate are the adapters.  Proceed with clicking on the Activate Firmware button from the Equipment tab, Equipment node, Firmware Management tab, Installed Firmware sub-tab:

image

From within the Activate Firmware Window, select Adapters for the Filter list, 2.0(1m) for the Set Version list, check the Ignore Compatibility Check and Startup Version Only check box:

image

Continue by clicking the Apply button and you’ll see the status of the activation in the Activate Status column:

image

Click on the OK button to close the Activate Firmware window and continue monitoring the status from within the Installed Firmware sub-tab window until you see the status for all of the adapters being listed as pending-next-boot:

image

Activating the Firmware on the CIMCs

Now that the firmware for the adapters have been activated, proceed with activating the firmware for the CIMCs by opening up the Activate Firmware window and select CIMC for the Filter list, 2.0(1m) for the Set Version list, check the Ignore Compatibility Check and Startup Version Only check box:

image

Continue by clicking the Apply button and you’ll see the status of the activation in the Activate Status column:

image

Click on the OK button to close the Activate Firmware window and continue monitoring the status from within the Installed Firmware sub-tab window until you see the status for all of the CIMCs being listed as ready:

image

Activating the Board Controller Firmware on a Server

Unfortunately, I didn’t have any B440 blades in the environment we had in our demo infrastructure so I won’t be able to provide screenshots for it.  Please refer to the manual for more details:

image

Activating the Cisco UCS Manager Software to Release 2.0

Proceed with activating the firmware for the UCS Manager by opening up the Activate Firmware window and select UCS Manager for the Filter list, 2.0(1m) for the Set Version list, check the Ignore Compatibility Check check box:

image

You’ll be prompted with a warning message as soon as you click on the Apply button:

image

Once you’ve answered Yes to the warning message, you’ll be brought back to the Activate Firmware window and the Activate Status will now read scheduled:

image

Click on the OK button to close the Activate Firmware window and continue waiting till you lose your connection to UCS Manager:

image

image

image

Give the 6100 series fabric interconnects to activate the UCS Manager and within a minute or two, you’ll be able to reconnect again:

image

image

Once you’re back into UCS Manager, you should now see that the UCS Manager has 2.0(1m) listed as the Running Version:

image

Activating the Firmware on the IOMs

Proceed with activating the firmware for the IOMs by opening up the Activate Firmware window and select IOM for the Filter list, 2.0(1m) for the Set Version list, check the Ignore Compatibility Check and Startup Version Only check box:

image

Continue by clicking the Apply button and you’ll see the status of the activation in the Activate Status column:

image

Click on the OK button to close the Activate Firmware window and continue monitoring the status from within the Installed Firmware sub-tab window until you see the status for all of the IOMs being listed as Pending Next Boot.

Activating the Fabric Interconnect Firmware for a Cluster Configuration

Activating the firmware on the fabric interconnects requires a specific order so the first step is to identify which fabric interconnect is the primary and subordinate.  This can be easily done by expanding the Fabric Interconnects node under the Equipment tab’s Equipment node:

image

Once you have identified which fabric interconnect is primary and subordinate, we can proceed with activating the subordinate fabric. 

Note that in a VMware ESXi environment, depending on which path the virtual machines are binded to, some of them may experience a slight blip in network disruption when you activate the fabric interconnect.  The disruption will be minimal as they will fail over to the other fabric.  With that being said, if your environment has sensitive applications that are dependent on other servers, disruptions in services can occur so contact the application owners to determine what the best approach for the firmware upgrade because there will be scenarios where it is best to simply shutdown the servers during the upgrade.

Proceed with ensuring that HA is operating correctly for both the primary and subordinate fabric interconnects:

image image

Once you’ve verified that HA is operating correctly, proceed with opening the Activate Firmware window, set the Startup Version for the subordinate fabric interconnect to 5.0(3)N2(2.1m) and check the Ignore Compatibility Check check box:

image

Proceed by clicking the Apply button and you’ll see the Activate Status column update to Activating status:

image

Continue by clicking the OK button to close the Activate Firmware window.  From here, navigate to the subordinate fabric interconnect node, open the General tab then scroll down to the Update Status section:

image

From here, you will be able to monitor the status of the activation:

image

image

As with the earlier versions of 1.x, the subordinate fabric interconnect will light up like a christmas tree changing from red, orange, yellow and back to the various available colours with the status’:

Ready: No

State: Down

Failure Reason: Node Down

Leadership: Inapplicable

Cluster Link State: Full

This is all normal so just for the activation to complete:

image

Keep monitoring the High Availability Details section and you will eventually see the status’ revert back to normal:

Ready: Yes

State: Up

Leadership: Subordinate

Cluster Link State: Full

image

Navigating back to the Installed Firmware tab should now indicate that the firmware version for the Kernel and System is 2.0(1m):

image

With the subordinate fabric interconnect activated with the new 2.0(1m) firmware, continue with activating the primary fabric interconnect by opening the Activate Firmware window, set the Startup Version for the primary fabric interconnect to 5.0(3)N2(2.1m) and check the Ignore Compatibility Check check box:

image

Clicking the apply button will prompt you with a warning message indicating that the fabric interconnect will reboot and thus kicking you off of UCS Manager:

image

Once you have answered Yes to the warning message, you will be returned to the Activate Firmware window where you’ll notice that the Activate Status for the primary firmware is now listed as Activating:

image

Continue by clicking the OK button to close the Activate Firmware window.  From here, navigate to the primary fabric interconnect node, open the General tab then scroll down to the Update Status section:

image

From here, you will be able to monitor the status of the activation:

image

The activation process of the primary fabric interconnect will eventually reboot the 6120/6140 so you’ll eventually get kicked out:

image

You should be able to immediately log back into UCS Manager because the subordinate is supposed to become the primary fabric interconnect:

image

As with the activation of the subordinate fabric interconnect, the primary will light up like a christmas tree changing from red, orange, yellow and back to the various available colours with the status’:

Ready: No

State: Down

Failure Reason: Node Down

Leadership: Inapplicable

Cluster Link State: Full

This is all normal so just for the activation to complete:

image

When you see that the High Availability Details listed as:

Ready: No

State: Up

Failure Reason: Chassis Configuration Incomplete

Leadership: Subordinate

Cluster Link State: Full

… you can expect the clustered fabric interconnect to be fully operational soon:

image

Keep monitoring the High Availability Details section and you will eventually see the status’ revert back to normal:

Ready: Yes

State: Up

Leadership: Subordinate

Cluster Link State: Full

image

Updating a Management and Host Firmware Package

It’s important that once you have the core components of the UCS B series infrastructure upgraded, you will also need to update the reset of the components on the blade servers.  I find that many administrators tend to miss this and what’s nice about the newer firmware versions of 1.4 is that you will actually get warnings about the firmware versions for the blades are old (older firmware never warned you).

Updating a Management Firmware Package

To update the management of the components on the blades, navigate to the Servers tab, Servers node, Policies then right click on the Management Firmware Packages node and select Create Management Firmware Package:

image

Select the packages for the blades you have in the environment, choose the firmware version and provide a name for this package:

image

Once you’ve completed your selection, click the OK button and your firmware package will be created:

image

With the new Management Firmware Package created, you have the option of immediately assigning it to your service profile or service profile templates.  I prefer to complete the creation of the Host Firmware Packages so I don’t need to go back to the same tab to assign the package again.

Updating a Host Firmware Package

To update the other hardware components on the blades, navigate to the Servers tab, Servers node, Policies then right click on the Host Firmware Packages node and select Create Host Firmware Package:

image

Select the packages for the blades you have in the environment, choose the firmware version and provide a name for this package:

image

Make sure you also navigate to the other tabs listed in the window (i.e. Adapter, BIOS, Board Controller, etc):

image

Once you’ve completed your selection, click the OK button and your firmware package will be created.

image

With the new packages created, you can now navigate to the Policies tab of the service profile or service profile templates and assign them:

image

If you were assigning the policies to an actual service profile, you’ll be warned that the operation will cause the blade to reboot:

image

If you answer yes, your blade will reboot and then go through the updating process:

image

image

It’s important that you navigate to the Installed Firmware tab to ensure that the components have updated:

image

… and that’s it.  The process is very much the same as 1.x  version and while I understand that the manual is extremely straight forward with what needs to be done, I hope this post serves to provide some visual aids through the use of screenshots to demonstrate what the process looks like.