Tuesday, January 11, 2011

Ops, I deleted all of my active / running Cisco UCS B Series Servers’ Service Profiles! What do I do now?

I took a bit of time to think about what title I should give this post and while I’m sure no one is going to Google/Bing “Ops”, I figured someone might type in “deleted service profiles” and may end up finding this post.  So without further delay, let’s dive into the problem and resolution.

Problem

In case you’re wondering, no, I didn’t delete any active / running Cisco UCS B Series servers’ Service Profiles.  I was, however, asked to look into a problem where I eventually found that someone else did.  What was interesting about the troubleshooting process was that I was told the following:

“There’s a problem with all of the ESX and ESXi hosts on the blades.”

“It looks like the network is down.”

“The only server that didn’t have a problem was the Windows 2008 server.”

Reviewing the warnings and faults I managed to find the logs didn’t really tell me that the service profile associated to the blade was no longer there and the following was essentially what I saw when my colleague asked me to look at it:

image

Ok, so there are errors and faults with the components and mismatching firmware versions. 

Let’s try reverting / updating the firmware to match the other components.  Result: problem still exists.

Let’s try resetting the connectivity of the NICs:

image

Result: problem still exists.

Ok, let’s try shutting down the server and re-acknowledging it:

image

Result: problem still exists.

What sort of caught my eye during the Re-acknowledge process was this:

image

Notice how the Service Profile field is blank?

Interesting to see that if I left the blade shut down, I see the following:

image

The Service Profile field’s blank!  What’s going on?

This was when I realized someone had deleted 3 service profiles that belonged to blades hosting ESX and ESXi.  The service profile for the Windows Server was still there and hence was humming along.

This was when I recreated a new service profile that matched the old deleted one and tried to assign it.

image

Not good because as you can see, it failed with:

Description: Service profile <profilename> configuration failed due to compute-unavailable,insufficient-resources

ID: 6241031

Cause: configuration-failure

Code: F0327

Original severity: major

Definitely not good.  Can I still boot up this server with an empty service profile?  Hitting the Reset option I get the following:

image image

Error performing reset

image

Error performing reset

Parsing Exception at line : 1 invalid XML character: 23

image

Further deeper digging into the minor faults show:

Description: Assignment of service profile <profilename> to server sys/chassis-1/blade-4 failed

ID: 6241036

Cause: assignment-failed

Code: F0689

Original severity: minor

image

Ok, so now I’m stuck with a server that I shutdown and can longer boot up.  I’d have to say I started feeling a bit nervous even though I did ask which server I could test on that’s not critical and this was what I got.  Note that I was able to boot it up but because I’m writing this post 3 weeks later, I can’t remember how I did it.  What I can remember is that it wasn’t too difficult and it was just a small workaround with the controls available in UCS Manager.

Solution

Now that I finally understood what the problem was and knowing there probably wasn’t a Cisco document that tells you what to do, I sat down and thought about this logically.  The bright idea (or at least I thought so) I had was that since:

  • The server fails to associate with a new profile.
  • The server can still be booted up with the old profile that’s no longer there.
  • UCS Manager maintains information about blades that are added to the chassis.

Perhaps the only way to remove this ghost profile is to decommission the server?  I remember reading in the Cisco documentation that the decommission option allows you to remove a blade from the configuration.  Before I went ahead to ask for permission to do this, I went ahead and did a quick search to confirm what I remember:

http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/gui/config/guide/GUI_Config_Guide_chapter28.html#task_2280EA0F991A4BF7963357C5C20360EF

Decommissioning a Server

This procedure removes the server from the configuration. As long as the server physically remains in the Cisco UCS instance, Cisco UCS Manager considers the server to be decommissioned and ignores it.

image

Ok, now that I’ve confirmed it, I went ahead and asked the manager who owns these blades and have been getting a lot of heat to get it up and going.  The response was: “Just make it happen and get it back up!”

So I proceeded with this theory and went ahead to do the following:

Server Maintenance –> Decommission

image

image

I went ahead to make sure there were no errors during the decommissioning process then proceeded with choosing the Resolve Slot Issue to re-acknowledge the server thus putting it back into the database:

image

image

image

No errors so far:

image

Discovery has started:

image

Overall Status is now unassociated:

image

Let’s now try to associate this blade to a service profile:

image

Looking good:

image

image

Success!

image

From there on, I repeated the same procedure for the other 2 servers and got the blades back up.  Hope this helps anyone out there that may encounter this problem.

5 comments:

Anonymous said...

You could unbind the profile from the template - for the server which is associated with this profile.

Disassociate the Service Profile from the server. Now it is not assoicated and the server is freed up as well. Once the tasks are complete in FSM, associate the profile back to the appropriate server and then bind the profile back to the template you want to.

This is helpful, when you have vNICs with vlans and vHBA configured, so you don't have to do any zoning one your FC Switches side etc... hope it helps someone in similar situation...

Valerian Crasto said...

I have followed the said steps to remove the GHOST profile. but no luck. still i can see GHOST profiles. which is not visible in UCS central. any advice appreciated.

John said...

I encountered something similar too - a ghosted service profile. Luckily, I still have the choice to erase and start from scratch (my entire setup), or open a TAC case and see if they can assist me.

John said...

Follow, I did the same to a ghosted blank profile, which caused a similar mismatch (vif-mismatch). De-commisioning and resolving and re-ack fixed my issue!!!

Thank you!

Ryan Betts said...

another good one Terence, cheers.