Monday, January 24, 2011

Loss of network connectivity when tagging ESX/ESXi port group with a VLAN on a Cisco UCS B Series infrastructure

During the first few Cisco UCS B Series deployments I was a part of, there was always this problem that I would run into that I knew what I needed to do to get around it but never really found out the reason why.  Fast forward to almost 2 years later and I still don’t know the reason.  Since I’ve been working with some network engineers on a datacenter project right now, I went ahead to ask them and while I did get some answers, we’re all still not exactly sure if that was the case.  Despite the uncertainty, I’m going to go ahead and write this post in case someone comes across this problem and needs to know how to get around it.  I will update this post when I figure out the reason why it needs to be configured this way.

Problem

You have just completed a Cisco UCS B series infrastructure setup with VLANs defined for your service profile.

image

For your service profile, you’ve set the native VLAN as (for the purpose of this example) your ESX/ESXi’s management VLAN (in this example, VLAN 105):

image

You proceed with the configuration of your ESX/ESXi server and configure the IP address and then tag that port group with VLAN 105:

image

What you’ve noticed is that with this configuration, you can no longer ping your gateway and other hosts in the same subnet:

image

Solution

The reason why you’ve lost connectivity is because you have the native VLAN for this server’s service profile to the VLAN of your management network as well as tagged your management network with this VLAN:

image image

What you need to do is either:

1. Untag your ESX/ESXi management network with the VLAN by setting it back to 0:

image

OR

2. Change the default VLAN from withing UCS Manager:

image

Note that it is actually best practice from a network perspective that the native VLAN for any environment should be a VLAN that is not in use.  This would help in preventing malicious attempts to hack into your network because if, say, I went ahead to plug into a switch, I would end up on a VLAN that isn’t used.  For security reasons such as this, we always get a list of VLANs in use from the client and then settle on a VLAN that is not in use for the default VLAN.

I’m still curious as to why this is the case and you can probably tell by now that I’m not much of a network guy (my CCNA expired 2 years ago).  I understand enough to get the deployment done but need the help of network engineers to configure the northbound 3750 connections.  One of the initial answers I received was something about the MTU size being 1504 instead of 1500 when a tag is added but I can’t validate that so please feel free to comment on this post if you know why.  Thanks.

3 comments:

franjimecsco said...

Yes, once you've defined the management VLANas being the Native VLAN, then setting the VLAN to 105 (or any VLAN, really) on the ESX host port group for management leads to a misconfiguration.

The reason for this is that once the management VLAN is set to Native VLAN, then only untagged frames will be passed to the management VLAN.

Since you've now set the tag on the frames coming from the ESX management port group as VLAN 105, they won't be seen on the management VLAN (which is looking for untagged frames only now).

You've got the solution exactly correct, btw. - either have the ESX
host send only untagged frames (which are then picked up and sent to the management VLAN), or explicitly set VLAN 105 to be management (and set some other VLAN to be assigned to untagged frames as the Native VLAN).

Terence Luk said...

Thank you for your detail reply Frank! I've also sent you a thank you note via LinkedIn.

Richie said...

Worked a treat. Many thanks