Thursday, February 3, 2011

Part #2 - Problem when upgrading from ESX 3.5 to ESXi 4.0 or 4.1 – Why are my RDM LUNs showing up as “0.00 B” for Capacity?

To follow up on a post I wrote back in October 14, 2010 which was titled:

Problem when upgrading from ESX 3.5 to ESXi 4.0 or 4.1 – Why are my RDM LUNs showing up as “0.00 B” for Capacity?

http://terenceluk.blogspot.com/2010/10/problem-when-upgrading-from-esx-35-to.html

As well as a few other posts related to this issue:

Experiencing slow boot up times during an update from ESX 3.5 to ESX/ESXi 4.0 or 4.1? Check your RDMs.

http://terenceluk.blogspot.com/2010/10/slow-boot-up-times-on-esxi-41-do-you.html

VMware ESXi 4.1 slow boot up seemingly stuck at: vmw_vaaip_netapp, vmkibft, vmfs3 modules / processes

http://terenceluk.blogspot.com/2011/01/vmware-esxi-41-slow-boot-up-seemingly.html

VMware ESXi 4.0 Update 2 slow boot up seemingly stuck at: vmw_psp_mru, multiextent, dvfilter, vmfs3, nfsclient modules / processes

http://terenceluk.blogspot.com/2011/01/vmware-esxi-40-update-2-slow-boot-up.html

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

I finally got a small window last Sunday to do a few tests with both ESXi 4.0 Update 2 and ESXi 4.1 with the problematic RDMs.  As noted in the first post I included at the top, the summary with the problem, reason and solution are as follows:

Summary

Problem: When you perform a rescan operation on a new ESX host’s storage adapter that has been given access to RDMs that are being used on other ESX hosts for MSCS/SQL Clusters, the capacity column displays zero bytes.

Reason: The reason is because MSCS/SQL clusters persistently has SCSI reservations on the LUN and therefore causes the newly added host to get a permission denied.

Solution: Simply shutdown the MSCS/SQL cluster virtual machines that access these LUNs and reboot the new host. Once the host is rebooted, the LUNs’ capacity will show up correctly.

… so what i basically did during the small window I had on Monday during the earlier morning hours was to:

  1. Shutdown the MSCS SQL Cluster active node on ESX01 (version 3.5)
  2. Shutdown the MSCS SQL Cluster passive node on ESX02 (version 3.5)
  3. Remove MSCS SQL Cluster active node on ESX01 and re-inventory the virtual machine on ESX04 (version 4.0 Update 2)
  4. Boot up MSCS SQL Cluster active node.
  5. Check RDM mappings.

What I found was that within the first 5 minutes after booting up the MSCS SQL Cluster active node, the capacity within VI Client was still labeled as “0.00 B”:

image

However, what I noticed was that as I completed checking to make sure that the MSCS SQL cluster resources came up properly and that I was able to access the shared RDM drives:

image image

image

… the capacity for the RDMs eventually refreshed and I was then able to see the capacity:

image

Note that I had not executed a rescan operation.  The capacity was refreshed automatically with the correct size of the LUNs displayed.

The next test was to do the same as I wrote above but with ESXi 4.1 and the results were exactly the same.  The cluster resources started properly and the shared RDMs showed up properly for the virtual machine:

image

As with ESXi 4.0, a rescan operation was not necessary and the RDM LUNs’ capacity automatically refreshed:

image

Unfortunately, I ran out of time as the window was extremely short so I did not get a chance to try rebooting the hosts and see if the capacity of the LUNs would revert back to “0.00 B”.  We’ll be physically moving these servers to another datacenter soon and I will definitely test that out but for all those who may run into the same situation as myself with the same setup, what I can tell you is that this setup will also run on 4.0 Update 2 and 4.1 to the degree that I was able to test and prove.

Stay tuned for more test results when I get the chance to do it.

No comments: