During the implemention of a private cloud solution using vSphere and vCloud Director, there was the need to reinstall the ESXi hosts with a custom ISO. At that point the vSphere platform was already deployed, vShield Manager and vCloud Director were running and integrated with each other, and the VXLAN preparations (transport VLAN added, vmknics were deployed and active and the segment networks config) we’re already done. VXLAN Network deployment was working and the vShield Manager and vCloud Director were happy with the vSphere environment.
After the ESXi host reinstall, it got its Host Profile back and was added back to the cluster. It was at that moment that we were confronted with this error:
It was preparing the host for the VXLAN network, adding the virtual NIC to the host and distributed virtual switch, to allow the host to do VXLAN translations. The description of the error is really great and very helpful ™. Whenever I get this kind of error in an interface, I dive into the command line and see if the system logging helps a bit in describing the error. Here’s what I got:
2014-03-18T11:52:57.182Z cpu14:44483)WARNING: vxlan: VDL2PortPropSet:170: Failed to create VXLAN IP vmknic on port[0x500000e] of VDS[DvsPortset-0] : Failure
2014-03-18T11:52:57.182Z cpu14:44483)WARNING: NetDVS: 2006: failed to init client for data com.vmware.net.vxlan.vmknic on port 252
2014-03-18T11:52:57.182Z cpu14:44483)WARNING: NetPort: 1388: failed to enable port 0x500000e: Failure
So, the system logging did not help, again a very helpful description. After searching the VMware KB and Community forums a while, I found a post which explained that the preparation of ESXi hosts for VXLAN was moved from vShield Manager to vCloud Director. After looking a bit closer, it was indeed the service account for vCloud Director which was executing the “Add virtual NIC” task.
Now here’s the kicker. It looks like that vShield Manager does not allow other applications to add a VXLAN vmknic to an ESXi host. It wants to do that itself. Try adding a vmknic for VXLAN manually; you’ll get the same error. Which is fine, except for that vCloud Director gets a notification when this preparation is starting and picks up the task to do so. vShield Manager in turn sees another application trying to the the preparation and shuts it down.
What ended up to be the fix, was to temporarily shut down the vCloud Director service and start the preparation. This made vShield Manager handle the preparation itself, successfully. Start the vCloud Director service after doing this and everyone is happy again.
The main cause for this issue was that the ESXi host that was reinstalled, was not properly de-configured within vCloud Director. If you need to do something similar, make sure you “Unprepare” the host within vCloud Director.