During the implemention of a private cloud solution using vSphere and vCloud Director, there was the need to reinstall the ESXi hosts with a custom ISO. At that point the vSphere platform was already deployed, vShield Manager and vCloud Director were running and integrated with each other, and the VXLAN preparations (transport VLAN added, vmknics were deployed and active and the segment networks config) we’re already done. VXLAN Network deployment was working and the vShield Manager and vCloud Director were happy with the vSphere environment.

 

After the ESXi host reinstall, it got its Host Profile back and was added back to the cluster. It was at that moment that we were confronted with this error:

 

add-virtual-nic-unknown-error

It was preparing the host for the VXLAN network, adding the virtual NIC to the host and distributed virtual switch, to allow the host to do VXLAN translations. The description of the error is really great and very helpful ™. Whenever I get this kind of error in an interface, I dive into the command line and see if the system logging helps a bit in describing the error. Here’s what I got:

 

2014-03-18T11:52:57.182Z cpu14:44483)WARNING: vxlan: VDL2PortPropSet:170: Failed to create VXLAN IP vmknic on port[0x500000e] of VDS[DvsPortset-0] : Failure
2014-03-18T11:52:57.182Z cpu14:44483)WARNING: NetDVS: 2006: failed to init client for data com.vmware.net.vxlan.vmknic on port 252
2014-03-18T11:52:57.182Z cpu14:44483)WARNING: NetPort: 1388: failed to enable port 0x500000e: Failure

 

So, the system logging did not help, again a very helpful description. After searching the VMware KB and Community forums a while, I found a post which explained that the preparation of ESXi hosts for VXLAN was moved from vShield Manager to vCloud Director. After looking a bit closer, it was indeed the service account for vCloud Director which was executing the “Add virtual NIC” task.

 

Now here’s the kicker. It looks like that vShield Manager does not allow other applications to add a VXLAN vmknic to an ESXi host. It wants to do that itself. Try adding a vmknic for VXLAN manually; you’ll get the same error. Which is fine, except for that vCloud Director gets a notification when this preparation is starting and picks up the task to do so. vShield Manager in turn sees another application trying to the the preparation and shuts it down. 

 

What ended up to be the fix, was to temporarily shut down the vCloud Director service and start the preparation. This made vShield Manager handle the preparation itself, successfully. Start the vCloud Director service after doing this and everyone is happy again.

 

The main cause for this issue was that the ESXi host that was reinstalled, was not properly de-configured within vCloud Director. If you need to do something similar, make sure you “Unprepare” the host within vCloud Director.



Share the wealth!

9 comments on “VMware VXLAN Host Preparation: Add virtual NIC: Unknown error

  • I’m having the same issue. When you say disable vcloud service you mean go into vcloud director and stop the cell service inside of Linux? Or the host? If on the host where is that done? If inside of the cell itself, that is a terrible fox if you’re a service provider. So let’s say you shut down the vcloud service, then how do you manually prepare the vxlan?

    • Hi Billy,

      Shutting down either should be fine, but it’s faster to just shut the vCD service on the Linux server down, then taking the entire server offline. Not sure what you mean with “that is a terrible fox if you’re a service provider” – would take a regular maintenance window. All that is lost is management for your customers.

      The preparation of the host is no different then with vCD running: vCenter -> Hosts & Clusters -> Datacenter X -> Network Virtualization -> Preparation.

  • I’ve had countless issues with VXLAN and worked hard with a number of people to try and overcome issues trying to get it set up. It seems that once it goes wrong, you have to take quite drastic corrective action. In my case I set up a fake cluster – pushed hosts out of the production cluster and in to the fake one, then reconfigured VDS and VXLAN there. It seems, to me at least, that this move between clusters is required to trigger the rescan needed by vSphere to force a reinstall of the agent and the associated vmk port. In my case, without that step, the port would never get commissioned.

    However!!! When I did this today, I witnessed the VXLAN agent get installed and then removed shortly after. I was left baffled but I stumbled upon this blog and decided to stop the vCloud Director service. After that, I moved the host back in to the production cluster where the nic was immediately commissioned and the VXLAN agent installed. It’s not vanished since I re-enabled the vCloud Director service.

    So something tells me that it’s the same issue as the one that you’ve faced. VCD and vShield (VCN) fighting over the process in the background.

    • Thank you for sharing that, it’s really interesting to hear that they can cross each other in multiple cases. 🙂

  • Just had something similar to this problem – the solution in the end was to re-connect vShield Manager to vCenter (in the settings edit the vCenter connection properties and put the password in, click OK).

    • Had the same issue when adding new hosts to the cluster. Reconnecting vShield to vCenter as indicated by Jonathan above solved the problem.

      “WARNING: vxlan: VDL2PortPropSet:170: Failed to create VXLAN IP vmknic on port[0x4000006] of VDS[DvsPortset-1] : Failure
      WARNING: NetDVS: 2017: failed to init client for data com.vmware.net.vxlan.vmknic on port 137”

      Thanks for sharing the solution!

    • Hi Andy,

      A reconnect basically does a resynchronisation between the vShield and vCenter. I have not seen it do anything; like removing a VXLAN portgroup or something, if that is what you’re afraid of.

Leave a Reply

Your email address will not be published. Required fields are marked *