This post is part of my VMware VCIX-NV Study Guide and covers some troubleshooting tips for common NSX connectivity issues.

Documentation

Index

 

Troubleshoot virtual machine connectivity to Logical Switches

I’m not really sure what VMware means with this subject. There are several things can can fall under this, like VXLAN connectivity preventing the virtual machine from going across the logical switch. We’ll cover this in other topics.

 

Troubleshoot dynamic routing protocols

The NSX Edge can use OSPF, BGP and IS-IS for dynamic routing between other network components (other Edges or physical devices). Below are some troubleshooting tips for dynamic routing:

Show active neighbors

Show installed dynamic routes

Show interfaces listening for neighbors

Show link state database for OSPF/ISIS

Detect Authentication Failure

Detect OSPF Area Misconfiguration

Debugging
In addition to doing all kinds of show commands to determine the status of ISIS, OSPF or BGP, you can also debug the protocols to get a lot more information about what the processes are doing in the background. To start the debugging process:

When enabled, the log will fill up with messages from the protocol. You should never let this running continuously, always disable it when you’re done. To stop the debugging process:

An sample output of the debug messages OSPF sends when establishing a neighbor relationship:

 

Troubleshoot Virtual Private Networks (VPNs)

VPNs can be tricky, especially between two vendors. So when you’re configuring them, you should know where to look if one doesn’t come up.

The NSX Edge keeps logs of the events, which are stored in /var/log/messages. The contents can be viewed through the command “show log”. You can either check that or check the central syslog facility, if you have one. The following log lines are taken from the output of “show log reverse”.

Phase 1 or 2 Policy Mismatch
When the VPN on the NSX Edge hangs in the “STATE_MAIN_I1” state, there’s something wrong with the Phase 1 or 2 negotiation. Look for “s1-c1” and “NO_PROPOSAL_CHOSEN” in the logs:

Pre-Shared Key Mismatch
When the PSK does not match, the log will tell you something about “INVALID_ID_INFORMATION”, after initiating the “Quick Mode” for information exchange.

 

Troubleshoot VXLAN, VTEP, and VNI configuration and connectivity

MTU Size
VXLAN requires you to set a larger MTU size. The recommended size is 1600. You can check from the ESXi server CLI whether the VXLAN stack has issues and if the correct MTU has been configured on the ESXi host uplinks by simply doing a (special) ping:

If the ESXi host (192.168.99.103 is a different host from where the test was) does not respond correctly, try with a lower packet size like 1472. If it does respond that time, the MTU is not configured correctly.

If it does respond, but VXLAN issues persist, zoom in on the controller and ESXi host communication. First, get the ID of the logical switch you’re having issues with (through the GUI or CLI) and login to a controller to see whether your ESXi hosts are logged in to the controller for this logical switch:

If that looks okay, check whether the ESXi hosts have registered as VTEPs with the controller:

If there are no VTEPs registered, there might be an issue with multicast on the network (if configured). If you’ve discovered that the ESXi hosts have registered as VTEPs, check whether any MAC addresses of virtual machines are registering with the controller for the logical switch:

If there are no MAC addresses present, multicast (if configured) might be the culprit. If everything looks fine and you still don’t have connectivity, start checking firewalls. 😉



Share the wealth!

Leave a Reply

Your email address will not be published. Required fields are marked *