This post is part of my VMware VCIX-NV Study Guide and covers some troubleshooting tips for common NSX connectivity issues.

Documentation

Index

 

Troubleshoot virtual machine connectivity to Logical Switches

I’m not really sure what VMware means with this subject. There are several things can can fall under this, like VXLAN connectivity preventing the virtual machine from going across the logical switch. We’ll cover this in other topics.

 

Troubleshoot dynamic routing protocols

The NSX Edge can use OSPF, BGP and IS-IS for dynamic routing between other network components (other Edges or physical devices). Below are some troubleshooting tips for dynamic routing:

Show active neighbors

vShield-edge-12-0> show ip ospf neighbor
Neigbhor ID         Priority    Address             Dead Time   State
1.1.1.1             128         192.168.99.1        36          Full/DR
vShield-edge-2-0> show ip bgp neighbors
vShield-edge-2-0> show isis neighbors

Show installed dynamic routes

vShield-edge-2-0> sh ip route bgp
vShield-edge-2-0> sh ip route isis
vShield-edge-12-0> show ip route ospf

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2


O   E2  1.1.1.0/24           [110/0]       via 192.168.99.1
O   E2  2.2.2.0/24           [110/0]       via 192.168.99.1
O   E2  10.1.5.0/24          [110/0]       via 192.168.99.1
O   E2  192.168.1.0/24       [110/0]       via 192.168.99.1

Show interfaces listening for neighbors

vShield-edge-2-0> show ip ospf interface
vNic_3 is activated
  Internet Address 192.168.99.1, Network Mask 255.255.255.0, Area 0.0.0.0
  Transmit Delay is 1 sec, Network Type BROADCAST, State DR, Priority 128
  Designated Router's Interface Address 192.168.99.1
  Backup Designated Router's Interface Address 0.0.0.0
  Timer intervals configured, Hello 10, Dead 40, Retransmit 5
vShield-edge-2-0> show isis interface

Show link state database for OSPF/ISIS

vShield-edge-2-0> show isis database
vShield-edge-2-0> show ip ospf database

Detect Authentication Failure

vShield-edge-12-0> show log reverse
2015-01-31T12:38:18+00:00 vShield-edge-12-0 routing[876]:  [user.info] AUDIT 0x3e02-39 (0000): OSPF 1 Packet received with unexpected authentication type 1.

Detect OSPF Area Misconfiguration

vShield-edge-12-0> show log reverse
2015-01-31T12:40:58+00:00 vShield-edge-12-0 routing[876]:  [user.emerg] EXCEPTION 0x3e01-110 (0000): OSPF 1 OSPF packet dropped because it was received on non-existent or inactive virtual or sham link

Debugging
In addition to doing all kinds of show commands to determine the status of ISIS, OSPF or BGP, you can also debug the protocols to get a lot more information about what the processes are doing in the background. To start the debugging process:

vShield-edge-12-0> debug ip ospf
vShield-edge-12-0> debug ip bgp
vShield-edge-12-0> debug isis

When enabled, the log will fill up with messages from the protocol. You should never let this running continuously, always disable it when you’re done. To stop the debugging process:

vShield-edge-12-0> no debug ip ospf
vShield-edge-12-0> no debug ip bgp
vShield-edge-12-0> no debug isis

An sample output of the debug messages OSPF sends when establishing a neighbor relationship:

2015-01-31T12:46:58+00:00 vShield-edge-12-0 routing[876]:  [user.info] AUDIT 0x3e01-226 (0000): OSPF 1  i/f idx 0X00000004  rtr ID 1.1.1.1 IP addr 192.168.99.1 neighbor FSM has processed an input.
2015-01-31T12:46:58+00:00 vShield-edge-12-0 routing[876]:  [user.info] AUDIT 0x3e01-200 (0000): OSPF 1 Database exchange with an adjacent OSPF neighbor has been completed.
2015-01-31T12:46:58+00:00 vShield-edge-12-0 routing[876]:  [user.info] AUDIT 0x3e01-226 (0000): OSPF 1  i/f idx 0X00000004  rtr ID 1.1.1.1 IP addr 192.168.99.1 neighbor FSM has processed an input.

 

Troubleshoot Virtual Private Networks (VPNs)

VPNs can be tricky, especially between two vendors. So when you’re configuring them, you should know where to look if one doesn’t come up.

The NSX Edge keeps logs of the events, which are stored in /var/log/messages. The contents can be viewed through the command “show log”. You can either check that or check the central syslog facility, if you have one. The following log lines are taken from the output of “show log reverse”.

Phase 1 or 2 Policy Mismatch
When the VPN on the NSX Edge hangs in the “STATE_MAIN_I1” state, there’s something wrong with the Phase 1 or 2 negotiation. Look for “s1-c1” and “NO_PROPOSAL_CHOSEN” in the logs:

Jan 31 13:11:35 gw-vpn01 ipsec[6769]: | got payload 0x800(ISAKMP_NEXT_N) needed: 0x0 opt: 0x0
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: | ***parse ISAKMP Notification Payload:
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    next payload type: ISAKMP_NEXT_NONE
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    length: 96
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    DOI: ISAKMP_DOI_IPSEC
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    protocol ID: 0
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    SPI size: 0
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: |    Notify Message Type: NO_PROPOSAL_CHOSEN
Jan 31 13:11:35 gw-vpn01 ipsec[6769]: "s1-c1" #1: ignoring informational payload, type NO_PROPOSAL_CHOSEN msgid=00000000

Pre-Shared Key Mismatch
When the PSK does not match, the log will tell you something about “INVALID_ID_INFORMATION”, after initiating the “Quick Mode” for information exchange.

Jan 31 13:15:00 gw-vpn01 ipsec[3855]: "s1-c1" #1: transition from state STATE_MAIN_I3 to state STATE_MAIN_I4
Jan 31 13:15:00 gw-vpn01 ipsec[3855]: "s1-c1" #1: STATE_MAIN_I4: ISAKMP SA established {auth=OAKLEY_PRESHARED_KEY cipher=oakley_3des_cbc_192 prf=oakley_sha group=modp1024}
Jan 31 13:15:00 gw-vpn01 ipsec[3855]: "s1-c1" #1: Dead Peer Detection (RFC 3706): enabled
Jan 31 13:15:00 gw-vpn01 ipsec[3855]: "s1-c1" #2: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+SAREFTRACK  {using isakmp#1 msgid:e8add10e proposal=3DES(3)_192-SHA1(2)_160 pfsgroup=OAKLEY_GROUP_MODP1024}
Jan 31 13:15:00 gw-vpn01 ipsec[3855]: "s1-c1" #1: ignoring informational payload, type INVALID_ID_INFORMATION msgid=00000000

 

Troubleshoot VXLAN, VTEP, and VNI configuration and connectivity

MTU Size
VXLAN requires you to set a larger MTU size. The recommended size is 1600. You can check from the ESXi server CLI whether the VXLAN stack has issues and if the correct MTU has been configured on the ESXi host uplinks by simply doing a (special) ping:

~ # ping ++netstack=vxlan -d -s 1572 -I vmk3 192.168.99.103
PING 192.168.99.103 (192.168.99.103): 1572 data bytes
1580 bytes from 192.168.99.103: icmp_seq=0 ttl=64 time=1.108 ms
1580 bytes from 192.168.99.103: icmp_seq=1 ttl=64 time=3.246 ms

If the ESXi host (192.168.99.103 is a different host from where the test was) does not respond correctly, try with a lower packet size like 1472. If it does respond that time, the MTU is not configured correctly.

If it does respond, but VXLAN issues persist, zoom in on the controller and ESXi host communication. First, get the ID of the logical switch you’re having issues with (through the GUI or CLI) and login to a controller to see whether your ESXi hosts are logged in to the controller for this logical switch:

nsx-controller # show control-cluster logical-switches connection-table 5003
Host-IP         Port  ID
192.168.99.103  43261 1
192.168.99.104  42155 2

If that looks okay, check whether the ESXi hosts have registered as VTEPs with the controller:

nsx-controller # show control-cluster logical-switches vtep-table 5003
VNI      IP              Segment         MAC               Connection-ID
5003     192.168.99.103  192.168.99.0    00:50:56:63:18:db 1
5003     192.168.99.104  192.168.99.0    00:50:56:66:08:fe 2

If there are no VTEPs registered, there might be an issue with multicast on the network (if configured). If you’ve discovered that the ESXi hosts have registered as VTEPs, check whether any MAC addresses of virtual machines are registering with the controller for the logical switch:

nsx-controller # show control-cluster logical-switches mac-table 5003
VNI      MAC               VTEP-IP         Connection-ID
5003     00:50:56:bc:21:ab 192.168.99.103  1
5003     00:50:56:ed:1a:bc 192.168.99.104  2

If there are no MAC addresses present, multicast (if configured) might be the culprit. If everything looks fine and you still don’t have connectivity, start checking firewalls. 😉



Share the wealth!