VMware NSX Best Practices from VMworld

There were a lot of technical sessions on VMworld about VMware NSX. 30 sessions were about or touched on NSX and interest (the queue for the waiting list) were enormous, a lot of people wanting to know more.

As a network guy, my VMworld was mostly about NSX as well. I joined 7 sessions to get more acquainted and learn about best practices in designing virtual network environments. This post will summarise these best practices. As soon as the VMworld presentations will come online, I will update this post with the actual network diagrams.

Cluster Design (types)
VMware NSX should be deployed in a management-decoupled-from-production kind of way. Under management we put the NSX Management functions, like the NSX Manager and NSX Controllers, the NSX Edge Services gateways and production is your compute environment for your virtual machines.

Recommended practice is to divide the management cluster into two functions; a NSX Management cluster and NSX Edge cluster. The NSX Management cluster will only host the NSX Managers and Controllers (a long with other vSphere management functions), while the NSX Edge cluster will host all the NSX Edge gateways that will be deployed. The reason for separating the NSX Edge cluster from the other management functions, is that you can position your WAN uplinks closer to the Edge gateways for more smart bandwidth management and minimising the stretched VLANs. Use different layer-3 subnets per Top-of-Rack switch for your VXLAN/VTEP networks to minimise stretched VLANs as well.

NSX Edge Services
There are three types of high availability for the NSX Edge gateways, each have their own pros and cons. Each implementation will need to fit your requirements, so there is no recommendation there (although the implication lies heavily on the last one displayed below).

Stand-alone: One Edge appliance, the least amount of resources needed, should be used for small deployments, state-less failover, 1 peering to the external network, VMware HA takes care of the high availability.
Edge HA: Two Edge appliances in Active/Standby mode, should be used for deployments where VMware HA isn't fast enough, state-full failover, 1 peering to the external network, failover can take between 30 and 60 seconds (internal failover and dynamic routing failover).
Edge ECMP: Multiple Edge appliances (up to 8) in Active/Active/etc mode, should be used for large deployments, stateless failover, multiple (1 per Edge) peerings to the external network, failover can take between 3 and 10 seconds. You will need to run NSX 6.1 or higher for this feature.

When it comes to high availability, the only benefit of Edge HA is the state-full failover. If you can afford to lose the state table and are running NSX 6.1+, go with ECMP as the failover time is the a lot better in this mode. When using the HA mode, you should tweak the OSPF or BGP timers to minimise failover times.

NSX Transport Zone
The Transport Zone is the backbone of the VXLAN network and spans across all ESXi hosts that have a certain logical switch. Because of that span, keeping the logical switches relative locally to a zone, let's say a rack, will cause VXLAN traffic to remain local and not span across your entire datacenter. VXLAN is a good technology to create spanned VLANs across routed networks and it is very possible to do so. Keeping your logical switches locally and making only the essential logical switches span your datacenter, will allow the VXLAN tunnels to be more effective.

NSX Distributed Firewall
The best practice for the Distributed Firewall is pretty straight forward: use it. Next to that, the basic best practices of regular firewall devices can be used (with a few exceptions):

Enable Role Based Access Control (RBAC) for your network and security guys, so they only see their objects
Set up logging to a central syslog device and netflow to a netflow collector, so you can analyse events that go on in your virtual network. Use Splunk or VMware vRealize Log Insight or something similar for this purpose.
Set up NSX threshold alerts to monitor the CPU and Memory usage of the NSX components.
Exclude your management virtual machines from per VM firewalling, but firewall those at the management zone level (if you make a mistake, you cannot accidentally take your vCenter/NSX Manager offline). Roie Ben Haim has an excellent post on how to configure this.
When using automation (vRelease Automation Center), build in predefined security models in the blueprints for standardisation of your security policies.

NSX Multi-tenancy
Multi-tenancy inside NSX can be achieved in the design of your virtual network. Use a Distributed Logical Router (DLR, stand-alone or HA) and a NSX Edge gateway per tenant and create a backbone network where the tenant network connects to an aggregation Edge cluster (HA or ECMP) that connects back to the outside network. The network policies reside on the Edge gateway and the Distributed Logical Router per tenant, giving them full control over their own network. The aggregation Edge gateways should use route import policies to make sure the tenant Edge gateways will only advertise the appropriate routes.

Credits
The credits for the diagrams and content go to the presenters of the VMworld sessions. Below is a list of sessions where this is taken from:

NET1586 - Advanced Network Services with NSX by Dimitri Desmidt and Max Ardica
NET1589 - Reference Design for SDDC with NSX & vSphere by Nimish Desai
SEC1746 - NSX Distributed Firewall Deep Dive by Anirban Sengupta and Kausum Kumar

Share the wealth!

2 Comments

martin
June 30, 2015 at 22:22

You have a great site and thank you for all your hard work. Quick question….I cannot determine how many ECMP Clusters NSX 6.1 will support, if NSX can support 2000ESG’s I suspect NSX 6.1 will support multiple groups of 8 ECMP Clusters. All I can glean for info is max of 8 ECMP Edges per cluster, how many clusters, 100, 200?
Thanks in advance!!

Martijn (Post author)
July 1, 2015 at 14:48

Hi Martin,

Thanks for your post. As far as I can tell, the maximum number of clusters is “unlimited” – as far as the maximum Edges go. So if you have an average of 4 Edges per ECMP cluster, it is 2000 / 4 = 500. If you have an average of 8 (the max) it is 2000 / 8 = 250.

Lostdomain

Tagcloud

VMware NSX Best Practices from VMworld

Share the wealth!

2 Comments

Leave a Reply Cancel reply


Contact