Learning NVP, Part 1: High-Level Architecture

This blog post kicks off a new series of posts describing my journey to become more knowledgeable about the Nicira Network Virtualization Platform (NVP). NVP is, in my opinion, an awesome platform, but there hasn’t been a great deal of information shared about the product, how it works, how you configure it, etc. That’s something I’m going to try to address in this series of posts. In this first post, I’ll start with a high-level description of the NVP architecture. Don’t worry—more in-depth information will come in future posts.

Before continuing, it might be useful to set some context around NVP and NSX. This series of posts will focus on NVP—a product that is available today and is currently in use in production. The architecture I’m describing here will also be applicable to NSX, which VMware announced in early March. Because NSX will leverage NVP’s architecture, spending some time with NVP now will pay off with NSX later. Make sense?

Let’s start with a figure. The diagram below graphically illustrates the NVP architecture at a high level:

High-level NVP architecture diagram

The key components of the NVP architecture include:

  • A scale-out controller cluster: The NVP controllers handle computing the network topology and providing configuration and flow information to create logical networks. The controllers support a scale-out model for high availability and increased scalability. The controller cluster supplies a northbound REST API that can be consumed by cloud management platforms such as OpenStack or CloudStack, or by home-grown cloud management systems.
  • A programmable virtual switch: NVP leverages Open vSwitch (OVS), an independent open source project with contributors from across the industry, to fill this role. OVS communicates with the NVP controller clusters to receive configuration and flow information.
  • Southbound communications protocols: NVP uses two open communications protocols to communicate southbound to OVS. For configuration information, NVP leverages OVSDB; for flow information, NVP uses OpenFlow. The management (OVSDB) communication between the controller cluster and OVS is encrypted using SSL.
  • Gateways: Gateways provide the “on-ramp” to enter or exit NVP logical networks. Gateways can provide either L2 gateway services (to bridge NVP logical networks onto physical networks) as well as L3 gateway services (to route between NVP logical networks and physical networks). In either case, the gateways are also built using a scale-out model that provides high availability and scalability for the L2 and L3 gateway services they host.
  • Encapsulation protocol: To provide full independence and isolation of logical networks from the underlying physical networks, NVP uses encapsulation protocols for transporting logical network traffic across physical networks. Currently, NVP supports both Generic Routing Encapsulation (GRE) and Stateless Transport Tunneling (STT), with additional encapsulation protocols planned for future releases.
  • Service nodes: To offload the handling of BUM (Broadcast, Unknown Unicast, and Multicast) traffic, NVP can optionally leverage one or more service nodes. Note that service nodes are optional; customers can choose to have BUM traffic handled locally on each hypervisor node. (Note that service nodes are not shown in the diagram above.)

Now that you have an idea of the high-level architecture, let me briefly outline how the rest of this series will be organized. The basic outline of this series will roughly correspond to how NVP would be deployed in a real-world environment.

  1. In the next post (or two), I’ll be focusing on getting the controller cluster built and diving a bit deeper into the controller cluster architecture.
  2. Once the controller cluster is up and running, I’ll take a look at getting NVP Manager up and running. NVP Manager is an application that consumes the northbound REST APIs from the controller cluster in order to view and manage NVP logical networks and NVP components. In most cases, this function is part of a cloud management platform (such as OpenStack or CloudStack), but using NVP Manager here allows me to focus on NVP instead of worrying about the details of the cloud management platform itself.
  3. The next step will be to bring hypervisor nodes into NVP. I’ll focus on using nodes running KVM, but keep in mind that Xen is also supported by NVP. If time (and resources) permit, I may try to look at bringing up Xen-based hypervisor nodes as well. Because NVP leverages OVS as the edge virtual switch, I’ll naturally be discussing some OVS-related tasks and topics as well.
  4. Following the addition of hypervisor nodes into NVP, I’ll look at creating a simple logical network, and we’ll examine how this logical network works with the underlying physical network.
  5. To add more flexibility to our logical network, we need to be able to bring physical resources into NVP logical networks. To enable that functionality, we’ll need to add gateways and gateway services to our configuration, so I’ll discuss gateways and L2 gateway services, how they work, and how we add them to an NVP configuration.
  6. The next step is to enable L3 (routing) functionality within NVP, and that is enabled by L3 gateway services. I’ll spend some time talking about the L3 gateway services, their architecture, adding them to NVP, and including L3 functionality in an NVP logical network. I’ll also explore distributed L3 routing, where the L3 routing is actually distributed across hypervisors in an NVP environment (this is a new feature just added in NVP 3.1).
  7. Now that we have both L2 and L3 gateway services in NVP, I’ll take a look at building more intricate logical networks.

Beyond that, it’s hard to say where the series will go. I’ll likely also take a look at some of NVP’s security features, and examine a few more complex NVP use cases. If there are additional topics you’d like to see beyond what I’ve outlined above, please feel free to speak up in the comments below.

I’m excited about this journey to learn NVP in more detail, and I’m looking forward to taking all of you along with me. Ready? Let’s go!

Tags: , , , , ,

  1. Brent Salisbury’s avatar

    Great post Scott, Already cleared up many of my questions.

    On the service node, is that hashing dmac/smac k/v pairs to build a tenants tenant broadcast domain tree/NIB/Topology?

    Anything other then OFP_NORMAL/STP for L2 learning involves reactive processing is why I ask. Trying to figure out something similar on the physical side for a current deployment outside the DC. If you all are keeping consistency thats very much awesome. Cool to hear its not the easy way out w/ an agent :)

    Looking forward to the next post, this was excellent info!
    -Brent

  2. piaopingchen’s avatar

    thank you for your post ,I think such post like this is very helpful for me

  3. cryptochrome’s avatar

    What Brent said: Great post! Can’t wait to read the next part. And as a security guy, I am especially eager to read on the security features.

  4. James Knapp’s avatar

    Thanks Scott. Great post and looking forward to the series.

    We virtualisation and storage guys need someone who comes from the same background to explain these topics. When the explanations come from people with a networking background, I find that either terms used are assumed to be understood that may not be, or the context and problem approach is different and so not as easily related to.

    So, does NVP require a symmetric network configuration (all switches interconnected etc.) or will the controller deduce the configuration and advertise this to the OVS so it can make decisions?

    Also what happens when there must be physical separation of networks (mandated by the current security policy)? Will there need to be separate NVP configurations connected with gateways, each with their own dedicated controller cluster? Or can a set of controllers manage multiple NVP ‘domains’?

    Thanks for the post and looking forward to the series

  5. vsoft’s avatar

    As per above NVP leverages open vswitch. Now OVS is available with Red Hat and Xen.

    My question is how will NVP work with vsphere.

    Looking forward to your reply.

  6. slowe’s avatar

    Brent, thanks for the feedback. If I’m understanding your question correctly, then I believe the answer is that each tenant’s network topology is maintained by the controllers, and is distinguished in the encapsulation header (the Context ID in STT, for example). Thus, the service node knows that if it receives BUM traffic from a particular tenant, it should only be replicated to hypervisor nodes hosting VMs for that same tenant. Does that answer your question?

    James, I’m hopeful that I’m able to present some of these concepts and ideas in a way that will be helpful to both network *and* server professionals. (Time will tell.) With regards to your question, the only thing NVP requires from the network is IP (layer 3) connectivity. How you build that connectivity is up to you. We’d recommend something like a leaf/spine architecture, but it is not required. As for physically separate networks, right now each controller cluster manages a single NVP domain. The real question is whether your physically separate networks would require multiple NVP domains, and the only answer I can give to that question (right now) is “It depends.” Stay tuned—this sort of thing might be something I can address later in the series when I talk about use cases and such. Thanks!

    Vsoft, right now OVS operates on vSphere as a vApp. It’s not ideal, but it works. That architecture will evolve over time as we work closer to the release of NSX, which—as I mentioned in the article—VMware announced back in March. I hope this helps!

  7. Amit’s avatar

    Hi Scott,

    I am looking forward to the series. One thing I have always wondered is how OVSs dynamically build Overlay between themselves using controller. Also, how STT or VXLAN packets are encapsulated by OVS depending on destination.

    Regards,
    Amit.

  8. Rynardt Spies’s avatar

    Yet again a great post Scott, and in good time as well.

    I was actually toying with the idea of doing a similar series after attending a 4-day vCloud LiVeFire session this week. This topic was covered quite extensive in two sessions that I can only describe as mind warping.

    It will be interesting to see how big network vendors will respond, as this could potentially have a big impact on their cash cow of physical layer 2 features.

    Again, great post!

  9. David Pasek’s avatar

    Scott, excellent introduction and I eagerly looking forward for other posts. It’s worth that you will try to consolidate all information and your lab experience into single and consistent resource – your blog stream about NVP. I really appreciate you share your knowledge to the public community. It’s not easy to integrate open source and commercial worlds and it’s even more difficult to explain technology to siloed IT infrastructure experts (network, server, or storage oriented). I can not wait the time when the software define infrastructure will be ready and standardize. Can you imagine Vmware ESX + VMware NSX (NVP) + PernixData FVP + Zerto replication with journaling? Amazing future … there are only two thinks which scares me. First is THE SOFTWARE because all programmers knows that in each code is the bug and when you repair one bug another two appear. Second is new INFRASTRUCTURE SPECIALIZATION DC Admin who must have overall but deep enough knowledge across all infrastructure disciplines and on top of that also the software awareness (read as art of abstraction and generalization). However there is no other way and we cannot stop the evolution. The main differentiator between platforms will be software architecture and quality. Thanks again to teach us (or together with us) NVP.

  10. Nick Day’s avatar

    Great post Scott, hence been waiting to see parts 2 and onwards… Any news?

    I can see you’ve been busy having also read your first in aseries co-authored article in Network Heresy.

    Keep up the excellent content, always great topics selected.

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>