Some Thoughts and Questions About STT

I just finished reading the Internet Draft for Stateless Transport Tunneling (STT), a proposed protocol for network virtualization. STT’s contemporaries are VXLAN (Virtual eXtensible Local Area Network) and NVGRE (Network Virtualization using Generic Routing Encapsulation), both of which are also described in IETF Internet Drafts. The goal of all of these protocols is to virtualize (abstract) the physical network topology and bring functionality like isolation of multiple tenants, isolation of overlapping address space between multiple tenants, expanded VLAN/tenant ID address space, and enhanced VM mobility (by providing L2 services over an L3 network, for example).

I’ve written a little bit about VXLAN before; see my post titled Examining VXLAN for more details.

After reading the STT draft, I’m struck by a few initial thoughts:

  • The STT draft proposes the use of “TCP-like” headers to take advantage of hardware-based TCP Segmentation Offload (TSO) support. The headers look enough like TCP to allow existing TSO support in network interface cards (NICs) and other devices to work, but are different enough that a STT-unaware receiver would simply drop the packets. STT even uses the same IP protocol number (6) as TCP.
  • Although the STT headers “look” like TCP, there is no TCP handshake or connection state maintained. The draft authors acknowledge that this is both good and bad; it’s good in that the overhead of managing TCP connections is avoided, but bad that intermediary devices (like firewalls) that might ordinarily leverage TCP state information will no longer be able to do so with STT flows.
  • Like VXLAN and NVGRE, STT can be negatively impacted by “middle boxes,” devices in the physical network between the STT endpoints that might potentially interfere with the traffic. Firewalls are one key example; without STT support in the firewalls, it’s an either/or solution: either all STT traffic passes (based on the STT “TCP-like” destination port) or no STT passes. As I said, though, this is not unique to STT; VXLAN and NVGRE also suffer from the same issue.
  • The STT protocol draft makes special accommodations to enable more efficient processing of STT frames, such as providing an “L4 offset” that allows an STT-aware device to immediately know where the start of the inner TCP/UDP payload starts (instead of having to perform extensive parsing of the headers).
  • To provide an expanded address space, STT leverages a 64-bit “Context ID,” which could be used in any number of ways: expanded VLAN IDs, customer/tenant IDs, etc. VXLAN, on the other hand, supplies a 24-bit ID that is explicitly called out as the VXLAN Network Identifier (VNI). Similarly, NVGRE uses a 24-bit Tenant Network Identifier (TNI). STT’s Context ID is both larger and more generic.
  • Like VXLAN and NVGRE, physical networking equipment will not be aware of STT.

A few questions also arise:

  • Is the performance benefit realized by—as the draft authors state—having STT “explicitly designed to leverage the TSO capabilities of currently available NICs” really that significant? The STT draft states that the protocol was designed this way to improve the performance of data transfers, implying that STT’s architecture and use of TSO enables it to provide better performance than VXLAN (which encapsulates inside UDP) or NVGRE (which encapsulates inside GRE).
  • What are the drawbacks of re-using the TCP protocol number and repurposing TCP header structures for different purposes? Does this set a precedent that could potentially have negative repercussions? Does this open the door for vendors to “co-opt” standard protocol numbers and header structures for their own purposes, and what effect will that have on the network industry?

The one key takeaway that I have from reading the draft is just how much the “network virtualization” space is like a Wild West shoot-out: it’s a big free-for-all with competing (proposed) standards and competing (proposed) protocols like VXLAN, NVGRE, and STT; each of these proposals offers some unique strengths but also has some drawbacks and limitations. Maybe I’m just naive, but it seems to me the sort of broad interoperability to which we’ve become accustomed in IP-based networks is in danger of being fragmented along vendor lines: VMware-based networks deploying VXLAN, Microsoft-based networks deploying NVGRE, and Nicira throwing STT in there for good measure (presumably with STT support in Open vSwitch). It will be interesting—perhaps even challenging—as these various proposals sort themselves out. That is, of course, assuming that they sort themselves out.

Courteous comments are always welcome; please feel free to speak up in the comments! Also, please disclose affiliations where appropriate; for example, although I work for EMC, these thoughts are mine and mine alone.

Tags: , ,

  1. Iben Rodriguez’s avatar

    As always – an insightful perspective on one of the newly proposed network virtualization protocols. Change is rough but offers risk takers opportunities to experience new levels of efficiency or perhaps crash and burn. Will this be a Betamax versus VHS or the the Blu-ray/HD DVD format wars? Open Source Software has made a powerful impact on how clouds are built and there seems to be no slowing down of adoption.

    It will probably come down to the success of the cloud management systems ability to efficiently scale and support the security needs of multiple tenants. Open Stack Quantum Essex is still in beta but already being used at some big installations. Stay Tuned for more on this…

  2. EtherealMind’s avatar

    You said “it’s a big free-for-all with competing (proposed) standards and competing (proposed) protocols like VXLAN, NVGRE, and STT;”

    I’ll take umbrage at that. Network standards are different from pseudo standard like that in Windows or VMware (such as AD, VAAI, VASA or vCenter). Networking standards are presented openly, and developed in concert between vendors.

    A “network system” is very different from operating systems, many times larger and distributed over a large area (continents), and devices act autonomously. This makes them different than a monoculture developed by a single software vendor.

    Further, the debates and discussions are conducted in the open. I doubt that VMware or EMC (as examples of server software or storage industry) would regularly offer up their internal projects for open debate and external review and criticism in the same way that networkers do.

    But that’s just my opinion.

  3. slowe’s avatar

    Greg, thanks for your comment. There are two aspects to this particular part of the discussion. First, let me start by saying that I agree that the open development of standards is far preferable than a closed, proprietary process. So, no need for you to defend the value of that process. That’s the first aspect of this discussion, and we are in agreement on the benefit of an open process. Where I take concern is in the implementation of proposed “standards” before they are actually standards, as we are seeing with VXLAN (and potentially with STT, if I’m not mistaken). At least the STT draft authors acknowledge that they don’t expect STT to make it all the way to RFC, but I don’t know if that’s the case for NVGRE or VXLAN. This is the second aspect, and in my mind it is separate and distinct from the first. It’s all well and good to submit protocol proposals, but it’s something else entirely to implement proposals before they are actually finalized. (And yes, I suppose there is value in the implementation of pre-standard protocols “driving” the industry forward in order to achieve progress, but as with all things in life there has to be a balance.)

    Thanks for your comment!

  4. EtherealMind’s avatar

    ” It’s all well and good to submit protocol proposals, but it’s something else entirely to implement proposals before they are actually finalized. ”

    Actually, the IETF process is intended to handle both standard and non-standard protocols. There are many examples of non-standard and proprietary RFC’s such as Cisco’s HSRP which serve to communicate with the wider network community. Many proposals fail to gain traction or adoption and remain as Informational or Draft. Therefore, the draft that Nicira have submitted is nothing more than a “here is our idea” so that everyone can see what they have done.

    If only Nicira choose to deploy STT, then it’s effectively proprietary protocol. But Nicira might be hoping that vendors will deploy STT in hardware for tunnel termination and gain wider deployment.

    On the bright side, STT highlights the deficiencies in VXLAN and NVGRE quite nicely, while the reverse is also true. STT has some limits that VXLAN/NVGRE overcome. It’s possible that one of the protocols will absorb features of the others and become a better protocol.

    In terms of fragmentation, we’ve seen this before around 2000/2001 timeframe when a lot of new protocols arrived and very few of them survived. This, overall, is a good thing. Today, the virtual networking is still developing and strong debate around protocols is a good thing. Customers will be well served by the review that is brought to bear.

  5. David Le Goff’s avatar

    Rewinding at the STT aim…

    Why don’t we use advanced packet processing software to overcome performance challenges with the existing protocols (GRE for instance) rather than faking TCP packets to benefit from hardware NIC capabilities?

    I am still curious to know…

  6. EtherealMind’s avatar

    In broad terms because the OS’s aren’t able to run real time. That is, the “software” you talk about is subject to indeterminate CPU timing in a multi-threaded kernel. The process of calculating the CRC, building the IP frame and Ethernet frame is quite CPU intensive and, like graphics processing, works best in silicon. If you allocate an entire core to network processing (required for 10GbE handling at low latency) then your server performance drops a fair bit.

    Additionally, GRE doesn’t load balance well in an Ethernet network and we need a modified Ethernet/IP header combination overcome this limitation.

    Hope this helps.

  7. slowe’s avatar

    David Le Goff, the answer is that most NICs available on the market today already have TCP offload capabilities built into them. Most NICs (if any) have hardware offloads to handle GRE. That would be a great feature to have, but it will take time for the NIC silicon vendors to engineer those features (and then additional time for those features to make it to market). Great question though!

  8. David Le Goff’s avatar

    Hi Scott,
    It was not my point actually :)
    Indeed, current NICs are not able to support GRE offloading.
    But what about the pure software companies which are capable to perform high performance networks with C language only? (like us… :)

    As a first example I can give you, 6WIND can perform 30Gbps GRE tunneling on a single core. Pretty good no?

  9. slowe’s avatar

    David, in the future please be sure to disclose any affiliations that might be pertinent–such as the fact that you appear to work for 6WIND. :-)

    The only issue with doing it all in software is the need for all the operating systems and hypervisors to implement the feature–a process that I suspect would take longer than building it into the hardware. In addition, there are the variability concerns that Greg Ferro mentions above.

    Thanks for your comments!

  10. David Le Goff’s avatar

    Coming back to Greg’s statement where I fully agree: “OS’s aren’t able to run real time. That is, the “software” you talk about is subject to indeterminate CPU timing in a multi-threaded kernel”

    This is why I stressed about looking at advanced packet processing such as 6WIND rather than bringing new tunneling protocols (STT).
    We have designed and developed the fast path notion as an add-on to the Linux OS to isolate most of the networking traffic without being preempted and penalized by the Linux. You run and process almost all your packets, GRE included outside of the Linux giving you the benefits from high performance, high scalability, etc… while being full transparent to the Linux & Applications.
    But don’t get me wrong, I don’t shot on companies like Nicira, I love this technology and they disrupted the market in the right way since some months. I am just willing to know why they don’t look for enhancing the software first. Or maybe they are doing it in parallel… :)

    And yes I am part of 6WIND but it was not a secret :D

  11. Lennie’s avatar

    GRE-based tunnels might have the biggest chance of being adopted right now. OpenStack with OpenVSwitch can run with GRE and it’s available in Linux distribution of choice right now.

    VXLAN is in the Linux kernel, but there is no OpenVSwitch or OpenStack release, so it can’t be deployed right now with working code.

    Microsoft also has a shipping product.

    So that are two pretty big camps with GRE.

    Anyway I’ve been wanting to ask some of you folks about an other idea I had.

    Has anyone of you ever looked at IETF workgroup for MultiPath-TCP has been up to ? MultiPath-TCP can be very useful in the datacenter. TCP has congestion control so it can deal with congestion and MultiPath-TCP can obviously handle failures of paths.

    It’s been worked on for Linux and FreeBSD for years now. The Linux implementation recently had it’s first stable release. I doubt the standard will get any big chances anymore. So it is pretty much ready for use.

    So my real question is, why muck around with STT if you can use Multipath-TCP ?

    I can see how people think they want a connectionless protocol, but with virtualization you are probably creating many flows between the same servers, am I right ?

    You might only need to establish one or very few Multipath-TCP-connection between 2 servers when you use tunneling like the other protocols. I could be wrong but head of line blocking seems unlikely to happen in the datacenter. Although I don’t think MultiPath-TCP can utilize Segment Offloading. But I’ll let other people smarted than me answer that one, like David from the previous comment.

    The other obvious alternative is SCTP it can also use multiple paths and it has been in Linux for ages. I think it even supports multiple streams and a more datagram-like API too for people that want to prevent head of line blocking. People keep saying they are scared of middle boxes, but SCTP also supports UDP tunneling I belief.

    So do you think something like that wouldn’t be better than the other tunneling protocols ?

    Why are we all trying to solve multiple paths at level 2 instead when you they are already deploying tunneling, then you can solve your multipath problems at level 3 too.