Am I Understanding RoCE Correctly?

A couple of days ago I posted a tweet inquiring about RDMA (Remote Direct Memory Access) over Converged Ethernet, affectionately known as RoCE and even more affectionately pronounced “Rocky”. At the time I was unclear exactly what RoCE was and what it was trying to accomplish.

Since then, I’ve done a bit more research and I think that I have a better idea of RoCE now. In particular, this EE Times article provided some information that I found useful in putting the pieces together.

If I understand things correctly, RoCE does for InfiniBand what FCoE did for Fibre Channel—it replaces the physical transport mechanism with 10 Gigabit Ethernet. More specifically, Converged Ethernet, which is the particular flavor of 10Gb Ethernet that supports the IEEE Data Center Bridging (DCB) standards like Priority-Based Flow Control (802.1Qbb), Enhanced Transmission Selection (802.1Qaz), and Congestion Notification (802.1Qau). These DCB standards are intended to make Ethernet less “lossy” and “chatty” and more reliable, predictable, and lossless like Fibre Channel (or InfiniBand).

Fibre Channel over Ethernet (FCoE) took the physical transport layers of traditional Fibre Channel and replaced them with Ethernet, and the IEEE created the DCB efforts to make sure that the underlying Ethernet transport was reliable and lossless so that it could support FCoE. Because it still “looked” like Fibre Channel at the upper layers, there is a great deal of interoperability between Fibre Channel and FCoE.

In a similar fashion, RDMA over Converged Ethernet (RoCE) does the same sort of thing, but for the RDMA interfaces that are common to InfiniBand. It takes RDMA and puts it on Ethernet, again relying upon the IEEE DCB standards to make Ethernet reliable and lossless with predictable latencies. No more proprietary fabrics; with RoCE-capable adapters, you’ll be able to reap the ultra-low latency benefits of InfiniBand over standards-based 10Gb Converged Ethernet.

At least, that’s how I understand it. Anyone else have a better explanation?

Tags: ,

  1. kommy’s avatar

    I think you’re correct mostly. You can see the many good presentations from this site:

    http://www.openfabrics.org/archives/sonoma2010.htm

    # RDMA is just Remote DMA not stands for InfiniBand as you know.

  2. Hari’s avatar

    companies like Xsigo send both IP(ethernet) and FC traffic over IB interfaces, especially in the context of virtualization – do you visualize similar things with RoCE? I cant imagine, but thought I would ask…

  3. slowe’s avatar

    Hari,

    No. In fact, some of the documents I’ve seen on RoCE specify indicate that IP-based traffic is a “no go”. As a result, I’m hard pressed to find uses for RoCE outside of HPC environments.

    I could be missing something. RoCE experts, feel free to clarify or expand…

  4. AFidel’s avatar

    RoCE could be useful for Oracle environments as we’ve seen some impressive stats on the reduced overhead and increased throughput RDMA can provide between middleware and DB server but the high cost in terms of hardware and an additional fabric to manage made it a non-starter.

  5. Markus’s avatar

    Hi,
    give a look at this Blog http://www.mellanox.com/blog/
    They are RoCE Compliant now either trough own Adapters or via Software.
    RoCE is definetly targeted to do Infinband over Ethernet.

  6. Paul Grun’s avatar

    Your understanding is mostly correct, with one or two very important clarifications.

    First, RoCE does NOT require nor depend on DCB for correct operation. There is a school of thought which holds that RoCE may run a bit better over a lossless fabric such as that provided by DCB, but again DCB is NOT required by the new standard. The degree to which it will run better is inversely proportional the the lossyness of the fabric.

    Second, the best way to think of RoCE is as the InfiniBand layer 4 transport running directly on layer 2 Ethernet. Therefore, it is not an IP network.

    I expect that RoCE will be very very successful in environments that do not require routing.

    RoCE is not an InfiniBand replacement, since RoCE is limited to Ethernet speeds whereas IB is currently well beyond those speeds with a roadmap that goes waay out into the future.

    Paul Grun
    Chief Scientist, System Fabric Works, Inc.
    Chair, InfiniBand Trade Association RoCE Taskgroup

  7. Paul Grun’s avatar

    One other point that bears mentioning: up above, Kommy said; “# RDMA is just Remote DMA not stands for InfiniBand as you know.”

    To be precise about it, IB provides a ‘channel semantic’ (SEND/RECEIVE) and a ‘memory semantic’ (RDMA READ, RDMA WRITE).

    When iWARP was introduced, it included both semantics under the umbrella name ‘RDMA’, and that is the state of affairs today.

    The way to think of it is this: “RDMA” is the name for a memory access method that includes both a channel semantic and a memory semantic, and InfiniBand is the best known implementation of RDMA today, with iWARP being another.

    Paul Grun
    Chief Scientist, System Fabric Works, Inc.
    Chair, InfiniBand Trade Association RoCE Taskgroup

  8. slowe’s avatar

    Paul,

    Thanks for the additional information—it’s very helpful!