A Deeper Look at VPLEX

I’ve been promising a post on VPLEX for quite some time now, and now it’s finally here. Having spent some hands-on time with VPLEX this past week, I think I’m finally ready to discuss VPLEX in some detail.

The Basics

First, I need to cover the basics of what VPLEX is as well as what it isn’t. VPLEX is another step in delivering EMC’s vision of Virtual Storage and storage federation. There has been quite a bit of discussion over the difference between storage federation vs. storage virtualization (see here and here for two examples). Personally, like the phrase that Joe Kelly used in a VPLEX post (emphasis mine):

This is the ability to create a consistent view of a volume, independent of its location. This is the core behind Storage Federation.

With this definition in mind, you can see that EMC has already delivered and will be delivering a number of technologies that support storage federation. Take sub-LUN FAST, for example; sub-LUN FAST presents a consistent view of a LUN regardless of the specific storage tier hosting the underlying blocks for that LUN. Blocks can (and will) be migrated automatically between tiers, yet the consistent view of the volume remains unchanged.

VPLEX accomplishes this definition of storage federation through in-band storage virtualization, which I personally think is why so many people are comparing it directly to IBM SVC, HDS USP-V, and NetApp V-Series. Yes, VPLEX does perform storage virtualization—but it’s storage virtualization as part of delivering storage federation.

So, what is VPLEX exactly, then? Using in-band storage virtualization, VPLEX acts as a scale-out cluster delivering both local (within a data center) storage federation and metro (between data centers at synchronous distances) storage federation. A single VPLEX cluster can scale up to four engines; each engine contains two directors. Each director is equipped with loads of RAM (64GB of RAM, if I recall correctly), eight front-end 8Gb Fibre Channel ports, and eight back-end 8Gb Fibre Channel ports. This means a four-engine cluster offers 512GB of cache, 64 front-end 8Gb Fibre Channel ports, and 64 back-end 8Gb Fibre Channel ports. A single VPLEX cluster can support up to 8,000 virtualized LUNs.

VPLEX clusters can be combined into a metro-plex to provide storage federation between two data centers at synchronous data replication distances (less than 100km today). A metro-plex would consist of eight engines (sixteen directors), 1TB of cache, 128 front-end 8Gb Fibre Channel ports, and 128 back-end 8Gb Fibre Channel ports.

In addition to understanding what VPLEX is, it’s also important to understand what VPLEX isn’t. It’s not a replacement for Invista, EMC’s out-of-band storage virtualization solution. It’s not a solution meant only for EMC arrays; VPLEX is also supported for non-EMC arrays, with support for more arrays in the works. And finally, it’s not a VMware-only solution; VPLEX fully supports physical instances of Windows Server, Linux, Solaris, AIX, and other operating systems.

Making it Real: Specifics in the Real World

If I were reading this post, I’d be asking myself right now,”OK, that’s all great and wonderful, but what does it really mean?” I’m glad you asked.

Storage federation as provided by VPLEX means that the storage managed by VPLEX is active, read-writable storage across the entire VPLEX cluster or metro-plex (remember that a metro-plex is a pair of VPLEX clusters separated by synchronous replication distances). This means that if you have a VPLEX Local configuration with 2 engines, all the storage managed by this VPLEX Local cluster is read-writable across the entire cluster. Similarly, if you have a VPLEX Metro configuration with 4 engines (2 in each site), you can have storage that is read-writable in both locations simultaneously.

Consider a traditional storage replication solution: data exists in Site A and the array replicates the data to Site B. While the data is present at both sites, it’s only writable at Site A. Site B is read-only. This is true of every replication solution of which I am aware on the market today. EMC’s own replication products—like SRDF or RecoverPoint—behave this way. Sure, there are workarounds to that limitation, like image access with RecoverPoint. In the end, though, these are workarounds to the underlying replication model. VPLEX breaks that model by allowing you to have writable storage in Site A and Site B at the same time. The same LUN visible in two sites at the same time, writable in both locations.

Just think about that for a moment. You’ll need a clustered file system to take advantage of this underlying storage functionality, but imagine something like Windows Server with Sanbolic’s Melio FS to provide writable Windows LUNs in multiple sites at the same time. Of course, there’s also the VMware use case where VPLEX provides writable access to a VMFS datastore between multiple data centers. Talk about making the hybrid cloud a reality—consider the use of VPLEX Metro between your on-site data center and a vCloud provider’s data center. It would be the ultimate in workload mobility.

And those are just the VPLEX Metro examples. What about VPLEX Local? Ever had to migrate from one storage array to another storage array? Yes, you could use Storage vMotion. Or you could use VPLEX Local and not even have to get the VMware administrators involved—it would all happen under the covers. Think about being able to transparently migrate storage volumes among various arrays within your data center to meet the SLAs of the workload. Need Tier 1 storage? No problem, we’ll use VPLEX Local to transparently migrate you to a VMAX. Don’t need that level of performance or availability any more? No problem, we’ll use VPLEX Local again to transparently migrate to a midrange storage platform.

Want to really freak your brain out? Think about VPLEX with sub-LUN FAST integrated into it…

I have so much more about VPLEX to share, but in the interest of keeping this already long blog post from getting even longer, I’ll wrap it up here. Feel free to share your thoughts or questions about VPLEX in the comments below.

Tags: , , ,

  1. Dave Alexander’s avatar

    Thanks for the writeup, Scott. I too had been a bit confused on the difference between VPLEX and, say, IBM SVC. The active/active nature of LUNs across distance and the clusters – the federation component – is what I was missing.

  2. Duncan’s avatar

    Thanks Scott, good info!

  3. Marc Farley (3PARFarley)’s avatar

    Scott, I’m surprised at the sloppiness in this post. You linked to previous posts including one of mine and then ignored their content – which is OK – before loosely paraphrasing something from Joe Kelly’s post in order to emphasize that you believe cache coherence is the core behind storage federation. That certainly puts you into the camp of EMC blogger – just in case anybody was wondering about your objectivity.

    Cache coherence may be the core behind what EMC is trying to sell as storage federation, but that would be a company-specific engineering solution and not any sort of definition for the concept of federation. In fact, I’d say we already have a terms words for what you and Joe are talking about and that’s distributed write caching.

    The definition you chose for federation: “the ability to create a consistent view of a volume, independent of its location” is far too broad to be useful. You even start out by talking about sub-lun tiering – which definitely should not be included any definition of what storage federation is. Sub-lun tiering is a matter of virtualization within an array and may be done across arrays at some point if those arrays are federated, but it is important to make distinctions about these things or we’ll have people saying things like “virtualized federated pools of disks”, when they could just say RAID instead.

    Federation is less about the presentation of volumes than about the group functionality provided by multiple arrays. There are many functions besides presenting LUNs that federation can aggregate or consolidate such as snapshot management and retention, remote copy (many will not want this to be done by distributed cache) and consolidated resource management. Tying federation to virtualization is seriously dumbing it down.

  4. slowe’s avatar

    (Note: For those that are not aware, Marc works for an EMC competitor, 3PAR.)

    Marc,

    Thanks for reading and responding. I appreciate your viewpoint and your opinion. It’s always nice to have a well-known and well-respected industry expert comment on your work.

    First, I’d like to say that you seemed to take a bit of a personal approach in your comment. While you and I might disagree over the definition of storage federation and what it is (or what it isn’t), I’d appreciate if we could leave the personal comments out of the discussion. I’m sure that wasn’t your intent, but that is how it comes across.

    Second, I’d like to state that my discussion of storage federation is really not that incompatible with your own: “the transparent, dynamic and non-disruptive distribution of storage resources across self-governing, discrete, peer storage systems”. Although you specifically exclude in-band storage virtualization systems such as VPLEX and IBM SVC from the concept of storage virtualization, I personally don’t find them in violation of your own definition. Yes, in-band storage virtualization systems such as VPLEX do govern the behavior of other storage systems, but they also provide the transparent, dynamic, and non-disruptive distribution of storage resources across peer storage systems. It seems to me that the exclusion of in-band storage virtualization is more of an arbitrary exclusion than anything else. Of course, I have not been privy to the extended conversations that you and other industry experts have been conducting, so there might be more to the story. If there is more to the story, I would certainly love to hear it.

    Third, I will restate again that while I personally feel that saying VPLEX delivers storage federation is an acceptable statement given the definitions (both yours and mine), I am not saying that storage federation is nothing more than storage virtualization. I said that VPLEX achieves storage federation via storage virtualization. I didn’t say that storage federation was the same as storage virtualization. I would agree with you that storage federation can be so much more (but doesn’t necessarily have to be more).

    Consider this: both Microsoft and VMware provide a means of virtualizing hardware and allowing multiple operating systems to run simultaneously on the same hardware. We call both of them “virtualization,” yet they are dramatically different, both in functionality as well as in architecture. Why can the term “storage federation” not be used in much the same way? Again, if there is more information that I am failing to see or recognize, I’d love to hear.

    Fourth, I’d like to clarify that I didn’t state that distributed cache coherency (or distributed write caching, as you refer to it) was the core of storage federation. I said that it was, in my opinion, the ability to present a consistent view of a volume, regardless of its physical location, that was the core of storage federation. Distributed cache coherency is merely one mechanism that is used to help achieve that functionality.

    Finally, I’d just like to point out the discussion of storage federation in this post is simply my perspective and my viewpoint. While you might not agree with it—and clearly you don’t—it doesn’t make my viewpoint any less valid. I neither disagreed with your own personal assessment of the meaning of storage federation nor slammed your definition; I merely stated that I personally felt my definition made the most sense to me. I’m not saying that everyone has to agree with me. I’m just saying that this is a definition that seems to fit and seems to make a fair amount of sense.

    Again, Marc, thanks for your comments.

  5. Marc Farley (3PARFarley)’s avatar

    Scott, you’ve apparently never had comments from other EMC bloggers other than to cheer you on apparently. I didn’t mean to be offensive, but I did intend to push your buttons. That’s life in the storage blogosphere as practiced by EMC bloggers. It’s part of the territory and if you are uncomfortable with it check with your comrades.

    The thing I need to apologize for is not IDing myself. Bad me. Thanks for covering this for me. I’ll get back to you on the other points you raised about different perspectives and all that but I have to run out for a bit.

  6. Marc Farley (3PARFarley)’s avatar

    Continuing with my last comment…

    The biggest proponent of excluding in-band virtualization was Barry Burke, aka The Storage Anarchist (EMC blogger). The argument about in-line virtualization is that you don’t want to call an address aggregator that maps volume space across “downstream targets” federation. THAT is virtualization, not federation.

    Considering that Vplex’s main function appears to be distributed caching – as opposed to virtualizing downstream targets – I’d say that it probably does qualify for consideration as a storage federation appliance. As a marketing guy I really appreciate not wanting to sell this as distributed caching.

    But there is the question of whether or not the federation function is intrinsic or extrinsic to the arrays. There are a lot of people that would say storage federation needs to be intrinsic to the array itself and not provided by external products that require a separate layer of management, as Vplex does. The argument I made in a blog post on Vplex was that minimal integration required to support 3rd party arrays demonstrates the lack of integration with arrays.

    The definition you attributed to me was arrived at through previous online discussions, although I did summarize it as: “the transparent, dynamic and non-disruptive distribution of storage resources across self-governing, discrete, peer storage systems”

    Yours and Joe’s differs considerably in significant ways: “The ability to create a consistent view of a volume, independent of its location.”

    Storage resources are more than LUNs; they include things like snapshots, policies, system metadata and all the things a team of storage systems might use to be self-governing, peer storage systems. So, no, I don’t agree that our definitions are very close.

    I would argue that if Vplex achieves storage federation it does it through distributed caching much more than through any storage virtualization funtionality it provides. Distributed caching may provide a first level of storage federation much the same way that FAST1 provides the first level of tiering (full-volume, not sub-volume). There are still problems with Vplex being external entities requiring additional management, though.

    I’m confused by your saying that you didn’t say distributed cache coherency was the core of storage federation. That’s the way it still reads to me after reading several times. I’m pretty sure that’s what Joe meant when he wrote: “…so something I forgot to mention in the previous post is the idea of Cache Coherence, which provides active-active data sharing. This is the ability to create a consistent view of a volume, independent of its location. This is the core behind Storage Federation.”

    Definitions in this business are very important. There was a lot of discussion about this a couple months ago and much cynicism over how watered down the term storage federation would become. I know how confounded things can become having spent a few years of my life writing books about storage and trying to make sense out of the multiple, overlapping vendor terms and finding generic terms to describe concepts. I don’t believe that any definition someone comes up with is as good as any other.

    So, you are now part of the discussion, whether or not you wanted to be – and that’s good. You are a smart guy and the fact that you work for EMC is not a problem. I agree and disagree with all of you from time to time and there are always customers to help keep us in line. These discussions can be direct and aren’t always going to be “polite first” , which is something that I’ve become accustomed to and you can too by growing a thicker skin.

  7. slowe’s avatar

    Marc,

    Thanks for your response. I’m just going to respond to one thing real quick, as I just landed in Boston and need to run a few errands. I’ll respond in more detail to the rest of your points later.

    I’d like to comment on your characterization of me based on the behavior of other EMC bloggers. I’m not other EMC bloggers; I’m me. You might have gotten accustomed to them interacting with you in a certain way, but that’s not necessarily the way I will interact with you. And in a related note, perhaps all other storage bloggers—perhaps all other bloggers, regardless of industry—want to be direct and dispense with politeness, but that’s not how I operate.

    Thanks for continuing the discussion! I’ll respond to the rest of your points shortly.

  8. slowe’s avatar

    Marc,

    You make some great points. I have just a few thoughts to add.

    With regard to the relationship between distributed cache coherency and storage federation, I can see your confusion. My emphasis was really only on Joe’s last statement, the one that I quoted (the statement about being able to present a consistent view of a volume independent of its location). My focus is not on distributed cache coherency as the heart of storage federation but rather on the effect of using distributed cache coherency—among other technologies—to provide a consistent view of a volume independent of its location. That effect is what I consider to be the heart of storage federation. I should have been clearer about that.

    Although you don’t think so—and that’s fine—I really do think that our definitions are far closer than you believe. Your definition (or, as you clarified, the definition derived from multiple online discussions) is a great technical definition. It is a solid academic definition. But it leaves people asking, “What does that mean, exactly?” The definition that I’m leaning toward is more of a real-world, in the trenches “common man’s” definition of storage federation. Yours describes the technical details behind the scenes, mine describes the net effect of those technical details. After all, what is the result of storage federation? What is the end result of discrete peer systems that dynamically and non-disruptively distribute storage resources? The end result is a consistent view of a volume regardless of where that volume actually resides.

    It’s quite likely that we won’t come to any sort of agreement here, and that’s OK. Different opinions are what drive the world and make it an interesting place! We can agree to disagree.

    Thanks!

  9. Marc Farley (3PARFarley)’s avatar

    Thanks Scott, I understand that you are an individual and not part of a tag team of bloggers at EMC. Nonetheless, the aggregate effect of EMC bloggers on competitors is like the good cop, bad cop treatment. You might be a good cop, but you are also now part of that bigger team and that is a change to your blogging context.

    We will disagree. The result of distributing storage resources can be much more than creating a consistent view of a volume. There are many important storage management functions that are not involved with how systems access storage.

    Thanks for the dialogue here.

  10. Harm’s avatar

    Great discussion, much learned :-) . Hope the following post can describe the extra functionalities, if there are. For example snapshotting? Or is it at the moment ‘just disks’.

  11. Arun’s avatar

    Does VPLEX and storage federation work across vendor platforms?

  12. slowe’s avatar

    Arun,

    VPLEX does support both EMC arrays as well as a limited number of non-EMC arrays. The number of non-EMC arrays will increase over time as additional products are tested and qualified.

    Harm,

    At the moment, it’s “just disks”. I would imagine that additional functionality is planned, but I don’t know that for certain.