Deduplication

You are currently browsing articles tagged Deduplication.

PHD Virtual Technologies, maker of the esXpress backup solution, are today announcing the availability of the latest version of their backup product. esXpress version 3.6 offers support for VMware vSphere 4 and, according to PHD Virtual, also offers improvements for all versions of VMware ESX since version 3.0.2.

esXpress 3.6 features a new deduplication engine that provides substantial new performance levels, based on information provided by PHD Virtual. For example:

  • File-level restores are now up to four times faster
  • Image-level restores are up to twice as fast
  • Initial backups are up to twice as fast

esXpress performs source-side deduplication at the block level to help reduce backup storage requirements and help keep WAN traffic to a minimum when backing up over a WAN.

It’s unclear to me whether esXpress leverages any new vSphere-specific features, like Changed Block Tracking, in the new release. PHD Virtual has committed to providing a review copy for me to run in the lab; once I’ve had the chance to “kick the tires,” so to speak, I’ll post more information here.

More details are available from PHD Virtual’s web site.

Tags: , , , ,

This session describes NetApp’s MultiStore functionality. MultiStore is the name given to NetApp’s functionality for secure logical partitioning of network and storage resources. The presenters for the session are Roger Weeks, TME with NetApp, and Scott Gelb with Insight Investments.

When using MultiStore, the basic building block is the vFiler. A vFiler is a logical construct within Data ONTAP that contains a lightweight instance of the Data ONTAP multi-protocol server. vFilers provide the ability to securely partition both storage resources and network resources. Storage resources are partitioned at either the FlexVol or Qtree level; it’s recommended to use FlexVols instead of Qtrees. (The presenters did not provide any further information beyond that recommendation. Do any readers have more information?) On the network side, the resources that can be logically partitioned are IP addresses, VLANs, VIFs, and IPspaces (logical routing tables).

Some reasons to use vFilers would include storage consolidation, seamless data migration, simple disaster recovery, or better workload management. MultiStore integrates with SnapMirror to provide some of the functionality needed for some of these use cases.

MultiStore uses vFiler0 to denote the physical hardware, and vFiler0 “owns” all the physical storage resources. You can create up to 64 vFiler instances, and active/active clustered configurations can support up to 130 vFiler instances (128 vFilers plus 2 vFiler0 instances) during a takeover scenario.

Each vFiler stores its configuration in a separate FlexVol (it’s own root vol, if you will). All the major protocols are supported within a vFiler context: NFS, CIFS, iSCSI, HTTP, and NDMP. Fibre Channel is not supported; you can only use Fibre Channel with vFiler0. This is due to the lack of NPIV support within Data ONTAP 7. (It’s theoretically possible, then, that if/when NetApp adds NPIV support to Data ONTAP that Fibre Channel would be supported within vFiler instances.)

Although it is possible to move resources between vFiler0 and a separate vFiler instance, doing so may impact client connections.

Managing vFilers appears to be the current weak spot; you can manage vFiler instances using the Data ONTAP CLI, but vFiler instances don’t have an interactive shell. Therefore, you have to direct commands to vFiler instances via SSH or RSH or using the vFiler context in vFiler0. You access the vFiler context by prepending the “vfiler” keyword to the commands at the CLI in vFiler0. Operations Manager 3.7 and Provisioning Manager can manage vFiler instances; FilerView can start, stop, or delete individual vFiler instances but cannot direct commands to an individual vFiler. If you need to manage CIFS on a vFiler instance, you can use the Computer Management MMC console to connect remotely to that vFiler instance to manage shares and share permissions, just as you can with vFiler0 (assuming CIFS is running within the vFiler, of course).

IPspaces are a logical routing construct that allow each vFiler to have its own routing table. For example, you may have a DMZ vFiler and an internal vFiler, each with their own, separate routing table. Up to 101 IPspaces are supported per controller. You can’t delete the default IPspace, as it’s the routing table for vFiler0. It is recommended to use VLANs and/or VIFs with IPspaces as a best practice.

One of the real advantages of using MultiStore and vFilers is the data migration and disaster recovery functionality that it enables when used in conjunction with SnapMirror. There are two sides to this:

  • “vfiler migrate” allows you to move an entire vFiler instance, including all data and configuration, from one physical storage system to another physical storage system. You can keep the same IP address or change the IP address. All other network identification remains the same: NetBIOS name, host name, etc., so the vFiler should look exactly the same across the network after the migration as it did before the migration.
  • “vfiler dr” is similar to “vfiler migrate” but uses SnapMirror to keep the source and target vFiler instances in sync with each other.

It makes sense, but you can’t use “vfiler dr” or “vfiler migrate” on vFiler0 (the physical storage system). My own personal thought regarding “vfiler dr”: what would this look like in a VMware environment using NFS? There could be some interesting possibilities there.

With regard to security, a Matasano security audit was performed and the results showed that there were no vulnerabilities that would allow “data leakage” between vFiler instances. This means that it’s OK to run a DMZ vFiler and an internal vFiler on the same physical system; the separation is strong enough.

Other points of interest:

  • Each vFiler adds about 400K of system memory, so keep that in mind when creating additional vFiler instances.
  • You can’t put more load on a MultiStore-enabled system than a non-MultiStore-enabled system. The ability to create logical vFilers doesn’t mean the physical storage system can suddenly handle more IOPS or more capacity.
  • You can use FlexShare on a MultiStore-enabled system to adjust priorities for the FlexVols assigned to various vFiler instances.
  • As of Data ONTAP 7.2, SnapMirror relationships created in a vFiler context are preserved during a “vfiler migrate” or “vfiler dr” operation.
  • More enhancements are planned for Data ONTAP 7.3, including deduplication support, SnapDrive 5.0 or higher support for iSCSI with vFiler instances, SnapVault additions, and SnapLock support.

Some of the potential use cases for MultiStore include file services consolidation (allows you to preserve file server identification onto separate vFiler instances), data migration, and disaster recovery. You might also use MultiStore if you needed support for multiple Active Directory domains with CIFS.

UPDATE: Apparently, my recollection of the presenters’ information was incorrect, and FTP is not a protocol supported with vFilers. I’ve updated the article accordingly.

Tags: , , , , , , , , , , , ,

This session provided information on enhancements to NetApp’s cloning functionality. These enhancements are due to be released along with Data ONTAP 7.3.1, which is expected out in December. Of course, that date may shift, but that’s the expected release timeframe.

The key focus of the session was new functionality that allows for Data ONTAP to clone individual files without a backing Snapshot. This new functionality is an extension of NetApp’s deduplication functionality, and is enabled by changes within Data ONTAP that enable block sharing, i.e., the ability for a single block to appear in multiple files or in multiple places in the same file. The number of times these blocks appear is tracked using a reference count. The actual reference count is always 1 less than the number of times the block appears. A block which is not shared has no reference count; a block that is shared in two locations has a reference count of 1. The maximum reference count is 255, so that means a single block is allowed to be shared up to 256 times within a single FlexVol. Unfortunately, there’s no way to view the reference count currently, as it’s stored in the WAFL metadata.

As with other cloning technologies, the only space that is required is for incremental changes from the base. (There is small overhead for metadata as well.) This functionality is going to be incorporated into the FlexClone license and will likely be referred to as “file-level FlexClone”. I suppose that cloning volumes with be referred to as “volume FlexClone” or similar.

This functionality will be command-line driven, but only from advanced mode (must do a “priv set adv” in order to access the commands). The commands are described below.

To clone a file or a LUN (the command is the same in both cases):

clone start <src_path> <dst_path> -n -l

To check the status of a cloning process or stop a cloning process, respectively:

clone status
clone stop

Existing commands for Snapshot-backed clones (“lun clone” or “vol clone”) will remain unchanged.

File-level cloning will integrate with Volume SnapMirror (VSM) without any problems; the destination will be an exact copy of the source, including clones. Not so for Qtree SnapMirror (QSM) and SnapVault, which will re-inflate the clones to full size. Users will need to run deduplication on the destination to try to regain the space. Dump/restores will work like QSM or SnapVault.

Now for the limitations, caveats and the gotchas:

  • Users can’t run single-file SnapRestore and a clone process at the same time.
  • Users can’t clone a file or a LUN that exists only in a Snapshot. The file or LUN must exist in the active file system.
  • ACLs and streams are not cloned.
  • The “clone” command does not work in a vFiler context.
  • Users can’t use synchronous SnapMirror with a volume that contains cloned files or LUNs.
  • Volume SnapRestore cannot run while cloning is in progress.
  • SnapDrive does not currently support this method of cloning. It’s anticipated that SnapManager for Virtual Infrastructure (SMVI) will be the first to leverage this functionality.
  • File-level FlexClone will be available for NFS only at first. Although it’s possible to clone data regions within a LUN, support is needed at the host level that isn’t present today.
  • Because blocks can only be shared 256 times (within a file or across files), it’s possible that some blocks in a clone will be full copies. This is especially true if there are lots of clones. Unfortunately, there is no easy way to monitor or check this. “df -s” can show space savings due to cloning, but that isn’t very granular.
  • There can be a maximum of 16 outstanding clone operations per FlexVol.
  • There is a maximum of 16TB of shared data among all clones. Trying to clone more than that results in full copies.
  • The maximum volume size for being able to use cloning is the same as for deduplication.

Obviously, VMware environments—VDI in particular—are a key use case for this sort of technology. (By the way, in case no one has yet connected the dots, this is the technology that I discussed here). To leverage this functionality, NetApp will update a tool known as the Rapid Cloning Utility (RCU; described in more detail in TR-3705) to take full advantage of file-level FlexCloning after Data ONTAP 7.3.1 is released. Note that the RCU is available today, but it only uses volume-level FlexClone.

Tags: , , , , , , , ,

FalconStor today announced a new file-interface deduplication system and enhancements to its Virtual Tape Library (VTL) software. FalconStor’s VTL product is resold by a number of other vendors, including EMC, Pillar, and SUN.

The new functionality being announced by FalconStor today is actually two-fold:

  1. First, FalconStor will now offer NFS and CIFS access to the deduplication repository through its file interface deduplication system (FIDS), enabling NAS-based disk-to-disk (D2D) backup solutions to integrate seamlessly with the FalconStor data repository. D2D backup solutions that are designed to write to NFS or CIFS can now dump backup data to a FalconStor appliance and take advantage of the deduplication functionality.
  2. Second, version 5.1 of the FalconStor VTL software adds support for 8Gb Fibre Channel and 10Gb Ethernet, enabling faster backups via these high-speed technologies. For backup solutions that utilize a “traditional” VTL, FalconStor can now offer higher throughput in those environments.

Overall, FalconStor’s system continues to retain or enhance features like many-to-one data replication, global deduplication across multiple appliances, and availability as a virtual appliance.

More information should be available on FalconStor’s web site, although at the time of publication of this article their site had not yet been updated.

Tags: , , , ,

Storage Short Take #4

Last week I provided a list of virtualization-related items that had made their way into my Inbox in some form or another; today I’ll share storage-related items with you in Storage Short Take #4! This post will also be cross-published to the Storage Monkeys Blogs.

  • Stephen Foskett has a nice round-up of some of the storage-related changes available to users in VMWare ESX 3.5 Update 3. Of particular note to many users is the VMDK Recovery Tool. Oh, and be sure to have a look at Stephen’s list of top 10 innovative enterprise storage hardware products. He invited me to participate in creating the list, but I just didn’t feel like I would have been able to contribute anything genuinely useful. Storage is an area I enjoy, but I don’t think I’ve risen to the ranks of “storage guru” just yet.
  • And in the area of top 10 storage lists, Marc Farley shares his list of top 10 network storage innovations as well. I’ll have to be honest—I recognize more of these products than I did ones on Stephen’s list.
  • Robin Harris of StorageMojo provides some great insight into the details behind EMC’s Atmos cloud storage product. I won’t even begin to try to summarize some of that information here as it’s way past my level, but it’s fascinating reading. What’s also interesting to me is that EMC chose to require users to use an API to really interact with the Atmos (more detailed reasons why provided here by Chad Sakac), while child company VMware is seeking to prevent users from having to modify their applications to take advantage of “the cloud.” I don’t necessarily see a conflict between these two approaches as they are seeking to address two different issues. Actually, I see similarities between EMC’s Atmos approach and Microsoft’s Azure approach, both which require retooling applications to take advantage of the new technology.
  • Speaking of Chad, here’s a recent post on how to add storage to the Celerra Virtual Appliance.
  • Andy Leonard took up a concern about NetApp deduplication and volume size limits a while back. The basic gist of the concern is that in its current incarnation, NetApp deduplication limits the size of the volume that can be deduplicated. If the size of the volume ever exceeds that limit, it can’t be deduplicated—even if the volume is subsequently resized back within the limit. With that in mind, users must actively track deduplication space savings so that, in the event they need to undo the deduplication, they don’t inadvertently lose the ability to deduplicate because they exceeded the size limit. Although Larry Freeman aka “Dr Dedupe” responded in the comments to Andy’s post, I don’t think that he actually addressed the problem Andy was trying to state. Although the logical data size can grow to 16TB within a deduplicated volume, you’ll still need to watch deduplication space savings if you think you might need to undo the deduplication for whatever reason. Otherwise, you could exceed the volume size limitations and lose the ability to deduplicate that volume.
  • And while we are on the subject of NetApp, a blog post by Beth Pariseau from earlier in the year recently caught my attention; it was in regards to NetApp Snapshots in LUN environments. I’ve discussed a little bit of this before in my post about managing space requirements with LUNs. The basic question: how much additional space is recommended—or required—when using Snapshots and LUNs? Before the advent of Snapshot auto-delete and volume autogrow, the mantra from NetApp was “2x + delta”—two times the size of the LUN plus changes. With the addition of these features, deduplication, and additional thin provisioning functionality, NetApp has now moved their focus to “1x + Delta”—the size of the LUN plus space needed for changes. It’s not surprising to me that there is confusion in this area, as NetApp themselves has worked so hard to preach “2x + Delta” and now has to go back and change their message. Bottom line: You’re going to need additional space for storing Snapshots of your LUNs, and the real amount is determined by your change rate, how many Snapshots you will keep, and for how long you will keep them. 20% might be enough, or you might need 120%. It all depends upon your applications and your business needs.
  • If you’re into Solaris ZFS, be sure to have a look at this NFS performance white paper by Sun. It provides some good details on recent changes to how NFS exports are implemented in conjunction with ZFS.

That’s it for this time around, but feel free to share any interesting links and your thoughts on them in the comments!

Tags: , , , , , , , , , ,