Data protection techniques for object storage systems

Rate this item
(0 votes)

Techniques such as replication and erasure coding protect data on object storage systems and other high-capacity primary storage systems when traditional backup is difficult.

Object storage systems are designed to cost effectively store a lot of data for a very long period of time. However, that makes traditional backup difficult, if not impossible. To ensure data is protected from both disk failure and corruption, vendors use replication or erasure coding (or a combination of the two).

Even if you are not considering object storage, understanding the differences between these data protection techniques is important since many primary storage arrays are beginning to use them. We explore the pros and cons of each approach so you can determine which method of data protection is best for your data center.

Scale-out basics

Most object storage systems, as well as converged systems, rely on scale-out storage architectures. These architectures are built around a cluster of servers that provide storage capacity and performance. Each time another node is added to the cluster, the performance and capacity of the overall cluster is increased.

These systems require redundancy across multiple storage nodes so that if one node fails, data can still be accessed. Typical RAID levels such as RAID 5 and RAID 6 are particularly ill-suited for this multi-node data distribution because of their slow rebuild times.

Replication pros and cons

Replication was the most prevalent form of data protection in early object storage systems and is becoming a common data protection technique in converged infrastructures, which are also node-based.

In this protection scheme, each unique object is copied a given number of times to a specified number of nodes, where the number of copies and how they're distributed (how many nodes receive a copy) is set manually or by policy. Many of these products also have the ability to control the location of the nodes that will receive the copies. They can be in different racks, different rows and, of course, different data centers.

The advantage of replication is that it is a relatively lightweight process, in that no complex calculations have to be made (compared with erasure coding). Also, it creates fully usable, standalone copies that are not dependent on any other data set for access. In converged or hyperconverged architectures, replication also allows for better virtual machine performance since all data can be served up locally.

The obvious downside to replication is that full, complete copies are made, and each redundant copy consumes that much more storage capacity. For smaller environments, this can be a minor detail. For environments with multiple petabytes of information, it can be a real problem. For example, a 5 PB environment could require 15 PB of total capacity, assuming a relatively common three-copy strategy.

What is erasure coding?

Erasure coding is a parity-based data protection scheme similar to RAID 5 and 6. But erasure coding operates at a lower level of granularity. In RAID 5 and 6, the lowest common denominator is the volume, where with erasure coding, it is the object. This means if there is a drive failure or node failure, only the objects on that drive or node need to be recreated, not the entire volume.

Similar to replication, erasure coding can be set either manually or by policy to survive a certain number of node failures before there is data loss. Many systems extend erasure coding between data centers, so that the data can be automatically distributed between data centers and nodes within those data centers.

Since it is parity-based, erasure coding does not create multiple, redundant copies of data the way replication does. This means the cost of additional capacity "overhead" for erasure coding is measured in fractions of the primary data set instead of multiples. An erasure-coded methodology designed to provide protection from the same number of failures as a 3x replication method requires an approximately 25% overhead instead of 300%.

The downside to erasure coding is that it's not as lightweight as replication. It typically requires more CPU and RAM resources to manage and to calculate the parity. More importantly, every access requires data to be reconstituted (since erasure-coded data is parsed and stored in changed block increments across nodes). This process can bog down considerably across the storage network compared with replication, which again could be designed to require little or no storage network traffic. The requirement for additional network traffic could be particularly troublesome in a WAN or cloud implementation since the WAN will create latency on every access.

Blended model

In an attempt to deliver the best of both worlds, some vendors are creating blended models. The first form of this is one where replication is the method used within the data center, so that most accesses from storage have the benefit of LAN-like performance. Then, erasure coding is used for data distribution to the other data centers in the organization. While capacity consumption is still high, data integrity is equally high.

The other blended model is based solely on erasure coding, but the erasure coding is zoned by data center. In this model, erasure coding is used locally and across the WAN, but one copy of all the data remains in the data center that needs it most. Then data is erasure-coded remotely across the other data centers in the customer's ecosystem. While this method consumes more capacity than regular erasure coding, it is still more efficient than the other blended model.

What is the best data protection method?

As is always, the answer is, "It depends." There is a lot to like about the simplicity of replication. It works well for data centers with less than 25 TB. But as data grows, continuing with a replication strategy for data protection becomes untenable.

For larger data centers, replication becomes too expensive from a capacity consumption perspective. If they have high bandwidth or short interconnection distances, erasure coding provides excellent storage efficiency and ideal data distribution. Data centers that have latency issues should consider one of the blended models; most likely, the second model which provides almost as good data efficiency and eliminates network latency issues for day-to-day data accesses.


Last modified on Friday, 15 January 2016 12:14
Data Recovery Expert

Viktor S., Ph.D. (Electrical/Computer Engineering), was hired by DataRecoup, the international data recovery corporation, in 2012. Promoted to Engineering Senior Manager in 2010 and then to his current position, as C.I.O. of DataRecoup, in 2014. Responsible for the management of critical, high-priority RAID data recovery cases and the application of his expert, comprehensive knowledge in database data retrieval. He is also responsible for planning and implementing SEO/SEM and other internet-based marketing strategies. Currently, Viktor S., Ph.D., is focusing on the further development and expansion of DataRecoup’s major internet marketing campaign for their already successful proprietary software application “Data Recovery for Windows” (an application which he developed).


  • Comment Link Daniel Wednesday, 28 February 2018 22:39 posted by Daniel

    It's nearly impossible to find experienced people
    on this subject, but you seem like you know what you're talking about!

  • Comment Link Josephine Friday, 23 February 2018 21:28 posted by Josephine

    Christmas trees require special consideration.

  • Comment Link Octavio Wednesday, 07 February 2018 04:15 posted by Octavio

    Hey! I just wanted to ask if you ever have any problems with hackers?
    My last blog (wordpress) was hacked and I ended up losing several weeks of
    hard work due to no back up. Do you have any methods to stop hackers?

  • Comment Link Kellie Tuesday, 06 February 2018 23:24 posted by Kellie

    A fantastic limousine company offers great service.

  • Comment Link Lesli Wednesday, 31 January 2018 07:16 posted by Lesli

    Greetings! I've been reading your blog for a while now and finally got the bravery to go ahead and give you a shout out from Kingwood Tx!
    Just wanted to mention keep up the good job!

  • Comment Link Adrianne Wednesday, 31 January 2018 05:31 posted by Adrianne

    This is the perfect website for everyone who would like to understand this topic.
    You know so much its almost tough to argue with you (not that I really would
    want to…HaHa). You certainly put a brand new spin on a subject that's been discussed for decades.
    Excellent stuff, just great!

  • Comment Link Ulrike Friday, 05 January 2018 14:30 posted by Ulrike

    Right away I am going to do my breakfast, once having my breakfast coming again to read additional news.

  • Comment Link Danny Tuesday, 02 January 2018 08:04 posted by Danny

    Have you ever thought about adding a little bit more
    than just your articles? I mean, what you say is fundamental and all.
    But just imagine if you added some great photos
    or videos to give your posts more, "pop"! Your content is
    excellent but with images and clips, this website could certainly be one of the best in its field.
    Excellent blog!

  • Comment Link Loretta Saturday, 25 November 2017 13:25 posted by Loretta

    Very good post. I am going through a few of these
    issues as well..

  • Comment Link Phillipp Thursday, 09 November 2017 23:18 posted by Phillipp

    I think the admin of this web site is in fact working hard for his web
    page, since here every material is quality based material.

Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.