Simple Regenerating Codes:Network Coding for Cloud Storage

Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S.
Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li INFOCOM 2012 Presented by Tangkai

Index  About the author  Introduction  SRC 
Simulations  Conclusion

About the author  Jianqiang Luo ◦ Experience  Senior
Software Engineer @ EMC  Received PhD, Wayne State University  Intern @ Microsoft, Data Domain  Team Leader @ Actuate  Received MS, SJTU ◦ Specialties  Working on distributed storage systems during PhD Performance profiling.

About the author  Alexandros G. Dimakis ◦ Assistant Professor
Dept of EE – Systems, USC ◦ Research interests:  Communications, signal processing and networking. ◦ INFOCOM 2012 - 2 ◦ Erasure code MDS MSR MBR etc

About the author  Cheng Huang ◦ Education  Microsoft
Research  Ph.D. Washington University  B.S. and M.S. EE Dept, SJTU ◦ Research interest  cloud services, internet measurements, erasure correction codes, distributed storage systems, peer-to-peer streaming, networking and multimedia communications. ◦ INFOCOM 2011  Public DNS System and Global Traffic Management  Estimating the Performance of Hypothetical Cloud Service Deployments: A Measurement-Based Approach

About the author  Jin Li ◦ Experience  Microsoft
Research  BS/MS/PhD THU (within 7 years)  ◦ Title  IEEE Fellow  GLOBECOM/ICME/ACM MM Chair

Introduction  Background ◦ We have come into BIG DATA
ERA!  Digital Universe 1.8 ZB (=1.8e9 TB)  Several PBs photo stored on Facebook  14.1PB data stored on Taobao (2010) ◦ Data security is IMPORTANT  Free from unwanted actions of unauthorized users.  Free from data loss caused by destructive forces

Introduction  Background ◦ Recovery  rare exception -> regular
operation  GFS[1]:  Hundreds or even thousands of machines  Inexpensive commodity parts  High concurrency/IO ◦ High failure tolerance, both for  High availability and to prevent data loss [1] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in SOSP ’03: Proc. of the 19th ACM Symposium on Operating Systems Principles, 2003.

Introduction  Background ◦ Erasure coding > replication  1.
Better reliability  2. Lower storage cost ◦ Some applications  Cloud storage systems  Archival storage  Peer-to-peer storage systems

Introduction  Erasure coding: MDS A B A B A+B
B A+2B A A+B A B (3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 k=2 n=3 n=4 File or data object

Introduction  Erasure coding vs. Replica[3] A B A A
B B A+B A+2B (4,2) MDS erasure code (any 2 suffice to recover) A B vs Erasure coding is introducing redundancy in an optimal way. Very useful in practice i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor)… Replication File or data object [3]A. G. Dimakis, P. G. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran,“Network coding for distributed storage systems,” in IEEE Trans. on Inform. Theory, vol. 56, pp. 4539 – 4551, Sep. 2010.

Introduction  Metrics ◦ Storage per node (α) ◦ Repair
Bandwidth per single node repair (γ) ◦ Disk Accesses per single node repair (d) ◦ Effective Coding Rate (R)  Contribution ◦ High R, Small d ◦ Low repair computation complexity

SRC  SRC: Simple Regenerating Codes ◦ Regenerating Codes 
address the issue of rebuilding (also called repairing) lost encoded fragments from existing encoded fragments. This issue arises in distributed storage systems where communication to maintain encoded redundancy is a problem.

SRC  Object  Requirement I: (n, k) property 
MDS[2] [2] Alexandros G. Dimakis, Kannan Ramchandran, Yunnan Wu, Changho Suh: A Survey on Network Codes for Distributed Storage. in Proceedings of the IEEE, 2011

SRC ◦ MDS

SRC  Requirement II: efficient exact repair ◦ Efficient: Low
complexity ◦ Exact repair (vs. functional repair)[3] :  1. [demands]Data have to stay in systematic form  2. [complexity]Updating repairing-decoding rules-> additional overhead  3. [security] dynamic repairing-and-decoding rules observed by eavesdroppers -> information leakage [2] Changho Suh, Kannan Ramchandran: Exact Regeneration Codes for Distributed Storage Repair Using Interference Alignment. in IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH 2011.

SRC  Solution ◦ MDS codes are used to provide
reliability to meets Requirement I ◦ simple XORs applied over the MDS coded packets provide efficient exact repair to meets Requirement II

SRC  Construction

SRC  Repair

(n,k,2)-SRC  Code Construction ◦ File f , of size
M = 2k ◦ Split into 2 parts ◦ 1. 2 independent (n,k)-MDS encoding ◦ 2. Generating a parity sum vector using XOR

(n,k,2)-SRC  Distribution ◦ 3n chunks in n storage nodes

(n,k,2)-SRC  Repair

(n,k,f)-SRC  General Code Construction ◦ File f , of
size M = fk ◦ Cut into f parts ◦ 1. f independent (n,k)-MDS encoding ◦ 2. Generating a parity sum vector using XOR

(n,k,f)-SRC  Distribution ◦ (f+1)n chunks in n storage nodes

(n,k,f)-SRC  Repair

(n,k,f)-SRC  Theorem ◦ Effective Coding Rate (R)  SRC
is a fraction f/f+1 of the coding rate of an (n, k) MDS code, hence is upper bounded

(n,k,f)-SRC  Theorem ◦ Effective Coding Rate (R)

(n,k,f)-SRC  Theorem ◦ Storage per node (α) ◦ Repair
Bandwidth per single node repair (γ) ◦ Disk Accesses per single node repair (d)  Seek time

(n,k,f)-SRC  Theorem ◦ Disk Accesses per single node repair
(d)  Starting with f disk accesses for the first chunk repair

(n,k,f)-SRC  Theorem ◦ Disk Accesses per single node repair
(d)  each additional chunk repair requires an additional disk access

(n,k,f)-SRC  Comparasion

(n,k,f)-SRC  Asymptotics of the SRC -> MDS ◦ let
the degree of parities f grow as a function of k ◦ Repair Bandwidth per single node repair (γ) ◦ Effective Coding Rate (R)

Simulations  Simulator Introduction ◦ One master, other storage server.
◦ Chunks form the smallest accessible data units and in our system are set to be 64MB  Simulator Validation ◦ 16 machines ◦ 1Gbps network. ◦ 410GB data per machine ◦ Approximately 6400 chunks

Simulations  Simulator Validation ◦ matches very well, when the
percentile is below 95

Simulations  Storage Cost Analysis ◦ 3-way replication as baseline

Simulations  Repair Performance ◦ Calculated on time ◦ Highlights:
Scalability

Simulations  Degraded Read Performance ◦ The only difference is
after a chunk is repaired, we do not write it back.

Simulations  Data Reliability Analysis ◦ simple Markov model to
estimate the reliability ◦ 5 years /1PB data / ◦ 30 min for replica / 15 min for SRC

Simulations  Data Reliability Analysis  Several order of magnitude
of reliablity  Scalability

Conclusions  Highlight ◦ R-S  Low IO/bandwidth -> scalability
◦ replica  High reliability  Decent repair/degraded read performance

Critical Thinking  Simulation  （n, k）as n grows, erasure
performance is weaker  Compare ◦ MSR? ◦ Exact? ◦ Implementation - > Simulation

Simple Regenerating Codes:Network Coding for Cl...

Simple Regenerating Codes:Network Coding for Cloud Storage

More Decks by Kevin Tong

Other Decks in Technology

Featured

Transcript