Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage

Authors Francisco Maturana , K. V. Rashmi



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2020.66.pdf
  • Filesize: 0.67 MB
  • 26 pages

Document Identifiers

Author Details

Francisco Maturana
  • Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
K. V. Rashmi
  • Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA

Acknowledgements

We thank Venkatesan Guruswami and Michael Rudow for their helpful suggestions on improving the writing for the paper.

Cite AsGet BibTex

Francisco Maturana and K. V. Rashmi. Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 66:1-66:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ITCS.2020.66

Abstract

Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high. Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Coding theory
  • Theory of computation → Error-correcting codes
Keywords
  • Coding theory
  • Reed-Solomon codes
  • Erasure codes
  • Code conversion
  • Distributed storage

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Vitaly Abdrashitov, N Prakash, and Muriel Médard. The storage vs repair bandwidth trade-off for multiple failures in clustered storage networks. In 2017 IEEE Information Theory Workshop (ITW), pages 46-50. IEEE, 2017. Google Scholar
  2. Abhishek Agarwal, Alexander Barg, Sihuang Hu, Arya Mazumdar, and Itzhak Tamo. Combinatorial alphabet-dependent bounds for locally recoverable codes. IEEE Transactions on Information Theory, 64(5):3481-3492, 2018. Google Scholar
  3. Omar Alrabiah and Venkatesan Guruswami. An Exponential Lower Bound on the Sub-packetization of MSR Codes. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 979-985, New York, NY, USA, 2019. ACM. Google Scholar
  4. Apache Software Foundation. Apache hadoop: HDFS erasure coding. Accessed: 2019-07-23. URL: https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
  5. SB Balaji and P Vijay Kumar. A tight lower bound on the sub-packetization level of optimal-access MSR and MDS codes. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 2381-2385. IEEE, 2018. Google Scholar
  6. Alexander Barg, Kathryn Haymaker, Everett W Howe, Gretchen L Matthews, and Anthony Várilly-Alvarado. Locally recoverable codes from algebraic curves and surfaces. In Algebraic Geometry for Coding Theory and Cryptography, pages 95-127. Springer, 2017. Google Scholar
  7. M. Blaum, J. Brady, J. Bruck, and Jai Menon. EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Transactions on Computers, 44(2):192-202, February 1995. Google Scholar
  8. Mario Blaum, James Lee Hafner, and Steven Hetzler. Partial-MDS codes and their application to RAID type of architectures. IEEE Transactions on Information Theory, 59(7):4510-4519, 2013. Google Scholar
  9. Dhruba Borthakur, Rodrigo Schmidt, Ramkumar Vadali, Scott Chen, and Patrick Kling. HDFS RAID - Facebook. URL: http://www.slideshare.net/ydn/hdfs-raid-facebook.
  10. Viveck R Cadambe, Cheng Huang, Jin Li, and Sanjeev Mehrotra. Polynomial length MDS codes with optimal repair in distributed storage. In 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pages 1850-1854. IEEE, 2011. Google Scholar
  11. Viveck R Cadambe, Syed A Jafar, Hamed Maleki, Kannan Ramchandran, and Changho Suh. Asymptotic interference alignment for optimal repair of MDS codes in distributed storage. IEEE Transactions on Information Theory, 59(5):2974-2987, 2013. Google Scholar
  12. Viveck R Cadambe and Arya Mazumdar. Bounds on the size of locally recoverable codes. IEEE Transactions on Information Theory, 61(11):5787-5794, 2015. Google Scholar
  13. Ameera Chowdhury and Alexander Vardy. New Constructions of MDS Codes with Asymptotically Optimal Repair. In 2018 IEEE International Symposium on Information Theory, pages 1944-1948, 2018. Google Scholar
  14. Hoang Dau, Iwan M Duursma, Han Mao Kiah, and Olgica Milenkovic. Repairing Reed-Solomon codes with multiple erasures. IEEE Transactions on Information Theory, 64(10):6567-6582, 2018. Google Scholar
  15. Hoang Dau and Olgica Milenkovic. Optimal repair schemes for some families of full-length Reed-Solomon codes. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 346-350. IEEE, 2017. Google Scholar
  16. A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. Network Coding for Distributed Storage Systems. IEEE Transactions on Information Theory, 56(9):4539-4551, September 2010. Google Scholar
  17. S Luna Frank-Fischer, Venkatesan Guruswami, and Mary Wootters. Locality via Partially Lifted Codes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  18. S. Ghemawat, H. Gobioff, and S.T. Leung. The Google file system. In ACM SIGOPS Operating Systems Review, volume 37-5, pages 29-43. ACM, 2003. Google Scholar
  19. H. Gluesing-Luerssen, J. Rosenthal, and R. Smarandache. Strongly-MDS convolutional codes. IEEE Transactions on Information Theory, 52(2):584-598, February 2006. Google Scholar
  20. Parikshit Gopalan, Cheng Huang, Bob Jenkins, and Sergey Yekhanin. Explicit maximally recoverable codes with locality. IEEE Transactions on Information Theory, 60(9):5245-5256, 2014. Google Scholar
  21. Parikshit Gopalan, Cheng Huang, Huseyin Simitci, and Sergey Yekhanin. On the locality of codeword symbols. IEEE Transactions on Information Theory, 58(11):6925-6934, 2012. Google Scholar
  22. Sreechakra Goparaju, Arman Fazeli, and Alexander Vardy. Minimum Storage Regenerating Codes for All Parameters. IEEE Transactions on Information Theory, 63(10):6318-6328, 2017. Google Scholar
  23. Sreechakra Goparaju, Itzhak Tamo, and Robert Calderbank. An improved sub-packetization bound for minimum storage regenerating codes. IEEE Transactions on Information Theory, 60(5):2770-2779, 2014. Google Scholar
  24. Sivakanth Gopi, Venkatesan Guruswami, and Sergey Yekhanin. Maximally Recoverable LRCs: A field size lower bound and constructions for few heavy parities. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2154-2170. SIAM, 2019. Google Scholar
  25. Venkatesan Guruswami and Ankit Singh Rawat. MDS code constructions with small sub-packetization and near-optimal repair bandwidth. In ACM-SIAM Symposium on Discrete Algorithms, 2017. Google Scholar
  26. Venkatesan Guruswami and Mary Wootters. Repairing Reed-Solomon codes. In ACM Symposium on Theory of Computing, pages 216-226, 2016. Google Scholar
  27. Venkatesan Guruswami, Chaoping Xing, and Chen Yuan. How Long Can Optimal Locally Repairable Codes Be? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  28. James Lee Hafner. WEAVER codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - Volume 4, FAST'05, pages 16-16, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
  29. Yuchong Hu, Xiaoyang Zhang, Patrick PC Lee, and Pan Zhou. Generalized Optimal Storage Scaling via Network Coding. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 956-960. IEEE, 2018. Google Scholar
  30. C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of USENIX Annual Technical Conference (ATC), 2012. Google Scholar
  31. Cheng Huang and Lihao Xu. STAR: An efficient coding scheme for correcting triple storage node failures. IEEE Transactions on Computers, 57(7):889-901, 2008. Google Scholar
  32. Saurabh Kadekodi, K. V. Rashmi, and Gregory R. Ganger. Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. USENIX FAST, 2019. Google Scholar
  33. Govinda M Kamath, N Prakash, V Lalitha, and P Vijay Kumar. Codes with local regeneration and erasure correction. IEEE Transactions on Information Theory, 60(8):4637-4660, 2014. Google Scholar
  34. F.J. MacWilliams and N.J.A. Sloane. The Theory of Error-Correcting Codes. North-holland Publishing Company, 2nd edition, 1978. Google Scholar
  35. Kaveh Mahdaviani, Ashish Khisti, and Soheil Mohajer. Bandwidth adaptive & error resilient MBR exact repair regenerating codes. IEEE Transactions on Information Theory, 65(5):2736-2759, 2018. Google Scholar
  36. Kaveh Mahdaviani, Soheil Mohajer, and Ashish Khisti. Product matrix MSR codes with bandwidth adaptive exact repair. IEEE Transactions on Information Theory, 64(4):3121-3135, 2018. Google Scholar
  37. Jay Mardia, Burak Bartan, and Mary Wootters. Repairing multiple failures for scalar MDS codes. IEEE Transactions on Information Theory, 65(5):2661-2672, 2018. Google Scholar
  38. A. Mazumdar. Capacity of Locally Recoverable Codes. In 2018 IEEE Information Theory Workshop, pages 1-5, November 2018. Google Scholar
  39. S. Mousavi, T. Zhou, and C. Tian. Delayed Parity Generation in MDS Storage Codes. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 1889-1893, June 2018. Google Scholar
  40. D Papailiopoulos, A Dimakis, and V Cadambe. Repair Optimal Erasure Codes through Hadamard Designs. IEEE Transactions on Information Theory, 59(5):3021-3037, May 2013. Google Scholar
  41. Dimitris S Papailiopoulos and Alexandros G Dimakis. Locally repairable codes. IEEE Transactions on Information Theory, 60(10):5843-5855, 2014. Google Scholar
  42. David A Patterson, Garth Gibson, and Randy H Katz. A case for redundant arrays of inexpensive disks (RAID), volume 17-3. ACM, 1988. Google Scholar
  43. J.S. Plank. T1: Erasure codes for storage applications. Proceedings of the 4th USENIX Conference on File and Storage Technologies, pages 1-74, January 2005. Google Scholar
  44. Brijesh Kumar Rai, Vommi Dhoorjati, Lokesh Saini, and Amit K Jha. On adaptive distributed storage systems. In 2015 IEEE international symposium on information theory (ISIT), pages 1482-1486. IEEE, 2015. Google Scholar
  45. K. V. Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. A Hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In ACM SIGCOMM, 2014. Google Scholar
  46. K. V. Rashmi, Nihar B Shah, and P Vijay Kumar. Enabling node repair in any erasure code for distributed storage. In 2011 IEEE International Symposium on Information Theory Proceedings, pages 1235-1239. IEEE, 2011. Google Scholar
  47. K. V. Rashmi, Nihar B Shah, and P Vijay Kumar. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory, 57(8):5227-5239, 2011. Google Scholar
  48. K. V. Rashmi, Nihar B Shah, and Kannan Ramchandran. A piggybacking design framework for read-and download-efficient distributed storage codes. In 2013 IEEE International Symposium on Information Theory, 2013. Google Scholar
  49. K. V. Rashmi, Nihar B Shah, and Kannan Ramchandran. A piggybacking design framework for read-and download-efficient distributed storage codes. IEEE Transactions on Information Theory, 63(9):5802-5820, 2017. Google Scholar
  50. Ankit Singh Rawat, O Ozan Koyluoglu, and Sriram Vishwanath. Progress on high-rate MSR codes: Enabling arbitrary number of helper nodes. In 2016 Information Theory and Applications Workshop (ITA), pages 1-6. IEEE, 2016. Google Scholar
  51. Ankit Singh Rawat, Onur Ozan Koyluoglu, Natalia Silberstein, and Sriram Vishwanath. Optimal locally repairable and secure codes for distributed storage systems. IEEE Transactions on Information Theory, 60(1):212-236, 2013. Google Scholar
  52. Ron M Roth and Gadiel Seroussi. On Generator Matrices of MDS Codes. IEEE Transactions on Infortmation Theory, 31(6):826-830, November 1985. Google Scholar
  53. Birenjith Sasidharan, Gaurav Kumar Agarwal, and P Vijay Kumar. A high-rate MSR code with polynomial sub-packetization level. In 2015 IEEE International Symposium on Information Theory (ISIT), pages 2051-2055. IEEE, 2015. Google Scholar
  54. Birenjith Sasidharan, Myna Vajha, and P Vijay Kumar. An explicit, coupled-layer construction of a high-rate MSR code with low sub-packetization level, small field size and d<(n- 1). In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2048-2052. IEEE, 2017. Google Scholar
  55. Nihar B Shah, K. V. Rashmi, P Vijay Kumar, and Kannan Ramchandran. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. IEEE Transactions on Information Theory, 58(4):2134-2158, 2011. Google Scholar
  56. Nihar B Shah, KV Rashmi, and P Vijay Kumar. A flexible class of regenerating codes for distributed storage. In 2010 IEEE International Symposium on Information Theory, pages 1943-1947. IEEE, 2010. Google Scholar
  57. Karthikeyan Shanmugam, Dimitris S Papailiopoulos, Alexandros G Dimakis, and Giuseppe Caire. A repair framework for scalar MDS codes. IEEE Journal on Selected Areas in Communications, 32(5):998-1007, 2014. Google Scholar
  58. Kenneth W Shum. Cooperative regenerating codes for distributed storage systems. In 2011 IEEE International Conference on Communications (ICC), pages 1-5. IEEE, 2011. Google Scholar
  59. Mridupawan Sonowal and Brijesh Kumar Rai. On adaptive distributed storage systems based on functional MSR code. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pages 338-343. IEEE, 2017. Google Scholar
  60. C. Suh and Kannan Ramchandran. Exact-Repair MDS Code Construction Using Interference Alignment. IEEE Transactions on Information Theory, pages 1425-1442, March 2011. Google Scholar
  61. Itzhak Tamo and Alexander Barg. A family of optimal locally recoverable codes. IEEE Transactions on Information Theory, 60(8):4661-4676, 2014. Google Scholar
  62. Itzhak Tamo, Alexander Barg, and Alexey Frolov. Bounds on the parameters of locally recoverable codes. IEEE Transactions on Information Theory, 62(6):3070-3083, 2016. Google Scholar
  63. Itzhak Tamo, Dimitris S Papailiopoulos, and Alexandros G Dimakis. Optimal locally repairable codes and connections to matroid theory. IEEE Transactions on Information Theory, 62(12):6661-6671, 2016. Google Scholar
  64. Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory, 59(3):1597-1616, 2013. Google Scholar
  65. Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. Access versus bandwidth in codes for storage. IEEE Transactions on Information Theory, 60(4):2028-2037, 2014. Google Scholar
  66. Itzhak Tamo, Min Ye, and Alexander Barg. Optimal repair of Reed-Solomon codes: Achieving the cut-set bound. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 216-227. IEEE, 2017. Google Scholar
  67. Itzhak Tamo, Min Ye, and Alexander Barg. The Repair Problem for Reed-Solomon Codes: Optimal Repair of Single and Multiple Erasures With Almost Optimal Node Size. IEEE Transactions on Information Theory, 65(5):2673-2695, 2018. Google Scholar
  68. Zhiying Wang, Itzhak Tamo, and Jehoshua Bruck. On codes for optimal rebuilding access. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1374-1381. IEEE, 2011. Google Scholar
  69. Zhiying Wang, Itzhak Tamo, and Jehoshua Bruck. Long MDS codes for optimal repair bandwidth. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 1182-1186. IEEE, 2012. Google Scholar
  70. Lihao Xu and Jehoshua Bruck. X-code: MDS array codes with optimal encoding. IEEE Transactions on Information Theory, 45(1):272-276, 1999. Google Scholar
  71. M. Ye and A. Barg. Explicit Constructions of Optimal-Access MDS Codes With Nearly Optimal Sub-Packetization. IEEE Transactions on Information Theory, 63(10):6307-6317, October 2017. Google Scholar
  72. Min Ye and Alexander Barg. Explicit constructions of MDS array codes and RS codes with optimal repair bandwidth. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1202-1206. IEEE, 2016. Google Scholar
  73. Min Ye and Alexander Barg. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. IEEE Transactions on Information Theory, 63(4):2001-2014, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail