RADON: Repairable Atomic Data Object in Networks

Authors Kishori M. Konwar, N. Prakash, Nancy A. Lynch, Muriel Médard



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2016.28.pdf
  • Filesize: 0.59 MB
  • 17 pages

Document Identifiers

Author Details

Kishori M. Konwar
N. Prakash
Nancy A. Lynch
Muriel Médard

Cite AsGet BibTex

Kishori M. Konwar, N. Prakash, Nancy A. Lynch, and Muriel Médard. RADON: Repairable Atomic Data Object in Networks. In 20th International Conference on Principles of Distributed Systems (OPODIS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 70, pp. 28:1-28:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.OPODIS.2016.28

Abstract

Erasure codes offer an efficient way to decrease storage and communication costs while implementing atomic memory service in asynchronous distributed storage systems. In this paper, we provide erasure-code-based algorithms having the additional ability to perform background repair of crashed nodes. A repair operation of a node in the crashed state is triggered externally, and is carried out by the concerned node via message exchanges with other active nodes in the system. Upon completion of repair, the node re-enters active state, and resumes participation in ongoing and future read, write, and repair operations. To guarantee liveness and atomicity simultaneously, existing works assume either the presence of nodes with stable storage, or presence of nodes that never crash during the execution. We demand neither of these; instead we consider a natural, yet practical network stability condition N1 that only restricts the number of nodes in the crashed/repair state during broadcast of any message. We present an erasure-code based algorithm RADON_{C} that is always live, and guarantees atomicity as long as condition N1 holds. In situations when the number of concurrent writes is limited, RADON_{C} has significantly improved storage and communication cost over a replication-based algorithm RADON_{R}, which also works under N1. We further show how a slightly stronger network stability condition N2 can be used to construct algorithms that never violate atomicity. The guarantee of atomicity comes at the expense of having an additional phase during the read and write operations.
Keywords
  • Atomicity
  • repair
  • fault-tolerance
  • storage cost
  • erasure codes

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. K. Aguilera, R. Janakiraman, and L. Xu. Using erasure codes efficiently for storage in a distributed system. In Proceedings of International Conference on Dependable Systems and Networks (DSN), pages 336-345, 2005. Google Scholar
  2. M. K. Aguilera, I. Keidar, D. Malkhi, J. P. Martin, and A. Shraery. Reconfiguring replicated atomic storage: A tutorial. Bulletin of the EATCS, 102:84-081, 2010. Google Scholar
  3. M. K. Aguilera, I. Keidar, D. Malkhi, and A. Shraer. Dynamic atomic storage without consensus. Journal of the ACM, pages 7:1-7:32, 2011. Google Scholar
  4. H. Attiya, A. Bar-Noy, and D. Dolev. Sharing memory robustly in message passing systems. Journal of the ACM, 42(1):124-142, 1996. Google Scholar
  5. H. Attiya, H. C. Chung, F. Ellen, S. Kumar, and J. L. Welch. Simulating a shared register in an asynchronous system that never stops changing - (extended abstract). In Distributed Computing - 29th International Symposium, DISC 2015, Tokyo, Japan, October 7-9, 2015, Proceedings, pages 75-91, 2015. Google Scholar
  6. R. Baldoni, S. Bonomi, A. M. Kermarrec, and M. Raynal. Implementing a register in a dynamic distributed system. In Distributed Computing Systems, 2009. ICDCS'09. 29th IEEE International Conference on, pages 639-647, June 2009. Google Scholar
  7. C. Cachin and S. Tessaro. Optimal resilience for erasure-coded byzantine distributed storage. In Proceedings of International Conference on Dependable Systems and Networks (DSN), pages 115-124, 2006. Google Scholar
  8. V. R. Cadambe, N. A. Lynch, M. Médard, and P. M. Musial. A coded shared atomic memory algorithm for message passing architectures. In Proceedings of 13th IEEE International Symposium on Network Computing and Applications (NCA), pages 253-260, 2014. Google Scholar
  9. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1-4:26, jun 2008. Google Scholar
  10. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP'07, pages 205-220, New York, NY, USA, 2007. ACM. Google Scholar
  11. A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh. A survey on network codes for distributed storage. Proceedings of the IEEE, 99(3):476-489, 2011. Google Scholar
  12. D. Dobre, G. Karame, W. Li, M. Majuntke, N. Suri, and M. Vukolić. Powerstore: proofs of writing for efficient and robust storage. In Proceedings of the 2013 ACM SIGSAC conference on Computer &communications security, pages 285-298, 2013. Google Scholar
  13. P. Dutta, R. Guerraoui, and R. R. Levy. Optimistic erasure-coded distributed storage. In Proceedings of the 22nd international symposium on Distributed Computing (DISC), pages 182-196, Berlin, Heidelberg, 2008. Google Scholar
  14. R. Fan and N. Lynch. Efficient replication of large data objects. In Distributed algorithms, Lecture Notes in Computer Science, pages 75-91, 2003. Google Scholar
  15. R. Guerraoui, R. R. Levy, B. Pochon, and J. Pugh. The collective memory of amnesic processes. ACM Trans. Algorithms, 4(1):1-31, 2008. Google Scholar
  16. J. Hendricks, G. R. Ganger, and M. K. Reiter. Low-overhead byzantine fault-tolerant storage. ACM SIGOPS Operating Systems Review, 41(6):73-86, 2007. Google Scholar
  17. C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure coding in windows azure storage. In Proc. USENIX Annual Technical Conference (ATC), pages 15-26, 2012. Google Scholar
  18. W. C. Huffman and V. Pless. Fundamentals of error-correcting codes. Cambridge university press, 2003. Google Scholar
  19. K. M. Konwar, N. Prakash, E. Kantor, N. Lynch, M. Medard, and A. A. Schwarzmann. Storage-optimized data-atomic algorithms for handling erasures 124 and errors in distributed storage systems. In 30th IEEE International Parallel &Distributed Processing Symposium (IPDPS), 2016. Google Scholar
  20. K. M. Konwar, N. Prakash, M. Medard, and N. Lynch. RADON: Repairable atomic data object in networks. CoRR, abs/1605.05717, 2016. Google Scholar
  21. L. Lamport. On interprocess communication. Distributed computing, 1(2):86-101, 1986. Google Scholar
  22. N. Lynch and A. A. Shvartsman. RAMBO: A reconfigurable atomic memory service for dynamic networks. In Proceedings of 16th International Symposium on Distributed Computing (DISC), pages 173-190, 2002. Google Scholar
  23. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, 1996. Google Scholar
  24. Peter Musial, Nicolas Nicolaou, and Alexander A. Shvartsman. Implementing distributed shared memory for dynamic networks. Communications of the ACM, 57(6):88-98, 2014. Google Scholar
  25. K. V. Rashmi, P. Nakkiran, J. Wang, N. B. Shah, and K. Ramchandran. Having your cake and eating it too: Jointly optimal erasure codes for i/o, storage, and network-bandwidth. In 13th USENIX Conference on File and Storage Technologies (FAST), pages 81-94, 2015. Google Scholar
  26. I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics, 8(2):300-304, 1960. Google Scholar
  27. M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. XORing elephants: novel erasure codes for big data. In Proceedings of the 39th international conference on Very Large Data Bases, pages 325-336, 2013. Google Scholar
  28. C. Shao, J. L. Welch, E. Pierce, and H. Lee. Multiwriter consistency conditions for shared memory registers. SIAM Journal on Computing, 40(1):28-62, 2011. Google Scholar
  29. A. Spiegelman, Y. Cassuto, G. Chockler, and I. Keidar. Space Bounds for Reliable Storage: Fundamental Limits of Coding. In Proceedings of the International Conference on Principles of Distributed Systems (OPODIS2015), 2015. Google Scholar
  30. A. Spiegelman and I. Keidar. On liveness of dynamic storage. CoRR, abs/1507.07086, 2015. URL: http://arxiv.org/abs/1507.07086.
  31. A. Spiegelman, I. Keidar, and D. Malkhi. Dynamic reconfiguration: A tutorial. OPODIS 2015, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail