Dynamic Atomic Snapshots

Authors Alexander Spiegelman, Idit Keidar



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2016.33.pdf
  • Filesize: 0.62 MB
  • 16 pages

Document Identifiers

Author Details

Alexander Spiegelman
Idit Keidar

Cite AsGet BibTex

Alexander Spiegelman and Idit Keidar. Dynamic Atomic Snapshots. In 20th International Conference on Principles of Distributed Systems (OPODIS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 70, pp. 33:1-33:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.OPODIS.2016.33

Abstract

Snapshots are useful tools for monitoring big distributed and parallel systems. In this paper, we adapt the well-known atomic snapshot abstraction to dynamic models with an unbounded number of participating processes. Our dynamic snapshot specification extends the API to allow changing the set of processes whose values should be returned from a scan operation. We introduce the ephemeral memory model, which consists of a dynamically changing set of nodes; when a node is removed, its memory can be immediately reclaimed. In this model, we present an algorithm for wait-free dynamic atomic snapshots.
Keywords
  • snapshots
  • shared memory
  • dynamic
  • ephemeral memory

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ittai Abraham, Gregory Chockler, Idit Keidar, and Dahlia Malkhi. Byzantine disk paxos: optimal resilience with byzantine shared memory. Distributed Computing, 18(5):387-408, 2006. Google Scholar
  2. Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt, and Nir Shavit. Atomic snapshots of shared memory. J. ACM, 40(4):873-890, September 1993. URL: http://dx.doi.org/10.1145/153724.153741.
  3. Yehuda Afek, Michael Merritt, and Gadi Taubenfeld. Benign failure models for shared memory. In Distributed Algorithms. Springer, 1993. Google Scholar
  4. Marcos K Aguilera. A pleasant stroll through the land of infinitely many creatures. ACM SIGACT News, 35(2):36-59, 2004. Google Scholar
  5. Marcos K. Aguilera, Idit Keidar, Dahlia Malkhi, and Alexander Shraer. Dynamic atomic storage without consensus. J. ACM, 58(2):7:1-7:32, April 2011. URL: http://dx.doi.org/10.1145/1944345.1944348.
  6. Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. Sharing memory robustly in message-passing systems. Journal of the ACM (JACM), 42(1), 1995. Google Scholar
  7. Hagit Attiya, Hyun Chul Chung, Faith Ellen, Saptaparni Kumar, and Jennifer L. Welch. Simulating a shared register in an asynchronous system that never stops changing. In International Symposium on Distributed Computing, pages 75-91. Springer, 2015. Google Scholar
  8. Hagit Attiya and Ophir Rachman. Atomic snapshots in o(n log n) operations. SIAM Journal on Computing, 27(2):319-340, 1998. Google Scholar
  9. Jim Basney and Miron Livny. Managing network resources in condor. In hpdc, pages 298-299, 2000. Google Scholar
  10. Greg Bronevetsky, Daniel Marques, Keshav Pingali, Peter Szwed, and Martin Schulz. Application-level checkpointing for shared memory programs. ACM SIGARCH Computer Architecture News, 32(5):235-247, 2004. Google Scholar
  11. Christian Cachin. Architecture of the hyperledger blockchain fabric, 2016. Google Scholar
  12. K. Mani Chandy and Leslie Lamport. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63-75, 1985. Google Scholar
  13. Ittay Eyal and Emin Gün Sirer. Majority is not enough: Bitcoin mining is vulnerable. In Financial Cryptography and Data Security, pages 436-454. Springer, 2014. Google Scholar
  14. Naser Ezzati-Jivan and Michel R. Dagenais. A framework to compute statistics of system parameters from very large trace files. ACM SIGOPS Operating Systems Review, 47(1):43-54, 2013. Google Scholar
  15. Eli Gafni and Dahlia Malkhi. Elastic configuration maintenance via a parsimonious speculating snapshot solution. In Proceedings of the 29th International Symposium on Distributed Computing, pages 140-153. Springer, 2015. Google Scholar
  16. Eli Gafni, Michael Merritt, and Gadi Taubenfeld. The concurrency hierarchy, and algorithms for unbounded concurrency. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, pages 161-169. ACM, 2001. Google Scholar
  17. MD Gan, ZJ Ding, SG Wang, WH Wu, and MC Zhou. Deadlock control of multithreaded software based on petri nets: A brief review. In 2016 IEEE 13th International Conference on Networking, Sensing, and Control (ICNSC), pages 1-5. IEEE, 2016. Google Scholar
  18. Seth Gilbert, Nancy A. Lynch, and Alexander A. Shvartsman. Rambo: A robust, reconfigurable atomic memory service for dynamic networks. Distributed Computing, 23(4):225-272, 2010. Google Scholar
  19. Roberto Gioiosa, Jose Carlos Sancho, Song Jiang, Fabrizio Petrini, and Kei Davis. Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 9. IEEE Computer Society, 2005. Google Scholar
  20. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy H Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, volume 11, pages 22-22, 2011. Google Scholar
  21. Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX Annual Technical Conference, volume 8, page 9, 2010. Google Scholar
  22. Prasad Jayanti, Tushar Deepak Chandra, and Sam Toueg. Fault-tolerant wait-free shared objects. Journal of the ACM (JACM), 45(3), 1998. Google Scholar
  23. Leander Jehl, Roman Vitenberg, and Hein Meling. Smartmerge: A new approach to reconfiguration for atomic storage. In Proceedings of the 29th International Symposium on Distributed Computing, pages 154-169. Springer, 2015. Google Scholar
  24. Nikolaos D. Kallimanis and Eleni Kanellou. Wait-free concurrent graph objects with dynamic traversals. In Proc. of the 19th International Conference On Principles Of Distributed Systems (OPODIS'15), volume 46 of LIPIcs - Leibniz International Proceedings in Informatics, pages 27:1-27:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.OPODIS.2015.27.
  25. Leslie Lamport et al. Paxos made simple. ACM Sigact News, 32(4):18-25, 2001. Google Scholar
  26. Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. Reconfiguring a state machine. ACM SIGACT News, 41(1):63-73, 2010. Google Scholar
  27. Nancy Lynch and Alex A. Shvartsman. Rambo: A reconfigurable atomic memory service for dynamic networks. In Distributed Computing, pages 173-190. Springer, 2002. Google Scholar
  28. Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008. Google Scholar
  29. Yiannis Nikolakopoulos, Anders Gidenstam, Marina Papatriantafilou, and Philippas Tsigas. A consistency framework for iteration operations in concurrent data structures. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pages 239-248. IEEE, 2015. Google Scholar
  30. Yiannis Nikolakopoulos, Anders Gidenstam, Marina Papatriantafilou, and Philippas Tsigas. Of concurrent data structures and iterations. In Algorithms, Probability, Networks, and Games, pages 358-369. Springer, 2015. Google Scholar
  31. Rolf Riesen, Kurt Ferreira, Duma Da Silva, Pierre Lemarinier, Dorian Arnold, and Patrick G Bridges. Alleviating scalability issues of checkpointing protocols. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 1-11. IEEE, 2012. Google Scholar
  32. Alexander Borisovitch Romanovsky and Yi-Min Wang. Method for deadlock recovery using consistent global checkpoints, September 2 1997. US Patent 5,664,088. Google Scholar
  33. Jose Carlos Sancho, Fabrizio Petrini, Kei Davis, Roberto Gioiosa, and Song Jiang. Current practice and a direction forward in checkpoint/restart implementations for fault tolerance. In 19th IEEE International Parallel and Distributed Processing Symposium, pages 8-pp. IEEE, 2005. Google Scholar
  34. Peter Scheuermann and Hsiang-Lung Tung. A deadlock checkpointing scheme for multidatabase systems. In Research Issues on Data Engineering, 1992: Transaction and Query Processing, Second International Workshop on, pages 184-191. IEEE, 1992. Google Scholar
  35. Martin Schulz, Greg Bronevetsky, Rohit Fernandes, Daniel Marques, Keshav Pingali, and Paul Stodghill. Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 38. IEEE Computer Society, 2004. Google Scholar
  36. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pages 1-10. IEEE, 2010. Google Scholar
  37. Alexander Spiegelman and Idit Keidar. On liveness of dynamic storage. arXiv preprint arXiv:1507.07086, 2015. Google Scholar
  38. Alexander Spiegelman, Idit Keidar, and Dahlia Malkhi. Dynamic reconfiguration: A tutorial. In OPODIS, pages 259-259, 2015. Google Scholar
  39. Alexender Spiegelman and Idit Keidar. Dynamic atomic snapshots. Technical Report CCIT 907, EE, Technion, November 2016. URL: http://webee.technion.ac.il/publication-link/index/id/703.
  40. Nathan Stone, John Kochmar, Raghurama Reddy, J. Ray Scott, Jason Sommerfield, and Chad Vizino. A checkpoint and recovery system for the pittsburgh supercomputing center terascale computing system. Pittsburgh Supercomputing Center, Tech. Rep, 2001. Google Scholar
  41. Zheng Zhang. Checkpoint computer system utilizing a fifo buffer to re-synchronize the memory systems on the detection of an error, May 8 2001. US Patent 6,230,282. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail