Fault-Tolerant Distributed Directories

Authors Judith Beestermöller, Costas Busch, Roger Wattenhofer



PDF
Thumbnail PDF

File

LIPIcs.SAND.2024.5.pdf
  • Filesize: 1.04 MB
  • 20 pages

Document Identifiers

Author Details

Judith Beestermöller
  • ETH Zurich, Switzerland
Costas Busch
  • Augusta University, GA, USA
Roger Wattenhofer
  • ETH Zurich, Switzerland

Cite AsGet BibTex

Judith Beestermöller, Costas Busch, and Roger Wattenhofer. Fault-Tolerant Distributed Directories. In 3rd Symposium on Algorithmic Foundations of Dynamic Networks (SAND 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 292, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SAND.2024.5

Abstract

Many fundamental distributed computing problems require coordinated access to a shared resource. A distributed directory is an overlay data structure on an asynchronous graph G that helps to access a shared token t. The directory supports three basic operations: publish, to initialize the directory, lookup, to read the contents of the token, and move, to get exclusive update access to the token. There are known directory schemes that achieve message complexity within polylog factors of the optimal cost with respect to the number of nodes n and the diameter D of G. Motivated by fault-tolerant distributed computing implementations, we consider the impact of edge failures on distributed directories. We give a distributed directory overlay data structure that can tolerate edge failures without disrupting the directory operations. The directory can be repaired concurrently while it processes directory operations. We analyze the impact of the faults on the amortized cost of the three directory operations compared to the optimal cost. We show that f edges failures increase the amortized competitive ratio of the operations by at most factor f. We also analyze the message complexity to repair the overlay structure, in terms of the number of messages that are sent and the maximum distance a message traverses. For an edge failure, the repair mechanism uses messages of size 𝒪(log n) that traverse distance at most D', the graph diameter after the fault. To our knowledge, this is the first asymptotic analysis of a fault-tolerant distributed directory.

Subject Classification

ACM Subject Classification
  • Theory of computation → Distributed algorithms
  • Theory of computation → Shared memory algorithms
  • Software and its engineering → Software fault tolerance
Keywords
  • distributed directory
  • sparse partition
  • fault tolerance
  • message complexity
  • path dilation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Baruch Awerbuch and David Peleg. Concurrent online tracking of mobile users. In Proceedings of the Conference on Communications Architecture & Protocols, SIGCOMM '91, pages 221-233, New York, NY, USA, 1991. Association for Computing Machinery. URL: https://doi.org/10.1145/115992.116013.
  2. Greg Bodwin, Michael Dinitz, Merav Parter, and Virginia Vassilevska Williams. Optimal vertex fault tolerant spanners (for fixed stretch). In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1884-1900. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.123.
  3. FRK Chung and MR Garey. Diameter bounds for altered graphs. Journal of graph theory, 8(4):511-534, 1984. Google Scholar
  4. Michael J. Demmer and Maurice Herlihy. The arrow distributed directory protocol. In Shay Kutten, editor, Distributed Computing, 12th International Symposium, DISC '98, Andros, Greece, September 24-26, 1998, Proceedings, volume 1499 of Lecture Notes in Computer Science, pages 119-133. Springer, 1998. URL: https://doi.org/10.1007/BFb0056478.
  5. Michal Dory and Merav Parter. Fault-tolerant labeling and compact routing schemes. In Avery Miller, Keren Censor-Hillel, and Janne H. Korhonen, editors, PODC '21: ACM Symposium on Principles of Distributed Computing, Virtual Event, Italy, July 26-30, 2021, pages 445-455. ACM, 2021. URL: https://doi.org/10.1145/3465084.3467929.
  6. Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci., 69(3):485-497, 2004. URL: https://doi.org/10.1016/j.jcss.2004.04.011.
  7. Arnold Filtser. Scattering and Sparse Partitions, and Their Applications. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1-47:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.47.
  8. Abdolhamid Ghodselahi and Fabian Kuhn. Dynamic analysis of the arrow distributed directory protocol in general networks. In Andréa W. Richa, editor, 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, Vienna, Austria, volume 91 of LIPIcs, pages 22:1-22:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.DISC.2017.22.
  9. Maurice Herlihy, Fabian Kuhn, Srikanta Tirthapura, and Roger Wattenhofer. Dynamic analysis of the arrow distributed protocol. Theory Comput. Syst., 39(6):875-901, 2006. URL: https://doi.org/10.1007/s00224-006-1251-9.
  10. Maurice Herlihy and Ye Sun. Distributed transactional memory for metric-space networks. Distributed Comput., 20(3):195-208, 2007. URL: https://doi.org/10.1007/s00446-007-0037-x.
  11. Lujun Jia, Guolong Lin, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. Universal approximations for tsp, steiner tree, and set cover. In Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, STOC '05, pages 386-395, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1060590.1060649.
  12. Pankaj Khanchandani and Roger Wattenhofer. The arvy distributed directory protocol. In Christian Scheideler and Petra Berenbrink, editors, The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22-24, 2019, pages 225-235. ACM, 2019. URL: https://doi.org/10.1145/3323165.3323181.
  13. Valerie King. Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 81-89. IEEE, 1999. Google Scholar
  14. Fabian Kuhn and Roger Wattenhofer. Dynamic analysis of the arrow distributed protocol. In Phillip B. Gibbons and Micah Adler, editors, SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, June 27-30, 2004, Barcelona, Spain, pages 294-301. ACM, 2004. URL: https://doi.org/10.1145/1007912.1007962.
  15. Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst., 7(4):321-359, 1989. URL: https://doi.org/10.1145/75104.75105.
  16. Gary L. Miller, Richard Peng, and Shen Chen Xu. Parallel graph decompositions using random shifts. In Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '13, pages 196-203, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2486159.2486180.
  17. Merav Parter. Nearly optimal vertex fault-tolerant spanners in optimal time: sequential, distributed, and parallel. In Stefano Leonardi and Anupam Gupta, editors, STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1080-1092. ACM, 2022. URL: https://doi.org/10.1145/3519935.3520047.
  18. David Peleg and Eilon Reshef. A variant of the arrow distributed directory with low average complexity. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming, ICALP '99, pages 615-624, Berlin, Heidelberg, 1999. Springer-Verlag. Google Scholar
  19. Shishir Rai, Gokarna Sharma, Costas Busch, and Maurice Herlihy. Load balanced distributed directories. Information and Computation, 285(A), 2022. URL: https://doi.org/10.1016/j.ic.2021.104700.
  20. Kerry Raymond. A tree-based algorithm for distributed mutual exclusion. ACM Transactions on Computer Systems, 7(1):61-77, 1989. URL: https://doi.org/10.1145/58564.59295.
  21. Gokarna Sharma and Costas Busch. Distributed transactional memory for general networks. Distributed computing, 27(5):329-362, 2014. Google Scholar
  22. Gokarna Sharma and Costas Busch. An analysis framework for distributed hierarchical directories. Algorithmica, 71(2):377-408, 2015. URL: https://doi.org/10.1007/s00453-013-9803-2.
  23. Gokarna Sharma, Hari Krishnan, Costas Busch, and Steven R. Brandt. Near-optimal location tracking using sensor networks. International Journal of Networking and Computing, 5(1):122-158, 2015. URL: http://www.ijnc.org/index.php/ijnc/article/view/100.
  24. Bo Zhang and Binoy Ravindran. Dynamic analysis of the relay cache-coherence protocol for distributed transactional memory. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Conference Proceedings, pages 1-11. IEEE, 2010. URL: https://doi.org/10.1109/IPDPS.2010.5470393.