Scalable Termination Detection for Distributed Actor Systems

Authors Dan Plyukhin, Gul Agha



PDF
Thumbnail PDF

File

LIPIcs.CONCUR.2020.11.pdf
  • Filesize: 0.58 MB
  • 23 pages

Document Identifiers

Author Details

Dan Plyukhin
  • University of Illinois at Urbana-Champaign, Urbana, IL, USA
Gul Agha
  • University of Illinois at Urbana-Champaign, Urbana, IL, USA

Acknowledgements

We would like to thank Dipayan Mukherjee, Atul Sandur, Charles Kuch, Jerry Wu, and the anonymous referees for providing valuable feedback in earlier versions of this work.

Cite As Get BibTex

Dan Plyukhin and Gul Agha. Scalable Termination Detection for Distributed Actor Systems. In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.CONCUR.2020.11

Abstract

Automatic garbage collection (GC) prevents certain kinds of bugs and reduces programming overhead. GC techniques for sequential programs are based on reachability analysis. However, testing reachability from a root set is inadequate for determining whether an actor is garbage because an unreachable actor may send a message to a reachable actor. Instead, it is sufficient to check termination (sometimes also called quiescence): an actor is terminated if it is not currently processing a message and cannot receive a message in the future. Moreover, many actor frameworks provide all actors with access to file I/O or external storage; without inspecting an actor’s internal code, it is necessary to check that the actor has terminated to ensure that it may be garbage collected in these frameworks. Previous algorithms to detect actor garbage require coordination mechanisms such as causal message delivery or nonlocal monitoring of actors for mutation. Such coordination mechanisms adversely affect concurrency and are therefore expensive in distributed systems. We present a low-overhead reference listing technique (called DRL) for termination detection in actor systems. DRL is based on asynchronous local snapshots and message-passing between actors. This enables a decentralized implementation and transient network partition tolerance. The paper provides a formal description of DRL, shows that all actors identified as garbage have indeed terminated (safety), and that all terminated actors - under certain reasonable assumptions - will eventually be identified (liveness).

Subject Classification

ACM Subject Classification
  • Computing methodologies → Concurrent algorithms
  • Software and its engineering → Garbage collection
Keywords
  • actors
  • concurrency
  • termination detection
  • quiescence detection
  • garbage collection
  • distributed systems

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Gul Agha. ACTORS - a Model of Concurrent Computation in Distributed Systems. MIT Press Series in Artificial Intelligence. MIT Press, 1990. Google Scholar
  2. Gul Agha. Concurrent object-oriented programming. Communications of the ACM, 33(9):125-141, September 1990. URL: https://doi.org/10.1145/83880.84528.
  3. Gul A. Agha, Ian A. Mason, Scott F. Smith, and Carolyn L. Talcott. A foundation for actor computation. Journal of Functional Programming, 7(1):1-72, January 1997. URL: https://doi.org/10.1017/S095679689700261X.
  4. Akka. URL: https://akka.io/.
  5. Joe Armstrong, Robert Virding, Claes Wikström, and Mike Williams. Concurrent Programming in ERLANG. Prentice Hall, Englewood Cliffs, New Jersey, second edition, 1996. Google Scholar
  6. Di Bevan. Distributed garbage collection using reference counting. In G. Goos, J. Hartmanis, D. Barstow, W. Brauer, P. Brinch Hansen, D. Gries, D. Luckham, C. Moler, A. Pnueli, G. Seegmüller, J. Stoer, N. Wirth, J. W. Bakker, A. J. Nijman, and P. C. Treleaven, editors, PARLE Parallel Architectures and Languages Europe, volume 259, pages 176-187. Springer Berlin Heidelberg, Berlin, Heidelberg, 1987. URL: https://doi.org/10.1007/3-540-17945-3_10.
  7. Sebastian Blessing, Sylvan Clebsch, and Sophia Drossopoulou. Tree topologies for causal message delivery. In Proceedings of the 7th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control - AGERE 2017, pages 1-10, Vancouver, BC, Canada, 2017. ACM Press. URL: https://doi.org/10.1145/3141834.3141835.
  8. Sergey Bykov, Alan Geller, Gabriel Kliot, James R. Larus, Ravi Pandya, and Jorgen Thelin. Orleans: Cloud computing for everyone. In Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11, pages 1-14, Cascais, Portugal, 2011. ACM Press. URL: https://doi.org/10.1145/2038916.2038932.
  9. K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63-75, February 1985. URL: https://doi.org/10.1145/214451.214456.
  10. Sylvan Clebsch and Sophia Drossopoulou. Fully concurrent garbage collection of actors on many-core machines. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications - OOPSLA '13, pages 553-570, Indianapolis, Indiana, USA, 2013. ACM Press. URL: https://doi.org/10.1145/2509136.2509557.
  11. Colin J Fidge. Timestamps in message-passing systems that preserve the partial ordering. Australian Computer Science Communications, 10(1):56-66, February 1988. Google Scholar
  12. Carl Hewitt and Henry G. Baker. Laws for communicating parallel processes. In Bruce Gilchrist, editor, Information Processing, Proceedings of the 7th IFIP Congress 1977, Toronto, Canada, August 8-12, 1977, pages 987-992. North-Holland, 1977. Google Scholar
  13. D. Kafura, M. Mukherji, and D.M. Washabaugh. Concurrent and distributed garbage collection of active objects. IEEE Transactions on Parallel and Distributed Systems, 6(4):337-350, April 1995. URL: https://doi.org/10.1109/71.372788.
  14. Ten-Hwang Lai. Termination detection for dynamically distributed systems with non-first-in-first-out communication. Journal of Parallel and Distributed Computing, 3(4):577-599, December 1986. URL: https://doi.org/10.1016/0743-7315(86)90015-8.
  15. Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419-429, 1983. URL: https://doi.org/10.1145/358141.358147.
  16. Jeff Matocha and Tracy Camp. A taxonomy of distributed termination detection algorithms. Journal of Systems and Software, 43(3):207-221, November 1998. URL: https://doi.org/10.1016/S0164-1212(98)10034-1.
  17. Friedemann Mattern. Algorithms for distributed termination detection. Distributed Computing, 2(3):161-175, September 1987. URL: https://doi.org/10.1007/BF01782776.
  18. NHS to Deploy Riak for New IT Backbone With Quality of Care Improvements in Sight, October 2013. URL: https://riak.com/nhs-to-deploy-riak-for-new-it-backbone-with-quality-of-care-improvements-in-sight.html.
  19. PayPal Blows Past 1 Billion Transactions Per Day Using Just 8 VMs With Akka, Scala, Kafka and Akka Streams. URL: https://www.lightbend.com/case-studies/paypal-blows-past-1-billion-transactions-per-day-using-just-8-vms-and-akka-scala-kafka-and-akka-streams.
  20. José M. Piquer. Indirect Reference Counting: A Distributed Garbage Collection Algorithm. In Emile H. L. Aarts, Jan van Leeuwen, and Martin Rem, editors, Parle '91 Parallel Architectures and Languages Europe, volume 505, pages 150-165. Springer Berlin Heidelberg, Berlin, Heidelberg, 1991. URL: https://doi.org/10.1007/978-3-662-25209-3_11.
  21. David Plainfossé and Marc Shapiro. A survey of distributed garbage collection techniques. In Memory Management, International Workshop IWMM 95, Kinross, UK, September 27-29, 1995, Proceedings, pages 211-249, 1995. URL: https://doi.org/10.1007/3-540-60368-9_26.
  22. Dan Plyukhin and Gul Agha. Concurrent garbage collection in the actor model. In Proceedings of the 8th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control - AGERE 2018, pages 44-53, Boston, MA, USA, 2018. ACM Press. URL: https://doi.org/10.1145/3281366.3281368.
  23. M. Schelvis. Incremental distribution of timestamp packets: A new approach to distributed garbage collection. In Conference Proceedings on Object-Oriented Programming Systems, Languages and Applications - OOPSLA '89, pages 37-48, New Orleans, Louisiana, United States, 1989. ACM Press. URL: https://doi.org/10.1145/74877.74883.
  24. Abhay Vardhan and Gul Agha. Using passive object garbage collection algorithms for garbage collection of active objects. ACM SIGPLAN Notices, 38(2 supplement):106, February 2003. URL: https://doi.org/10.1145/773039.512443.
  25. Nalini Venkatasubramanian, Gul Agha, and Carolyn Talcott. Scalable distributed garbage collection for systems of active objects. In Yves Bekkers and Jacques Cohen, editors, Memory Management, volume 637, pages 134-147. Springer-Verlag, Berlin/Heidelberg, 1992. URL: https://doi.org/10.1007/BFb0017187.
  26. Nalini Venkatasubramanian and Carolyn Talcott. Reasoning about meta level activities in open distributed systems. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing - PODC '95, pages 144-152, Ottowa, Ontario, Canada, 1995. ACM Press. URL: https://doi.org/10.1145/224964.224981.
  27. Stanislav Vishnevskiy. How Discord Scaled Elixir to 5,000,000 Concurrent Users, July 2017. URL: https://blog.discord.com/scaling-elixir-f9b8e1e7c29b.
  28. Wei-Jen Wang. Conservative snapshot-based actor garbage collection for distributed mobile actor systems. Telecommunication Systems, June 2011. URL: https://doi.org/10.1007/s11235-011-9509-1.
  29. Wei-Jen Wang, Carlos Varela, Fu-Hau Hsu, and Cheng-Hsien Tang. Actor Garbage Collection Using Vertex-Preserving Actor-to-Object Graph Transformations. In David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, Paolo Bellavista, Ruay-Shiung Chang, Han-Chieh Chao, Shin-Feng Lin, and Peter M. A. Sloot, editors, Advances in Grid and Pervasive Computing, volume 6104, pages 244-255. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. URL: https://doi.org/10.1007/978-3-642-13067-0_28.
  30. Wei-Jen Wang and Carlos A. Varela. Distributed Garbage Collection for Mobile Actor Systems: The Pseudo Root Approach. In Yeh-Ching Chung and José E. Moreira, editors, Advances in Grid and Pervasive Computing, volume 3947, pages 360-372. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. URL: https://doi.org/10.1007/11745693_36.
  31. Paul Watson and Ian Watson. An efficient garbage collection scheme for parallel computer architectures. In G. Goos, J. Hartmanis, D. Barstow, W. Brauer, P. Brinch Hansen, D. Gries, D. Luckham, C. Moler, A. Pnueli, G. Seegmüller, J. Stoer, N. Wirth, J. W. Bakker, A. J. Nijman, and P. C. Treleaven, editors, PARLE Parallel Architectures and Languages Europe, volume 259, pages 432-443. Springer Berlin Heidelberg, Berlin, Heidelberg, 1987. URL: https://doi.org/10.1007/3-540-17945-3_25.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail