Zero-Copy, Minimal-Blackout Virtual Machine Migrations Using Disaggregated Shared Memory

Authors Andreas Grapentin , Felix Eberhardt, Tobias Zagorni, Andreas Polze, Michele Gazzetti , Christian Pinto



PDF
Thumbnail PDF

File

OASIcs.PARMA-DITAM.2024.3.pdf
  • Filesize: 0.86 MB
  • 13 pages

Document Identifiers

Author Details

Andreas Grapentin
  • Operating Systems and Middleware Group, Hasso Plattner Institute, University of Potsdam, Germany
Felix Eberhardt
  • Operating Systems and Middleware Group, Hasso Plattner Institute, University of Potsdam, Germany
Tobias Zagorni
  • Operating Systems and Middleware Group, Hasso Plattner Institute, University of Potsdam, Germany
Andreas Polze
  • Operating Systems and Middleware Group, Hasso Plattner Institute, University of Potsdam, Germany
Michele Gazzetti
  • IBM Research Europe, Dublin, Ireland
Christian Pinto
  • IBM Research Europe, Dublin, Ireland

Cite As Get BibTex

Andreas Grapentin, Felix Eberhardt, Tobias Zagorni, Andreas Polze, Michele Gazzetti, and Christian Pinto. Zero-Copy, Minimal-Blackout Virtual Machine Migrations Using Disaggregated Shared Memory. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/OASIcs.PARMA-DITAM.2024.3

Abstract

We propose a new live-migration paradigm for virtual machines called zero-copy migration. By making the working set of the virtual machine available on the destination host through transparently byte-addressable disaggregated memory, we remove the need for a pre-copy phase while simultaneously reducing the performance impact of the post-copy phase. We describe an open-source implementation of the proposed paradigm based on QEMU-KVM and libvirt, and we evaluate the efficiency of the approach with a deployment on a functional hardware prototype of a memory disaggregation system realized using ThymesisFlow. Using a series of configurable benchmarks, we show that the lead time and blackout time of the migration are equal to best-case scenarios of traditional pre-copy, post-copy and hybrid approaches. Key performance metrics from the perspective of applications running in the virtual machine, such as memory latency and throughput, are improved by up to three orders of magnitude, increasing both flexibility and responsiveness of live-migrations in the datacenter.

Subject Classification

ACM Subject Classification
  • Hardware → Memory and dense storage
  • Computer systems organization → Cloud computing
  • Software and its engineering → Virtual machines
  • Software and its engineering → Distributed memory
  • Software and its engineering → Cloud computing
  • Information systems → Data centers
  • Information systems → Computing platforms
Keywords
  • disaggregation
  • disaggregated memory
  • vm live migration
  • thymesisflow
  • power9
  • opencapi
  • performance evaluation
  • zero copy

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. ACM SIGOPS operating systems review, 37(5):164-177, 2003. Google Scholar
  2. Albert Cho, Anish Saxena, Moinuddin Qureshi, and Alexandros Daglis. A case for cxl-centric server processors. arXiv preprint, 2023. URL: https://arxiv.org/abs/2305.05033.
  3. I-Hsin Chung, Bulent Abali, and Paul Crumley. Towards a composable computer system. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pages 137-147, 2018. Google Scholar
  4. Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, pages 273-286, 2005. Google Scholar
  5. Computer Express Link. URL: https://www.computeexpresslink.org/.
  6. Michael R Hines, Umesh Deshpande, and Kartik Gopalan. Post-copy live migration of virtual machines. ACM SIGOPS operating systems review, 43(3):14-26, 2009. Google Scholar
  7. Wei Huang, Qi Gao, Jiuxing Liu, and Dhabaleswar K Panda. High performance virtual machine migration with rdma over modern interconnects. In 2007 IEEE International Conference on Cluster Computing, pages 11-20. IEEE, 2007. Google Scholar
  8. Khaled Z Ibrahim, Steven Hofmeyr, Costin Iancu, and Eric Roman. Optimized pre-copy live migration for memory intensive applications. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1-11, 2011. Google Scholar
  9. C. Isci, J. Liu, B. Abali, J. O. Kephart, and J. Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55(6):4:1-4:12, 2011. URL: https://doi.org/10.1147/JRD.2011.2167775.
  10. Canturk Isci, Jiuxing Liu, Bülent Abali, Jeffrey O Kephart, and Jack Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55(6):4-1, 2011. Google Scholar
  11. Yuji Muraoka and Kenichi Kourai. Efficient migration of large-memory vms using private virtual memory. In Advances in Intelligent Networking and Collaborative Systems: The 11th International Conference on Intelligent Networking and Collaborative Systems (INCoS-2019), pages 380-389. Springer, 2020. Google Scholar
  12. Michael Nelson, Beng-Hong Lim, Greg Hutchins, et al. Fast transparent migration for virtual machines. In USENIX Annual technical conference, general track, pages 391-394, 2005. Google Scholar
  13. nil-migration. URL: https://nil-migration.org.
  14. OpenCAPI Consortium. OpenCAPI Specification. Online: http://opencapi.org, 2017. Accessed: January 2019.
  15. C. Pinto, D. Syrivelis, M. Gazzetti, P. Koutsovasilis, A. Reale, K. Katrinis, and H. Hofstee. Thymesisflow: A software-defined, hw/sw co-designed interconnect stack for rack-scale memory disaggregation. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 868-880, Los Alamitos, CA, USA, October 2020. IEEE Computer Society. URL: https://doi.org/10.1109/MICRO50266.2020.00075.
  16. Adam Ruprecht, Danny Jones, Dmitry Shiraev, Greg Harmon, Maya Spivak, Michael Krebs, Miche Baker-Harvey, and Tyler Sanderson. Vm live migration at scale. ACM SIGPLAN Notices, 53(3):45-56, 2018. Google Scholar
  17. Petter Svard, Benoit Hudzia, Johan Tordsson, and Erik Elmroth. Evaluation of delta compression techniques for efficient live migration of large virtual machines. In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 111-120, 2011. Google Scholar
  18. ThymesisFlow. URL: https://github.com/OpenCAPI/ThymesisFlow.
  19. Franco Travostino, Paul Daspit, Leon Gommans, Chetan Jog, Cees De Laat, Joe Mambretti, Inder Monga, Bas Van Oudenaarde, Satish Raghunath, and Phil Yonghui Wang. Seamless live migration of virtual machines over the man/wan. Future Generation Computer Systems, 22(8):901-907, 2006. Google Scholar
  20. Chenjiu Wang, Ke He, Ruiqi Fan, Xiaonan Wang, Yang Kong, Wei Wang, and Qinfen Hao. Cxl over ethernet: A novel fpga-based memory disaggregation design in data centers. arXiv preprint, 2023. URL: https://arxiv.org/abs/2302.08055.
  21. Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920-1948, 2015. URL: https://doi.org/10.1109/TKDE.2015.2427795.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail