CROCHET: Checkpoint and Rollback via Lightweight Heap Traversal on Stock JVMs

Authors Jonathan Bell , Luís Pina



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2018.17.pdf
  • Filesize: 0.61 MB
  • 31 pages

Document Identifiers

Author Details

Jonathan Bell
  • George Mason University, Fairfax, VA, USA
Luís Pina
  • George Mason University, Fairfax, VA, USA

Cite AsGet BibTex

Jonathan Bell and Luís Pina. CROCHET: Checkpoint and Rollback via Lightweight Heap Traversal on Stock JVMs. In 32nd European Conference on Object-Oriented Programming (ECOOP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 109, pp. 17:1-17:31, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ECOOP.2018.17

Abstract

Checkpoint/rollback (CR) mechanisms create snapshots of the state of a running application, allowing it to later be restored to that checkpointed snapshot. Support for checkpoint/rollback enables many program analyses and software engineering techniques, including test generation, fault tolerance, and speculative execution. Fully automatic CR support is built into some modern operating systems. However, such systems perform checkpoints at the coarse granularity of whole pages of virtual memory, which imposes relatively high overhead to incrementally capture the changing state of a process, and makes it difficult for applications to checkpoint only some logical portions of their state. CR systems implemented at the application level and with a finer granularity typically require complex developer support to identify: (1) where checkpoints can take place, and (2) which program state needs to be copied. A popular compromise is to implement CR support in managed runtime environments, e.g. the Java Virtual Machine (JVM), but this typically requires specialized, non-standard runtime environments, limiting portability and adoption of this approach. In this paper, we present a novel approach for Checkpoint ROllbaCk via lightweight HEap Traversal (Crochet), which enables fully automatic fine-grained lightweight checkpoints within unmodified commodity JVMs (specifically Oracle's HotSpot and OpenJDK). Leveraging key insights about the internal design common to modern JVMs, Crochet works entirely through bytecode rewriting and standard debug APIs, utilizing special proxy objects to perform a lazy heap traversal that starts at the root references and traverses the heap as objects are accessed, copying or restoring state as needed and removing each proxy immediately after it is used. We evaluated Crochet on the DaCapo benchmark suite, finding it to have very low runtime overhead in steady state (ranging from no overhead to 1.29x slowdown), and that it often outperforms a state-of-the-art system-level checkpoint tool when creating large checkpoints.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Frameworks
Keywords
  • Checkpoint rollback
  • runtime systems
  • dynamic analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Shay Artzi, Sunghun Kim, and Michael D. Ernst. Recrash: Making software failures reproducible by preserving object states. In Proceedings of the 22Nd European Conference on Object-Oriented Programming, ECOOP '08, pages 542-565, Berlin, Heidelberg, 2008. Springer-Verlag. URL: http://dx.doi.org/10.1007/978-3-540-70592-5_23.
  2. Earl T. Barr and Mark Marron. Tardis: Affordable time-travel debugging in managed runtimes. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages &Applications, OOPSLA '14, pages 67-82, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2660193.2660209.
  3. Earl T. Barr, Mark Marron, Ed Maurer, Dan Moseley, and Gaurav Seth. Time-travel debugging for javascript/node.js. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 1003-1007, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2950290.2983933.
  4. Jonathan Bell and Gail Kaiser. Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs. In ACM International Conference on Object Oriented Programming Systems Languages &Applications, OOPSLA '14, pages 83-101, New York, NY, USA, October 2014. ACM. URL: http://dx.doi.org/10.1145/2660193.2660212.
  5. Jonathan Bell and Gail Kaiser. Unit Test Virtualization with VMVM. In 36th International Conference on Software Engineering, ICSE 2014, pages 550-561, New York, NY, USA, June 2014. ACM. ACM SIGSOFT Distinguished Paper Award. URL: http://dx.doi.org/10.1145/2568225.2568248.
  6. Jonathan Bell, Nikhil Sarda, and Gail Kaiser. Chronicler: Lightweight recording to reproduce field failures. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 362-371, Piscataway, NJ, USA, 2013. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=2486788.2486836.
  7. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The dacapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169-190, New York, NY, USA, 2006. ACM. URL: http://dx.doi.org/10.1145/1167473.1167488.
  8. Eric Bruneton, Romain Lenglet, and Thierry Coupaye. Asm: A code manipulation tool to implement adaptable systems. In In Adaptable and extensible component systems, 2002. Google Scholar
  9. Rodrigo Bruno and Paulo Ferreira. Alma: Gc-assisted jvm live migration for java server applications. In Proceedings of the 17th International Middleware Conference, Middleware '16, pages 5:1-5:14, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2988336.2988341.
  10. João Cachopo and António Rito-Silva. Versioned boxes as the basis for memory transactions. Sci. Comput. Program., 63(2):172-185, 2006. URL: http://dx.doi.org/10.1016/j.scico.2006.05.009.
  11. Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 209-224, Berkeley, CA, USA, 2008. USENIX Association. URL: http://dl.acm.org/citation.cfm?id=1855741.1855756.
  12. Antonio Carzaniga, Alessandra Gorla, Alberto Goffi, Andrea Mattavelli, and Mauro Pezzè. Cross-checking Oracles from Intrinsic Software Redundancy. In Proceedings of the 36th International Conference on Software Engineering, ICSE '14, pages 931-942, 2014. Google Scholar
  13. Antonio Carzaniga, Alessandra Gorla, Andrea Mattavelli, Mauro Pezzè, and Nicolò Perino. Automatic Recovery from Runtime Failures. In Proceedings of the 35th International Conference on Software Engineering, ICSE '13, pages 782-791, 2013. Google Scholar
  14. Keith Chapman, Antony L. Hosking, and J. Eliot B. Moss. Hybrid stm/htm for nested transactions on openjdk. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pages 660-676, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2983990.2984029.
  15. Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. Partial replay of long-running applications. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ESEC/FSE '11, pages 135-145. ACM, 2011. URL: http://dx.doi.org/10.1145/2025113.2025135.
  16. Jonathan Corbet. Checkpoint/restart (mostly) in user space. LWN.Net, 2011. Google Scholar
  17. Olivier Crameri, Ricardo Bianchini, and Willy Zwaenepoel. Striking a new balance between program instrumentation and debugging time. In Proceedings of the sixth conference on Computer systems, EuroSys '11, pages 199-214. ACM, 2011. URL: http://dx.doi.org/10.1145/1966445.1966464.
  18. Antonio Cunei and Jan Vitek. A new approach to real-time checkpointing. In Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 68-77, New York, NY, USA, 2006. ACM. URL: http://dx.doi.org/10.1145/1134760.1134771.
  19. Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In Proceedings of the 20th International Conference on Distributed Computing, DISC'06, pages 194-208, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_14.
  20. George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. Revirt: enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th symposium on Operating systems design and implementation, OSDI '02, pages 211-224. ACM, 2002. URL: http://dx.doi.org/10.1145/1060289.1060309.
  21. P. Felber, G. Korland, and N. Shavit. Deuce: Noninvasive concurrency with a java stm. In Electronic Proceedings of the workshop on Programmability Issues for Multi-Core Computers (MULTIPROG), 2010. Google Scholar
  22. Gordon Fraser and Andrea Arcuri. Evosuite: Automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE '11, pages 416-419, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/2025113.2025179.
  23. Adele Goldberg and David Robson. Smalltalk-80: The Language and Its Implementation. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983. Google Scholar
  24. Rachid Guerraoui, Michal Kapalka, and Jan Vitek. Stmbench7: A benchmark for software transactional memory. In Proceedings of the 2Nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys '07, pages 315-324, New York, NY, USA, 2007. ACM. URL: http://dx.doi.org/10.1145/1272996.1273029.
  25. Petr Hosek and Cristian Cadar. Safe software updates via multi-version execution. In International Conference on Software Engineering (ICSE 2013), pages 612-621, 5 2013. Google Scholar
  26. Kai-Yuan Hou, Kang G. Shin, and Jan-Lung Sung. Application-assisted live migration of virtual machines with java applications. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 15:1-15:15, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2741948.2741950.
  27. Hojun Jaygarl, Sunghun Kim, Tao Xie, and Carl K. Chang. Ocat: object capture-based automated testing. In Proceedings of the 19th international symposium on Software testing and analysis, ISSTA '10, pages 159-170. ACM, 2010. URL: http://dx.doi.org/10.1145/1831708.1831729.
  28. Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit. Automated atomicity-violation fixing. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 389-400, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1993498.1993544.
  29. Wei Jin and Alessandro Orso. Bugredux: reproducing field failures for in-house debugging. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 474-484, Piscataway, NJ, USA, 2012. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=2337223.2337279.
  30. Samuel T. King, George W. Dunlap, and Peter M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, pages 1-1, Berkeley, CA, USA, 2005. USENIX Association. URL: http://dl.acm.org/citation.cfm?id=1247360.1247361.
  31. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering, ICSE '12, pages 3-13, Piscataway, NJ, USA, 2012. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=2337223.2337225.
  32. Luis Mastrangelo, Luca Ponzanelli, Andrea Mocci, Michele Lanza, Matthias Hauswirth, and Nathaniel Nystrom. Use at your own risk: The java unsafe api in the wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 695-710, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2814270.2814313.
  33. Heather Miller, Philipp Haller, Eugene Burmako, and Martin Odersky. Instant pickles: Generating object-oriented pickler combinators for fast and extensible serialization. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages &Applications, OOPSLA '13, pages 183-202, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2509136.2509547.
  34. Satish Narayanasamy, Gilles Pokam, and Brad Calder. Bugnet: Recording application-level execution for deterministic replay debugging. IEEE Micro, 26(1):100-109, 2006. URL: http://dx.doi.org/10.1109/MM.2006.7.
  35. OpenJDK Team. CompressedOOPS. URL: https://wiki.openjdk.java.net/display/HotSpot/CompressedOops.
  36. Oracle. Jvm tool interface. http://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html, 2013.
  37. Oracle Corporation. Instrumentation API for the Java Platform SE 7. https://docs.oracle.com/javase/7/docs/api/java/lang/instrument/Instrumentation.html#retransformClasses(java.lang.Class...). Accessed on 2018 11.
  38. Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh. The design and implementation of zap: A system for migrating computing environments. SIGOPS Oper. Syst. Rev., 36(SI):361-376, dec 2002. URL: http://dx.doi.org/10.1145/844128.844162.
  39. Carlos Pacheco and Michael D. Ernst. Randoop: feedback-directed random testing for java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, OOPSLA '07, pages 815-816. ACM, 2007. URL: http://dx.doi.org/10.1145/1297846.1297902.
  40. Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller. Automated fixing of programs with contracts. IEEE Trans. Softw. Eng., 40(5):427-449, 2014. URL: http://dx.doi.org/10.1109/TSE.2014.2312918.
  41. Luís Pina. Practical Dynamic Software Updating. PhD thesis, Instituto Superior Técnico, University of Lisbon, 2016. Google Scholar
  42. Luís Pina and João Cachopo. Atomic dynamic upgrades using software transactional memory. In Proceedings of the 4th International Workshop on Hot Topics in Software Upgrades, HotSWUp. IEEE, 2012. Google Scholar
  43. Luís Pina, Luís Veiga, and Michael Hicks. Rubah: DSU for Java on a Stock JVM. In OOPSLA, 2014. URL: http://dx.doi.org/10.1145/2660193.2660220.
  44. James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li. Libckpt: Transparent checkpointing under unix. In Proceedings of the USENIX 1995 Technical Conference Proceedings, TCON'95, pages 18-18, Berkeley, CA, USA, 1995. USENIX Association. URL: http://dl.acm.org/citation.cfm?id=1267411.1267429.
  45. Yuhua Qi, Xiaoguang Mao, and Yan Lei. Efficient automated program repair through fault-recorded testing prioritization. In Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM '13, pages 180-189, Washington, DC, USA, 2013. IEEE Computer Society. URL: http://dx.doi.org/10.1109/ICSM.2013.29.
  46. Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, and Yuanyuan Zhou. Rx: Treating bugs as allergies - a safe method to survive software failures. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP '05, pages 235-248, New York, NY, USA, 2005. ACM. URL: http://dx.doi.org/10.1145/1095810.1095833.
  47. Torvald Riegel, Pascal Felber, and Christof Fetzer. A lazy snapshot algorithm with eager validation. In Proceedings of the 20th International Conference on Distributed Computing, DISC'06, pages 284-298, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_20.
  48. Yasushi Saito. Jockey: A user-space library for record-replay debugging. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging, AADEBUG'05, pages 69-76, New York, NY, USA, 2005. ACM. URL: http://dx.doi.org/10.1145/1085130.1085139.
  49. Koushik Sen, Darko Marinov, and Gul Agha. Cute: A concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 263-272, New York, NY, USA, 2005. ACM. URL: http://dx.doi.org/10.1145/1081706.1081750.
  50. Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nicolas Viennot, Jason Nieh, and Angelos D. Keromytis. Assure: Automatic software self-healing using rescue points. SIGARCH Comput. Archit. News, 37(1):37-48, 2009. URL: http://dx.doi.org/10.1145/2528521.1508250.
  51. Petter Svärd, Benoit Hudzia, Johan Tordsson, and Erik Elmroth. Evaluation of delta compression techniques for efficient live migration of large virtual machines. In Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '11, pages 111-120, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1952682.1952698.
  52. John Vilk, James Mickens, and Mark Marron. A gray box approach for high-fidelity, high-speed time-travel debugging. Technical report, Microsoft Research, June 2016. URL: https://www.microsoft.com/en-us/research/publication/gray-box-approach-high-fidelity-high-speed-time-travel-debugging/.
  53. Willem Visser, Klaus Havelund, Guillaume Brat, Seungjoon Park, and Flavio Lerda. Model checking programs. Automated Software Engg., 10(2):203-232, apr 2003. URL: http://dx.doi.org/10.1023/A:1022920129859.
  54. Matej Vitásek, Walter Binder, and Matthias Hauswirth. Shadowdata: Shadowing heap objects in java. In Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE '13, pages 17-24, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2462029.2462032.
  55. Guoqing Xu, Atanas Rountev, Yan Tang, and Feng Qin. Efficient checkpointing of java software using context-sensitive capture and replay. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE '07, pages 85-94, New York, NY, USA, 2007. ACM. URL: http://dx.doi.org/10.1145/1287624.1287638.
  56. Cristian Zamfir and George Candea. Execution synthesis: a technique for automated software debugging. In Proceedings of the 5th European conference on Computer systems, EuroSys '10, pages 321-334. ACM, 2010. URL: http://dx.doi.org/10.1145/1755913.1755946.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail