Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms + Data Layouts = Efficient Programs

Authors Vidush Singhal , Chaitanya Koparkar , Joseph Zullo , Artem Pelenitsyn , Michael Vollmer , Mike Rainey , Ryan Newton , Milind Kulkarni



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2024.38.pdf
  • Filesize: 2.29 MB
  • 28 pages

Document Identifiers

Author Details

Vidush Singhal
  • Purdue University, West Lafayette, IN, USA
Chaitanya Koparkar
  • Indiana University, Bloomington, IN, USA
Joseph Zullo
  • Purdue University, West Lafayette, IN, USA
Artem Pelenitsyn
  • Purdue University, West Lafayette, IN, USA
Michael Vollmer
  • University of Kent, UK
Mike Rainey
  • Carnegie Mellon University, Pittsburgh, PA, USA
Ryan Newton
  • Purdue University, West Lafayette, IN, USA
Milind Kulkarni
  • Purdue University, West Lafayette, IN, USA

Cite As Get BibTex

Vidush Singhal, Chaitanya Koparkar, Joseph Zullo, Artem Pelenitsyn, Michael Vollmer, Mike Rainey, Ryan Newton, and Milind Kulkarni. Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms + Data Layouts = Efficient Programs. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 38:1-38:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/LIPIcs.ECOOP.2024.38

Abstract

While programmers know that memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out.
This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbon’s baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Compilers
  • Software and its engineering → Software performance
  • Information systems → Data layout
Keywords
  • Tree traversals
  • Compilers
  • Data layout optimization
  • Dense data layout

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Godmar Back. Datascript - A specification and scripting language for binary data. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering, GPCE '02, pages 66-77, Berlin, Heidelberg, 2002. Springer-Verlag. Google Scholar
  2. Thaïs Baudon, Gabriel Radanne, and Laure Gonnord. Bit-stealing made legal: Compilation for custom memory representations of algebraic data types. Proc. ACM Program. Lang., 7(ICFP), August 2023. URL: https://doi.org/10.1145/3607858.
  3. Zilin Chen, Ambroise Lafont, Liam O'Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson, and Christine Rizkallah. Dargent: A silver bullet for verified data layout refinement. Proc. ACM Program. Lang., 7(POPL), January 2023. URL: https://doi.org/10.1145/3571240.
  4. Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 13-24, New York, NY, USA, 1999. Association for Computing Machinery. URL: https://doi.org/10.1145/301618.301635.
  5. Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 1-12, New York, NY, USA, 1999. Association for Computing Machinery. URL: https://doi.org/10.1145/301618.301633.
  6. Trishul M. Chilimbi and James R. Larus. Using generational garbage collection to implement cache-conscious data placement. In Proceedings of the 1st International Symposium on Memory Management, ISMM '98, pages 37-48, New York, NY, USA, 1998. Association for Computing Machinery. URL: https://doi.org/10.1145/286860.286865.
  7. Adam Chlipala. An optimizing compiler for a purely functional web-application language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pages 10-21, New York, NY, USA, 2015. ACM. URL: https://doi.org/10.1145/2784731.2784741.
  8. Karl Cronburg and Samuel Z. Guyer. Floorplan: Spatial layout in memory management systems. In Proceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2019, pages 81-93, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3357765.3359519.
  9. Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, and Adam Chlipala. Narcissus: Correct-by-construction derivation of decoders and encoders from binary formats. Proc. ACM Program. Lang., 3(ICFP), July 2019. URL: https://doi.org/10.1145/3341686.
  10. Kathleen Fisher and Robert Gruber. Pads: A domain-specific language for processing ad hoc data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 295-304, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1065010.1065046.
  11. Kathleen Fisher and David Walker. The pads project: An overview. In Proceedings of the 14th International Conference on Database Theory, ICDT '11, pages 11-17, New York, NY, USA, 2011. Association for Computing Machinery. URL: https://doi.org/10.1145/1938551.1938556.
  12. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, PLDI '93, pages 237-247, New York, NY, USA, 1993. Association for Computing Machinery. URL: https://doi.org/10.1145/155090.155113.
  13. Juliana Franco, Martin Hagelin, Tobias Wrigstad, Sophia Drossopoulou, and Susan Eisenbach. You can have it all: Abstraction and good cache performance. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2017, pages 148-167, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3133850.3133861.
  14. Juliana Franco, Alexandros Tasos, Sophia Drossopoulou, Tobias Wrigstad, and Susan Eisenbach. Safely abstracting memory layouts, 2019. URL: https://doi.org/10.48550/arXiv.1901.08006.
  15. Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv. Concurrent data representation synthesis. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 417-428, New York, NY, USA, 2012. Association for Computing Machinery. URL: https://doi.org/10.1145/2254064.2254114.
  16. Chris Lattner and Vikram Adve. Automatic pool allocation for disjoint data structures. In Proceedings of the 2002 Workshop on Memory System Performance, MSP '02, pages 13-24, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/773146.773041.
  17. Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary Fernandez, and Artem Gleyzer. Pads/ml: A functional data description language. In Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '07, pages 77-83, New York, NY, USA, 2007. Association for Computing Machinery. URL: https://doi.org/10.1145/1190216.1190231.
  18. Peter J. McCann and Satish Chandra. Packet types: Abstract specification of network protocol messages. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '00, pages 321-333, New York, NY, USA, 2000. Association for Computing Machinery. URL: https://doi.org/10.1145/347059.347563.
  19. Liam O'Connor, Christine Rizkallah, Zilin Chen, Sidney Amani, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Alex Hixon, Gabriele Keller, Toby Murray, et al. Cogent: certified compilation for a functional systems language. arXiv preprint arXiv:1601.05520, 2016. Google Scholar
  20. Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998. Google Scholar
  21. Tahina Ramananandro, Antoine Delignat-Lavaud, Cédric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan Protzenko. Everparse: Verified secure zero-copy parsers for authenticated message formats. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, pages 1465-1482, USA, 2019. USENIX Association. Google Scholar
  22. Vidush Singhal, Chaitanya Koparkar, Joseph Zullo, Artem Pelenitsyn, Michael Vollmer, Mike Rainey, Ryan Newton, and Milind Kulkarni. Optimizing layout of recursive datatypes with marmoset, 2024. URL: https://arxiv.org/abs/2405.17590.
  23. Marcell van Geest and Wouter Swierstra. Generic packet descriptions: Verified parsing and pretty printing of low-level data. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Type-Driven Development, TyDe 2017, pages 30-40, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3122975.3122979.
  24. Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, and Ryan R. Newton. Local: A language for programs operating on serialized data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 48-62, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3314221.3314631.
  25. Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. Compiling Tree Transforms to Operate on Packed Representations. In Peter Müller, editor, 31st European Conference on Object-Oriented Programming (ECOOP 2017), volume 74 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1-26:29, Dagstuhl, Germany, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ECOOP.2017.26.
  26. Qianchuan Ye and Benjamin Delaware. A verified protocol buffer compiler. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, pages 222-233, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3293880.3294105.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail