Compiling Tree Transforms to Operate on Packed Representations

Authors Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, Ryan R. Newton



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2017.26.pdf
  • Filesize: 0.69 MB
  • 29 pages

Document Identifiers

Author Details

Michael Vollmer
Sarah Spall
Buddhika Chamith
Laith Sakka
Chaitanya Koparkar
Milind Kulkarni
Sam Tobin-Hochstadt
Ryan R. Newton

Cite As Get BibTex

Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. Compiling Tree Transforms to Operate on Packed Representations. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 74, pp. 26:1-26:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ECOOP.2017.26

Abstract

When written idiomatically in most programming languages, programs
that traverse and construct trees operate over pointer-based data
structures, using one heap object per-leaf and per-node. This
representation is efficient for random access and shape-changing
modifications, but for traversals, such as compiler passes, that
process most or all of a tree in bulk, it can be inefficient. In this
work we instead compile tree traversals to operate on
pointer-free pre-order serializations of trees. On modern
architectures such programs often run significantly faster than
their pointer-based counterparts, and additionally are directly suited
to storage and transmission without requiring marshaling.

We present a prototype compiler, Gibbon, that compiles a
small first-order, purely functional language sufficient for tree
traversals. The compiler transforms this language into intermediate
representation with explicit pointers into input and output buffers
for packed data. The key compiler technologies include an effect
system for capturing traversal behavior, combined with an algorithm to
insert destination cursors. We evaluate our compiler on tree
transformations over a real-world dataset of source-code syntax trees.
For traversals touching the whole tree, such as maps and folds, packed
data allows speedups of over 2x compared to a highly-optimized
pointer-based baseline.

Subject Classification

Keywords
  • compiler optimization
  • program transformation
  • tree traversal

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18:509-517, September 1975. URL: http://dx.doi.org/10.1145/361002.361007.
  2. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Notices, 30:207-216, August 1995. URL: http://dx.doi.org/10.1145/209937.209958.
  3. Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT '11, pages 89-100. IEEE, 2011. Google Scholar
  4. TM Chilimbi, MD Hill, and JR Larus. Cache-conscious structure layout. ACM SIGPLAN Notices, 1999. URL: http://dl.acm.org/citation.cfm?id=301633.
  5. Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. In Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, PLDI '99, pages 13-24, New York, NY, USA, 1999. ACM. URL: http://dx.doi.org/10.1145/301618.301635.
  6. Trishul M. Chilimbi and James R. Larus. Using generational garbage collection to implement cache-conscious data placement, 1999. URL: http://dx.doi.org/10.1145/301589.286865.
  7. Adam Chlipala. An optimizing compiler for a purely functional web-application language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pages 10-21, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2784731.2784741.
  8. Matthew Flatt and PLT. Reference: Racket. Technical report, PLT Design, Inc., 2010. URL: http://racket-lang.org/tr1/.
  9. Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. General transformations for gpu execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing), SC '13, 2013. Google Scholar
  10. Alexander G Gray and Andrew W Moore. N-body'problems in statistical learning. In NIPS, volume 4, pages 521-527. Citeseer, 2000. Google Scholar
  11. Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. Region-based memory management in Cyclone. In PLDI, 2002. URL: http://dl.acm.org/citation.cfm?id=512563.
  12. Michael Hapala, Tomas Davidovic, Ingo Wald, Vlastimil Havran, and Philipp Slusallek. Efficient Stack-less BVH Traversal for Ray Tracing. In Proceedings 27th Spring Conference of Computer Graphics (SCCG) 2011, pages 29-34, 2011. Google Scholar
  13. Kohei Honda, Vasco Thudichum Vasconcelos, and Makoto Kubo. Language primitives and type discipline for structured communication-based programming. In Proceedings of the 7th European Symposium on Programming: Programming Languages and Systems, ESOP '98, pages 122-138, London, UK, UK, 1998. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=645392.651876.
  14. Aaron W. Hsu. The Key to a Data Parallel Compiler. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2016, pages 32-40, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2935323.2935331.
  15. James Larus. Restructuring Symbolic Programs for Concurrent Execution on Multiprocessors. PhD thesis, University of California at Berkeley, 1989. Google Scholar
  16. Chris Lattner and Vikram Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. ACM SIGPLAN Notices, 40:129-142, 2005. URL: http://dx.doi.org/10.1145/1065010.1065027.
  17. Chris Lattner and Vikram S. Adve. Transparent pointer compression for linked data structures. In Proceedings of the 2005 Workshop on Memory System Performance, MSP '05, pages 24-35, New York, NY, USA, 2005. ACM. URL: http://dx.doi.org/10.1145/1111583.1111587.
  18. Junichiro Makino. Vectorization of a treecode. J. Comput. Phys., 87:148-160, March 1990. URL: http://dx.doi.org/10.1016/0021-9991(90)90231-O.
  19. Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising purely functional GPU programs. In ICFP: International Conference on Functional Programming, pages 49-60. ACM, 2013. Google Scholar
  20. Leo A. Meyerovich, Todd Mytkowicz, and Wolfram Schulte. Data parallel programming for irregular tree computations. In HotPAR. USENIX, May 2011. URL: https://www.microsoft.com/en-us/research/publication/data-parallel-programming-for-irregular-tree-computations/.
  21. Stefan Popov, Johannes Günther, Hans-Peter Seidel, and Philipp Slusallek. Stackless kd-tree traversal for high performance GPU ray tracing. Computer Graphics Forum, 26(3):415-424, September 2007. (Proceedings of Eurographics). Google Scholar
  22. Bin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, Tomi Poutanen, and Wolfram Schulte. SIMD parallelization of applications that traverse irregular data structures. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013, pages 20:1-20:10. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/CGO.2013.6494989.
  23. Bin Ren, Todd Mytkowicz, and Gagan Agrawal. A portable optimization engine for accelerating irregular data-traversal applications on SIMD architectures. TACO, 11(2):16:1-16:31, 2014. URL: http://dx.doi.org/10.1145/2632215.
  24. Bo Joel Svensson, Mary Sheeran, and Ryan R. Newton. Design exploration through code-generating dsls. Commun. ACM, 57(6):56-63, June 2014. Google Scholar
  25. Sam Tobin-Hochstadt and Matthias Felleisen. The design and implementation of typed scheme. In George C. Necula and Philip Wadler, editors, Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, San Francisco, California, USA, January 7-12, 2008, pages 395-406. ACM, 2008. URL: http://dx.doi.org/10.1145/1328438.1328486.
  26. Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Inf. Comput., 132(2):109-176, 1997. Google Scholar
  27. D. N. Truong, F. Bodin, and A. Seznec. Improving cache behavior of dynamically allocated data structures. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, PACT '98, pages 322-, Washington, DC, USA, 1998. IEEE Computer Society. URL: http://portal.acm.org/citation.cfm?id=522344.825680.
  28. Kenton Varda. Cap'n Proto, 2015. URL: https://capnproto.org/.
  29. Edward Z. Yang, Giovanni Campagna, Ömer S. Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, and Ryan R. Newton. Efficient communication and collection with compact normal forms. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pages 362-374, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2784731.2784735.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail