Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems

Authors Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Arvind K. Sujeeth, Manohar Jonnalagedda, Nada Amin, Georg Ofenbeck, Alen Stojanov, Yannis Klonatos, Mohammad Dashti, Christoph Koch, Markus Püschel, Kunle Olukotun

Thumbnail PDF


  • Filesize: 0.91 MB
  • 24 pages

Document Identifiers

Author Details

Tiark Rompf
Kevin J. Brown
HyoukJoong Lee
Arvind K. Sujeeth
Manohar Jonnalagedda
Nada Amin
Georg Ofenbeck
Alen Stojanov
Yannis Klonatos
Mohammad Dashti
Christoph Koch
Markus Püschel
Kunle Olukotun

Cite AsGet BibTex

Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Arvind K. Sujeeth, Manohar Jonnalagedda, Nada Amin, Georg Ofenbeck, Alen Stojanov, Yannis Klonatos, Mohammad Dashti, Christoph Koch, Markus Püschel, and Kunle Olukotun. Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems. In 1st Summit on Advances in Programming Languages (SNAPL 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 32, pp. 238-261, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Most performance critical software is developed using very low-level techniques. We argue that this needs to change, and that generative programming is an effective avenue to enable the use of high-level languages and programming techniques in many such circumstances.
  • Performance
  • Generative Programming
  • Staging
  • DSLs


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Mads Sig Ager, Dariusz Biernacki, Olivier Danvy, and Jan Midtgaard. A functional correspondence between evaluators and abstract machines. In Proceedings of the 5th ACM SIGPLAN International Conference on Principles and Practice of Declaritive Programming, PPDP'03, pages 8-19, New York, NY, USA, 2003. ACM. Google Scholar
  2. Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. Dbtoaster: Higher-order delta processing for dynamic, frequently fresh views. PVLDB, 5(10):968-979, 2012. Google Scholar
  3. Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Tiling imperfectly-nested loop nests. In Supercomputing, ACM/IEEE 2000 Conference, pages 31-31. IEEE, 2000. Google Scholar
  4. Saman P. Amarasinghe. Petabricks: a language and compiler based on autotuning. In Manolis Katevenis, Margaret Martonosi, Christos Kozyrakis, and Olivier Temam, editors, High Performance Embedded Architectures and Compilers, 6th International Conference, HiPEAC 2011, Heraklion, Crete, Greece, January 24-26, 2011. Proceedings, page 3. ACM, 2011. Google Scholar
  5. Arvind. Bluespec: A language for hardware design, simulation, synthesis and verification. In 1st ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2003), 24-26 June 2003, Mont Saint-Michel, France, Proceedings, page 249. IEEE Computer Society, 2003. Google Scholar
  6. Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. Lime: a Java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA, pages 89-108, New York, NY, USA, 2010. ACM. Google Scholar
  7. Emil Axelsson, Koen Claessen, Mary Sheeran, Josef Svenningsson, David Engdal, and Anders Persson. The design and implementation of feldspar: An embedded language for digital signal processing. In Proceedings of the 22nd international conference on Implementation and application of functional languages, IFL'10, pages 121-136, Berlin, Heidelberg, 2011. Springer-Verlag. Google Scholar
  8. Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avizienis, John Wawrzynek, and Krste Asanovic. Chisel: constructing hardware in a scala embedded language. In Patrick Groeneveld, Donatella Sciuto, and Soha Hassoun, editors, The 49th Annual Design Automation Conference 2012, DAC'12, San Francisco, CA, USA, June 3-7, 2012, pages 1216-1225. ACM, 2012. Google Scholar
  9. Federico Bassetti, Kei Davis, and Daniel J. Quinlan. C++ expression templates performance issues in scientific computing. In IPPS/SPDP, pages 635-639, 1998. Google Scholar
  10. Olav Beckmann, Alastair Houghton, Michael Mellor, and Paul H.J. Kelly. Runtime code generation in c++ as a foundation for domain-specific optimisation. In Christian Lengauer, Don Batory, Charles Consel, and Martin Odersky, editors, Domain-Specific Program Generation, volume 3016 of Lecture Notes in Computer Science, pages 77-210. Springer Berlin / Heidelberg, 2004. Google Scholar
  11. Andrzej Bialecki, Michael Cafarella, Doug Cutting, and Owen O'Malley. Hadoop: a framework for running applications on large clusters built of commodity hardware. Wiki at http://lucene. apache. org/hadoop, 11, 2005. Google Scholar
  12. Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In International Conference on Supercomputing (ICS), pages 340-347, 1997. Google Scholar
  13. Guy E. Blelloch. Programming parallel algorithms. Commun. ACM, 39(3):85-97, 1996. Google Scholar
  14. Martin Bravenboer, Arthur van Dam, Karina Olmos, and Eelco Visser. Program transformation with scoped dynamic rewrite rules. Fundam. Inf., 69:123-178, July 2005. Google Scholar
  15. Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. A heterogeneous parallel framework for domain-specific languages. In PACT, 2011. Google Scholar
  16. Jacques Carette, Oleg Kiselyov, and Chung chieh Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program., 19(5):509-543, 2009. Google Scholar
  17. Bryan Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: compiling an embedded data parallel language. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP, pages 47-56, New York, NY, USA, 2011. ACM. Google Scholar
  18. H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language Virtualization for Heterogeneous Parallel Computing. In Onward!, 2010. Google Scholar
  19. Bradford L. Chamberlain, David Callahan, and Hans P. Zima. Parallel programmability and the chapel language. IJHPCA, 21(3):291-312, 2007. Google Scholar
  20. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI. ACM, 2010. Google Scholar
  21. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519-538, 2005. Google Scholar
  22. James Cheney, Sam Lindley, and Philip Wadler. A practical theory of language-integrated query. In Greg Morrisett and Tarmo Uustalu, editors, ACM SIGPLAN International Conference on Functional Programming, ICFP'13, Boston, MA, USA - September 25 - 27, 2013, pages 403-416. ACM, 2013. Google Scholar
  23. Cliff Click and Keith D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17:181-196, March 1995. Google Scholar
  24. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP, pages 315-326, 2007. Google Scholar
  25. James Crotinger, Scott Haney, Stephen Smith, and Steve Karmesin. PETE: The portable expression template engine. Dr. Dobb’s J., October 1999. Google Scholar
  26. Kei Davis and Daniel J. Quinlan. Rose: An optimizing transformation system for c++ array-class libraries. In Serge Demeyer and Jan Bosch, editors, ECOOP Workshops, volume 1543 of Lecture Notes in Computer Science, pages 452-453. Springer, 1998. Google Scholar
  27. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, OSDI, pages 137-150, 2004. Google Scholar
  28. Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. Terra: a multi-stage language for high-performance computing. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'13, Seattle, WA, USA, June 16-19, 2013, pages 105-116, 2013. Google Scholar
  29. Zachary DeVito, Daniel Ritchie, Matthew Fisher, Alex Aiken, and Pat Hanrahan. First-class runtime generation of high-performance types using exotypes. In Michael F. P. O'Boyle and Keshav Pingali, editors, ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'14, Edinburgh, United Kingdom - June 09 - 11, 2014, page 11. ACM, 2014. Google Scholar
  30. Torbjörn Ekman and Görel Hedin. The jastadd system - modular extensible compiler construction. Sci. Comput. Program., 69(1-3):14-26, 2007. Google Scholar
  31. Conal Elliott, Sigbjorn Finne, and Oege de Moor. Compiling embedded languages. J. Funct. Program., 13(3):455-481, 2003. Google Scholar
  32. J. McGraw et. al. SISAL: Streams and iterators in a single assignment language, language reference manual. Technical Report M-146, Lawrence Livermore National Laboratory, March 1985. Google Scholar
  33. Daniel Frampton, Stephen M. Blackburn, Perry Cheng, Robin Garner, David Grove, J. Eliot B. Moss, and Sergey I. Salishev. Demystifying magic: high-level low-level programming. In Antony L. Hosking, David F. Bacon, and Orran Krieger, editors, Proceedings of the 5th International Conference on Virtual Execution Environments, VEE 2009, Washington, DC, USA, March 11-13, 2009, pages 81-90. ACM, 2009. Google Scholar
  34. Matteo Frigo. A fast fourier transform compiler. In PLDI, pages 169-180, 1999. Google Scholar
  35. Yoshihiko Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12(4):381-391, 1999. Google Scholar
  36. Yoshihiko Futamura. Partial evaluation of computation process, revisited. Higher-Order and Symbolic Computation, 12(4):377-380, 1999. Google Scholar
  37. Nithin George, HyoukJoong Lee, David Novo, Tiark Rompf, Kevin J. Brown, Arvind K. Sujeeth, Martin Odersky, Kunle Olukotun, and Paolo Ienne. Hardware system synthesis from domain-specific languages. In 24th International Conference on Field Programmable Logic and Applications, FPL 2014, Munich, Germany, 2-4 September, 2014, pages 1-8. IEEE, 2014. Google Scholar
  38. Nithin George, David Novo, Tiark Rompf, Martin Odersky, and Paolo Ienne. Making domain-specific hardware synthesis tools cost-efficient. In 2013 International Conference on Field-Programmable Technology, FPT 2013, Kyoto, Japan, December 9-11, 2013, pages 120-127. IEEE, 2013. Google Scholar
  39. Andy Gill. Domain-specific languages and code synthesis using haskell. Queue, 12(4):30:30-30:43, April 2014. Google Scholar
  40. Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012. Google Scholar
  41. Clemens Grelck, Karsten Hinckfuß, and Sven-Bodo Scholz. With-loop fusion for data locality and parallelism. In Andrew Butterfield, Clemens Grelck, and Frank Huch, editors, Implementation and Application of Functional Languages, IFL, pages 178-195. Springer Berlin / Heidelberg, 2006. Google Scholar
  42. Martin Hanger, Tor Arne Johansen, Geir Kare Mykland, and Aage Skullestad. Dynamic model predictive control allocation using CVXGEN. In 9th IEEE International Conference on Control and Automation, ICCA 2011, Santiago, Chile, December 19-21, 2011, pages 417-422. IEEE, 2011. Google Scholar
  43. High Performance Fortran. URL: http://hpff.rice.edu/index.htm.
  44. Christian Hofer, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors. Polymorphic embedding of dsls. In Yannis Smaragdakis and Jeremy G. Siek, editors, GPCE, pages 137-148. ACM, 2008. Google Scholar
  45. P. Hudak. Building domain-specific embedded languages. ACM Computing Surveys, 28, 1996. Google Scholar
  46. Paul Hudak. Modular domain specific languages and tools. In Proceedings of Fifth International Conference on Software Reuse, pages 134-142, June 1998. Google Scholar
  47. Intel. Intel array building blocks. URL: http://software.intel.com/en-us/articles/intel-array-building-blocks.
  48. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys'07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys, pages 59-72, New York, NY, USA, 2007. ACM. Google Scholar
  49. Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD'09: Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD, pages 987-994, New York, NY, USA, 2009. ACM. Google Scholar
  50. Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In FSTTCS, pages 383-414, 2008. Google Scholar
  51. Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. Playing by the rules: rewriting as a practical optimisation technique in ghc. In Haskell Workshop, 2001. Google Scholar
  52. Manohar Jonnalagedda, Thierry Coppey, Sandro Stucki, Tiark Rompf, and Martin Odersky. Staged parser combinators for efficient data processing. In Andrew P. Black and Todd D. Millstein, editors, Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 637-653. ACM, 2014. Google Scholar
  53. Ulrik Jørring and William L. Scherlis. Compilers and staging transformations. In Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, St. Petersburg Beach, Florida, USA, January 1986, pages 86-96. ACM Press, 1986. Google Scholar
  54. Vojin Jovanovic, Amir Shaikhha, Sandro Stucki, Vladimir Nikolaev, Christoph Koch, and Martin Odersky. Yin-yang: concealing the deep embedding of dsls. In Ulrik Pagh Schultz and Matthew Flatt, editors, Generative Programming: Concepts and Experiences, GPCE'14, Vasteras, Sweden, September 15-16, 2014, pages 73-82. ACM, 2014. Google Scholar
  55. Guy L. Steele Jr. Parallel programming and parallel abstractions in fortress. In IEEE PACT, page 157, 2005. Google Scholar
  56. Steve Karmesin, James Crotinger, Julian Cummings, Scott Haney, William Humphrey, John Reynders, Stephen Smith, and Timothy J. Williams. Array design and expression evaluation in pooma ii. In ISCOPE, pages 231-238, 1998. Google Scholar
  57. Lennart C.L. Kats and Eelco Visser. The spoofax language workbench: rules for declarative specification of languages and ides. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA'10, pages 444-463, New York, NY, USA, 2010. ACM. Google Scholar
  58. Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387endash408, 2005. Google Scholar
  59. Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853-864, 2014. Google Scholar
  60. Christoph Koch. Abstraction without regret in data management systems. In CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org, 2013. Google Scholar
  61. Christoph Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70-79, 2014. Google Scholar
  62. Grzegorz Kossakowski, Nada Amin, Tiark Rompf, and Martin Odersky. Javascript as an embedded dsl. In ECOOP, pages 409-434, 2012. Google Scholar
  63. Shriram Krishnamurthi. Linguistic reuse. PhD thesis, Computer Science, Rice University, Houston, 2001. Google Scholar
  64. Peter J. Landin. The next 700 programming languages. Commun. ACM, 9(3):157-166, 1966. Google Scholar
  65. James R. Larus and Galen C. Hunt. The singularity system. Commun. ACM, 53(8):72-79, 2010. Google Scholar
  66. C. Lattner and V. Adve. Llvm: a compilation framework for lifelong program analysis transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75-86, 2004. Google Scholar
  67. HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Hassan Chafi, Tiark Rompf, Martin Odersky, and Kunle Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31(5):42-53, 2011. Google Scholar
  68. HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, and Kunle Olukotun. Locality-aware mapping of nested parallel patterns on gpus. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Micro, 2014. Google Scholar
  69. Daan Leijen and Erik Meijer. Domain specific embedded compilers. In DSL, pages 109-122, 1999. Google Scholar
  70. Sorin Lerner, David Grove, and Craig Chambers. Composing dataflow analyses and transformations. SIGPLAN Not., 37:270-282, January 2002. Google Scholar
  71. Sorin Lerner, Todd D. Millstein, and Craig Chambers. Automatically proving the correctness of compiler optimizations. In PLDI, pages 220-231, 2003. Google Scholar
  72. Geoffrey Mainland and Greg Morrisett. Nikola: embedding compiled GPU functions in Haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell'10, pages 67-78, New York, NY, USA, 2010. ACM. Google Scholar
  73. Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising purely functional GPU programs. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP'13, pages 49-60, New York, NY, USA, 2013. ACM. Google Scholar
  74. Frank McSherry, Michael Isard, and Derek G. Murray. Scalability! but at what cost? Technical report, Microsoft Research, 2015. Google Scholar
  75. José M. F. Moura, Markus Püschel, David Padua, and Jack Dongarra. Scanning the issue: Special issue on program generation, optimization, and platform adaptation. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):211-215, 2005. Google Scholar
  76. Fabian Nagel, Gavin M. Bierman, and Stratis D. Viglas. Code generation for efficient query processing in managed runtimes. PVLDB, 7(12):1095-1106, 2014. Google Scholar
  77. Shayan Najd, Sam Lindley, Josef Svenningsson, and Philip Wadler. Everything old is new again: Quoted domain specific languages. Technical report, University of Edinburgh, 2015. Google Scholar
  78. Thomas Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539-550, 2011. Google Scholar
  79. Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers. Polyglot: An extensible compiler framework for java. In CC, pages 138-152, 2003. Google Scholar
  80. Nathaniel Nystrom, Derek White, and Kishen Das. Firepile: run-time compilation for GPUs in Scala. In Proceedings of the 10th ACM international conference on Generative programming and component engineering, GPCE, pages 107-116, New York, NY, USA, 2011. ACM. Google Scholar
  81. Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. Spiral in scala: towards the systematic construction of generators for performance libraries. In Generative Programming: Concepts and Experiences, GPCE'13, Indianapolis, IN, USA - October 27 - 28, 2013, pages 125-134, 2013. Google Scholar
  82. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, Muhammad Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. The tao of parallelism in algorithms. In Mary W. Hall and David A. Padua, editors, Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, pages 12-25. ACM, 2011. Google Scholar
  83. M. Püschel, J.M.F. Moura, J.R. Johnson, D. Padua, M.M. Veloso, B.W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proceedings of the IEEE, 93(2):232 -275, feb. 2005. Google Scholar
  84. Markus Püschel, Franz Franchetti, and Yevgen Voronenko. Encyclopedia of Parallel Computing, chapter Spiral. Springer, 2011. Google Scholar
  85. Markus Püschel, José M. F. Moura, Bryan Singer, Jianxin Xiong, Jeremy Johnson, David A. Padua, Manuela M. Veloso, and Robert W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1):21-45, 2004. Google Scholar
  86. Daniel J. Quinlan, Markus Schordan, Qing Yi, and Andreas Sæbjørnsen. Classification and utilization of abstractions for optimization. In ISoLA (Preliminary proceedings), pages 2-9, 2004. Google Scholar
  87. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman P. Amarasinghe, and Frédo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph., 31(4):32, 2012. Google Scholar
  88. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'13, Seattle, WA, USA, June 16-19, 2013, pages 519-530, 2013. Google Scholar
  89. J.C. Reynolds. User-defined types and procedural data structures as complementary approaches to data abstraction. CMU Technical Report, 1975. Google Scholar
  90. Julien Richard-Foy, Olivier Barais, and Jean-Marc Jézéquel. Efficient high-level abstractions for web programming. In Generative Programming: Concepts and Experiences, GPCE'13, Indianapolis, IN, USA - October 27 - 28, 2013, pages 53-60, 2013. Google Scholar
  91. Tiark Rompf. Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012. Google Scholar
  92. Tiark Rompf and Nada Amin. A SQL to C compiler in 500 lines of code. Technical report, Purdue University, 2015. Google Scholar
  93. Tiark Rompf, Nada Amin, Adriaan Moors, Philipp Haller, and Martin Odersky. Scala-virtualized: linguistic reuse for deep embeddings. Higher-Order and Symbolic Computation, pages 1-43, 2013. Google Scholar
  94. Tiark Rompf and Martin Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. In Proceedings of the ninth international conference on Generative programming and component engineering, GPCE, pages 127-136, New York, NY, USA, 2010. ACM. Google Scholar
  95. Tiark Rompf and Martin Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121-130, 2012. Google Scholar
  96. Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. Optimizing data structures in high-level programs. In POPL, 2013. Google Scholar
  97. Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Building-blocks for performance oriented dsls. In Olivier Danvy and Chung-chieh Shan, editors, Proceedings IFIP Working Conference on Domain-Specific Languages, DSL 2011, Bordeaux, France, 6-8th September 2011., volume 66 of EPTCS, pages 93-117, 2011. Google Scholar
  98. Sven-Bodo Scholz. Single assignment c: efficient support for high-level array operations in a functional setting. J. Funct. Program., 13(6):1005-1059, 2003. Google Scholar
  99. Jeremy G. Siek and Andrew Lumsdaine. The Matrix Template Library: A generic programming approach to high performance numerical linear algebra. In International Symposium on Computing in Object-Oriented Parallel Environments, number 1505 in Lecture Notes in Computer Science, pages 59-70, 1998. Google Scholar
  100. Alexander Slesarenko. Lightweight polytypic staging: a new approach to an implementation of nested data parallelism in scala. In Scala Workshop, 2012. Google Scholar
  101. Alexander Slesarenko, Alexander Filippov, and Alexey Romanov. First-class isomorphic specialization by staged evaluation. In Workshop on Generic Programming (WGP), 2014. Google Scholar
  102. G.L. Steele. Growing a language. Higher-Order and Symbolic Computation, 12(3):221-236, 1999. Google Scholar
  103. Alen Stojanov, Georg Ofenbeck, Tiark Rompf, and Markus Püschel. Abstracting vector architectures in library generators: Case study convolution filters. In Laurie J. Hendren, Alex Rubinsteyn, Mary Sheeran, and Jan Vitek, editors, ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, Edinburgh, United Kingdom, June 12-13, 2014, page 14. ACM, 2014. Google Scholar
  104. Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. The end of an architectural era (it’s time for a complete rewrite). In Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, and Erich J. Neuhold, editors, VLDB, pages 1150-1160. ACM, 2007. Google Scholar
  105. Arvind K. Sujeeth, Austin Gibbons, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Martin Odersky, and Kunle Olukotun. Forge: generating a high performance DSL implementation from a declarative specification. In Generative Programming: Concepts and Experiences, GPCE'13, Indianapolis, IN, USA - October 27 - 28, 2013, pages 145-154, 2013. Google Scholar
  106. Arvind K. Sujeeth, HyoukJoong. Lee, Kevin J. Brown, Tiark Rompf, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning, ICML, 2011. Google Scholar
  107. Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksander Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. Composition and reuse with compiled domain-specific languages. In European Conference on Object Oriented Programming, ECOOP, 2013. Google Scholar
  108. Bo Joel Svensson, Mary Sheeran, and Ryan Newton. Design exploration through code-generating dsls. Queue, 12(4):40:40-40:52, April 2014. Google Scholar
  109. Walid Taha and Tim Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211-242, 2000. Google Scholar
  110. Ross Tate, Michael Stepp, and Sorin Lerner. Generating compiler optimizations from proofs. In POPL, pages 389-402, 2010. Google Scholar
  111. Sam Tobin-Hochstadt and Matthias Felleisen. The design and implementation of typed scheme. In George C. Necula and Philip Wadler, editors, POPL, pages 395-406. ACM, 2008. Google Scholar
  112. Sam Tobin-Hochstadt, Vincent St-Amour, Ryan Culpepper, Matthew Flatt, and Matthias Felleisen. Languages as libraries. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI'11, pages 132-141, New York, NY, USA, 2011. ACM. Google Scholar
  113. D. Vandevoorde and N.M. Josuttis. C++ templates: the Complete Guide. Addison-Wesley Professional, 2003. Google Scholar
  114. Todd L. Veldhuizen. Expression templates, C++ gems. SIGS Publications, Inc., New York, NY, 1996. Google Scholar
  115. Todd L. Veldhuizen. Arrays in blitz++. In ISCOPE, pages 223-230, 1998. Google Scholar
  116. Todd L. Veldhuizen. Active Libraries and Universal Languages. PhD thesis, Indiana University Computer Science, May 2004. Google Scholar
  117. Todd L. Veldhuizen and Jeremy G. Siek. Combining optimizations, combining theories. Technical report, Indiana University, 2008. Google Scholar
  118. Philip Wadler. Deforestation: Transforming programs to eliminate trees. Theor. Comput. Sci., 73(2):231-248, 1990. Google Scholar
  119. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad-hoc. In POPL, pages 60-76, 1989. Google Scholar
  120. R. Clinton Whaley, Antoine Petitet, and Jack Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001. Google Scholar
  121. Michael E Wolf and Monica S Lam. A loop transformation theory and an algorithm to maximize parallelism. Parallel and Distributed Systems, IEEE Transactions on, 2(4):452-471, 1991. Google Scholar
  122. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, NSDI, 2011. Google Scholar
  123. Marcin Zukowski, Peter A. Boncz, Niels Nes, and Sándor Héman. Monetdb/x100 - a dbms in the cpu cache. IEEE Data Eng. Bull., 28(2):17-22, 2005. Google Scholar