Parallelizing Julia with a Non-Invasive DSL

Anderson, Todd A.; Liu, Hai; Kuper, Lindsey; Totoni, Ehsan; Vitek, Jan; Shpeisman, Tatiana

doi:10.4230/LIPIcs.ECOOP.2017.4

Abstract

Computational scientists often prototype software using productivity
languages that offer high-level programming abstractions. When higher
performance is needed, they are obliged to rewrite their code in a
lower-level efficiency language. Different solutions have been
proposed to address this trade-off between productivity and
efficiency. One promising approach is to create embedded
domain-specific languages that sacrifice generality for productivity
and performance, but practical experience with DSLs points to some
road blocks preventing widespread adoption. This paper proposes a
non-invasive domain-specific language that makes as few visible
changes to the host programming model as possible.  We present ParallelAccelerator,
a library and compiler for high-level, high-performance scientific
computing in Julia. ParallelAccelerator's programming model is aligned with existing
Julia programming idioms. Our compiler exposes the implicit
parallelism in high-level array-style programs and compiles them to
fast, parallel native code. Programs can also run in "library-only"
mode, letting users benefit from the full Julia environment and
libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary.

The Nengo neural simulator, 2016. URL: http://nengo.ca.
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine Yelick. The landscape of parallel computing research: A view from Berkeley. Technical report, UC Berkeley, 2006. URL: https://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html.
Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012. URL: http://arxiv.org/abs/1209.5145.
Gavin Bierman, Erik Meijer, and Mads Torgersen. Adding dynamic types to C^♯. In Proceedings of the 24th European Conference on Object-oriented Programming, ECOOP'10, pages 76-100, Berlin, Heidelberg, 2010. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=1883978.1883986.
João Bispo, Luís Reis, and João M. P. Cardoso. Techniques for efficient MATLAB-to-C compilation. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2015, pages 7-12, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2774959.2774961.
Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT '11, pages 89-100, Washington, DC, USA, 2011. IEEE Computer Society. URL: http://dx.doi.org/10.1109/PACT.2011.15.
Bryan Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 47-56, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1941553.1941562.
Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovic, James Demmel, Kurt Keutzer, John Shalf, Kathy Yelick, and Armando Fox. SEJITS: Getting productivity and performance with selective embedded JIT specialization. In Workshop on Programmable Models for Emerging Architecture (PMEA), 2009. URL: http://parlab.eecs.berkeley.edu/publication/296.
Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming, DAMP '11, pages 3-14, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1926354.1926358.
Maxime Chevalier-Boisvert, Laurie Hendren, and Clark Verbrugge. Optimizing MATLAB through just-in-time specialization. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 46-65, Berlin, Heidelberg, 2010. Springer-Verlag. URL: http://dx.doi.org/10.1007/978-3-642-11970-5_4.
Matthias Christen, Olaf Schenk, and Helmar Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proceedings of the 2011 IEEE International Parallel &Distributed Processing Symposium, IPDPS '11, pages 676-687, Washington, DC, USA, 2011. IEEE Computer Society. URL: http://dx.doi.org/10.1109/IPDPS.2011.70.
Matthias Christen, Olaf Schenk, and Yifeng Cui. Patus for convenient high-performance stencils: Evaluation in earthquake simulations. In Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, pages 1-10, Washington, DC, USA, 2012. IEEE Computer Society. URL: http://dx.doi.org/10.1109/SC.2012.95.
Berthold K. P. Horn and Brian G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185-203, August 1981. URL: http://dx.doi.org/10.1016/0004-3702(81)90024-2.
Tomas Kalibera, Petr Maj, Floreal Morandat, and Jan Vitek. A fast abstract syntax tree interpreter for R. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '14, pages 89-102, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2576195.2576205.
Shoaib Ashraf Kamil. Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages. PhD thesis, EECS Department, University of California, Berkeley, January 2013. URL: http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-1.html.
Vineet Kumar and Laurie Hendren. MIX10: Compiling MATLAB to X10 for high performance. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages &Applications, OOPSLA '14, pages 617-636, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2660193.2660218.
Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: A LLVM-based Python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM '15, pages 7:1-7:6, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2833157.2833162.
Dahua Lin. Devectorize.jl, 2015. URL: https://github.com/lindahua/Devectorize.jl.
Derek Lockhart, Gary Zibrat, and Christopher Batten. PyMTL: A unified framework for vertically integrated computer architecture research. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, pages 280-292, Washington, DC, USA, 2014. IEEE Computer Society. URL: http://dx.doi.org/10.1109/MICRO.2014.50.
Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 429-443, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2694344.2694364.
Stefan C. Müller, Gustavo Alonso, Adam Amara, and André Csillaghy. Pydron: Semi-automatic parallelization for multi-core and the cloud. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 645-659, Berkeley, CA, USA, 2014. USENIX Association. URL: http://dl.acm.org/citation.cfm?id=2685048.2685100.
Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 152-163, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1993498.1993517.
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519-530, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2491956.2462176.
Tiark Rompf and Martin Odersky. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE '10, pages 127-136, New York, NY, USA, 2010. ACM. URL: http://dx.doi.org/10.1145/1868294.1868314.
Lukas Stadler, Adam Welc, Christian Humer, and Mick Jordan. Optimizing R language execution via aggressive speculation. In Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, pages 84-95, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2989225.2989236.
Arvind Sujeeth. OptiML language specification 0.2, 2012. URL: https://stanford-ppl.github.io/Delite/optiml/downloads/optiml-spec.pdf.
Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand Atreya, Martin Odersky, and Kunle Olukotun. OptiML: An implicitly parallel domain-specific language for machine learning. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML '11, pages 609-616, New York, NY, USA, June 2011. ACM.
Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. Composition and reuse with compiled domain-specific languages. In Proceedings of the 27th European Conference on Object-Oriented Programming, ECOOP'13, pages 52-78, Berlin, Heidelberg, 2013. Springer-Verlag. URL: http://dx.doi.org/10.1007/978-3-642-39038-8_3.
Justin Talbot, Zachary DeVito, and Pat Hanrahan. Riposte: A trace-driven compiler and parallel VM for vector code in R. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, pages 43-52, New York, NY, USA, 2012. ACM. URL: http://dx.doi.org/10.1145/2370816.2370825.
Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. The Pochoir stencil compiler. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 117-128, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1989493.1989508.

Parallelizing Julia with a Non-Invasive DSL

Authors Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, Tatiana Shpeisman

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Parallelizing Julia with a Non-Invasive DSL

Authors Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, Tatiana Shpeisman

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Supplementary Materials

References

Thanks for your feedback!

Could not send message