The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication

Author Lorenzo De Stefani



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2019.33.pdf
  • Filesize: 0.63 MB
  • 16 pages

Document Identifiers

Author Details

Lorenzo De Stefani
  • Department of Computer Science, Brown University, United States of America

Acknowledgements

I want to thank Gianfranco Bilardi at the University of Padova (Italy) for inital conversations on the topic of this work, and Megumi Ando at Brown University (USA) for her feedback on early versions of this manuscript.

Cite As Get BibTex

Lorenzo De Stefani. The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 33:1-33:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ISAAC.2019.33

Abstract

Asymptotically tight lower bounds are derived for the I/O complexity of a general class of hybrid algorithms computing the product of n x n square matrices combining "Strassen-like" fast matrix multiplication approach with computational complexity Theta(n^{log_2 7}), and "standard" matrix multiplication algorithms with computational complexity Omega (n^3). We present a novel and tight Omega ((n/max{sqrt M, n_0})^{log_2 7}(max{1,(n_0)/M})^3M) lower bound for the I/O complexity of a class of "uniform, non-stationary" hybrid algorithms when executed in a two-level storage hierarchy with M words of fast memory, where n_0 denotes the threshold size of sub-problems which are computed using standard algorithms with algebraic complexity Omega (n^3). 
The lower bound is actually derived for the more general class of "non-uniform, non-stationary" hybrid algorithms which allow recursive calls to have a different structure, even when they refer to the multiplication of matrices of the same size and in the same recursive level, although the quantitative expressions become more involved. Our results are the first I/O lower bounds for these classes of hybrid algorithms. All presented lower bounds apply even if the recomputation of partial results is allowed and are asymptotically tight.
The proof technique combines the analysis of the Grigoriev’s flow of the matrix multiplication function, combinatorial properties of the encoding functions used by fast Strassen-like algorithms, and an application of the Loomis-Whitney geometric theorem for the analysis of standard matrix multiplication algorithms. Extensions of the lower bounds for a parallel model with P processors are also discussed.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • I/O complexity
  • Hybrid Algorithm
  • Matrix Multiplication
  • Recomputation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal, Bowen Alpern, Ashok Chandra, and Marc Snir. A model for hierarchical memory. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 305-314. ACM, 1987. Google Scholar
  2. Alok Aggarwal, Jeffrey Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, 1988. Google Scholar
  3. G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In Design and Analysis of Algorithms, pages 13-36. Springer, 2012. Google Scholar
  4. Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 77-79. ACM, 2012. Google Scholar
  5. Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. Communication-optimal parallel algorithm for Strassen’s matrix multiplication. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 193-204. ACM, 2012. Google Scholar
  6. Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing, 32(6):3495-3523, 2010. Google Scholar
  7. Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications, 32(3):866-901, 2011. Google Scholar
  8. Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM (JACM), 59(6):32, 2012. Google Scholar
  9. Sandeep N Bhatt, Gianfranco Bilardi, and Geppino Pucci. Area-time tradeoffs for universal VLSI circuits. Theoretical Computer Science, 408(2-3):143-150, 2008. Google Scholar
  10. Gianfranco Bilardi and Lorenzo De Stefani. The I/O complexity of Strassen’s matrix multiplication with recomputation. In Workshop on Algorithms and Data Structures, pages 181-192. Springer, 2017. Google Scholar
  11. Gianfranco Bilardi and Lorenzo De Stefani. The I/O complexity of Toom-Cook integer multiplication. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2034-2052. SIAM, 2019. Google Scholar
  12. Gianfranco Bilardi and Enoch Peserico. A characterization of temporal locality and its portability across memory hierarchies. In International Colloquium on Automata, Languages, and Programming, pages 128-139. Springer, 2001. Google Scholar
  13. Gianfranco Bilardi and Franco P Preparata. Horizons of parallel computation. Journal of Parallel and Distributed Computing, 27(2):172-182, 1995. Google Scholar
  14. Gianfranco Bilardi and Franco P Preparata. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds. Theory of Computing Systems, 32(5):531-559, 1999. Google Scholar
  15. Lynn Elliot Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University-Bozeman, College of Engineering, 1969. Google Scholar
  16. National Research Council et al. Getting up to speed: The future of supercomputing. National Academies Press, 2005. Google Scholar
  17. Lorenzo De Stefani. On space constrained computations. PhD thesis, University of Padova, 2016. Google Scholar
  18. Lorenzo De Stefani. The I/O complexity of hybrid algorithms for square matrix multiplication. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.12804.
  19. Frédéric Desprez and Frédéric Suter. Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms. Concurrency and Computation: practice and experience, 16(8):771-797, 2004. Google Scholar
  20. Craig C Douglas, Michael Heroux, Gordon Slishman, and Roger M Smith. GEMMW: a portable level 3 BLAS Winograd variant of Strassen’s matrix-matrix multiply algorithm. Journal of Computational Physics, 110(1):1-10, 1994. Google Scholar
  21. Dmitrii Yur'evich Grigor'ev. Application of separability and independence notions for proving lower bounds of circuit complexity. Zapiski Nauchnykh Seminarov POMI, 60:38-48, 1976. Google Scholar
  22. John E Hopcroft and Leslie R Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):30-36, 1971. Google Scholar
  23. Steven Huss-Lederman, Elaine M Jacobson, Jeremy R Johnson, Anna Tsao, and Thomas Turnbull. Implementation of Strassen’s algorithm for matrix multiplication. In Supercomputing'96: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, pages 32-32. IEEE, 1996. Google Scholar
  24. Dror Irony, Sivan Toledo, and Alexander Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9):1017-1026, 2004. Google Scholar
  25. Hong Jia-Wei and Hsiang-Tsung Kung. I/O complexity: The red-blue pebble game. In Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 326-333. ACM, 1981. Google Scholar
  26. S Lennart Johnsson. Minimizing the communication time for matrix multiplication on multiprocessors. Parallel Computing, 19(11):1235-1257, 1993. Google Scholar
  27. Richard R. Koch, F. T. Leighton, Bruce M. Maggs, Satish B. Rao, Arnold L. Rosenberg, and Eric J. Schwabe. Work-preserving Emulations of Fixed-connection Networks. J. ACM, 44(1):104-147, January 1997. URL: https://doi.org/10.1145/256292.256299.
  28. François Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation, pages 296-303. ACM, 2014. Google Scholar
  29. Lynn H Loomis and Hassler Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55(10):961-962, 1949. Google Scholar
  30. Roy Nissim and Oded Schwartz. Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations. In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium, pages 714-716, 2019. Google Scholar
  31. J. E. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1997. Google Scholar
  32. John E Savage. Extending the Hong-Kung model to memory hierarchies. In International Computing and Combinatorics Conference, pages 270-281. Springer, 1995. Google Scholar
  33. Jacob Scott, Olga Holtz, and Oded Schwartz. Matrix multiplication I/O-complexity by path routing. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, pages 35-45. ACM, 2015. Google Scholar
  34. Jacob N Scott. An I/O-Complexity Lower Bound for All Recursive Matrix Multiplication Algorithms by Path-Routing. PhD thesis, UC Berkeley, 2015. Google Scholar
  35. Edgar Solomonik and James Demmel. Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms. In European Conference on Parallel Processing, pages 90-109. Springer, 2011. Google Scholar
  36. Volker Streets. Gaussian elimination is not optimal. numerical mathematics, 13(4):354-356, 1969. Google Scholar
  37. Y. D. Burago V. A. Zalgaller, A. B. Sossinsky. Geometric Inequalities. The American Mathematical Monthly, 96(6):544-546, 1989. Google Scholar
  38. Shmuel Winograd. On multiplication of 2× 2 matrices. Linear algebra and its applications, 4(4):381-388, 1971. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail