The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication

De Stefani, Lorenzo

doi:10.4230/LIPIcs.ISAAC.2019.33

File

Cite AsGet BibTex

Lorenzo De Stefani. The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 33:1-33:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ISAAC.2019.33

Abstract

Asymptotically tight lower bounds are derived for the I/O complexity of a general class of hybrid algorithms computing the product of n x n square matrices combining "Strassen-like" fast matrix multiplication approach with computational complexity Theta(n^{log_2 7}), and "standard" matrix multiplication algorithms with computational complexity Omega (n^3). We present a novel and tight Omega ((n/max{sqrt M, n_0})^{log_2 7}(max{1,(n_0)/M})^3M) lower bound for the I/O complexity of a class of "uniform, non-stationary" hybrid algorithms when executed in a two-level storage hierarchy with M words of fast memory, where n_0 denotes the threshold size of sub-problems which are computed using standard algorithms with algebraic complexity Omega (n^3). The lower bound is actually derived for the more general class of "non-uniform, non-stationary" hybrid algorithms which allow recursive calls to have a different structure, even when they refer to the multiplication of matrices of the same size and in the same recursive level, although the quantitative expressions become more involved. Our results are the first I/O lower bounds for these classes of hybrid algorithms. All presented lower bounds apply even if the recomputation of partial results is allowed and are asymptotically tight. The proof technique combines the analysis of the Grigoriev’s flow of the matrix multiplication function, combinatorial properties of the encoding functions used by fast Strassen-like algorithms, and an application of the Loomis-Whitney geometric theorem for the analysis of standard matrix multiplication algorithms. Extensions of the lower bounds for a parallel model with P processors are also discussed.

Subject Classification

ACM Subject Classification

Theory of computation → Design and analysis of algorithms

Keywords

I/O complexity
Hybrid Algorithm
Matrix Multiplication
Recomputation

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alok Aggarwal, Bowen Alpern, Ashok Chandra, and Marc Snir. A model for hierarchical memory. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 305-314. ACM, 1987.
Alok Aggarwal, Jeffrey Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, 1988.
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In Design and Analysis of Algorithms, pages 13-36. Springer, 2012.
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 77-79. ACM, 2012.
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. Communication-optimal parallel algorithm for Strassen’s matrix multiplication. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 193-204. ACM, 2012.
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing, 32(6):3495-3523, 2010.
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications, 32(3):866-901, 2011.
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM (JACM), 59(6):32, 2012.
Sandeep N Bhatt, Gianfranco Bilardi, and Geppino Pucci. Area-time tradeoffs for universal VLSI circuits. Theoretical Computer Science, 408(2-3):143-150, 2008.
Gianfranco Bilardi and Lorenzo De Stefani. The I/O complexity of Strassen’s matrix multiplication with recomputation. In Workshop on Algorithms and Data Structures, pages 181-192. Springer, 2017.
Gianfranco Bilardi and Lorenzo De Stefani. The I/O complexity of Toom-Cook integer multiplication. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2034-2052. SIAM, 2019.
Gianfranco Bilardi and Enoch Peserico. A characterization of temporal locality and its portability across memory hierarchies. In International Colloquium on Automata, Languages, and Programming, pages 128-139. Springer, 2001.
Gianfranco Bilardi and Franco P Preparata. Horizons of parallel computation. Journal of Parallel and Distributed Computing, 27(2):172-182, 1995.
Gianfranco Bilardi and Franco P Preparata. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds. Theory of Computing Systems, 32(5):531-559, 1999.
Lynn Elliot Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University-Bozeman, College of Engineering, 1969.
National Research Council et al. Getting up to speed: The future of supercomputing. National Academies Press, 2005.
Lorenzo De Stefani. On space constrained computations. PhD thesis, University of Padova, 2016.
Lorenzo De Stefani. The I/O complexity of hybrid algorithms for square matrix multiplication. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.12804.
Frédéric Desprez and Frédéric Suter. Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms. Concurrency and Computation: practice and experience, 16(8):771-797, 2004.
Craig C Douglas, Michael Heroux, Gordon Slishman, and Roger M Smith. GEMMW: a portable level 3 BLAS Winograd variant of Strassen’s matrix-matrix multiply algorithm. Journal of Computational Physics, 110(1):1-10, 1994.
Dmitrii Yur'evich Grigor'ev. Application of separability and independence notions for proving lower bounds of circuit complexity. Zapiski Nauchnykh Seminarov POMI, 60:38-48, 1976.
John E Hopcroft and Leslie R Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):30-36, 1971.
Steven Huss-Lederman, Elaine M Jacobson, Jeremy R Johnson, Anna Tsao, and Thomas Turnbull. Implementation of Strassen’s algorithm for matrix multiplication. In Supercomputing'96: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, pages 32-32. IEEE, 1996.
Dror Irony, Sivan Toledo, and Alexander Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9):1017-1026, 2004.
Hong Jia-Wei and Hsiang-Tsung Kung. I/O complexity: The red-blue pebble game. In Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 326-333. ACM, 1981.
S Lennart Johnsson. Minimizing the communication time for matrix multiplication on multiprocessors. Parallel Computing, 19(11):1235-1257, 1993.
Richard R. Koch, F. T. Leighton, Bruce M. Maggs, Satish B. Rao, Arnold L. Rosenberg, and Eric J. Schwabe. Work-preserving Emulations of Fixed-connection Networks. J. ACM, 44(1):104-147, January 1997. URL: https://doi.org/10.1145/256292.256299.
François Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation, pages 296-303. ACM, 2014.
Lynn H Loomis and Hassler Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55(10):961-962, 1949.
Roy Nissim and Oded Schwartz. Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations. In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium, pages 714-716, 2019.
J. E. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1997.
John E Savage. Extending the Hong-Kung model to memory hierarchies. In International Computing and Combinatorics Conference, pages 270-281. Springer, 1995.
Jacob Scott, Olga Holtz, and Oded Schwartz. Matrix multiplication I/O-complexity by path routing. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, pages 35-45. ACM, 2015.
Jacob N Scott. An I/O-Complexity Lower Bound for All Recursive Matrix Multiplication Algorithms by Path-Routing. PhD thesis, UC Berkeley, 2015.
Edgar Solomonik and James Demmel. Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms. In European Conference on Parallel Processing, pages 90-109. Springer, 2011.
Volker Streets. Gaussian elimination is not optimal. numerical mathematics, 13(4):354-356, 1969.
Y. D. Burago V. A. Zalgaller, A. B. Sossinsky. Geometric Inequalities. The American Mathematical Monthly, 96(6):544-546, 1989.
Shmuel Winograd. On multiplication of 2× 2 matrices. Linear algebra and its applications, 4(4):381-388, 1971.

The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication

Author Lorenzo De Stefani

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication

Author Lorenzo De Stefani

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References