Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling

Authors Deeparnab Chakrabarty, Sanjeev Khanna



PDF
Thumbnail PDF

File

OASIcs.SOSA.2018.4.pdf
  • Filesize: 0.53 MB
  • 11 pages

Document Identifiers

Author Details

Deeparnab Chakrabarty
Sanjeev Khanna

Cite As Get BibTex

Deeparnab Chakrabarty and Sanjeev Khanna. Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 4:1-4:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/OASIcs.SOSA.2018.4

Abstract

Given a non-negative real matrix A, the matrix scaling problem is to determine if it is possible to scale the rows and columns so that each row and each column sums to a specified target value for it. 

The matrix scaling problem arises in many algorithmic applications, perhaps most notably as a preconditioning step in solving linear system of equations. One of the most natural and by now classical approach to matrix scaling is the Sinkhorn-Knopp algorithm (also known as the RAS method) where one alternately scales either all rows or all columns to meet the target values. In addition to being extremely simple and natural, another appeal of this procedure is that it easily lends itself to parallelization. A central question is to understand the rate of convergence of the Sinkhorn-Knopp algorithm. 

Specifically, given a suitable error metric to measure deviations from target values, and an error bound epsilon, how quickly does the Sinkhorn-Knopp algorithm converge to an error below epsilon? While there are several non-trivial convergence results known about the Sinkhorn-Knopp algorithm, perhaps somewhat surprisingly, even for natural error metrics such as ell_1-error or ell_2-error, this is not entirely understood. 

In this paper, we present an elementary convergence analysis for the Sinkhorn-Knopp algorithm that improves upon the previous best bound. In a nutshell, our approach is to show (i) a simple bound on the number of iterations needed so that the KL-divergence between the current row-sums and the target row-sums drops below a specified threshold delta, and (ii) then show that for a suitable choice of delta, whenever KL-divergence is below delta, then the ell_1-error or the ell_2-error is below epsilon. The well-known Pinsker's inequality immediately allows us to translate a bound on the KL divergence to a bound on ell_1-error. To bound the ell_2-error in terms of the KL-divergence, we establish a new inequality, referred to as (KL vs ell_1/ell_2) inequality in the paper. This new inequality is a strengthening of the Pinsker's inequality that we believe is of independent interest. Our analysis of ell_2-error significantly improves upon the best previous convergence bound for ell_2-error. 

The idea of studying Sinkhorn-Knopp convergence via KL-divergence is not new and has indeed been previously explored. Our contribution is an elementary, self-contained presentation of this approach and an interesting new inequality that yields a significantly stronger convergence guarantee for the extensively studied ell_2-error.

Subject Classification

Keywords
  • Matrix Scaling
  • Entropy Minimization
  • KL Divergence Inequalities

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Scott Aaronson. Quantum computing and hidden variables. Phys. Rev. A, 71:032325, Mar 2005. URL: http://dx.doi.org/10.1103/PhysRevA.71.032325.
  2. Zeyuan Allen Zhu, Yuanzhi Li, Rafael Oliveira, and Avi Wigderson. Much faster algorithms for matrix scaling. 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS, 2017. URL: http://arxiv.org/abs/1704.02315.
  3. Zeyuan Allen Zhu and Lorenzo Orecchia. Using optimization to break the epsilon barrier: A faster and simpler width-independent algorithm for solving positive linear programs in parallel. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, San Diego, CA, USA, pages 1439-1456, 2015. Google Scholar
  4. Michael Bacharach. Estimating nonnegative matrices from marginal data. International Economic Review, 6(3):294-310, 1965. URL: http://www.jstor.org/stable/2525582.
  5. H. Balakrishnan, Inseok Hwang, and C. J. Tomlin. Polynomial approximation algorithms for belief matrix maintenance in identity management. In 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601), volume 5, pages 4874-4879 Vol.5, Dec 2004. URL: http://dx.doi.org/10.1109/CDC.2004.1429569.
  6. R.B. Bapat and T.E.S. Raghavan. An extension of a theorem of Darroch and Ratcliff in loglinear models and its application to scaling multidimensional matrices. Linear Algebra and its Applications, 114:705-715, 1989. Special Issue Dedicated to Alan J. Hoffman. URL: http://dx.doi.org/10.1016/0024-3795(89)90489-8.
  7. L.M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200-217, 1967. URL: http://dx.doi.org/10.1016/0041-5553(67)90040-7.
  8. Richard A Brualdi, Seymour V Parter, and Hans Schneider. The diagonal equivalence of a nonnegative matrix to a stochastic matrix. Journal of Mathematical Analysis and Applications, 16(1):31-50, 1966. URL: http://dx.doi.org/10.1016/0022-247X(66)90184-3.
  9. Michael B. Cohen, Aleksander Madry, Dimitris Tsipras, and Adrian Vladu. Matrix scaling and balancing via box constrained newton’s method and interior point methods. 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS, 2017. URL: http://arxiv.org/abs/1704.02310.
  10. Imre Csiszar. I-divergence geometry of probability distributions and minimization problems. Ann. Probab., 3(1):146-158, 02 1975. URL: http://dx.doi.org/10.1214/aop/1176996454.
  11. Imre Csiszar. A geometric interpretation of darroch and ratcliff’s generalized iterative scaling. Ann. Statist., 17(3):1409-1413, 09 1989. URL: http://dx.doi.org/10.1214/aos/1176347279.
  12. W. Edwards Deming and Frederick F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist., 11(4):427-444, 12 1940. URL: http://dx.doi.org/10.1214/aoms/1177731829.
  13. Joel Franklin and Jens Lorenz. On the scaling of multidimensional matrices. Linear Algebra and its Applications, 114:717-735, 1989. Special Issue Dedicated to Alan J. Hoffman. URL: http://dx.doi.org/10.1016/0024-3795(89)90490-4.
  14. Leonid Gurvits and Peter N. Yianilos. The deflation-inflation method for certain semidefinite programming and maximum determinant completion problems. Technical report, NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, 1998. Google Scholar
  15. Martin Idel. A review of matrix scaling and Sinkhorn’s normal form for matrices and positive maps. ArXiv e-prints, 2016. URL: http://arxiv.org/abs/1609.06349.
  16. Bahman Kalantari and Leonid Khachiyan. On the rate of convergence of deterministic and randomized RAS matrix scaling algorithms. Oper. Res. Lett., 14(5):237-244, 1993. URL: http://dx.doi.org/10.1016/0167-6377(93)90087-W.
  17. Bahman Kalantari and Leonid Khachiyan. On the complexity of nonnegative-matrix scaling. Linear Algebra and its Applications, 240:87-103, 1996. URL: http://dx.doi.org/10.1016/0024-3795(94)00188-X.
  18. Bahman Kalantari, Isabella Lari, Federica Ricca, and Bruno Simeone. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm. Technical Report, n.24, Department of Statistics and Applied probability, La Sapienza University, Rome, 2002. Google Scholar
  19. Bahman Kalantari, Isabella Lari, Federica Ricca, and Bruno Simeone. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm. Math. Program., 112(2):371-401, 2008. URL: http://dx.doi.org/10.1007/s10107-006-0021-4.
  20. Nathan Linial, Alex Samorodnitsky, and Avi Wigderson. A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents. Combinatorica, 20(4):545-568, 2000. URL: http://dx.doi.org/10.1007/s004930070007.
  21. Michael Luby and Noam Nisan. A parallel approximation algorithm for positive linear programming. In S. Rao Kosaraju, David S. Johnson, and Alok Aggarwal, editors, Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA, pages 448-457. ACM, 1993. URL: http://dx.doi.org/10.1145/167088.167211.
  22. Sally M Macgill. Theoretical properties of biproportional matrix adjustments. Environment and Planning A, 9(6):687-701, 1977. URL: http://dx.doi.org/10.1068/a090687.
  23. Michael W. Mahoney, Satish Rao, Di Wang, and Peng Zhang. Approximating the Solution to Mixed Packing and Covering LPs in Parallel O(ε^-3) Time. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, pages 52:1-52:14, 2016. Google Scholar
  24. MV Menon. Reduction of a matrix with positive elements to a doubly stochastic matrix. Proceedings of the American Mathematical Society, 18(2):244-247, 1967. Google Scholar
  25. Arkadi Nemirovskii and Uriel Rothblum. On complexity of matrix scaling. Linear Algebra and its Applications., 302:435-460, 1999. Google Scholar
  26. Juan de Dios Ortúzar and Luis G. Willumsen. Modelling Transport. John Wiley &Sons, Ltd, 2011. URL: http://dx.doi.org/10.1002/9781119993308.fmatter.
  27. T.E.S. Raghavan. On pairs of multidimensional matrices. Linear Algebra and its Applications, 62:263-268, 1984. URL: http://dx.doi.org/10.1016/0024-3795(84)90101-0.
  28. Günter Rote and Martin Zachariasen. Matrix scaling by network flow. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pages 848-854, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1283383.1283474.
  29. Uriel Rothblum and Hans Schneider. Scalings of matrices which have prespecified row sums and column sums via optimization. Linear Algebra and its Applications, 114:737-764, 1989. Special Issue Dedicated to Alan J. Hoffman. Google Scholar
  30. Ludger Ruschendorf. Convergence of the iterative proportional fitting procedure. Ann. Statist., 23(4):1160-1174, 1995. URL: http://dx.doi.org/10.1214/aos/1176324703.
  31. Igal Sason and Sergio Verdú. Upper bounds on the relative entropy and rényi divergence as a function of total variation distance for finite alphabets. In 2015 IEEE Information Theory Workshop - Fall (ITW), Jeju Island, South Korea, October 11-15, 2015, pages 214-218. IEEE, 2015. URL: http://dx.doi.org/10.1109/ITWF.2015.7360766.
  32. Erwin Schrödinger. Über die umkehrung der naturgesetze. Preuss. Akad. Wiss., Phys.-Math. Kl., 1931. Google Scholar
  33. Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402-405, 1967. URL: http://www.jstor.org/stable/2314570.
  34. Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific J. Math., 21(2):343-348, 1967. URL: https://projecteuclid.org:443/euclid.pjm/1102992505.
  35. George W. Soules. The rate of convergence of sinkhorn balancing. Linear Algebra and its Applications, 150:3-40, 1991. URL: http://dx.doi.org/10.1016/0024-3795(91)90157-R.
  36. Neal E. Young. Sequential and parallel algorithms for mixed packing and covering. In 42nd Annual Symposium on Foundations of Computer Science, FOCS, Las Vegas, Nevada, USA, pages 538-546, 2001. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail