Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

Authors Borja Balle, Clara Lacroce , Prakash Panangaden , Doina Precup, Guillaume Rabusseau

Thumbnail PDF


  • Filesize: 0.87 MB
  • 20 pages

Document Identifiers

Author Details

Borja Balle
  • DeepMind, London, UK
Clara Lacroce
  • School of Computer Science, McGill University, Montréal, Canada
  • Mila, Montréal, Canada
Prakash Panangaden
  • School of Computer Science, McGill University, Montréal, Canada
  • Mila, Montréal, Canada
Doina Precup
  • School of Computer Science, McGill University, Montréal, Canada
  • Mila, Montréal, Canada
Guillaume Rabusseau
  • DIRO, Université de Montréal, Montréal, Canada
  • CIFAR AI Chair, Mila, Montréal, Canada


The authors would like to thank Tianyu Li, Harsh Satija and Alessandro Sordoni for feedback on earlier drafts of this work, Gheorghe Comanici for a detailed review, and Maxime Wabartha for fruitful discussions and comments on proofs.

Cite AsGet BibTex

Borja Balle, Clara Lacroce, Prakash Panangaden, Doina Precup, and Guillaume Rabusseau. Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 118:1-118:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


We address the approximate minimization problem for weighted finite automata (WFAs) with weights in ℝ, over a one-letter alphabet: to compute the best possible approximation of a WFA given a bound on the number of states. This work is grounded in Adamyan-Arov-Krein approximation theory, a remarkable collection of results on the approximation of Hankel operators. In addition to its intrinsic mathematical relevance, this theory has proven to be very effective for model reduction. We adapt these results to the framework of weighted automata over a one-letter alphabet. We provide theoretical guarantees and bounds on the quality of the approximation in the spectral and 𝓁² norm. We develop an algorithm that, based on the properties of Hankel operators, returns the optimal approximation in the spectral norm.

Subject Classification

ACM Subject Classification
  • Theory of computation → Quantitative automata
  • Theory of computation → Probabilistic computation
  • Theory of computation → Markov decision processes
  • Weighted finite automata
  • approximate minimization
  • Hankel matrices
  • AAK Theory


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Vadim M. Adamyan, Damir Zyamovich Arov, and Mark Grigorievich Krein. Analytic Properties of Schmidt Pairs for a Hankel Operator and the Generalized Schur-Takagi problem. Mathematics of The Ussr-sbornik, 15:31-73, 1971. Google Scholar
  2. M.M Al-Hussari, I.M. Jaimoukha, and D.J.N. Limebeer. A Descriptor Approach for the Solution of the One-Block Distance Problem. In In Proceedings of the IFAC World Congress, 1993. Google Scholar
  3. Athanasios C. Antoulas. Approximation of Large-Scale Dynamical Systems. SIAM, 2005. Google Scholar
  4. Stéphane Ayache, Rémi Eyraud, and Noé Goudian. Explaining Black Boxes on Sequential Data Using Weighted Automata. In Proceedings of the 14th International Conference on Grammatical Inference, ICGI 2018, Wrocław, Poland, September 5-7, 2018, volume 93 of Proceedings of Machine Learning Research, pages 81-103. PMLR, 2018. URL: http://proceedings.mlr.press/v93/ayache19a.html.
  5. Raphaël Bailly, François Denis, and Liva Ralaivola. Grammatical Inference as a Principal Component Analysis Problem. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 33-–40, New York, NY, USA, 2009. Association for Computing Machinery. URL: https://doi.org/10.1145/1553374.1553379.
  6. Joseph A. Ball and Andre CM. Ran. Optimal Hankel Norm Model Reductions and Wiener-Hopf Factorization I: The Canonical Case. SIAM Journal on Control and Optimization, 25(2):362-382, 1987. Google Scholar
  7. Borja Balle, Xavier Carreras, Franco M. Luque, and Ariadna Quattoni. Spectral Learning of Weighted Automata - A Forward-Backward Perspective. Mach. Learn., 96(1-2):33-63, 2014. URL: https://doi.org/10.1007/s10994-013-5416-x.
  8. Borja Balle, Pascale Gourdeau, and Prakash Panangaden. Bisimulation Metrics and Norms for Weighted Finite Automata. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 103:1-103:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.103.
  9. Borja Balle, William L. Hamilton, and Joelle Pineau. Methods of moments for learning stochastic languages: Unified presentation and empirical comparison. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, volume 32 of JMLR Workshop and Conference Proceedings, pages 1386-1394. JMLR.org, 2014. URL: http://proceedings.mlr.press/v32/balle14.html.
  10. Borja Balle, Clara Lacroce, Prakash Panangaden, Doina Precup, and Guillaume Rabusseau. Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata. arXiv preprint, 2021. URL: http://arxiv.org/abs/2102.06860.
  11. Borja Balle, Prakash Panangaden, and Doina Precup. A Canonical Form for Weighted Automata and Applications to Approximate Minimization. In 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 2015, pages 701-712. IEEE Computer Society, 2015. URL: https://doi.org/10.1109/LICS.2015.70.
  12. Borja Balle, Prakash Panangaden, and Doina Precup. Singular Value Automata and Approximate Minimization. Math. Struct. Comput. Sci., 29(9):1444-1478, 2019. URL: https://doi.org/10.1017/S0960129519000094.
  13. Borja Balle and Guillaume Rabusseau. Approximate minimization of weighted tree automata. Information and Computation, page 104654, 2020. Google Scholar
  14. R. H. Bartels and G. W. Stewart. Solution of the matrix equation ax + xb = c [f4]. Commun. ACM, 15(9):820–-826, 1972. Google Scholar
  15. Jean Berstel and Christophe Reutenauer. Noncommutative rational series with applications, volume 137. Cambridge University Press, 2011. Google Scholar
  16. J.W. Carlyle and A. Paz. Realizations by stochastic finite automata. Journal of Computer and System Sciences, 5(1):26-40, 1971. Google Scholar
  17. Charles K. Chui and Guanrong Chen. Discrete H^∞ Optimization With Applications in Signal Processing and Control Systems. Springer-Verlag, 1997. Google Scholar
  18. C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1:211-218, 1936. URL: https://doi.org/10.1007/BF02288367.
  19. Rémi Eyraud and Stéphane Ayache. Distillation of Weighted Automata from Recurrent Neural Networks Using a Spectral Approach. CoRR, abs/2009.13101, 2020. URL: http://arxiv.org/abs/2009.13101.
  20. M. Fliess. Matrice de Hankel. Journal de Mathématique Pures et Appliquées, 5:197-222, 1974. Google Scholar
  21. Keith Glover. All optimal Hankel-norm approximations of linear multivariable systems and their ℒ_∞-error bounds. International Journal of Control, 39(6):1115-1193, 1984. URL: https://doi.org/10.1080/00207178408933239.
  22. Guoxiang Gu. All optimal Hankel-norm approximations and their error bounds in discrete-time. International Journal of Control, 78(6):408-423, 2005. URL: https://doi.org/10.1080/00207170500110988.
  23. Daniel Hsu, Sham M. Kakade, and Tong Zhang. A spectral algorithm for learning hidden markov models. J. Comput. Syst. Sci., 78(5):1460–1480, 2012. URL: https://doi.org/10.1016/j.jcss.2011.12.025.
  24. Vlad Ionescu and Cristian Oara. The four-block Adamjan-Arov-Kein problem for discrete-time systems. In Linear Algebra and its Application, pages 95-119. Elsevier, 2001. Google Scholar
  25. L. Kronecker. Zur Theorie der Elimination einer Variablen aus zwei algebraischen Gleichungen. Montasber. Königl. Preussischen Acad Wies, pages 535-600, 1881. Google Scholar
  26. Alex Kulesza, Nan Jiang, and Satinder Singh. Low-Rank Spectral Learning with Weighted Loss Functions. In Artificial Intelligence and Statistics, pages 517-525. PMLR, 2015. Google Scholar
  27. Alex Kulesza, N. Raj Rao, and Satinder Singh. Low-Rank Spectral Learning. In Samuel Kaski and Jukka Corander, editors, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33 of Proceedings of Machine Learning Research, pages 522-530, Reykjavik, Iceland, April 22-25 2014. PMLR. URL: http://proceedings.mlr.press/v33/kulesza14.html.
  28. Aersity M Lyapunov. The General Problem of the Stability of Motion [in Russian]. Gostekhizdat, Moscow, 1950. Google Scholar
  29. Jean Meinguet. A Simplified Presentation of the Adamjan-Arov-Krein Approximation Theory. In H. Werner, L. Wuytack, E. Ng, and H. J. Bünger, editors, Computational Aspects of Complex Analysis, pages 217-248, Dordrecht, 1983. Springer Netherlands. URL: https://doi.org/10.1007/978-94-009-7121-9_9.
  30. Zeev Nehari. On Bounded Bilinear Forms. Annals of Mathematics, 65(1):153-162, 1957. Google Scholar
  31. Nikolai K. Nikol'Skii. Operators, Functions and Systems: An Easy Reading, volume 92 of Mathematical Surveys and Monographs. American Mathematical Society, 2002. Google Scholar
  32. Takamasa Okudono, Masaki Waga, Taro Sekiyama, and Ichiro Hasuo. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 5306-5314. AAAI Press, 2020. URL: https://aaai.org/ojs/index.php/AAAI/article/view/5977.
  33. Alexander Ostrowski and Hans Schneider. Some Theorems on the Inertia of General Matrices. J. Math. Anal. Appl, 4(1):72-84, 1962. Google Scholar
  34. Vladimir Peller. Hankel Operators and their Applications. Springer Science & Business Media, 2012. Google Scholar
  35. Gelu Popescu. Multivariable Nehari Problem and Interpolation. Journal of Functional Analysis, 200:536-581, 2003. URL: https://doi.org/10.1016/S0022-1236(03)00078-8.
  36. Guillaume Rabusseau, Borja Balle, and Joelle Pineau. Multitask Spectral Learning of Weighted Automata. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 2585-2594, 2017. Google Scholar
  37. Guillaume Rabusseau, Tianyu Li, and Doina Precup. Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning. In Kamalika Chaudhuri and Masashi Sugiyama, editors, The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, volume 89 of Proceedings of Machine Learning Research, pages 1630-1639. PMLR, 2019. URL: http://proceedings.mlr.press/v89/rabusseau19a.html.
  38. William E. Roth. The equations ax - yb = c and ax - xb = c in matrices. Proceedings of the American Mathematical Society, 3(3):392-396, 1952. Google Scholar
  39. B.De Schutter. Minimal State-Space Realization in Linear System Theory: an Overview. Journal of Computational and Applied Mathematics, 121(1):331-354, 2000. URL: https://doi.org/10.1016/S0377-0427(00)00341-1.
  40. Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam, 1997. Google Scholar
  41. Gail Weiss, Yoav Goldberg, and Eran Yahav. Learning Deterministic Weighted Automata with Queries and Counterexamples. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8558-8569, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/d3f93e7766e8e1b7ef66dfdd9a8be93b-Abstract.html.
  42. H.K Wimmer. On the Ostrowski-Schneider Inertia Theorem. Journal of Mathematical Analysis and Applications, 41(1):164-169, 1973. URL: https://doi.org/10.1016/0022-247X(73)90190-X.
  43. Kehe Zhu. Operator Theory in Function Spaces, volume 138. American Mathematical Society, 1990. Google Scholar