Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Jayaram, Rajesh; Saha, Barna

doi:10.4230/LIPIcs.ICALP.2017.19

File

Cite AsGet BibTex

Rajesh Jayaram and Barna Saha. Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 19:1-19:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICALP.2017.19

Abstract

In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. If the current clique algorithms are optimal, so is valiant’s parser. In FOCS 2015, 2015.
Alfred V. Aho and John E. Hopcroft. The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1974.
Alfred V. Aho and Thomas G. Peterson. A minimum distance error-correcting parser for context-free languages. SIAM J. Comput., 1(4), 1972.
Rolf Backofen, Dekel Tsur, Shay Zakov, and Michal Ziv-Ukelson. Sparse RNA Folding: Time and Space Efficient Algorithms. In Annual Symposium on Combinatorial Pattern Matching, pages 249-262. Springer, 2009.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In STOC 2015, 2015.
Arturs Backurs and Krzysztof Onak. Fast algorithms for parsing sequences of parentheses with few errors. In PODS, 2016.
Ulrike Brandt and Ghislain Delepine. Weight-reducing grammars and ultralinear languages. RAIRO-Theoretical Informatics and Applications, 38(1):19-25, 2004.
Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia V. Williams. Truly sub-cubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. In FOCS 2016, 2016.
Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In FOCS 2015, 2015.
J. A. Brzozowski. Regular-like expressions for some irregular languages. In IEEE Annual Symposium on Switching and Automata Theory, 1968.
Noam Chomsky. On certain formal properties of grammars. Information and control, 2(2):137-167, 1959.
Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13, 1970.
Sheila A. Greibach. The unsolvability of the recognition of linear context-free languages. Journal of the ACM (JACM), 13(4):582-587, 1966.
Steven Grijzenhout and Maarten Marx. The quality of the XML web. Web Semant., 19, 2013.
J. J. Gutell, R. R.and Cannone, Z. Shang, Y. Du, and M. J. Serra. A story: unpaired adenosine bases in ribosomal RNAs. Journal of Mol Biology, 2010.
John E. Hopcroft and Jeffrey D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., 1969.
O. H. Ibarra and T. Jiang. On one-way cellular arrays,. SIAM J. Comput., 16, 1987.
Russell Impagliazzo and Ramamohan Paturi. Complexity of k-SAT. In CCC 1999, pages 237-240, 1999.
Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? In FOCS 1998, pages 653-662, 1998.
Mark Johnson. PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names. In ACL 2010, pages 1148-1157, 2010.
Ik-Soon Kim and Kwang-Moo Choe. Error repair with validation in LR-based parsing. ACM Trans. Program. Lang. Syst., 23(4), July 2001.
Donald E Knuth. Semantics of context-free languages. Mathematical systems theory, 2(2):127-145, 1968.
Flip Korn, Barna Saha, Divesh Srivastava, and Shanshan Ying. On repairing structural problems in semi-structured data. In VLDB 2013, 2013.
Martin Kutriba and Andreas Malcher. Finite turns and the regular closure of linear context-free languages. Discrete Applied Mathematics, 155(5), October 2007.
Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM, 49, 2002.
Andreas Malcher and Giovanni Pighizzini. Descriptional complexity of bounded context-free languages. Information and Computation, 227, June 2013.
Christopher D. Manning. Foundations of statistical natural language processing, volume 999. MIT Press, 1999.
Darnell Moore and Irfan Essa. Recognizing multitasked activities from video using stochastic context-free grammar. In NCAI 2002, pages 770-776, 2002.
E. Moriya and T. Tada. On the space complexity of turn bounded pushdown automata. Internat. J. Comput, 80:295–-304., 2003.
Gene Myers. Approximately matching context-free languages. Information Processing Letters, 54, 1995.
Geoffrey K. Pullum and Gerald Gazdar. Natural languages and context-free languages. Linguistics and Philosophy, 4(4), 1982.
Sanguthevar Rajasekaran and Marius Nicolae. An error correcting parser for context free grammars that takes less than cubic time. Manuscript, 2014.
Andrea Rosani, Nicola Conci, and Francesco G. De Natale. Human behavior recognition using a context-free grammar. Journal of Electronic Imaging, 23(3), 2014.
Barna Saha. The Dyck language edit distance problem in near-linear time. In FOCS 2014, pages 611-620, 2014.
Barna Saha. Language edit distance and maximum likelihood parsing of stochastic grammars: Faster algorithms and connection to fundamental graph problems. In FOCS 2015, pages 118-135, 2015.
Jose M Sempere and Pedro Garcia. A characterization of even linear languages and its application to the learning problem. In International Colloquium on Grammatical Inference, pages 38-44. Springer, 1994.
Jeffrey Ullman and John Hopcroft. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2001.
Leslie G. Valiant. General context-free recognition in less than cubic time. Journal of computer and system sciences, 10(2), 1975.
Balaji Venkatachalam, Dan Gusfield, and Yelena Frid. Faster Algorithms for RNA-Folding Using the four-Russians Method. In WABI 2013, 2013.
Robert A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974.
Ye-Yi Wang, Milind Mahajan, and Xuedong Huang. A unified context-free grammar and n-gram model for spoken language processing. In ICASP 2000, pages 1639-1642, 2000.
Glynn Winskel. The formal semantics of programming languages: an introduction, 1993.
D. A. Workman. Turn-bounded grammars and their relation to ultralinear languages. Inform. and Control, 32:188-200, 1976.
Shay Zakov, Dekel Tsur, and Michal Ziv-Ukelson. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. In WABI 2010, 2010.

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Authors Rajesh Jayaram, Barna Saha

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message