Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Authors Rajesh Jayaram, Barna Saha



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2017.19.pdf
  • Filesize: 0.68 MB
  • 15 pages

Document Identifiers

Author Details

Rajesh Jayaram
Barna Saha

Cite As Get BibTex

Rajesh Jayaram and Barna Saha. Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 19:1-19:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ICALP.2017.19

Abstract

In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. 

To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time  O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!

Subject Classification

Keywords
  • Approximation
  • Edit Distance
  • Dynamic Programming
  • Context Free Grammar
  • Hardness

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. If the current clique algorithms are optimal, so is valiant’s parser. In FOCS 2015, 2015. Google Scholar
  2. Alfred V. Aho and John E. Hopcroft. The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1974. Google Scholar
  3. Alfred V. Aho and Thomas G. Peterson. A minimum distance error-correcting parser for context-free languages. SIAM J. Comput., 1(4), 1972. Google Scholar
  4. Rolf Backofen, Dekel Tsur, Shay Zakov, and Michal Ziv-Ukelson. Sparse RNA Folding: Time and Space Efficient Algorithms. In Annual Symposium on Combinatorial Pattern Matching, pages 249-262. Springer, 2009. Google Scholar
  5. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In STOC 2015, 2015. Google Scholar
  6. Arturs Backurs and Krzysztof Onak. Fast algorithms for parsing sequences of parentheses with few errors. In PODS, 2016. Google Scholar
  7. Ulrike Brandt and Ghislain Delepine. Weight-reducing grammars and ultralinear languages. RAIRO-Theoretical Informatics and Applications, 38(1):19-25, 2004. Google Scholar
  8. Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia V. Williams. Truly sub-cubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. In FOCS 2016, 2016. Google Scholar
  9. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In FOCS 2015, 2015. Google Scholar
  10. J. A. Brzozowski. Regular-like expressions for some irregular languages. In IEEE Annual Symposium on Switching and Automata Theory, 1968. Google Scholar
  11. Noam Chomsky. On certain formal properties of grammars. Information and control, 2(2):137-167, 1959. Google Scholar
  12. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13, 1970. Google Scholar
  13. Sheila A. Greibach. The unsolvability of the recognition of linear context-free languages. Journal of the ACM (JACM), 13(4):582-587, 1966. Google Scholar
  14. Steven Grijzenhout and Maarten Marx. The quality of the XML web. Web Semant., 19, 2013. Google Scholar
  15. J. J. Gutell, R. R.and Cannone, Z. Shang, Y. Du, and M. J. Serra. A story: unpaired adenosine bases in ribosomal RNAs. Journal of Mol Biology, 2010. Google Scholar
  16. John E. Hopcroft and Jeffrey D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., 1969. Google Scholar
  17. O. H. Ibarra and T. Jiang. On one-way cellular arrays,. SIAM J. Comput., 16, 1987. Google Scholar
  18. Russell Impagliazzo and Ramamohan Paturi. Complexity of k-SAT. In CCC 1999, pages 237-240, 1999. Google Scholar
  19. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? In FOCS 1998, pages 653-662, 1998. Google Scholar
  20. Mark Johnson. PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names. In ACL 2010, pages 1148-1157, 2010. Google Scholar
  21. Ik-Soon Kim and Kwang-Moo Choe. Error repair with validation in LR-based parsing. ACM Trans. Program. Lang. Syst., 23(4), July 2001. Google Scholar
  22. Donald E Knuth. Semantics of context-free languages. Mathematical systems theory, 2(2):127-145, 1968. Google Scholar
  23. Flip Korn, Barna Saha, Divesh Srivastava, and Shanshan Ying. On repairing structural problems in semi-structured data. In VLDB 2013, 2013. Google Scholar
  24. Martin Kutriba and Andreas Malcher. Finite turns and the regular closure of linear context-free languages. Discrete Applied Mathematics, 155(5), October 2007. Google Scholar
  25. Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM, 49, 2002. Google Scholar
  26. Andreas Malcher and Giovanni Pighizzini. Descriptional complexity of bounded context-free languages. Information and Computation, 227, June 2013. Google Scholar
  27. Christopher D. Manning. Foundations of statistical natural language processing, volume 999. MIT Press, 1999. Google Scholar
  28. Darnell Moore and Irfan Essa. Recognizing multitasked activities from video using stochastic context-free grammar. In NCAI 2002, pages 770-776, 2002. Google Scholar
  29. E. Moriya and T. Tada. On the space complexity of turn bounded pushdown automata. Internat. J. Comput, 80:295–-304., 2003. Google Scholar
  30. Gene Myers. Approximately matching context-free languages. Information Processing Letters, 54, 1995. Google Scholar
  31. Geoffrey K. Pullum and Gerald Gazdar. Natural languages and context-free languages. Linguistics and Philosophy, 4(4), 1982. Google Scholar
  32. Sanguthevar Rajasekaran and Marius Nicolae. An error correcting parser for context free grammars that takes less than cubic time. Manuscript, 2014. Google Scholar
  33. Andrea Rosani, Nicola Conci, and Francesco G. De Natale. Human behavior recognition using a context-free grammar. Journal of Electronic Imaging, 23(3), 2014. Google Scholar
  34. Barna Saha. The Dyck language edit distance problem in near-linear time. In FOCS 2014, pages 611-620, 2014. Google Scholar
  35. Barna Saha. Language edit distance and maximum likelihood parsing of stochastic grammars: Faster algorithms and connection to fundamental graph problems. In FOCS 2015, pages 118-135, 2015. Google Scholar
  36. Jose M Sempere and Pedro Garcia. A characterization of even linear languages and its application to the learning problem. In International Colloquium on Grammatical Inference, pages 38-44. Springer, 1994. Google Scholar
  37. Jeffrey Ullman and John Hopcroft. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2001. Google Scholar
  38. Leslie G. Valiant. General context-free recognition in less than cubic time. Journal of computer and system sciences, 10(2), 1975. Google Scholar
  39. Balaji Venkatachalam, Dan Gusfield, and Yelena Frid. Faster Algorithms for RNA-Folding Using the four-Russians Method. In WABI 2013, 2013. Google Scholar
  40. Robert A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974. Google Scholar
  41. Ye-Yi Wang, Milind Mahajan, and Xuedong Huang. A unified context-free grammar and n-gram model for spoken language processing. In ICASP 2000, pages 1639-1642, 2000. Google Scholar
  42. Glynn Winskel. The formal semantics of programming languages: an introduction, 1993. Google Scholar
  43. D. A. Workman. Turn-bounded grammars and their relation to ultralinear languages. Inform. and Control, 32:188-200, 1976. Google Scholar
  44. Shay Zakov, Dekel Tsur, and Michal Ziv-Ukelson. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. In WABI 2010, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail