License
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2017.19
URN: urn:nbn:de:0030-drops-74548
URL: http://drops.dagstuhl.de/opus/volltexte/2017/7454/
Go to the corresponding LIPIcs Volume Portal


Jayaram, Rajesh ; Saha, Barna

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

pdf-format:
LIPIcs-ICALP-2017-19.pdf (0.7 MB)


Abstract

In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!

BibTeX - Entry

@InProceedings{jayaram_et_al:LIPIcs:2017:7454,
  author =	{Rajesh Jayaram and Barna Saha},
  title =	{{Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{19:1--19:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Ioannis Chatzigiannakis and Piotr Indyk and Fabian Kuhn and Anca Muscholl},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7454},
  URN =		{urn:nbn:de:0030-drops-74548},
  doi =		{10.4230/LIPIcs.ICALP.2017.19},
  annote =	{Keywords:  Approximation, Edit Distance, Dynamic Programming,  Context Free Grammar, Hardness}
}

Keywords: Approximation, Edit Distance, Dynamic Programming, Context Free Grammar, Hardness
Seminar: 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Issue Date: 2017
Date of publication: 06.07.2017


DROPS-Home | Fulltext Search | Imprint Published by LZI