Jayaram, Rajesh ;
Saha, Barna
Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
Abstract
In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice.
To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of nonlinear productions, k^*, or the number of nested nonlinear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(G(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(G(n^2 + (n^3/gamma^2))), where G is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!
BibTeX  Entry
@InProceedings{jayaram_et_al:LIPIcs:2017:7454,
author = {Rajesh Jayaram and Barna Saha},
title = {{Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!}},
booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
pages = {19:119:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770415},
ISSN = {18688969},
year = {2017},
volume = {80},
editor = {Ioannis Chatzigiannakis and Piotr Indyk and Fabian Kuhn and Anca Muscholl},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2017/7454},
URN = {urn:nbn:de:0030drops74548},
doi = {10.4230/LIPIcs.ICALP.2017.19},
annote = {Keywords: Approximation, Edit Distance, Dynamic Programming, Context Free Grammar, Hardness}
}
07.07.2017
Keywords: 

Approximation, Edit Distance, Dynamic Programming, Context Free Grammar, Hardness 
Seminar: 

44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Issue date: 

2017 
Date of publication: 

07.07.2017 