The Complexity of Problems in P Given Correlated Instances

eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2017-11-28 13:1 13:19 10.4230/LIPIcs.ITCS.2017.13 article The Complexity of Problems in P Given Correlated Instances Goldwasser, Shafi Holden, Dhiraj Instances of computational problems do not exist in isolation. Rather, multiple and correlated instances of the same problem arise naturally in the real world. The challenge is how to gain computationally from correlations when they can be found. [DGH, ITCS 2015] showed that significant computational gains can be made by having access to auxiliary instances which are correlated to the primary problem instance via the solution space. They demonstrate this for constraint satisfaction problems, which are NP-hard in the general worst case form. Here, we set out to study the impact of having access to correlated instances on the complexity of polynomial time problems. Namely, for a problem P that is conjectured to require time n^c for c>0, we ask whether access to a few instances of P that are correlated in some natural way can be used to solve P on one of them (the designated "primary instance") faster than the conjectured lower bound of n^c. We focus our attention on a number of problems: the Longest Common Subsequence (LCS), the minimum Edit Distance between sequences, and Dynamic Time Warping Distance (DTWD) of curves, for all of which the best known algorithms achieve O(n^2/polylog(n)) runtime via dynamic programming. These problems form an interesting case in point to study, as it has been shown that a O(n^(2 - epsilon)) time algorithm for a worst-case instance would imply improved algorithms for a host of other problems as well as disprove complexity hypotheses such as the Strong Exponential Time Hypothesis. We show how to use access to a logarithmic number of auxiliary correlated instances, to design novel o(n^2) time algorithms for LCS, EDIT, DTWD, and more generally improved algorithms for computing any tuple-based similarity measure - a generalization which we define within on strings. For the multiple sequence alignment problem on k strings, this yields an O(nk\log n) algorithm contrasting with classical O(n^k) dynamic programming. Our results hold for several correlation models between the primary and the auxiliary instances. In the most general correlation model we address, we assume that the primary instance is a worst-case instance and the auxiliary instances are chosen with uniform distribution subject to the constraint that their alignments are epsilon-correlated with the optimal alignment of the primary instance. We emphasize that optimal solutions for the auxiliary instances will not generally coincide with optimal solutions for the worst case primary instance. We view our work as pointing out a new avenue for looking for significant improvements for sequence alignment problems and computing similarity measures, by taking advantage of access to sequences which are correlated through natural generating processes. In this first work we show how to take advantage of mathematically inspired simple clean models of correlation - the intriguing question, looking forward, is to find correlation models which coincide with evolutionary models and other relationships and for which our approach to multiple sequence alignment gives provable guarantees. https://drops.dagstuhl.de/storage/00lipics/lipics-vol067-itcs2017/LIPIcs.ITCS.2017.13/LIPIcs.ITCS.2017.13.pdf Correlated instances Longest Common Subsequence Fine-grained complexity

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ITCS.2017.13</doi>

<documentType>article</documentType>

<title language="eng">The Complexity of Problems in P Given Correlated Instances</title>

<name>Goldwasser, Shafi</name>

</author>

<name>Holden, Dhiraj</name>

</author>

</authors>

<abstract language="eng">Instances of computational problems do not exist in isolation. Rather, multiple and correlated instances of the same problem arise naturally in the real world. The challenge is how to gain computationally from correlations when they can be found. [DGH, ITCS 2015] showed that significant computational gains can be made by having access to auxiliary instances which are correlated to the primary problem instance via the solution space. They demonstrate this for constraint satisfaction problems, which are NP-hard in the general worst case form. Here, we set out to study the impact of having access to correlated instances on the complexity of polynomial time problems. Namely, for a problem P that is conjectured to require time n^c for c>0, we ask whether access to a few instances of P that are correlated in some natural way can be used to solve P on one of them (the designated "primary instance") faster than the conjectured lower bound of n^c. We focus our attention on a number of problems: the Longest Common Subsequence (LCS), the minimum Edit Distance between sequences, and Dynamic Time Warping Distance (DTWD) of curves, for all of which the best known algorithms achieve O(n^2/polylog(n)) runtime via dynamic programming. These problems form an interesting case in point to study, as it has been shown that a O(n^(2 - epsilon)) time algorithm for a worst-case instance would imply improved algorithms for a host of other problems as well as disprove complexity hypotheses such as the Strong Exponential Time Hypothesis. We show how to use access to a logarithmic number of auxiliary correlated instances, to design novel o(n^2) time algorithms for LCS, EDIT, DTWD, and more generally improved algorithms for computing any tuple-based similarity measure - a generalization which we define within on strings. For the multiple sequence alignment problem on k strings, this yields an O(nk\log n) algorithm contrasting with classical O(n^k) dynamic programming. Our results hold for several correlation models between the primary and the auxiliary instances. In the most general correlation model we address, we assume that the primary instance is a worst-case instance and the auxiliary instances are chosen with uniform distribution subject to the constraint that their alignments are epsilon-correlated with the optimal alignment of the primary instance. We emphasize that optimal solutions for the auxiliary instances will not generally coincide with optimal solutions for the worst case primary instance. We view our work as pointing out a new avenue for looking for significant improvements for sequence alignment problems and computing similarity measures, by taking advantage of access to sequences which are correlated through natural generating processes. In this first work we show how to take advantage of mathematically inspired simple clean models of correlation - the intriguing question, looking forward, is to find correlation models which coincide with evolutionary models and other relationships and for which our approach to multiple sequence alignment gives provable guarantees.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol067-itcs2017/LIPIcs.ITCS.2017.13/LIPIcs.ITCS.2017.13.pdf</fullTextUrl>

<keyword>Correlated instances</keyword>

<keyword>Longest Common Subsequence</keyword>

<keyword>Fine-grained complexity</keyword>

</keywords>

</record>

</records>