Polynomial-Time Trace Reconstruction in the Low Deletion Rate Regime
In the trace reconstruction problem, an unknown source string x ∈ {0,1}ⁿ is transmitted through a probabilistic deletion channel which independently deletes each bit with some fixed probability δ and concatenates the surviving bits, resulting in a trace of x. The problem is to reconstruct x given access to independent traces. Trace reconstruction of arbitrary (worst-case) strings is a challenging problem, with the current state of the art for poly(n)-time algorithms being the 2004 algorithm of Batu et al. [T. Batu et al., 2004]. This algorithm can reconstruct an arbitrary source string x ∈ {0,1}ⁿ in poly(n) time provided that the deletion rate δ satisfies δ ≤ n^{-(1/2 + ε)} for some ε > 0.
In this work we improve on the result of [T. Batu et al., 2004] by giving a poly(n)-time algorithm for trace reconstruction for any deletion rate δ ≤ n^{-(1/3 + ε)}. Our algorithm works by alternating an alignment-based procedure, which we show effectively reconstructs portions of the source string that are not "highly repetitive", with a novel procedure that efficiently determines the length of highly repetitive subwords of the source string.
trace reconstruction
Mathematics of computing~Probabilistic inference problems
20:1-20:20
Regular Paper
Full Version: https://arxiv.org/abs/2012.02844
Xi
Chen
Xi Chen
Columbia University, New York, NY, USA
http://www.cs.columbia.edu/~xichen
Supported by NSF grants CCF-1703925 and IIS-1838154.
Anindya
De
Anindya De
University of Pennsylvania, Philadelphia, PA, USA
https://www.seas.upenn.edu/~anindyad/
Supported by NSF grants CCF-1926872 and CCF-1910534.
Chin Ho
Lee
Chin Ho Lee
Columbia University, New York, NY, USA
https://www.cs.columbia.edu/~chlee/
Supported by a grant from the Croucher Foundation and by the Simons Collaboration on Algorithms and Geometry.
Rocco A.
Servedio
Rocco A. Servedio
Columbia University, New York, NY, USA
http://www.cs.columbia.edu/~rocco
Supported by NSF grants CCF-1814873, IIS-1838154, CCF-1563155, and by the Simons Collaboration on Algorithms and Geometry.
Sandip
Sinha
Sandip Sinha
Columbia University, New York, NY, USA
https://sites.google.com/view/sandips
https://orcid.org/0000-0002-2592-175X
Supported by NSF grants CCF-1714818, CCF-1822809, IIS-1838154, CCF-1617955, CCF-1740833, and by the Simons Collaboration on Algorithms and Geometry.
10.4230/LIPIcs.ITCS.2021.20
Alexandr Andoni, Constantinos Daskalakis, Avinatan Hassidim, and Sebastien Roch. Global alignment of molecular sequences via ancestral state reconstruction. Stochastic Processes and their Applications, 122(12):3852-3874, 2012.
T. Batu, S. Kannan, S. Khanna, and A. McGregor. Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pages 910-918, 2004.
Z. Chase. New lower bounds for trace reconstruction. CoRR, abs/1905.03031, 2019. URL: http://arxiv.org/abs/1905.03031.
http://arxiv.org/abs/1905.03031
Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Optimal mean-based algorithms for trace reconstruction. In Proceedings of the 49th ACM Symposium on Theory of Computing (STOC), pages 1047-1056, 2017.
N. Holden and R. Lyons. Lower bounds for trace reconstruction. CoRR, abs/1808.02336, 2018. URL: http://arxiv.org/abs/1808.02336.
http://arxiv.org/abs/1808.02336
Nina Holden, Robin Pemantle, and Yuval Peres. Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. CoRR, abs/1801.04783, 2018. URL: http://arxiv.org/abs/1801.04783.
http://arxiv.org/abs/1801.04783
Nina Holden, Robin Pemantle, Yuval Peres, and Alex Zhai. Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. CoRR, abs/1801.04783, 2020. URL: http://arxiv.org/abs/1801.04783.
http://arxiv.org/abs/1801.04783
T. Holenstein, M. Mitzenmacher, R. Panigrahy, and U. Wieder. Trace reconstruction with constant deletion probability and related results. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pages 389-398, 2008.
V. V. Kalashnik. Reconstruction of a word from its fragments. Computational Mathematics and Computer Science (Vychislitel'naya matematika i vychislitel'naya tekhnika), Kharkov, 4:56-57, 1973.
Vladimir Levenshtein. Efficient reconstruction of sequences. IEEE Transactions on Information Theory, 47(1):2-22, 2001.
Vladimir Levenshtein. Efficient reconstruction of sequences from their subsequences or supersequences. Journal of Combinatorial Theory Series A, 93(2):310-332, 2001.
Andrew McGregor, Eric Price, and Sofya Vorotnikova. Trace reconstruction revisited. In Proceedings of the 22nd Annual European Symposium on Algorithms, pages 689-700, 2014.
Michael Mitzenmacher. A survey of results for deletion channels and related synchronization channels. Probability Surveys, 6:1-33, 2009.
Fedor Nazarov and Yuval Peres. Trace reconstruction with exp(o(n^1/3)) samples. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1042-1046, 2017.
Lee Organick, Siena Dumas Ang, Yuan-Jyue Chen, Randolph Lopez, Sergey Yekhanin, Konstantin Makarychev, Miklos Z Racz, Govinda Kamath, Parikshit Gopalan, Bichlien Nguyen, et al. Random access in large-scale dna data storage. Nature biotechnology, 36(3):242, 2018.
S.M. Hossein Tabatabaei Yazdi, Ryan Gabrys, and Olgica Milenkovic. Portable and error-free DNA-based data storage. Scientific Reports, 7(1):5011, 2017.
Xi Chen, Anindya De, Chin Ho Lee, Rocco A. Servedio, and Sandip Sinha
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode