Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Ban, Frank; Chen, Xi; Servedio, Rocco A.; Sinha, Sandip

doi:10.4230/LIPIcs.APPROX-RANDOM.2019.44

Subject Classification

ACM Subject Classification

Mathematics of computing → Information theory
Theory of computation → Machine learning theory

Keywords

population recovery
deletion channel
trace reconstruction

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s.

Cite As Get BibTex

Frank Ban, Xi Chen, Rocco A. Servedio, and Sandip Sinha. Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 44:1-44:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.44

Author Details

Frank Ban

UC Berkeley, Berkeley, CA, USA

Xi Chen

Columbia University, New York, NY, USA

Rocco A. Servedio

Columbia University, New York, NY, USA

Sandip Sinha

Columbia University, New York, NY, USA

References

Alexandr Andoni, Mark Braverman, and Avinatan Hassidim. Phylogenetic Reconstruction with Insertions and Deletions. Manuscript, 2014.
Alexandr Andoni, Constantinos Daskalakis, Avinatan Hassidim, and Sébastien Roch. Global Alignment of Molecular Sequences via Ancestral State Reconstruction. In ICS, pages 358-369, 2010.
Frank Ban, Xi Chen, Adam Freilich, Rocco A. Servedio, and Sandip Sinha. Beyond trace reconstruction: Population recovery from the deletion channel. CoRR, abs/1904.05532, 2019. URL: http://arxiv.org/abs/1904.05532.
T. Batu, S. Kannan, S. Khanna, and A. McGregor. Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pages 910-918, 2004.
Zachary Chase. New lower bounds for trace reconstruction. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.03031.
Constantinos Daskalakis and Sébastien Roch. Alignment-Free Phylogenetic Reconstruction. In RECOMB, pages 123-137, 2010.
A. De, M. Saks, and S. Tang. Noisy population recovery in polynomial time. Technical Report TR-16-026, Electronic Colloquium on Computational Complexity, 2016. To appear in FOCS 2016.
Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Optimal mean-based algorithms for trace reconstruction. In Proceedings of the 49th ACM Symposium on Theory of Computing (STOC), pages 1047-1056, 2017.
Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Sharp bounds for population recovery. CoRR, abs/1703.01474, 2017. URL: http://arxiv.org/abs/1703.01474.
Z. Dvir, A. Rao, A. Wigderson, and A. Yehudayoff. Restriction access. In Innovations in Theoretical Computer Science, pages 19-33, 2012.
W. Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968.
Nina Holden and Russell Lyons. Lower bounds for trace reconstruction. Available at https://arxiv.org/abs/1808.02336, 2018.
Nina Holden, Robin Pemantle, and Yuval Peres. Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. CoRR, abs/1801.04783, 2018.
T. Holenstein, M. Mitzenmacher, R. Panigrahy, and U. Wieder. Trace reconstruction with constant deletion probability and related results. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pages 389-398, 2008.
Svante Janson. Tail bounds for sums of geometric and exponential variables. Statistics & Probability Letters, 135:1-6, 2018. URL: https://doi.org/10.1016/j.spl.2017.11.017.
Sampath Kannan and Andrew McGregor. More on Reconstructing Strings from Random Traces: Insertions and Deletions. In IEEE International Symposium on Information Theory, pages 297-301, 2005.
Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, and Soumyabrata Pal. Trace Reconstruction: Generalized and Parameterized. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.09618.
S. Lovett and J. Zhang. Improved Noisy Population Recovery, and Reverse Bonami-Beckner Inequality for Sparse Functions. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 137-142, 2015.
Andrew McGregor, Eric Price, and Sofya Vorotnikova. Trace Reconstruction Revisited. In Proceedings of the 22nd Annual European Symposium on Algorithms, pages 689-700, 2014.
Ankur Moitra and Michael E. Saks. A Polynomial Time Algorithm for Lossy Population Recovery. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 110-116, 2013.
Fedor Nazarov and Yuval Peres. Trace reconstruction with exp(O(n^1/3)) samples. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1042-1046, 2017.
Lee Organick, Siena Dumas Ang, Yuan-Jyue Chen, Randolph Lopez, Sergey Yekhanin, Konstantin Makarychev, Miklos Z Racz, Govinda Kamath, Parikshit Gopalan, Bichlien Nguyen, et al. Random access in large-scale DNA data storage. Nature biotechnology, 36(3):242, 2018.
Yuval Peres and Alex Zhai. Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice. In FOCS, pages 228-239, 2017.
Yury Polyanskiy, Ananda Theertha Suresh, and Yihong Wu. Sample complexity of population recovery. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017, pages 1589-1618, 2017.
Krishnamurthy Viswanathan and Ram Swaminathan. Improved string reconstruction over insertion-deletion channels. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 399-408, 2008.
A. Wigderson and A. Yehudayoff. Population recovery and partial identification. Machine Learning, 102(1):29-56, 2016. Preliminary version in FOCS 2012.
S.M. Hossein Tabatabaei Yazdi, Ryan Gabrys, and Olgica Milenkovic. Portable and Error-Free DNA-Based Data Storage. Scientific Reports, 7(1):5011, 2017.

Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Authors Frank Ban, Xi Chen, Rocco A. Servedio, Sandip Sinha

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Authors Frank Ban, Xi Chen, Rocco A. Servedio, Sandip Sinha

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message