Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Authors Frank Ban, Xi Chen, Rocco A. Servedio, Sandip Sinha

Thumbnail PDF


  • Filesize: 0.57 MB
  • 18 pages

Document Identifiers

Author Details

Frank Ban
  • UC Berkeley, Berkeley, CA, USA
Xi Chen
  • Columbia University, New York, NY, USA
Rocco A. Servedio
  • Columbia University, New York, NY, USA
Sandip Sinha
  • Columbia University, New York, NY, USA

Cite AsGet BibTex

Frank Ban, Xi Chen, Rocco A. Servedio, and Sandip Sinha. Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 44:1-44:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Information theory
  • Theory of computation → Machine learning theory
  • population recovery
  • deletion channel
  • trace reconstruction


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alexandr Andoni, Mark Braverman, and Avinatan Hassidim. Phylogenetic Reconstruction with Insertions and Deletions. Manuscript, 2014. Google Scholar
  2. Alexandr Andoni, Constantinos Daskalakis, Avinatan Hassidim, and Sébastien Roch. Global Alignment of Molecular Sequences via Ancestral State Reconstruction. In ICS, pages 358-369, 2010. Google Scholar
  3. Frank Ban, Xi Chen, Adam Freilich, Rocco A. Servedio, and Sandip Sinha. Beyond trace reconstruction: Population recovery from the deletion channel. CoRR, abs/1904.05532, 2019. URL: http://arxiv.org/abs/1904.05532.
  4. T. Batu, S. Kannan, S. Khanna, and A. McGregor. Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pages 910-918, 2004. Google Scholar
  5. Zachary Chase. New lower bounds for trace reconstruction. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.03031.
  6. Constantinos Daskalakis and Sébastien Roch. Alignment-Free Phylogenetic Reconstruction. In RECOMB, pages 123-137, 2010. Google Scholar
  7. A. De, M. Saks, and S. Tang. Noisy population recovery in polynomial time. Technical Report TR-16-026, Electronic Colloquium on Computational Complexity, 2016. To appear in FOCS 2016. Google Scholar
  8. Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Optimal mean-based algorithms for trace reconstruction. In Proceedings of the 49th ACM Symposium on Theory of Computing (STOC), pages 1047-1056, 2017. Google Scholar
  9. Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Sharp bounds for population recovery. CoRR, abs/1703.01474, 2017. URL: http://arxiv.org/abs/1703.01474.
  10. Z. Dvir, A. Rao, A. Wigderson, and A. Yehudayoff. Restriction access. In Innovations in Theoretical Computer Science, pages 19-33, 2012. Google Scholar
  11. W. Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968. Google Scholar
  12. Nina Holden and Russell Lyons. Lower bounds for trace reconstruction. Available at https://arxiv.org/abs/1808.02336, 2018.
  13. Nina Holden, Robin Pemantle, and Yuval Peres. Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. CoRR, abs/1801.04783, 2018. Google Scholar
  14. T. Holenstein, M. Mitzenmacher, R. Panigrahy, and U. Wieder. Trace reconstruction with constant deletion probability and related results. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pages 389-398, 2008. Google Scholar
  15. Svante Janson. Tail bounds for sums of geometric and exponential variables. Statistics & Probability Letters, 135:1-6, 2018. URL: https://doi.org/10.1016/j.spl.2017.11.017.
  16. Sampath Kannan and Andrew McGregor. More on Reconstructing Strings from Random Traces: Insertions and Deletions. In IEEE International Symposium on Information Theory, pages 297-301, 2005. Google Scholar
  17. Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, and Soumyabrata Pal. Trace Reconstruction: Generalized and Parameterized. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.09618.
  18. S. Lovett and J. Zhang. Improved Noisy Population Recovery, and Reverse Bonami-Beckner Inequality for Sparse Functions. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 137-142, 2015. Google Scholar
  19. Andrew McGregor, Eric Price, and Sofya Vorotnikova. Trace Reconstruction Revisited. In Proceedings of the 22nd Annual European Symposium on Algorithms, pages 689-700, 2014. Google Scholar
  20. Ankur Moitra and Michael E. Saks. A Polynomial Time Algorithm for Lossy Population Recovery. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 110-116, 2013. Google Scholar
  21. Fedor Nazarov and Yuval Peres. Trace reconstruction with exp(O(n^1/3)) samples. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1042-1046, 2017. Google Scholar
  22. Lee Organick, Siena Dumas Ang, Yuan-Jyue Chen, Randolph Lopez, Sergey Yekhanin, Konstantin Makarychev, Miklos Z Racz, Govinda Kamath, Parikshit Gopalan, Bichlien Nguyen, et al. Random access in large-scale DNA data storage. Nature biotechnology, 36(3):242, 2018. Google Scholar
  23. Yuval Peres and Alex Zhai. Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice. In FOCS, pages 228-239, 2017. Google Scholar
  24. Yury Polyanskiy, Ananda Theertha Suresh, and Yihong Wu. Sample complexity of population recovery. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017, pages 1589-1618, 2017. Google Scholar
  25. Krishnamurthy Viswanathan and Ram Swaminathan. Improved string reconstruction over insertion-deletion channels. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 399-408, 2008. Google Scholar
  26. A. Wigderson and A. Yehudayoff. Population recovery and partial identification. Machine Learning, 102(1):29-56, 2016. Preliminary version in FOCS 2012. Google Scholar
  27. S.M. Hossein Tabatabaei Yazdi, Ryan Gabrys, and Olgica Milenkovic. Portable and Error-Free DNA-Based Data Storage. Scientific Reports, 7(1):5011, 2017. Google Scholar