Circular Trace Reconstruction

Authors Shyam Narayanan, Michael Ren



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2021.18.pdf
  • Filesize: 0.54 MB
  • 18 pages

Document Identifiers

Author Details

Shyam Narayanan
  • Massachusetts Institute of Technology, Cambridge, MA, USA
Michael Ren
  • Massachusetts Institute of Technology, Cambridge, MA, USA

Acknowledgements

The first author thanks Professor Piotr Indyk for many helpful discussions and feedback, Mehtaab Sawhney for pointers to some references, and Professor Bjorn Poonen for a helpful discussion on sums of roots of unity. The second author thanks Professor Joe Gallian for running the Duluth REU at which part of this research was conducted, as well as program advisors Amanda Burcroff, Colin Defant, and Yelena Mandelshtam for providing a supportive environment. The authors also thank Amanda Burcroff for helpful edits on the paper’s writeup.

Cite AsGet BibTex

Shyam Narayanan and Michael Ren. Circular Trace Reconstruction. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ITCS.2021.18

Abstract

Trace reconstruction is the problem of learning an unknown string x from independent traces of x, where traces are generated by independently deleting each bit of x with some deletion probability q. In this paper, we initiate the study of Circular trace reconstruction, where the unknown string x is circular and traces are now rotated by a random cyclic shift. Trace reconstruction is related to many computational biology problems studying DNA, which is a primary motivation for this problem as well, as many types of DNA are known to be circular. Our main results are as follows. First, we prove that we can reconstruct arbitrary circular strings of length n using exp(Õ(n^{1/3})) traces for any constant deletion probability q, as long as n is prime or the product of two primes. For n of this form, this nearly matches what was the best known bound of exp(O(n^{1/3})) for standard trace reconstruction when this paper was initially released. We note, however, that Chase very recently improved the standard trace reconstruction bound to exp(Õ(n^{1/5})). Next, we prove that we can reconstruct random circular strings with high probability using n^O(1) traces for any constant deletion probability q. Finally, we prove a lower bound of Ω̃(n³) traces for arbitrary circular strings, which is greater than the best known lower bound of Ω̃(n^{3/2}) in standard trace reconstruction.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Probabilistic algorithms
Keywords
  • Trace Reconstruction
  • Deletion Channel
  • Cyclotomic Integers

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Circular DNA. Available at URL: https://en.wikipedia.org/wiki/Circular_DNA.
  2. Alexandr Andoni, Assaf Goldberger, Andrew McGregor, and Ely Porat. Homomorphic fingerprints under misalignments: sketching edit and shift distances. In Symposium on Theory of Computing (STOC), pages 931-940, 2013. URL: https://doi.org/10.1145/2488608.2488726.
  3. Frank Ban, Xi Chen, Adam Freilich, Rocco A. Servedio, and Sandip Sinha. Beyond trace reconstruction: Population recovery from the deletion channel. In Foundations of Computer Science (FOCS), pages 745-768, 2019. URL: https://doi.org/10.1109/FOCS.2019.00050.
  4. Frank Ban, Xi Chen, Rocco A. Servedio, and Sandip Sinha. Efficient average-case population recovery in the presence of insertions and deletions. In Approximation, Randomization, and Combinatorial Optimization: Algorithms and Techniques, pages 44:1-44:18, 2019. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.44.
  5. Afonso S. Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Innovations in Theoretical Computer Science (ITCS), pages 745-768, 2019. URL: https://doi.org/10.1145/2554797.2554839.
  6. Afonso S. Bandeira, Jonathan Niles-Weed, and Philippe Rigollet. Optimal rates of estimation for multi-reference alignment. Mathematical Statistics and Learning, 2(1):25-75, 2019. URL: https://doi.org/10.4171/MSL/11.
  7. Tugkan Batu, Sampath Kannan, Sanjeev Khanna, and Andrew McGregor. Reconstructing strings from random traces. In Symposium on Discrete Algorithms (SODA), pages 910-918, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982929.
  8. Vinnu Bhardwaj, Pavel A. Pevzner, Cyrus Rashtchian, and Yana Safonova. Trace reconstruction problems in computational biology. IEEE Transactions on Information Theory, pages 1-1, 2020. URL: https://doi.org/10.1109/TIT.2020.3030569.
  9. Peter Borwein and Tamás Erdélyi. Littlewood-type polynomials on subarcs of the unit circle. Indiana University Mathematics Journal, 46(4):1323-1346, 1997. URL: https://doi.org/10.1512/IUMJ.1997.46.1435.
  10. Peter Borwein, Tamás Erdélyi, and Géza Kós. Littlewood-type problems on [0, 1]. Proceedings of the London Mathematical Society, 3(79):22-46, 1999. URL: https://doi.org/10.1112/S0024611599011831.
  11. Joshua Brakensiek, Ray Li, and Bruce Spang. Coded trace reconstruction in a constant number of traces. In Foundations of Computer Science (FOCS), 2020. URL: http://arxiv.org/abs/1908.03996.
  12. Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Circular pattern matching with k mismatches. Journal of Computer and System Sciences, 115:73-85, 2021. URL: https://doi.org/10.1016/j.jcss.2020.07.003.
  13. Zachary Chase. New lower bounds for trace reconstruction. CoRR, abs/1905.03031, 2019. URL: http://arxiv.org/abs/1905.03031.
  14. Zachary Chase. New upper bounds for trace reconstruction. CoRR, abs/2009.03296, 2020. URL: http://arxiv.org/abs/2009.03296.
  15. Xi Chen, Anindya De, Chin Ho Lee, Rocco A. Servedio, and Sandip Sinha. Polynomial-time trace reconstruction in the smoothed complexity model. CoRR, abs/2008.12386, 2020. To appear in Symposium on Discrete Algorithms (SODA), 2021. URL: http://arxiv.org/abs/2008.12386.
  16. Mahdi Cheraghchi, Ryan Gabrys, Olgica Milenkovic, and João Ribeiro. Coded trace reconstruction. IEEE Trans. Inf. Theory, 66(10):6084-6103, 2020. URL: https://doi.org/10.1109/TIT.2020.2996377.
  17. George M. Church, Yuan Gao, and Sriram Kosuri. Next-generation digital information storage in DNA. Science, 337(6102):1628, 2012. URL: https://doi.org/10.1126/science.1226355.
  18. Sami Davies, Miklos Racz, and Cyrus Rashtchian. Reconstructing trees from traces. In Conference On Learning Theory (COLT), pages 961-978, 2019. URL: http://proceedings.mlr.press/v99/davies19a.html.
  19. Anindya De, Ryan O'Donnell, and Rocco A. Servedio. Optimal mean-based algorithms for trace reconstruction. Annals of Applied Probability, 29(2):851-874, 2019. URL: https://doi.org/10.1214/18-AAP1394.
  20. Lisa Hartung, Nina Holden, and Yuval Peres. Trace reconstruction with varying deletion probabilities. In Analytic Algorithmics and Combinatorics (ANALCO), pages 54-61, 2018. URL: https://doi.org/10.1137/1.9781611975062.6.
  21. Nina Holden and Russell Lyons. Lower bounds for trace reconstruction. Annals of Applied Probability, 30(2):503-525, 2020. URL: https://doi.org/10.1214/19-AAP1506.
  22. Nina Holden, Robin Pemantle, and Yuval Peres. Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. In Conference On Learning Theory (COLT), pages 1799-1840, 2018. URL: http://proceedings.mlr.press/v75/holden18a.html.
  23. Thomas Holenstein, Michael Mitzenmacher, Rina Panigrahy, and Udi Wieder. Trace reconstruction with constant deletion probability and related results. In Symposium on Discrete Algorithms (SODA), pages 389-398, 2008. URL: https://doi.org/10.1145/1347082.1347125.
  24. Sampath Kannan and Andrew McGregor. More on reconstructing strings from random traces: insertions and deletions. In International Symposium on Information Theory (ISIT), pages 297-301, 2005. URL: https://doi.org/10.1109/ISIT.2005.1523342.
  25. Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, and Soumyabrata Pal. Trace reconstruction: Generalized and parameterized. In European Symposium on Algorithms (ESA), pages 68:1-68:25, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.68.
  26. Vladimir I. Levenshtein. Efficient reconstruction of sequences. IEEE Trans. Information Theory, 47(1):2-22, 2001. URL: https://doi.org/10.1109/18.904499.
  27. Vladimir I. Levenshtein. Efficient reconstruction of sequences from their subsequences or supersequences. J. Comb. Theory, Ser. A, 93(2):310-332, 2001. URL: https://doi.org/10.1006/jcta.2000.3081.
  28. Maurice Maes. On a cyclic string-to-string correction problem. Information Processing Letters, 35(2):73-78, 1990. URL: https://doi.org/10.1016/0020-0190(90)90109-B.
  29. Daniel A. Marcus. Number Fields. Springer International Publishing, 1977. Google Scholar
  30. Andrew McGregor, Eric Price, and Sofya Vorotnikova. Trace reconstruction revisited. In European Symposium on Algorithms (ESA), pages 689-700, 2014. Google Scholar
  31. Shyam Narayanan. Population recovery from the deletion channel: Nearly matching trace reconstruction bounds. CoRR, abs/2004.06828, 2020. To appear in Symposium on Discrete Algorithms (SODA), 2021. URL: http://arxiv.org/abs/2004.06828.
  32. Fedor Nazarov and Yuval Peres. Trace reconstruction with exp(o(n^1/3)) samples. In Symposium on Theory of Computing (STOC), pages 1042-1046, 2017. URL: https://doi.org/10.1145/3055399.3055494.
  33. Hoang Hiep Nguyen, Jeho Park, Seon Joo Park, Chang-Soo Lee, Seungwoo Hwang, Yong-Beom Shin, Tai Hwan Ha, and Moonil Kim. Long-term stability and integrity of plasmid-based dna data storage. Polymers, 10(1):28, 2018. URL: https://doi.org/10.3390/polym10010028.
  34. Lee Organick, Siena Dumas Ang, Yuan-Jyue Chen, Randolph Lopez, Sergey Yekhanin, Konstantin Makarychev, Miklos Z Racz, Govinda Kamath, Parikshit Gopalan, Bichlien Nguyen, Christopher N Takahashi, Sharon Newman, Hsing-Yeh Parker, Cyrus Rashtchian, Kendall Stewart, Gagan Gupta, Robert Carlson, John Mulligan, Douglas Carmean, Georg Seelig, Luis Ceze, , and Karin Strauss. Random access in large-scale dna data storage. Nature Biotechnology, 36:242-248, 2018. URL: https://doi.org/10.1038/nbt.4079.
  35. Yuval Peres and Alex Zhai. Average-case reconstruction for the deletion channel: Subpolynomially many traces suffice. In Foundations of Computer Science (FOCS), pages 228-239, 2017. URL: https://doi.org/10.1109/FOCS.2017.29.
  36. Amelia Perry, Jonathan Weed, Afonso S. Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497-517, 2019. URL: https://doi.org/10.1137/18M1214317.
  37. Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, and Robert B. Jackson. Campbell Biology. Pearson, 9th edition, 2011. Google Scholar
  38. Krishnamurthy Viswanathan and Ram Swaminathan. Improved string reconstruction over insertion-deletion channels. In Symposium on Discrete Algorithms (SODA), pages 399-408, 2008. URL: https://doi.org/10.1145/1347082.1347126.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail