Streaming for Aibohphobes: Longest Palindrome with Mismatches

Authors Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2017.31.pdf
  • Filesize: 0.56 MB
  • 13 pages

Document Identifiers

Author Details

Elena Grigorescu
Erfan Sadeqi Azer
Samson Zhou

Cite AsGet BibTex

Elena Grigorescu, Erfan Sadeqi Azer, and Samson Zhou. Streaming for Aibohphobes: Longest Palindrome with Mismatches. In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 93, pp. 31:1-31:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.FSTTCS.2017.31

Abstract

A palindrome is a string that reads the same as its reverse, such as "aibohphobia" (fear of palindromes). Given a metric and an integer d>0, a d-near-palindrome} is a string of Hamming distance at most d from its reverse. We study the natural problem of identifying the longest d-near-palindrome in data streams. The problem is relevant to the analysis of DNA databases, and to the task of repairing recursive structures in documents such as XML and JSON. We present the first streaming algorithm for the longest d-near-palindrome problem that returns a d-near-palindrome whose length is within a multiplicative (1+\eps)-factor of the longest d-near-palindrome. Our algorithm also returns the set of mismatched indices in the d-near-palindrome, and uses O{\frac{d\log^7 n}{\eps\log(1+\eps)}} bits of space, and O{\frac{d\log^6 n}{\eps\log(1+\eps)}} update time per arrival symbol. We show that for d=o(\sqrt{n}), any randomized algorithm with multiplicative approximation (1+\eps) that succeeds with probability at least 1-1/n requires \Omega(d\log n) space. We further obtain a streaming algorithm that returns a d-near-palindrome whose length is within an additive E-error of the longest d-near-palindrome. The algorithm uses O{\frac{dn\log^6 n}{E}} bits of space and O{\frac{dn\log^5 n}{E}} update time. As before, we show that any randomized streaming algorithm that solves the longest d-near-palindrome problem for additive error E with probability at least 1-\frac{1}{n}, uses \Omega\left(\frac{dn}{E}\right) space. Finally, we give an exact two-pass algorithm that solves the longest d-near-palindrome problem using O{d^2\sqrt{n}\log^6 n} bits of space.
Keywords
  • Longest palindrome with mismatches
  • Streaming algorithms
  • Hamming distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. List of open problems in sublinear algorithms: Problem 61. URL: http://sublinear.info/61.
  2. Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. Basic local alignment search tool. Journal of molecular biology, 215(3):403-410, 1990. Google Scholar
  3. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. Google Scholar
  4. Alexandr Andoni, Assaf Goldberger, Andrew McGregor, and Ely Porat. Homomorphic fingerprints under misalignments: sketching edit and shift distances. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 931-940, 2013. Google Scholar
  5. Petra Berenbrink, Funda Ergün, Frederik Mallmann-Trenn, and Erfan Sadeqi Azer. Palindrome recognition in the streaming model. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS), pages 149-161, 2014. Google Scholar
  6. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Trans. Algorithms, 10(4):22:1-22:12, 2014. Google Scholar
  7. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A black box for online approximate pattern matching. Inf. Comput., 209(4):731-736, 2011. Google Scholar
  8. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2039-2052, 2016. Google Scholar
  9. Albert A. Conti, Tom Van Court, and Martin C. Herbordt. Processing repetitive sequence structures with mismatches at streaming rate. In Field Programmable Logic and Application, 14th International Conference , FPL Proceedings, pages 1080-1083, 2004. Google Scholar
  10. Pawel Gawrychowski, Oleg Merkurev, Arseny M. Shur, and Przemyslaw Uznanski. Tight tradeoffs for real-time approximation of longest palindromes in streams. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM, pages 18:1-18:13, 2016. Google Scholar
  11. Elena Grigorescu, Erfan Sadeqi Azer, and Samson Zhou. Streaming for aibohphobes: Longest palindrome with mismatches. CoRR, abs/1705.01887, 2017. URL: http://arxiv.org/abs/1705.01887.
  12. Martin C. Herbordt, Josh Model, Bharat Sukhwani, Yongfeng Gu, and Tom Van Court. Single pass streaming BLAST on fpgas. Parallel Computing, 33(10-11):741-756, 2007. Google Scholar
  13. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. Google Scholar
  14. Glenn K. Manacher. A new linear-time "on-line" algorithm for finding the smallest initial palindrome of a string. J. ACM, 22(3):346-351, 1975. Google Scholar
  15. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 315-323, 2009. Google Scholar
  16. Ely Porat and Ohad Lipsky. Improved sketching of hamming distance with error correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM Proceedings, pages 173-182, 2007. Google Scholar
  17. Andrew Chi-Chih Yao. Probabilistic computations: Toward a unified measure of complexity (extended abstract). In 18th Annual Symposium on Foundations of Computer Science, FOCS, pages 222-227, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail