Approximate Hamming Distance in a Stream

Authors Raphaël Clifford, Tatiana Starikovskaya



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2016.20.pdf
  • Filesize: 476 kB
  • 14 pages

Document Identifiers

Author Details

Raphaël Clifford
Tatiana Starikovskaya

Cite As Get BibTex

Raphaël Clifford and Tatiana Starikovskaya. Approximate Hamming Distance in a Stream. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICALP.2016.20

Abstract

We consider the problem of computing a (1+epsilon)-approximation of the Hamming distance between a pattern of length n and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem. We show the following:

- If Alice and Bob both share the pattern and Alice has the first half of the stream and Bob the second half, then there is an O(epsilon^{-4}*log^2(n)) bit randomised one-way communication protocol.

- If Alice has the pattern, Bob the first half of the stream and Charlie the second half, then there is an O(epsilon^{-2}*sqrt(n)*log(n)) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1 + epsilon)-approximate Hamming distance which give worst case running time guarantees per arriving symbol.

- For binary input alphabets there is an O(epsilon^{-3}*sqrt(n)*log^2(n)) space and O(epsilon^{-2}*log(n)) time streaming
(1 + epsilon)-approximate Hamming distance algorithm.

- For general input alphabets there is an O(epsilon^{-5}*sqrt(n)*log^4(n)) space and O(epsilon^{-4}*log^3(n)) time streaming
(1 + epsilon)-approximate Hamming distance algorithm.

Subject Classification

Keywords
  • Hamming distance
  • communication complexity
  • data stream model

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl Abrahamson. Generalized string matching. SIAM Journal on Computing, 16(6):1039-1051, 1987. Google Scholar
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In STOC'00: Proc. 28superscriptth Annual ACM Symp. Theory of Computing, pages 20-29. ACM, 1996. Google Scholar
  3. Amihood Amir, Yonatan Aumann, Gary Benson, Avivit Levy, Ohad Lipsky, Ely Porat, Steven Skiena, and Uzi Vishne. Pattern matching with address errors: Rearrangement distances. Journal of Computer System Sciences, 75(6):359-370, 2009. Google Scholar
  4. Amihood Amir, Yonatan Aumann, Oren Kapah, Avivit Levy, and Ely Porat. Approximate string matching with address bit errors. In CPM'08: Proc. 19superscriptth Annual Symp. on Combinatorial Pattern Matching, pages 118-129, 2008. Google Scholar
  5. Amihood Amir, Yonatan Aumann, Moshe Lewenstein, and Ely Porat. Function matching. SIAM Journal on Computing, 35(5):1007-1022, 2006. Google Scholar
  6. Amihood Amir, Richard Cole, Ramesh Hariharan, Moshe Lewenstein, and Ely Porat. Overlap matching. Information and Computation, 181(1):57-74, 2003. Google Scholar
  7. Amihood Amir, Estrella Eisenberg, and Ely Porat. Swap and mismatch edit distance. Algorithmica, 45(1):109-120, 2006. Google Scholar
  8. Amihood Amir, Martin Farach, and S. Muthu Muthukrishnan. Alphabet dependence in parameterized matching. Information Processing Letters, 49(3):111-115, 1994. Google Scholar
  9. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM Journal on Computing, 41(5):1299-1317, 2012. Google Scholar
  10. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In SODA'16: Proc. 27superscriptth ACM-SIAM Symp. on Discrete Algorithms, pages 2039-2052, 2016. Google Scholar
  11. Raphaël Clifford and Benjamin Sach. Pseudo-realtime pattern matching: Closing the gap. In CPM'10: Proc. 21superscriptst Annual Symp. on Combinatorial Pattern Matching, pages 101-111, 2010. Google Scholar
  12. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams using Hamming norms (how to zero in). IEEE Trans. on Knowl. and Data Eng., 15(3):529-540, 2003. Google Scholar
  13. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794-1813, 2002. Google Scholar
  14. M. Fischer and M. Paterson. String matching and other products. In Proc. 7superscriptth SIAM-AMS Complexity of Comp., pages 113-125, 1974. Google Scholar
  15. Wei Huang, Yaoyun Shi, Shengyu Zhang, and Yufan Zhu. The communication complexity of the Hamming distance problem. Information Processing Letters, 99(4):149-153, 2006. Google Scholar
  16. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM, 53(3):307-323, 2006. Google Scholar
  17. Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized matching in the streaming model. In STACS'13: Proc. 30superscriptth Annual Symp. on Theoretical Aspects of Computer Science, pages 400-411, 2013. URL: http://arxiv.org/abs/1109.5269.
  18. Thathachar S. Jayram and David P. Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Transactions on Algorithms (TALG), 9(3):26, 2013. Google Scholar
  19. William Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Proc. of the Conference in Modern Analysis and Probability, volume 26 of Contemporary Mathematics, pages 189-206. American Mathematical Society, 1984. Google Scholar
  20. H. Karloff. Fast algorithms for approximately counting mismatches. Information Processing Letters, 48(2):53-60, 1993. Google Scholar
  21. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the Hamming distance in 1/ε time per alignment. In FOCS'15: Proc. 56superscriptth Annual Symp. Foundations of Computer Science, pages 601-613, 2015. Google Scholar
  22. S. Rao Kosaraju. Efficient string matching. Manuscript, 1987. Google Scholar
  23. Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. Journal of Computer System Sciences, 37(1):63-78, 1988. Google Scholar
  24. Ely Porat. Personal communication, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail