Creative Commons Attribution 3.0 Unported license
We consider the problem of computing a (1+epsilon)-approximation of the Hamming distance between a pattern of length n and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem. We show the following:
- If Alice and Bob both share the pattern and Alice has the first half of the stream and Bob the second half, then there is an O(epsilon^{-4}*log^2(n)) bit randomised one-way communication protocol.
- If Alice has the pattern, Bob the first half of the stream and Charlie the second half, then there is an O(epsilon^{-2}*sqrt(n)*log(n)) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1 + epsilon)-approximate Hamming distance which give worst case running time guarantees per arriving symbol.
- For binary input alphabets there is an O(epsilon^{-3}*sqrt(n)*log^2(n)) space and O(epsilon^{-2}*log(n)) time streaming
(1 + epsilon)-approximate Hamming distance algorithm.
- For general input alphabets there is an O(epsilon^{-5}*sqrt(n)*log^4(n)) space and O(epsilon^{-4}*log^3(n)) time streaming
(1 + epsilon)-approximate Hamming distance algorithm.
@InProceedings{clifford_et_al:LIPIcs.ICALP.2016.20,
author = {Clifford, Rapha\"{e}l and Starikovskaya, Tatiana},
title = {{Approximate Hamming Distance in a Stream}},
booktitle = {43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
pages = {20:1--20:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-013-2},
ISSN = {1868-8969},
year = {2016},
volume = {55},
editor = {Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.20},
URN = {urn:nbn:de:0030-drops-62992},
doi = {10.4230/LIPIcs.ICALP.2016.20},
annote = {Keywords: Hamming distance, communication complexity, data stream model}
}