Improved Circular k-Mismatch Sketches

Authors Shay Golan , Tomasz Kociumaka , Tsvi Kopelowitz , Ely Porat , Przemysław Uznański



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2020.46.pdf
  • Filesize: 0.63 MB
  • 24 pages

Document Identifiers

Author Details

Shay Golan
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Tomasz Kociumaka
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Tsvi Kopelowitz
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Ely Porat
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Przemysław Uznański
  • Institute of Computer Science, University of Wrocław, Poland

Cite AsGet BibTex

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański. Improved Circular k-Mismatch Sketches. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 46:1-46:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.46

Abstract

The shift distance sh(S₁,S₂) between two strings S₁ and S₂ of the same length is defined as the minimum Hamming distance between S₁ and any rotation (cyclic shift) of S₂. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S₁ and S₂ of length n are given to two identical players (encoders), who independently compute sketches (summaries) sk(S₁) and sk(S₂), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S₁,S₂) with high probability. This paper primarily focuses on the more general k-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S₁,S₂) > k, where k is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular k-mismatch sketches of size Õ(k+D(n)), where D(n) is the number of divisors of n. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular k-mismatch sketch of size Õ(k); this size matches communication-complexity lower bounds. We also design (1± ε)-approximate circular k-mismatch sketch of size Õ(min(ε^{-2}√k, ε^{-1.5}√n)), which improves upon an Õ(ε^{-2}√n)-size sketch of Crouch and McGregor (APPROX'11).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Sketching and sampling
Keywords
  • Hamming distance
  • k-mismatch
  • sketches
  • rotation
  • cyclic shift
  • communication complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized string matching. SIAM Journal on Computing, 16(6):1039-1051, 1987. URL: https://doi.org/10.1137/0216067.
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137-147, 1999. URL: https://doi.org/10.1006/jcss.1997.1545.
  3. Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. How much different are two words with different shortest periods. In Lazaros S. Iliadis, Ilias Maglogiannis, and Vassilis P. Plagianakos, editors, 14th International Conference on Artificial Intelligence Applications and Innovations, AIAI 2018, Workshops, volume 520 of IFIP Advances in Information and Communication Technology, pages 168-178. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-92016-0_16.
  4. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. Journal of Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  5. Alexandr Andoni, Assaf Goldberger, Andrew McGregor, and Ely Porat. Homomorphic fingerprints under misalignments: sketching edit and shift distances. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, 45th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2013, pages 931-940. ACM, 2013. URL: https://doi.org/10.1145/2488608.2488726.
  6. Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. Earth mover distance over high-dimensional spaces. In Shang-Hua Teng, editor, 19th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pages 343-352. SIAM, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347120.
  7. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pages 377-386. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/FOCS.2010.43.
  8. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. The sketching complexity of pattern matching. In Klaus Jansen, Sanjeev Khanna, José D. P. Rolim, and Dana Ron, editors, 8th International Workshop on Randomization and Computation, RANDOM 2004, volume 3122 of LNCS, pages 261-272. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27821-4_24.
  9. Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Irit Dinur, editor, 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016, pages 51-60. IEEE Computer Society, 2016. URL: https://doi.org/10.1109/FOCS.2016.15.
  10. Karl Bringmann, Marvin Künnemann, and Philip Wellnitz. Few matches or almost periodicity: Faster pattern matching with mismatches in compressed texts. In Timothy M. Chan, editor, 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1126-1145. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.69.
  11. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Daniel Wichs and Yishay Mansour, editors, 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, pages 712-725. ACM, 2016. URL: https://doi.org/10.1145/2897518.2897577.
  12. Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern Hamming distances. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 643-656. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384266.
  13. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach, 2020. URL: http://arxiv.org/abs/2004.08350.
  14. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Robert Krauthgamer, editor, 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2039-2052. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch142.
  15. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Timothy M. Chan, editor, 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1106-1125. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.68.
  16. Raphaël Clifford and Tatiana Starikovskaya. Approximate Hamming distance in a stream. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, volume 55 of LIPIcs, pages 20:1-20:14. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.20.
  17. Graham Cormode. Data sketching. Communications of the ACM, 60(9):48-55, 2017. URL: https://doi.org/10.1145/3080008.
  18. Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1-3):1-294, 2011. URL: https://doi.org/10.1561/1900000004.
  19. Michael S. Crouch and Andrew McGregor. Periodicity and cyclic shifts via linear sketches. In Leslie Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim, editors, 14th International Workshop on Approximation Algorithms for Combinatorial Optimization, APPROX 2011, volume 6845 of LNCS, pages 158-170. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22935-0_14.
  20. Benjamin Doerr. Probabilistic tools for the analysis of randomized optimization heuristics. In Natural Computing Series, pages 1-87. Springer International Publishing, 2020. URL: https://doi.org/10.1007/978-3-030-29414-4_1.
  21. Michael J. Fischer and Michael S. Paterson. String matching and other products. In Richard M. Karp, editor, Complexity of Computation, volume 7 of SIAM-AMS Proceedings, pages 113-125. AMS, 1974. Google Scholar
  22. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L₁ distance. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 62:1-62:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  23. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 65:1-65:16. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.65.
  24. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming pattern matching with d wildcards. Algorithmica, 81(5):1988-2015, 2019. URL: https://doi.org/10.1007/s00453-018-0521-7.
  25. Richard W. Hamming. Error detecting and error correcting codes. Bell System Technical Journal, 29(2):147-160, 1950. URL: https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.
  26. Wei Huang, Yaoyun Shi, Shengyu Zhang, and Yufan Zhu. The communication complexity of the Hamming distance problem. Information Processing Letters, 99(4):149-153, 2006. URL: https://doi.org/10.1016/j.ipl.2006.01.014.
  27. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Information Processing Letters, 48(2):53-60, 1993. URL: https://doi.org/10.1016/0020-0190(93)90177-B.
  28. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: https://doi.org/10.1147/rd.312.0249.
  29. Subhash Khot and Assaf Naor. Nonembeddability theorems via fourier analysis. Mathematische Annalen, 334(4):821-852, 2006. URL: https://doi.org/10.1007/s00208-005-0745-0.
  30. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the Hamming distance in 1/ε time per alignment. In Venkatesan Guruswami, editor, 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, pages 601-613. IEEE Computer Society, 2015. URL: https://doi.org/10.1109/FOCS.2015.43.
  31. Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern Hamming distance. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, volume 61 of OASICS, pages 10:1-10:5. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.10.
  32. S. Rao Kosaraju. Efficient string matching. Manuscript, 1987. Google Scholar
  33. Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457-474, 2000. URL: https://doi.org/10.1137/S0097539798347177.
  34. Jelani Nelson. Sketching and streaming algorithms for processing massive data. ACM Crossroads, 19(1):14-19, 2012. URL: https://doi.org/10.1145/2331042.2331049.
  35. Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. Journal of the ACM, 54(5):23, 2007. URL: https://doi.org/10.1145/1284320.1284322.
  36. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 315-323. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/FOCS.2009.11.
  37. Ely Porat and Ohad Lipsky. Improved sketching of Hamming distance with error correcting. In Bin Ma and Kaizhong Zhang, editors, 18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007, volume 4580 of LNCS, pages 173-182. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-73437-6_19.
  38. Jakub Radoszewski and Tatiana Starikovskaya. Streaming k-mismatch with error correcting and applications. Information and Computation, 271:104513, 2020. URL: https://doi.org/10.1016/j.ic.2019.104513.
  39. Tatiana Starikovskaya, Michal Svagerka, and Przemysław Uznański. L_p pattern matching in a stream. In Jarosław Byrka and Raghu Meka, editors, 23rd International Workshop on Approximation Algorithms for Combinatorial Optimization, APPROX 2020, volume 176 of LIPIcs, pages 35:1-35:23. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.35.
  40. David P. Woodruff. Optimal space lower bounds for all frequency moments. In J. Ian Munro, editor, 15th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pages 167-175. SIAM, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982817.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail