A Faster Algorithm for Finding Closest Pairs in Hamming Metric

Authors Andre Esser, Robert Kübler, Floyd Zweydinger

Thumbnail PDF


  • Filesize: 0.87 MB
  • 21 pages

Document Identifiers

Author Details

Andre Esser
  • Cryptography Research Center, Technology Innovation Institute, Abu Dhabi, UAE
Robert Kübler
  • Metro AG, Düsseldorf, Germany
Floyd Zweydinger
  • Ruhr Universität Bochum, Germany

Cite AsGet BibTex

Andre Esser, Robert Kübler, and Floyd Zweydinger. A Faster Algorithm for Finding Closest Pairs in Hamming Metric. In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 213, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


We study the Closest Pair Problem in Hamming metric, which asks to find the pair with the smallest Hamming distance in a collection of binary vectors. We give a new randomized algorithm for the problem on uniformly random input outperforming previous approaches whenever the dimension of input points is small compared to the dataset size. For moderate to large dimensions, our algorithm matches the time complexity of the previously best-known locality sensitive hashing based algorithms. Technically our algorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010) and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the aforementioned areas, we significantly simplify the analysis of these previous works. We give a modular analysis, which allows us to investigate the performance of the algorithm also on non-uniform input distributions. Furthermore, we give a proof of concept implementation of our algorithm which performs well in comparison to a quadratic search baseline. This is the first step towards answering an open question raised by May and Ozerov regarding the practicability of algorithms following these design principles.

Subject Classification

ACM Subject Classification
  • Theory of computation → Nearest neighbor algorithms
  • closest pair problem
  • LSH
  • nearest neighbor


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Josh Alman. An illuminating algorithm for the light bulb problem. In Jeremy T. Fineman and Michael Mitzenmacher, editors, 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January 8-9, 2019 - San Diego, CA, USA, volume 69 of OASICS, pages 2:1-2:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/OASIcs.SOSA.2019.2.
  2. Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 793-801, 2015. Google Scholar
  3. Yoshinori Aono, Phong Q Nguyen, Takenobu Seito, and Junji Shikata. Lower bounds on lattice enumeration with extreme pruning. In Annual International Cryptology Conference, pages 608-637. Springer, 2018. Google Scholar
  4. Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions in nearest neighbor searching with applications to lattice sieving. In Robert Krauthgamer, editor, 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 10-24, Arlington, VA, USA, January 10-12 2016. ACM-SIAM. URL: https://doi.org/10.1137/1.9781611974331.ch2.
  5. Jon Louis Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214-229, 1980. Google Scholar
  6. Leif Both and Alexander May. Decoding linear codes with high error rate and its impact for LPN security. In Tanja Lange and Rainer Steinwandt, editors, Post-Quantum Cryptography - 9th International Conference, PQCrypto 2018, pages 25-46, Fort Lauderdale, Florida, United States, April 9-11 2018. Springer, Heidelberg, Germany. URL: https://doi.org/10.1007/978-3-319-79063-3_2.
  7. Leif Both and Alexander May. Decoding linear codes with high error rate and its impact for lpn security. In International Conference on Post-Quantum Cryptography, pages 25-46. Springer, 2018. Google Scholar
  8. Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. Brief: Binary robust independent elementary features. In European conference on computer vision, pages 778-792. Springer, 2010. Google Scholar
  9. Moshe Dubiner. Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Transactions on Information Theory, 56(8):4166-4179, 2010. Google Scholar
  10. Cheikh Thiécoumba Gueye, Jean Belo Klamti, and Shoichi Hirose. Generalization of bjmm-isd using may-ozerov nearest neighbor algorithm over an arbitrary finite field 𝔽_q. In International Conference on Codes, Cryptology, and Information Security, pages 96-109. Springer, 2017. Google Scholar
  11. Shoichi Hirose. May-ozerov algorithm for nearest-neighbor problem over 𝔽_q and its application to information set decoding. In International Conference for Information Technology and Communications, pages 115-126. Springer, 2016. Google Scholar
  12. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In 30th Annual ACM Symposium on Theory of Computing, pages 604-613, Dallas, TX, USA, May 23-26, 1998. ACM Press. URL: https://doi.org/10.1145/276698.276876.
  13. Matti Karppa, Petteri Kaski, and Jukka Kohonen. A faster subquadratic algorithm for finding outlier correlations. In Robert Krauthgamer, editor, 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1288-1305, Arlington, VA, USA, January 10-12, 2016. ACM-SIAM. URL: https://doi.org/10.1137/1.9781611974331.ch90.
  14. Samir Khuller and Yossi Matias. A simple randomized sieve algorithm for the closest-pair problem. Information and Computation, 118(1):34-37, 1995. Google Scholar
  15. Jiwen Lu, Venice Erin Liong, Xiuzhuang Zhou, and Jie Zhou. Learning compact binary face descriptor for face recognition. IEEE transactions on pattern analysis and machine intelligence, 37(10):2041-2056, 2015. Google Scholar
  16. Jonathan Marchini, Peter Donnelly, and Lon R Cardon. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature genetics, 37(4):413-417, 2005. Google Scholar
  17. Alexander May and Ilya Ozerov. On computing nearest neighbors with applications to decoding of binary linear codes. In Elisabeth Oswald and Marc Fischlin, editors, Advances in Cryptology - EUROCRYPT 2015, Part I, volume 9056 of Lecture Notes in Computer Science, pages 203-228, Sofia, Bulgaria, April 26-30, 2015. Springer, Heidelberg, Germany. URL: https://doi.org/10.1007/978-3-662-46800-5_9.
  18. Rajeev Motwani, Assaf Naor, and Rina Panigrahi. Lower bounds on locality sensitive hashing. In Proceedings of the twenty-second annual symposium on Computational geometry, pages 154-157, 2006. Google Scholar
  19. Solomon K Musani, Daniel Shriner, Nianjun Liu, Rui Feng, Christopher S Coffey, Nengjun Yi, Hemant K Tiwari, and David B Allison. Detection of gene× gene interactions in genome-wide association studies of human population data. Human heredity, 63(2):67-84, 2007. Google Scholar
  20. Christoph Strecha, Alex Bronstein, Michael Bronstein, and Pascal Fua. Ldahash: Improved matching with smaller descriptors. IEEE transactions on pattern analysis and machine intelligence, 34(1):66-78, 2011. Google Scholar
  21. Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and juntas. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 11-20. IEEE, 2012. Google Scholar
  22. Leslie G Valiant. Functionality in neural nets. In COLT, volume 88, pages 28-39, 1988. Google Scholar
  23. Ning Xie, Shuai Xu, and Yekun Xu. A new coding-based algorithm for finding closest pair of vectors. Theoretical Computer Science, 782:129-144, 2019. Google Scholar