Exact and Approximate Range Mode Query Data Structures in Practice

Authors Meng He , Zhen Liu



PDF
Thumbnail PDF

File

LIPIcs.SEA.2023.19.pdf
  • Filesize: 0.89 MB
  • 22 pages

Document Identifiers

Author Details

Meng He
  • Dalhousie University, Halifax, Canada
Zhen Liu
  • Dalhousie University, Halifax, Canada

Cite As Get BibTex

Meng He and Zhen Liu. Exact and Approximate Range Mode Query Data Structures in Practice. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.SEA.2023.19

Abstract

We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A, such that given a query range [a, b], the most frequent element in A[a, b] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further.
In the approximate version of this problem, a (1+ε)-approximate range mode query looks for an element whose occurrences in A[a,b] is at least F_{a,b}/(1+ε), where F_{a,b} is the frequency of the mode in A[a,b]. We implement all previous solutions to this problems and find that, even when ε = 1/2, the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35.6% to 93.8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O(n/ε)-word structures while using only slightly more space in practice.

Subject Classification

ACM Subject Classification
  • Information systems → Data structures
Keywords
  • range mode query
  • exact range mode query
  • approximate range mode query

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Project Gutenberg. (n.d.), retrieved in July 2021. Available from URL: https://www.gutenberg.org/.
  2. Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 522-539. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.32.
  3. Diego Arroyuelo, Rodrigo Cánovas, Gonzalo Navarro, and Kunihiko Sadakane. Succinct trees in practice. In Guy E. Blelloch and Dan Halperin, editors, Proceedings of the Twelfth Workshop on Algorithm Engineering and Experiments, ALENEX 2010, Austin, Texas, USA, January 16, 2010, pages 84-97. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611972900.9.
  4. Nikhil Bansal and Ryan Williams. Regularity lemmas and combinatorial algorithms. Theory of Computing, 8:69-94, 2012. Google Scholar
  5. Michael A Bender, Martin Farach-Colton, Giridhar Pemmasani, Steven Skiena, and Pavel Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75-94, 2005. Google Scholar
  6. Prosenjit Bose, Evangelos Kranakis, Pat Morin, and Yihui Tang. Approximate range mode and range median queries. In Annual Symposium on Theoretical Aspects of Computer Science, pages 377-388. Springer, 2005. Google Scholar
  7. Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T. Wilkinson. Linear-space data structures for range mode query in arrays. Theory of Computing Systems, 55(4):719-741, March 2013. Google Scholar
  8. Francisco Claude, J Ian Munro, and Patrick K Nicholson. Range queries over untangled chains. In International Symposium on String Processing and Information Retrieval, pages 82-93. Springer, 2010. Google Scholar
  9. O'Neil Delpratt, Naila Rahman, and Rajeev Raman. Engineering the LOUDS succinct tree representation. In Carme Àlvarez and Maria J. Serna, editors, Experimental Algorithms, 5th International Workshop, WEA 2006, Cala Galdana, Menorca, Spain, May 24-27, 2006, Proceedings, volume 4007 of Lecture Notes in Computer Science, pages 134-145. Springer, 2006. URL: https://doi.org/10.1007/11764298_12.
  10. Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms, pages 348-360. Springer, 2002. Google Scholar
  11. James R Driscoll, Neil Sarnak, Daniel D Sleator, and Robert E Tarjan. Making data structures persistent. Journal of computer and system sciences, 38(1):86-124, 1989. Google Scholar
  12. Hicham El-Zein, Meng He, J Ian Munro, Yakov Nekrich, and Bryce Sandlund. On approximate range mode and range selection. In 30th International Symposium on Algorithms and Computation (ISAAC 2019), volume 149, page 57. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  13. Hicham El-Zein, Meng He, J Ian Munro, and Bryce Sandlund. Improved time and space bounds for dynamic range mode. In 26th Annual European Symposium on Algorithms (ESA 2018), volume 112, page 25. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  14. Derya Erhan. Boğaziçi university DDoS dataset, 2019. Available from URL: https://dx.doi.org/10.21227/45m9-9p82.
  15. Luca Foschini, Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Trans. Algorithms, 2(4):611-639, 2006. URL: https://doi.org/10.1145/1198513.1198521.
  16. Michael L Fredman and Dan E Willard. Blasting through the information theoretic barrier with fusion trees. In Proceedings of the twenty-second annual ACM symposium on Theory of Computing, pages 1-7, 1990. Google Scholar
  17. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014), pages 326-337, 2014. URL: https://doi.org/10.1007/978-3-319-07959-2_28.
  18. Mark Greve, Allan Grønlund Jørgensen, Kasper Dalgaard Larsen, and Jakob Truelsen. Cell probe lower bounds and approximations for range mode. In International Colloquium on Automata, Languages, and Programming, pages 605-616. Springer, 2010. Google Scholar
  19. Yuzhou Gu, Adam Polak, Virginia Vassilevska Williams, and Yinzhan Xu. Faster monotone min-plus product, range mode, and single source replacement paths. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, July 12-16, 2021, Glasgow, Scotland (Virtual Conference), volume 198 of LIPIcs, pages 75:1-75:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.75.
  20. Meng He and Serikzhan Kazi. Path query data structures in practice. In 18th International Symposium on Experimental Algorithms, volume 160, pages 27:1-27:16, 2020. Google Scholar
  21. D. Jansens. Persistent Binary Search Trees. URL: https://cglab.ca/~dana/pbst/.
  22. Danny Krizanc, Pat Morin, and Michiel H. M. Smid. Range mode and range median queries on lists and trees. Nordic Journal of Computing, 12(1):1-17, 2005. Google Scholar
  23. Seattle Public Library. Seattle library checkout records, 2017. Available from URL: https://www.kaggle.com/seattle-public-library/seattle-library-checkout-records.
  24. Zhen Liu. Exact and approximate range mode query data structures in practice. Master’s thesis, Dalhousie University, 2023. URL: http://hdl.handle.net/10222/81772.
  25. Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 188-197, 2019. Google Scholar
  26. City of New York. NYC parking tickets, 2017. Available from URL: https://www.kaggle.com/datasets/new-york-city/nyc-parking-tickets.
  27. Daisuke Okanohara and Kunihiko Sadakane. Practical entropy-compressed rank/select dictionary. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX 2007, New Orleans, Louisiana, USA, January 6, 2007. SIAM, 2007. URL: https://doi.org/10.1137/1.9781611972870.6.
  28. Mihai Patrascu. Succincter. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 305-313, 2008. Google Scholar
  29. Holger Petersen and Szymon Grabowski. Range mode and range median queries in constant time and sub-quadratic space. Information Processing Letters, 109(4):225-228, 2009. Google Scholar
  30. Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms (TALG), 3(4):43-es, 2007. Google Scholar
  31. Bryce Sandlund and Yinzhan Xu. Faster dynamic range mode. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Google Scholar
  32. Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 12-29. SIAM, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail