Exact and Approximate Range Mode Query Data Structures in Practice

He, Meng; Liu, Zhen

doi:10.4230/LIPIcs.SEA.2023.19

File

Subject Classification

ACM Subject Classification

Information systems → Data structures

Keywords

range mode query
exact range mode query
approximate range mode query

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A, such that given a query range [a, b], the most frequent element in A[a, b] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1+ε)-approximate range mode query looks for an element whose occurrences in A[a,b] is at least F_{a,b}/(1+ε), where F_{a,b} is the frequency of the mode in A[a,b]. We implement all previous solutions to this problems and find that, even when ε = 1/2, the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35.6% to 93.8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O(n/ε)-word structures while using only slightly more space in practice.

Cite As Get BibTex

Meng He and Zhen Liu. Exact and Approximate Range Mode Query Data Structures in Practice. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.SEA.2023.19

Author Details

Meng He

Dalhousie University, Halifax, Canada

Zhen Liu

Dalhousie University, Halifax, Canada

References

Project Gutenberg. (n.d.), retrieved in July 2021. Available from URL: https://www.gutenberg.org/.
Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 522-539. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.32.
Diego Arroyuelo, Rodrigo Cánovas, Gonzalo Navarro, and Kunihiko Sadakane. Succinct trees in practice. In Guy E. Blelloch and Dan Halperin, editors, Proceedings of the Twelfth Workshop on Algorithm Engineering and Experiments, ALENEX 2010, Austin, Texas, USA, January 16, 2010, pages 84-97. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611972900.9.
Nikhil Bansal and Ryan Williams. Regularity lemmas and combinatorial algorithms. Theory of Computing, 8:69-94, 2012.
Michael A Bender, Martin Farach-Colton, Giridhar Pemmasani, Steven Skiena, and Pavel Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75-94, 2005.
Prosenjit Bose, Evangelos Kranakis, Pat Morin, and Yihui Tang. Approximate range mode and range median queries. In Annual Symposium on Theoretical Aspects of Computer Science, pages 377-388. Springer, 2005.
Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T. Wilkinson. Linear-space data structures for range mode query in arrays. Theory of Computing Systems, 55(4):719-741, March 2013.
Francisco Claude, J Ian Munro, and Patrick K Nicholson. Range queries over untangled chains. In International Symposium on String Processing and Information Retrieval, pages 82-93. Springer, 2010.
O'Neil Delpratt, Naila Rahman, and Rajeev Raman. Engineering the LOUDS succinct tree representation. In Carme Àlvarez and Maria J. Serna, editors, Experimental Algorithms, 5th International Workshop, WEA 2006, Cala Galdana, Menorca, Spain, May 24-27, 2006, Proceedings, volume 4007 of Lecture Notes in Computer Science, pages 134-145. Springer, 2006. URL: https://doi.org/10.1007/11764298_12.
Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms, pages 348-360. Springer, 2002.
James R Driscoll, Neil Sarnak, Daniel D Sleator, and Robert E Tarjan. Making data structures persistent. Journal of computer and system sciences, 38(1):86-124, 1989.
Hicham El-Zein, Meng He, J Ian Munro, Yakov Nekrich, and Bryce Sandlund. On approximate range mode and range selection. In 30th International Symposium on Algorithms and Computation (ISAAC 2019), volume 149, page 57. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
Hicham El-Zein, Meng He, J Ian Munro, and Bryce Sandlund. Improved time and space bounds for dynamic range mode. In 26th Annual European Symposium on Algorithms (ESA 2018), volume 112, page 25. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
Derya Erhan. Boğaziçi university DDoS dataset, 2019. Available from URL: https://dx.doi.org/10.21227/45m9-9p82.
Luca Foschini, Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Trans. Algorithms, 2(4):611-639, 2006. URL: https://doi.org/10.1145/1198513.1198521.
Michael L Fredman and Dan E Willard. Blasting through the information theoretic barrier with fusion trees. In Proceedings of the twenty-second annual ACM symposium on Theory of Computing, pages 1-7, 1990.
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014), pages 326-337, 2014. URL: https://doi.org/10.1007/978-3-319-07959-2_28.
Mark Greve, Allan Grønlund Jørgensen, Kasper Dalgaard Larsen, and Jakob Truelsen. Cell probe lower bounds and approximations for range mode. In International Colloquium on Automata, Languages, and Programming, pages 605-616. Springer, 2010.
Yuzhou Gu, Adam Polak, Virginia Vassilevska Williams, and Yinzhan Xu. Faster monotone min-plus product, range mode, and single source replacement paths. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, July 12-16, 2021, Glasgow, Scotland (Virtual Conference), volume 198 of LIPIcs, pages 75:1-75:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.75.
Meng He and Serikzhan Kazi. Path query data structures in practice. In 18th International Symposium on Experimental Algorithms, volume 160, pages 27:1-27:16, 2020.
D. Jansens. Persistent Binary Search Trees. URL: https://cglab.ca/~dana/pbst/.
Danny Krizanc, Pat Morin, and Michiel H. M. Smid. Range mode and range median queries on lists and trees. Nordic Journal of Computing, 12(1):1-17, 2005.
Seattle Public Library. Seattle library checkout records, 2017. Available from URL: https://www.kaggle.com/seattle-public-library/seattle-library-checkout-records.
Zhen Liu. Exact and approximate range mode query data structures in practice. Master’s thesis, Dalhousie University, 2023. URL: http://hdl.handle.net/10222/81772.
Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 188-197, 2019.
City of New York. NYC parking tickets, 2017. Available from URL: https://www.kaggle.com/datasets/new-york-city/nyc-parking-tickets.
Daisuke Okanohara and Kunihiko Sadakane. Practical entropy-compressed rank/select dictionary. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX 2007, New Orleans, Louisiana, USA, January 6, 2007. SIAM, 2007. URL: https://doi.org/10.1137/1.9781611972870.6.
Mihai Patrascu. Succincter. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 305-313, 2008.
Holger Petersen and Szymon Grabowski. Range mode and range median queries in constant time and sub-quadratic space. Information Processing Letters, 109(4):225-228, 2009.
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms (TALG), 3(4):43-es, 2007.
Bryce Sandlund and Yinzhan Xu. Faster dynamic range mode. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 12-29. SIAM, 2020.

Exact and Approximate Range Mode Query Data Structures in Practice

Authors Meng He , Zhen Liu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Exact and Approximate Range Mode Query Data Structures in Practice

Authors Meng He , Zhen Liu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message