Engineering Predecessor Data Structures for Dynamic Integer Sets

Authors Patrick Dinklage , Johannes Fischer, Alexander Herlez



PDF
Thumbnail PDF

File

LIPIcs.SEA.2021.7.pdf
  • Filesize: 0.92 MB
  • 19 pages

Document Identifiers

Author Details

Patrick Dinklage
  • TU Dortmund University, Germany
Johannes Fischer
  • TU Dortmund University, Germany
Alexander Herlez
  • TU Dortmund University, Germany

Cite AsGet BibTex

Patrick Dinklage, Johannes Fischer, and Alexander Herlez. Engineering Predecessor Data Structures for Dynamic Integer Sets. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SEA.2021.7

Abstract

We present highly optimized data structures for the dynamic predecessor problem, where the task is to maintain a set S of w-bit numbers under insertions, deletions, and predecessor queries (return the largest element in S no larger than a given key). The problem of finding predecessors can be viewed as a generalized form of the membership problem, or as a simple version of the nearest neighbour problem. It lies at the core of various real-world problems such as internet routing. In this work, we engineer (1) a simple implementation of the idea of universe reduction, similar to van-Emde-Boas trees (2) variants of y-fast tries [Willard, IPL'83], and (3) B-trees with different strategies for organizing the keys contained in the nodes, including an implementation of dynamic fusion nodes [Pǎtraşcu and Thorup, FOCS'14]. We implement our data structures for w = 32,40,64, which covers most typical scenarios. Our data structures finish workloads faster than previous approaches while being significantly more space-efficient, e.g., they clearly outperform standard implementations of the STL by finishing up to four times as fast using less than a third of the memory. Our tests also provide more general insights on data structure design, such as how small sets should be stored and handled and if and when new CPU instructions such as advanced vector extensions pay off.

Subject Classification

ACM Subject Classification
  • Theory of computation → Predecessor queries
Keywords
  • integer data structures
  • dynamic data structures
  • predecessor
  • universe reduction
  • y-fast trie
  • fusion tree
  • B-tree

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Advanced Micro Devices Inc. AMD64 Architecture - Programmer’s Manual Volume 3: General-Purpose and System Instructions, September 2020. URL: https://www.amd.com/system/files/TechDocs/24594.pdf.
  2. Miklós Ajtai, Michael L. Fredman, and János Komlós. Hash functions for priority queues. Inf. Control., 63(3):217-225, 1984. URL: https://doi.org/10.1016/S0019-9958(84)80015-7.
  3. Arm Limited. A64 Instruction Set Reference, 2018. URL: https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference.
  4. Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit. Practical range minimum queries revisited. In 16th International Symposium on Experimental Algorithms (SEA), pages 12:1-12:16. Dagstuhl, 2017. URL: https://doi.org/10.4230/LIPIcs.SEA.2017.12.
  5. Djamal Belazzougui. Predecessor search, string algorithms and data structures. In Encyclopedia of Algorithms, pages 1605-1611. Springer, 2016. URL: https://doi.org/10.1007/978-1-4939-2864-4_632.
  6. Pedro Celis, Per-Åke Larson, and J. Ian Munro. Robin hood hashing (preliminary report). In 26th Symposium on Foundations of Computer Science (FOCS), pages 281-288. IEEE, 1985. URL: https://doi.org/10.1109/SFCS.1985.48.
  7. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. Google Scholar
  8. Mikael Degermark, Andrej Brodnik, Svante Carlsson, and Stephen Pink. Small forwarding tables for fast routing lookups. In ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pages 3-14. ACM, 1997. URL: https://doi.org/10.1145/263105.263133.
  9. Roman Dementiev, Lutz Kettner, Jens Mehnert, and Peter Sanders. Engineering a sorted list data structure for 32 bit key. In 6th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 142-151. SIAM, 2004. URL: https://web.archive.org/web/20201111145353/http://algo2.iti.kit.edu/dementiev/files/veb.pdf.
  10. Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. Practical performance of space efficient data structures for longest common extensions. In 28th European Symposium on Algorithms (ESA), pages 39:1-39:20. Dagstuhl, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.39.
  11. Héctor Ferrada and Gonzalo Navarro. Improved range minimum queries. J. Discrete Algorithms, 43:72-80, 2017. URL: https://doi.org/10.1016/j.jda.2016.09.002.
  12. E. Fredkin. Trie memory. Commun. ACM, 3:490-499, 1960. URL: https://doi.org/10.1145/367390.367400.
  13. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. URL: https://doi.org/10.1016/0022-0000(93)90040-4.
  14. Steffen Heinz, Justin Zobel, and Hugh E. Williams. Burst tries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst., 20(2):192-223, 2002. URL: https://doi.org/10.1145/506309.506312.
  15. Lucian Ilie and Liviu Tinta. Practical algorithms for the longest common extension problem. In 16th International Symposium on String Processing and Information Retrieval (SPIRE), pages 302-309. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-03784-9_30.
  16. Intel Corporation. Intel (R) 64 and IA-32 Architectures - Software Developer’s Manual - Volume 2: Instruction Set Reference, A-Z, September 2016. URL: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf.
  17. Maureen Korda and Rajeev Raman. An experimental evaluation of hybrid data structures for searching. In 3rd International Workshop on Algorithm Engineering (WAE), pages 214-228. Springer, 1999. URL: https://doi.org/10.1007/3-540-48318-7_18.
  18. Kurt Maly. Compressed tries. Commun. ACM, 19(7):409-415, 1976. URL: https://doi.org/10.1145/360248.360258.
  19. Nicholas Nash and David Gregg. Comparing integer data structures for 32- and 64-bit keys. ACM J. Exp. Algorithmics, 15, 2010. URL: https://doi.org/10.1145/1671970.1671977.
  20. Gonzalo Navarro and Javiel Rojas-Ledesma. Predecessor search. ACM Comput. Surv., 53(5), 2020. URL: https://doi.org/10.1145/3409371.
  21. Mihai Pǎtraşcu and Mikkel Thorup. Time-space trade-offs for predecessor search. In 31st Annual ACM Symposium on Theory of Computing (STOC), pages 232-240. ACM, 2006. URL: https://doi.org/10.1145/1132516.1132551.
  22. Mihai Pǎtraşcu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In 55th Symposium on Foundations of Computer Science (FOCS), pages 166-175. IEEE, 2014. URL: https://doi.org/10.1109/FOCS.2014.26.
  23. Robert Endre Tarjan and Andrew Chi-Chih Yao. Storing a sparse table. Commun. ACM, 22(11):606-611, 1979. URL: https://doi.org/10.1145/359168.359175.
  24. Peter van Emde Boas. Preserving order in a forest in less than logarithmic time. In 16th Symposium on Foundations of Computer Science (FOCS), pages 75-84. IEEE, 1975. URL: https://doi.org/10.1109/SFCS.1975.26.
  25. M. Wenzel. Wörterbücher für ein beschränktes Universum (dictionaries for a bounded universe). Master’s thesis, Saarland University, Germany, 1992. Google Scholar
  26. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space theta(n). Inform. Process. Lett., 17(2):81-84, 1983. URL: https://doi.org/10.1016/0020-0190(83)90075-3.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail