Maximizing the Optimality Streak of Deferred Data Structuring (a.k.a. Database Cracking)

Author Yufei Tao



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.10.pdf
  • Filesize: 0.88 MB
  • 18 pages

Document Identifiers

Author Details

Yufei Tao
  • The Chinese University of Hong Kong, China

Cite As Get BibTex

Yufei Tao. Maximizing the Optimality Streak of Deferred Data Structuring (a.k.a. Database Cracking). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.10

Abstract

This paper studies how to minimize the total cost of answering r queries over n elements in an online manner (i.e., the next query is given only after the previous query’s result is ready) when the value r ≤ n is unknown in advance. Traditional indexing, which first builds a complete index on the n elements before answering queries, may be unsuitable because the index’s construction time - usually Ω(n log n) - can become the performance bottleneck. In contrast, for many problems, a lower bound of Ω(n log (1+r)) holds on the total cost of r queries for every r ∈ [1, n]. Matching this lower bound is a primary objective of deferred data structuring (DDS), also known as database cracking in the system community. For a wide class of problems, we present generic reductions to convert traditional indexes into DDS algorithms that match the lower bound for a long range of r. For a decomposable problem, if a data structure can be built in O(n log n) time and has Q(n) query search time, our reduction yields an algorithm that runs in O(n log (1+r)) time for all r ≤ (n log n)/(Q(n)), where the upper bound (n log n)/(Q(n)) is asymptotically the best possible under mild constraints. In particular, if Q(n) = O(log n), then the O(n log (1+r))-time guarantee extends to all r ≤ n, with which we optimally settle a large variety of DDS problems. Our results can be generalized to a class of "spectrum indexable problems", which subsumes the class of decomposable problems.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures and algorithms for data management
Keywords
  • Deferred Data Structuring
  • Database Cracking
  • Data Structures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Prabhakar Raghavan. Deferred data structure for the nearest neighbor problem. Information Processing Letters (IPL), 40(3):119-122, 1991. URL: https://doi.org/10.1016/0020-0190(91)90164-D.
  2. Jérémy Barbay, Ankur Gupta, Srinivasa Rao Satti, and Jonathan Sorenson. Near-optimal online multiselection in internal and external memory. J. Discrete Algorithms, 36:3-17, 2016. URL: https://doi.org/10.1016/J.JDA.2015.11.001.
  3. Jon Louis Bentley and Hermann A. Maurer. Efficient worst-case data structures for range searching. Acta Inf., 13:155-168, 1980. URL: https://doi.org/10.1007/BF00263991.
  4. Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. Time bounds for selection. Journal of Computer and System Sciences (JCSS), 7(4):448-461, 1973. URL: https://doi.org/10.1016/S0022-0000(73)80033-9.
  5. Gerth Stolting Brodal, Beat Gfeller, Allan Gronlund Jorgensen, and Peter Sanders. Towards optimal range medians. Theoretical Computer Science, 412(24):2588-2601, 2011. URL: https://doi.org/10.1016/J.TCS.2010.05.003.
  6. Timothy M. Chan and Konstantinos Tsakalidis. Optimal deterministic algorithms for 2-d and 3-d shallow cuttings. Discrete & Computational Geometry, 56(4):866-881, 2016. URL: https://doi.org/10.1007/S00454-016-9784-4.
  7. Bernard Chazelle, Leonidas J. Guibas, and D. T. Lee. The power of geometric duality. BIT Numerical Mathematics, 25(1):76-90, 1985. URL: https://doi.org/10.1007/BF01934990.
  8. Yu-Tai Ching, Kurt Mehlhorn, and Michiel H. M. Smid. Dynamic deferred data structuring. Information Processing Letters (IPL), 35(1):37-40, 1990. URL: https://doi.org/10.1016/0020-0190(90)90171-S.
  9. Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 3rd edition, 2008. Google Scholar
  10. Greg N. Frederickson and Donald B. Johnson. The complexity of selection and ranking in x+y and matrices with sorted columns. Journal of Computer and System Sciences (JCSS), 24(2):197-208, 1982. URL: https://doi.org/10.1016/0022-0000(82)90048-4.
  11. Beat Gfeller and Peter Sanders. Towards optimal range medians. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 475-486, 2009. URL: https://doi.org/10.1007/978-3-642-02927-1_40.
  12. Goetz Graefe, Felix Halim, Stratos Idreos, Harumi A. Kuno, and Stefan Manegold. Concurrency control for adaptive indexing. Proceedings of the VLDB Endowment (PVLDB), 5(7):656-667, 2012. URL: https://doi.org/10.14778/2180912.2180918.
  13. Felix Halim, Stratos Idreos, Panagiotis Karras, and Roland H. C. Yap. Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. Proceedings of the VLDB Endowment (PVLDB), 5(6):502-513, 2012. URL: https://doi.org/10.14778/2168651.2168652.
  14. Sariel Har-Peled and Nirman Kumar. Approximate nearest neighbor search for low-dimensional queries. SIAM Journal on Computing, 42(1):138-159, 2013. URL: https://doi.org/10.1137/110852711.
  15. Sariel Har-Peled and S. Muthukrishnan. Range medians. In Proceedings of European Symposium on Algorithms (ESA), pages 503-514, 2008. URL: https://doi.org/10.1007/978-3-540-87744-8_42.
  16. Stratos Idreos, Martin L. Kersten, and Stefan Manegold. Database cracking. In Proceedings of Biennial Conference on Innovative Data Systems Research (CIDR), pages 68-78, 2007. URL: http://cidrdb.org/cidr2007/papers/cidr07p07.pdf.
  17. Stratos Idreos, Martin L. Kersten, and Stefan Manegold. Self-organizing tuple reconstruction in column-stores. In Proceedings of ACM Management of Data (SIGMOD), pages 297-308, 2009. URL: https://doi.org/10.1145/1559845.1559878.
  18. Stratos Idreos, Stefan Manegold, Harumi A. Kuno, and Goetz Graefe. Merging what’s cracked, cracking what’s merged: Adaptive indexing in main-memory column-stores. Proceedings of the VLDB Endowment (PVLDB), 4(9):585-597, 2011. URL: https://doi.org/10.14778/2002938.2002944.
  19. Richard M. Karp, Rajeev Motwani, and Prabhakar Raghavan. Deferred data structuring. SIAM Journal on Computing, 17(5):883-902, 1988. URL: https://doi.org/10.1137/0217055.
  20. David G. Kirkpatrick. Optimal search in planar subdivisions. SIAM Journal of Computing, 12(1):28-35, 1983. URL: https://doi.org/10.1137/0212002.
  21. David G. Kirkpatrick and Raimund Seidel. The ultimate planar convex hull algorithm? SIAM Journal on Computing, 15(1):287-299, 1986. URL: https://doi.org/10.1137/0215021.
  22. Robert Krauthgamer and James R. Lee. Navigating nets: simple algorithms for proximity search. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 798-807, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982913.
  23. Konstantinos Lampropoulos, Fatemeh Zardbani, Nikos Mamoulis, and Panagiotis Karras. Adaptive indexing in high-dimensional metric spaces. Proceedings of the VLDB Endowment (PVLDB), 16(10):2525-2537, 2023. URL: https://doi.org/10.14778/3603581.3603592.
  24. Rajeev Motwani and Prabhakar Raghavan. Deferred data structuring: Query-driven preprocessing for geometric search problems. In Proceedings of Symposium on Computational Geometry (SoCG), pages 303-312, 1986. URL: https://doi.org/10.1145/10515.10548.
  25. Mark H. Overmars and Jan van Leeuwen. Maintenance of configurations in the plane. Journal of Computer and System Sciences (JCSS), 23(2):166-204, 1981. URL: https://doi.org/10.1016/0022-0000(81)90012-X.
  26. Bryce Sandlund and Sebastian Wild. Lazy search trees. In Proceedings of Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 704-715, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00071.
  27. Bryce Sandlund and Lingyi Zhang. Selectable heaps and optimal lazy search trees. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1962-1975, 2022. URL: https://doi.org/10.1137/1.9781611977073.78.
  28. Neil Sarnak and Robert Endre Tarjan. Planar point location using persistent search trees. Communications of the ACM (CACM), 29(7):669-679, 1986. URL: https://doi.org/10.1145/6138.6151.
  29. Felix Martin Schuhknecht, Alekh Jindal, and Jens Dittrich. An experimental evaluation and analysis of database cracking. The VLDB Journal, 25(1):27-52, 2016. URL: https://doi.org/10.1007/S00778-015-0397-Y.
  30. Yufei Tao and Dimitris Papadias. Range aggregate processing in spatial databases. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(12):1555-1570, 2004. URL: https://doi.org/10.1109/TKDE.2004.93.
  31. Fatemeh Zardbani, Peyman Afshani, and Panagiotis Karras. Revisiting the theory and practice of database cracking. In Proceedings of Extending Database Technology (EDBT), pages 415-418, 2020. URL: https://doi.org/10.5441/002/EDBT.2020.46.
  32. Fatemeh Zardbani, Nikos Mamoulis, Stratos Idreos, and Panagiotis Karras. Adaptive indexing of objects with spatial extent. Proceedings of the VLDB Endowment (PVLDB), 16(9):2248-2260, 2023. URL: https://doi.org/10.14778/3598581.3598596.
  33. Donghui Zhang, Vassilis J. Tsotras, and Dimitrios Gunopulos. Efficient aggregation over objects with extent. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 121-132, 2002. URL: https://doi.org/10.1145/543613.543629.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail