Stab-Forests: Dynamic Data Structures for Efficient Temporal Query Processing

Authors Jelle Hellings, Yuqing Wu



PDF
Thumbnail PDF

File

LIPIcs.TIME.2020.18.pdf
  • Filesize: 0.72 MB
  • 19 pages

Document Identifiers

Author Details

Jelle Hellings
  • Exploratory Systems Lab, Department of Computer Science, University of California, Davis, CA, USA
Yuqing Wu
  • Computer Science Department, Pomona College, Claremont, CA, USA

Cite AsGet BibTex

Jelle Hellings and Yuqing Wu. Stab-Forests: Dynamic Data Structures for Efficient Temporal Query Processing. In 27th International Symposium on Temporal Representation and Reasoning (TIME 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 178, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.TIME.2020.18

Abstract

Many sources of data have temporal start and end attributes or are created in a time-ordered manner. Hence, it is only natural to consider joining datasets based on these temporal attributes. To do so efficiently, several internal-memory temporal join algorithms have recently been proposed. Unfortunately, these join algorithms are designed to join entire datasets and cannot efficiently join skewed datasets in which only few events participate in the join result. To support high-performance internal-memory temporal joins of skewed datasets, we propose the skip-join algorithm, which operates on stab-forests. The stab-forest is a novel dynamic data structure for indexing temporal data that allows efficient updates when events are appended in a time-based order. Our stab-forests efficiently support not only traditional temporal stab-queries, but also more general multi-stab-queries. We conducted an experimental evaluation to compare the skip-join algorithm with state-of-the-art techniques using real-world datasets. We observed that the skip-join algorithm outperforms other techniques by an order of magnitude when joining skewed datasets and delivers comparable performance to other techniques on non-skewed datasets.

Subject Classification

ACM Subject Classification
  • Information systems → Join algorithms
  • Information systems → Temporal data
Keywords
  • Cache-friendly temporal joins
  • temporal data
  • skewed data
  • stab-queries
  • temporal indices

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K Agarwal and Jeff Erickson. Geometric range searching and its relatives, volume 223 of Contemporary Mathematics, pages 1-56. American Mathematical Society, 1999. Google Scholar
  2. Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 8(12):1792-1803, 2015. URL: https://doi.org/10.14778/2824032.2824076.
  3. Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, and Jeffrey Scott Vitter. Scalable sweeping-based spatial join. In Proceedings of the 24rd International Conference on Very Large Data Bases, pages 570-581. Morgan Kaufmann Publishers Inc., 1998. Google Scholar
  4. Lars Arge and Jeffrey Scott Vitter. Optimal external memory interval management. SIAM Journal on Computing, 32(6):1488-1508, 2003. URL: https://doi.org/10.1137/S009753970240481X.
  5. Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer, 3rd edition, 2008. Google Scholar
  6. Panagiotis Bouros and Nikos Mamoulis. A forward scan based plane sweep algorithm for parallel interval joins. Proceedings of the VLDB Endowment, 10(11):1346-1357, 2017. URL: https://doi.org/10.14778/3137628.3137644.
  7. Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using r-trees. SIGMOD Record, 22(2):237-246, 1993. URL: https://doi.org/10.1145/170036.170075.
  8. Bureau of Transportation Statistics, United States Department of Transportation. Airline on-time performance data, 2017. URL: https://www.transtats.bts.gov/Tables.asp?DB_ID=120.
  9. Yi-Jen Chiang and Roberto Tamassia. Dynamic algorithms in computational geometry. Proceedings of the IEEE, 80(9):1412-1434, 1992. URL: https://doi.org/10.1109/5.163409.
  10. Cline Center for Democracy. Speed project - civil unrest event data, 2012. URL: https://clinecenter.illinois.edu/project/human-loop-event-data-projects/SPEED.
  11. Herbert Edelsbrunner. A new approach to rectangle intersections part I. International Journal of Computer Mathematics, 13(3-4):209-219, 1983. URL: https://doi.org/10.1080/00207168308803364.
  12. Herbert Edelsbrunner. A new approach to rectangle intersections part II. International Journal of Computer Mathematics, 13(3-4):221-229, 1983. URL: https://doi.org/10.1080/00207168308803365.
  13. Dengfeng Gao, Christian S. Jensen, Richard T. Snodgrass, and Michael D. Soo. Join operations in temporal databases. The VLDB Journal, 14(1):2-29, 2005. URL: https://doi.org/10.1007/s00778-003-0111-3.
  14. Martin Kaufmann, Amin Amiri Manjili, Panagiotis Vagenas, Peter Michael Fischer, Donald Kossmann, Franz Färber, and Norman May. Timeline index: A unified data structure for processing queries on temporal data in SAP HANA. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1173-1184. ACM, 2013. URL: https://doi.org/10.1145/2463676.2465293.
  15. Hans-Peter Kriegel, Marco Pötke, and Thomas Seidl. Managing intervals efficiently in object-relational databases. In Proceedings of the 26th International Conference on Very Large Data Bases, pages 407-418. Morgan Kaufmann Publishers Inc., 2000. Google Scholar
  16. Jiří Matoušek. Geometric range searching. ACM Computing Surveys, 26(4):422-461, 1994. URL: https://doi.org/10.1145/197405.197408.
  17. Edward M. McCreight. Priority search trees. SIAM Journal on Computing, 14(2):257-276, 1985. URL: https://doi.org/10.1137/0214021.
  18. Dinesh P. Mehta and Sartaj Sahni. Handbook Of Data Structures And Applications, Second Edition. Chapman & Hall/CRC, 2017. Google Scholar
  19. A. Montplaisir-Gonçalves, N. Ezzati-Jivan, F. Wininger, and M. R. Dagenais. State history tree: An incremental disk-based data structure for very large interval data. In 2013 International Conference on Social Computing, pages 716-724. IEEE, 2013. URL: https://doi.org/10.1109/SocialCom.2013.107.
  20. Mark H. Overmars. The Design of Dynamic Data Structures. Springer, 1983. Google Scholar
  21. Danila Piatov, Sven Helmer, and Anton Dignös. An interval join optimized for modern hardware. In 2016 IEEE 32nd International Conference on Data Engineering, pages 1098-1109. IEEE, 2016. URL: https://doi.org/10.1109/ICDE.2016.7498316.
  22. Betty Salzberg and Vassilis J. Tsotras. Comparison of access methods for time-evolving data. ACM Computing Surveys, 31(2):158-221, 1999. URL: https://doi.org/10.1145/319806.319816.
  23. Hanan Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley Longman Publishing Co., Inc., 1990. Google Scholar
  24. Peter Sanders. Memory Hierarchies - Models and Lower Bounds, pages 1-13. Springer, 2003. URL: https://doi.org/10.1007/3-540-36574-5_1.
  25. Arie Segev and Himawan Gunadhi. Event-join optimization in temporal relational databases. In Proceedings of the 15th international conference on Very large data bases, pages 205-215. Morgan Kaufmann Publishers Inc., 1989. Google Scholar
  26. Donghui Zhang, Vassilis J. Tsotras, and Bernhard Seeger. Efficient temporal join processing using indices. In Proceedings 18th International Conference on Data Engineering, pages 103-113. IEEE, 2002. URL: https://doi.org/10.1109/ICDE.2002.994701.