Aggregating over Dominated Points by Sorting, Scanning, Zip and Flat Maps

Authors Jacek Sroka , Jerzy Tyszkiewicz



PDF
Thumbnail PDF

File

LIPIcs.ESA.2023.96.pdf
  • Filesize: 0.74 MB
  • 13 pages

Document Identifiers

Author Details

Jacek Sroka
  • University of Warsaw, Poland
Jerzy Tyszkiewicz
  • University of Warsaw, Poland

Acknowledgements

We would like to thank Filip Murlak for encouragement and discussions on the topic. We also would like to thank anonymous referees for comments, which helped us improve the presentation of the paper.

Cite As Get BibTex

Jacek Sroka and Jerzy Tyszkiewicz. Aggregating over Dominated Points by Sorting, Scanning, Zip and Flat Maps. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 96:1-96:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.ESA.2023.96

Abstract

Prefix aggregation operation (also called scan), and its particular case, prefix summation, is an important parallel primitive and enjoys a lot of attention in the research literature. It is also used in many algorithms as one of the steps. 
Aggregation over dominated points in ℝ^m is a multidimensional generalisation of prefix aggregation. It is also intensively researched, both as a parallel primitive and as a practical problem, encountered in computational geometry, spatial databases and data warehouses.
In this paper we show that, for a constant dimension m, aggregation over dominated points in ℝ^m can be computed by O(1) basic operations that include sorting the whole dataset, zipping sorted lists of elements, computing prefix aggregations of lists of elements and flat maps, which expand the data size from initial n to n log^{m-1}n.
Thereby we establish that prefix aggregation suffices to express aggregation over dominated points in more dimensions, even though the latter is a far-reaching generalisation of the former. Many problems known to be expressible by aggregation over dominated points become expressible by prefix aggregation, too. 
We rely on a small set of primitive operations which guarantee an easy transfer to various distributed architectures and some desired properties of the implementation.

Subject Classification

ACM Subject Classification
  • Theory of computation → Massively parallel algorithms
  • Theory of computation → Parallel computing models
  • Theory of computation → Database query processing and optimization (theory)
Keywords
  • Aggregation over dominated points prefix sums sorting flat map range tree parallel algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Lars Arge, Sathish Govindarajan, Jun Yang, and Ke Yi. Efficient external memory structures for range-aggregate queries. Computational Geometry, 46(3):358-370, 2013. URL: https://doi.org/10.1016/j.comgeo.2012.10.003.
  2. Selim G. Akl and Ivan Stojmenović. Multiple criteria BSR: An implementation and applications to computational geometry problems. In 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, 1994. Google Scholar
  3. Selim G. Akl and Ivan Stojmenović. Broadcasting with selective reduction: A powerful model of parallel computation. Parallel and distributed computing handbook, pages 192-222, 1996. Google Scholar
  4. Guy E. Blelloch. Scans as primitive parallel operations. IEEE Trans. Computers, 38(11):1526-1538, 1989. URL: https://doi.org/10.1109/12.42122.
  5. Guy E. Blelloch. Prefix sums and their applications. In John H Reif, editor, Synthesis of parallel algorithms. Morgan Kaufmann Publishers Inc., 1993. Google Scholar
  6. Bernard Chazelle. Lower bounds for orthogonal range searching II. the arithmetic model. J. ACM, 37(3):439-463, 1990. URL: https://doi.org/10.1145/79147.79149.
  7. Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the mapreduce framework. In Takao Asano, Shin-Ichi Nakano, Yoshio Okamoto, and Osamu Watanabe, editors, Algorithms and Computation - 22nd International Symposium, ISAAC 2011, Yokohama, Japan, December 5-8, 2011. Proceedings, volume 7074 of Lecture Notes in Computer Science, pages 374-383. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-25591-5_39.
  8. Mark Harris and Michael Garland. Optimizing parallel prefix operations for the Fermi architecture. In Wen mei W. Hwu, editor, GPU Computing Gems Jade Edition, Applications of GPU Computing Series, pages 29-38. Morgan Kaufmann, Boston, 2012. URL: https://doi.org/10.1016/B978-0-12-385963-1.00003-4.
  9. Mark Harris, Shubhabrata Sengupta, and John D Owens. Parallel prefix sum (scan) with CUDA. GPU gems, 3(39):851-876, 2007. Google Scholar
  10. Ralf Hinze. An algebra of scans. In Dexter Kozen and Carron Shankland, editors, Mathematics of Program Construction, 7th International Conference, MPC 2004, Stirling, Scotland, UK, July 12-14, 2004, Proceedings, volume 3125 of Lecture Notes in Computer Science, pages 186-210. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27764-4_11.
  11. Seokjin Hong, Byoungho Song, and Sukho Lee. Efficient execution of range-aggregate queries in data warehouse environments. In Hideko S. Kunii, Sushil Jajodia, and Arne Sølvberg, editors, Conceptual Modeling - ER 2001, 20th International Conference on Conceptual Modeling, Yokohama, Japan, November 27-30, 2001, Proceedings, volume 2224 of Lecture Notes in Computer Science, pages 299-310. Springer, 2001. URL: https://doi.org/10.1007/3-540-45581-7_23.
  12. Xiao Hu, Ke Yi, and Yufei Tao. Output-optimal massively parallel algorithms for similarity joins. ACM Trans. Database Syst., 44(2):6:1-6:36, 2019. URL: https://doi.org/10.1145/3311967.
  13. Y. Charlie Hu, Shang-Hua Teng, and S. Lennart Johnsson. A data-parallel implementation of the geometric partitioning algorithm. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, PPSC 1997. SIAM, 1997. Google Scholar
  14. Hung-Ming Lai and Jenq-Kuen Lee. Efficient support of the scan vector model for RISC-V vector extension. In Workshop Proceedings of the 51st International Conference on Parallel Processing, ICPP Workshops 2022, Bordeaux, France, 29 August 2022 - 1 September 2022, pages 15:1-15:8. ACM, 2022. URL: https://doi.org/10.1145/3547276.3548518.
  15. Nicolas Langrené and Xavier Warin. Fast multivariate empirical cumulative distribution function with connection to kernel density estimation. Computational Statistics & Data Analysis, 162:107267, 2021. URL: https://doi.org/10.1016/j.csda.2021.107267.
  16. Rong Lin, Koji Nakano, Stephan Olariu, Maria Cristina Pinotti, James L. Schwing, and Albert Y. Zomaya. Scalable hardware-algorithms for binary prefix sums. IEEE Trans. Parallel Distributed Syst., 11(8):838-850, 2000. URL: https://doi.org/10.1109/71.877941.
  17. Rong Lin, Koji Nakano, Stephan Olariu, and Albert Y. Zomaya. An efficient parallel prefix sums architecture with domino logic. IEEE Trans. Parallel Distributed Syst., 14(9):922-931, 2003. URL: https://doi.org/10.1109/TPDS.2003.1233714.
  18. Robert A. Melter and Ivan Stojmenovic. Constant time BSR solutions to l_1 metric and digital geometry problems. J. Math. Imaging Vis., 5(2):119-127, 1995. URL: https://doi.org/10.1007/BF01250524.
  19. Vijaya Ramachandran and Elaine Shi. Data oblivious algorithms for multicores. In Kunal Agrawal and Yossi Azar, editors, SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event, USA, 6-8 July, 2021, pages 373-384. ACM, 2021. URL: https://doi.org/10.1145/3409964.3461783.
  20. Mohsen Safari and Marieke Huisman. Formal verification of parallel prefix sum and stream compaction algorithms in CUDA. Theor. Comput. Sci., 912:81-98, 2022. URL: https://doi.org/10.1016/j.tcs.2022.02.027.
  21. Yexuan Shi, Yongxin Tong, Yuxiang Zeng, Zimu Zhou, Bolin Ding, and Lei Chen. Efficient approximate range aggregation over large-scale spatial data federation. IEEE Trans. Knowl. Data Eng., 35(1):418-430, 2023. URL: https://doi.org/10.1109/TKDE.2021.3084141.
  22. Frederick N. Springsteel and Ivan Stojmenovic. Parallel general prefix computations with geometric, algebraic, and other applications. Int. J. Parallel Program., 18(6):485-503, 1989. URL: https://doi.org/10.1007/BF01381719.
  23. Jacek Sroka, Artur Leśniewski, Miroslaw Kowaluk, Krzysztof Stencel, and Jerzy Tyszkiewicz. Towards minimal algorithms for big data analytics with spreadsheets. In Foto N. Afrati and Jacek Sroka, editors, Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2017, Chicago, IL, USA, May 19, 2017, pages 1:1-1:4. ACM, 2017. URL: https://doi.org/10.1145/3070607.3075961.
  24. Yufei Tao. Massively parallel entity matching with linear classification in low dimensional space. In Benny Kimelfeld and Yael Amsterdamer, editors, 21st International Conference on Database Theory, ICDT 2018, March 26-29, 2018, Vienna, Austria, volume 98 of LIPIcs, pages 20:1-20:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICDT.2018.20.
  25. Yufei Tao, Wenqing Lin, and Xiaokui Xiao. Minimal MapReduce algorithms. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 529-540, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2463676.2463719.
  26. Yufei Tao and Dimitris Papadias. Range aggregate processing in spatial databases. IEEE Trans. Knowl. Data Eng., 16(12):1555-1570, 2004. URL: https://doi.org/10.1109/TKDE.2004.93.
  27. Uzi Vishkin. Prefix sums and an application thereof, April 1 2003. US Patent 6,542,918. Google Scholar
  28. Limin Xiang and Kazuo Ushijima. ANSV problem on bsrs. Inf. Process. Lett., 65(3):135-138, 1998. URL: https://doi.org/10.1016/S0020-0190(97)00214-7.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail