Deterministic Cache-Oblivious Funnelselect

Authors Gerth Stølting Brodal , Sebastian Wild



PDF
Thumbnail PDF

File

LIPIcs.SWAT.2024.17.pdf
  • Filesize: 0.79 MB
  • 12 pages

Document Identifiers

Author Details

Gerth Stølting Brodal
  • Aarhus University, Denmark
Sebastian Wild
  • University of Liverpool, UK

Cite AsGet BibTex

Gerth Stølting Brodal and Sebastian Wild. Deterministic Cache-Oblivious Funnelselect. In 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 294, pp. 17:1-17:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SWAT.2024.17

Abstract

In the multiple-selection problem one is given an unsorted array S of N elements and an array of q query ranks r_1 < ⋯ < r_q, and the task is to return, in sorted order, the q elements in S of rank r_1, …, r_q, respectively. The asymptotic deterministic comparison complexity of the problem was settled by Dobkin and Munro [JACM 1981]. In the I/O model an optimal I/O complexity was achieved by Hu et al. [SPAA 2014]. Recently [ESA 2023], we presented a cache-oblivious algorithm with matching I/O complexity, named funnelselect, since it heavily borrows ideas from the cache-oblivious sorting algorithm funnelsort from the seminal paper by Frigo, Leiserson, Prokop and Ramachandran [FOCS 1999]. Funnelselect is inherently randomized as it relies on sampling for cheaply finding many good pivots. In this paper we present deterministic funnelselect, achieving the same optimal I/O complexity cache-obliviously without randomization. Our new algorithm essentially replaces a single (in expectation) reversed-funnel computation using random pivots by a recursive algorithm using multiple reversed-funnel computations. To meet the I/O bound, this requires a carefully chosen subproblem size based on the entropy of the sequence of query ranks; deterministic funnelselect thus raises distinct technical challenges not met by randomized funnelselect. The resulting worst-case I/O bound is O(∑_{i = 1}^{q+1} Δ_i/B ⋅ log_{M/B} N/Δ_i + N/B), where B is the external memory block size, M ≥ B^{1+ε} is the internal memory size, for some constant ε > 0, and Δ_i = r_i - r_{i-1} (assuming r_0 = 0 and r_{q+1} = N + 1).

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Multiple selection
  • cache-oblivious algorithm
  • entropy bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116-1127, 1988. URL: https://doi.org/10.1145/48529.48535.
  2. Jérémy Barbay, Ankur Gupta, Srinivasa Rao Satti, and Jon Sorenson. Near-optimal online multiselection in internal and external memory. Journal of Discrete Algorithms, 36:3-17, January 2016. URL: https://doi.org/10.1016/j.jda.2015.11.001.
  3. Chaya Bleich and Michael L. Overton. A linear-time algorithm for the weighted median problem. Technical Report 75, New Yourk University, Department of Computer Science, April 1983. URL: https://archive.org/details/lineartimealgori00blei/.
  4. Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448-461, 1973. URL: https://doi.org/10.1016/S0022-0000(73)80033-9.
  5. Gerth Stølting Brodal and Rolf Fagerberg. Cache oblivious distribution sweeping. In Peter Widmayer, Francisco Triguero Ruiz, Rafael Morales Bueno, Matthew Hennessy, Stephan J. Eidenbenz, and Ricardo Conejo, editors, Automata, Languages and Programming, 29th International Colloquium, ICALP 2002, Malaga, Spain, July 8-13, 2002, Proceedings, volume 2380 of Lecture Notes in Computer Science, pages 426-438. Springer, 2002. URL: https://doi.org/10.1007/3-540-45465-9_37.
  6. Gerth Stølting Brodal and Rolf Fagerberg. On the limits of cache-obliviousness. In Lawrence L. Larmore and Michel X. Goemans, editors, Proceedings of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego, CA, USA, pages 307-315. ACM, 2003. URL: https://doi.org/10.1145/780542.780589.
  7. Gerth Stølting Brodal and Sebastian Wild. Funnelselect: Cache-oblivious multiple selection. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms, ESA 2023, September 4-6, 2023, Amsterdam, The Netherlands, volume 274 of LIPIcs, pages 25:1-25:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.ESA.2023.25.
  8. J. M. Chambers. Partial sorting [M1] (algorithm 410). Commun. ACM, 14(5):357-358, 1971. URL: https://doi.org/10.1145/362588.362602.
  9. David P. Dobkin and J. Ian Munro. Optimal time minimal space selection algorithms. J. ACM, 28(3):454-461, 1981. URL: https://doi.org/10.1145/322261.322264.
  10. Dorit Dor and Uri Zwick. Selecting the median. SIAM Journal on Computing, 28(5):1722-1758, 1999. URL: https://doi.org/10.1137/s0097539795288611.
  11. Robert W. Floyd and Ronald L. Rivest. Expected time bounds for selection. Communications of the ACM, 18(3):165-172, March 1975. URL: https://doi.org/10.1145/360680.360691.
  12. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 285-298. IEEE Computer Society, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814600.
  13. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. ACM Trans. Algorithms, 8(1):4:1-4:22, 2012. URL: https://doi.org/10.1145/2071379.2071383.
  14. C. A. R. Hoare. Algorithm 65: find. Commun. ACM, 4(7):321-322, 1961. URL: https://doi.org/10.1145/366622.366647.
  15. Xiaocheng Hu, Yufei Tao, Yi Yang, and Shuigeng Zhou. Finding approximate partitions and splitters in external memory. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, June 2014. URL: https://doi.org/10.1145/2612669.2612691.
  16. Kanela Kaligosi, Kurt Mehlhorn, J. Ian Munro, and Peter Sanders. Towards optimal multiple selection. In Luís Caires, Giuseppe F. Italiano, Luís Monteiro, Catuscia Palamidessi, and Moti Yung, editors, Automata, Languages and Programming, 32nd International Colloquium, ICALP 2005, Lisbon, Portugal, July 11-15, 2005, Proceedings, volume 3580 of Lecture Notes in Computer Science, pages 103-114. Springer, 2005. URL: https://doi.org/10.1007/11523468_9.
  17. Helmut Prodinger. Multiple Quickselect - Hoare’s Find algorithm for several elements. Information Processing Letters, 56(3):123-129, November 1995. URL: https://doi.org/10.1016/0020-0190(95)00150-b.
  18. Arnold Schönhage, Mike Paterson, and Nicholas Pippenger. Finding the median. J. Comput. Syst. Sci., 13(2):184-199, 1976. URL: https://doi.org/10.1016/S0022-0000(76)80029-3.
  19. Michael Ian Shamos. Geometry and statistics: Problems at the interface. In Joseph Frederick Traub, editor, Algorithms and Complexity: New Directions and Recent Results, pages 251-280. Academic Press, 1976. URL: http://euro.ecom.cmu.edu/people/faculty/mshamos/1976Stat.pdf.