Fast Deterministic Selection

Author Andrei Alexandrescu



PDF
Thumbnail PDF

File

LIPIcs.SEA.2017.24.pdf
  • Filesize: 0.53 MB
  • 19 pages

Document Identifiers

Author Details

Andrei Alexandrescu

Cite As Get BibTex

Andrei Alexandrescu. Fast Deterministic Selection. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 24:1-24:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.SEA.2017.24

Abstract

The selection problem, in forms such as finding the median or choosing the k top ranked items in a dataset, is a core task in computing with numerous applications in fields as diverse as statistics, databases, Machine Learning, finance, biology, and graphics. The selection algorithm Median of Medians, although a landmark theoretical achievement, is seldom used in practice because it is slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm MedianOfNinthers based on a refined definition of MedianOfMedians. A complementary algorithm MedianOfExtrema is also proposed. These algorithms work together to solve the selection problem in guaranteed linear time, faster than state-of-the-art baselines, and without resorting to randomization, heuristics, or fallback approaches for pathological cases. We demonstrate results on uniformly distributed random numbers, typical low-entropy artificial datasets, and real-world data. Measurements are open-sourced alongside the implementation at https://github.com/andralex/MedianOfNinthers.

Subject Classification

Keywords
  • Selection Problem
  • Quickselect
  • Median of Medians
  • Algorithm Engineering
  • Algorithmic Libraries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Andrei Alexandrescu. Median of ninthers: code, data, and benchmarks. https://github.com/andralex/MedianOfNinthers, 2017.
  2. Sebastiano Battiato, Domenico Cantone, Dario Catalano, Gianluca Cincotti, and Micha Hofri. An efficient algorithm for the approximate median selection problem. In Algorithms and Complexity, pages 226-238. Springer, 2000. Google Scholar
  3. Jon L Bentley and M Douglas McIlroy. Engineering a sort function. Software: Practice and Experience, 23(11):1249-1265, 1993. Google Scholar
  4. Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448-461, August 1973. URL: http://dx.doi.org/10.1016/S0022-0000(73)80033-9.
  5. Svante Carlsson and Mikael Sundström. Algorithms and Computations: 6th International Symposium, ISAAC'95 Cairns, Australia, December 4-6, 1995 Proceedings, chapter Linear-time in-place selection in less than 3n comparisons, pages 244-253. Springer Berlin Heidelberg, Berlin, Heidelberg, 1995. URL: http://dx.doi.org/10.1007/BFb0015429.
  6. Ke Chen and Adrian Dumitrescu. Select with groups of 3 or 4. In Algorithms and Data Structures: 14th International Symposium, WADS, 2015. Google Scholar
  7. Derrick Coetzee. An efficient implementation of Blum, Floyd, Pratt, Rivest, and Tarjan’s worst-case linear selection algorithm, 2004. URL: http://moonflare.com/code/select/select.pdf.
  8. Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google Scholar
  9. Jean Daligault and Conrado Martínez. On the variance of quickselect. In Proceedings of the Meeting on Analytic Algorithmics and Combinatorics, pages 205-210. Society for Industrial and Applied Mathematics, 2006. Google Scholar
  10. Kevin Dinkel and Andrew Zizzi. Fast median finding on digital images. In AIAA Regional Student Paper Conference, 2012. Google Scholar
  11. Dorit Dor and Uri Zwick. Median selection requires (2+ε)N comparisons. SIAM J. Discret. Math., 14(3):312-325, March 2001. URL: http://dx.doi.org/10.1137/S0895480199353895.
  12. Robert W. Floyd and Ronald L. Rivest. Expected time bounds for selection. Commun. ACM, 18(3):165-172, March 1975. URL: http://dx.doi.org/10.1145/360680.360691.
  13. GNU Team. Implementation of std::nth_element, 2016. [Online; accessed 27-Nov-2016]. URL: https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01347.html.
  14. Robin Griffin and K. A. Redish. Remark on algorithm 347 [m1]: An efficient algorithm for sorting with minimal storage. Commun. ACM, 13(1):54-, January 1970. URL: http://dx.doi.org/10.1145/361953.361993.
  15. C. A. R. Hoare. Algorithm 63: Partition. Commun. ACM, 4(7):321-, July 1961. URL: http://doi.acm.org/10.1145/366622.366642, URL: http://dx.doi.org/10.1145/366622.366642.
  16. C. A. R. Hoare. Algorithm 64: Quicksort. Commun. ACM, 4(7):321-, July 1961. URL: http://dx.doi.org/10.1145/366622.366644.
  17. C. A. R. Hoare. Algorithm 65: Find. Commun. ACM, 4(7):321-322, July 1961. URL: http://dx.doi.org/10.1145/366622.366647.
  18. Peter Kirschenhofer and Helmut Prodinger. Comparisons in Hoare’s find algorithm. Combinatorics, Probability and Computing, 7:111-120, 3 1998. URL: http://journals.cambridge.org/article_S0963548397003325, URL: http://dx.doi.org/null.
  19. Krzysztof C. Kiwiel. On Floyd and Rivest’s SELECT Algorithm. Theor. Comput. Sci., 347(1-2):214-238, November 2005. URL: http://dx.doi.org/10.1016/j.tcs.2005.06.032.
  20. Donald E. Knuth. The Art of Computer Programming, Volume 3: (2Nd Ed.) Sorting and Searching. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1998. Google Scholar
  21. Tony W. Lai and Derick Wood. SWAT 88: 1st Scandinavian Workshop on Algorithm Theory Halmstad, Sweden, July 5-8, 1988 Proceedings, chapter Implicit selection, pages 14-23. Springer Berlin Heidelberg, Berlin, Heidelberg, 1988. URL: http://dx.doi.org/10.1007/3-540-19487-8_2.
  22. Conrado Martínez, Daniel Panario, and Alfredo Viola. Mathematics and Computer Science II: Algorithms, Trees, Combinatorics and Probabilities, chapter Analysis of Quickfind with Small Subfiles, pages 329-340. Birkhäuser Basel, Basel, 2002. URL: http://dx.doi.org/10.1007/978-3-0348-8211-8_20.
  23. Conrado Martínez, Daniel Panario, and Alfredo Viola. Adaptive sampling strategies for quickselect. ACM Trans. Algorithms, 6(3):53:1-53:45, July 2010. URL: http://dx.doi.org/10.1145/1798596.1798606.
  24. Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K Gray, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, et al. Quantitative analysis of culture using millions of digitized books. science, 331(6014):176-182, 2011. Google Scholar
  25. David R. Musser. Introspective sorting and selection algorithms. Software - Practice &Experience, 27(8):983-993, 1997. Google Scholar
  26. Himangi Saraogi. Median of medians algorithm, 2013. URL: http://himangi774.blogspot.com/2013/09/median-of-medians.html.
  27. Uwe Schöning. Mastering the master theorem. Bulletin of the EATCS, 71:165-166, 2000. Google Scholar
  28. SciPy.org. Implementation of argpartition, 2017. [Online; accessed Feb 9, 2017]. URL: https://docs.scipy.org/doc/numpy/reference/generated/numpy.argpartition.html.
  29. R. Sedgewick and K. Wayne. Algorithms. Pearson Education, 2011. URL: https://books.google.com/books?id=idUdqdDXqnAC.
  30. Richard C. Singleton. Algorithm 347: An efficient algorithm for sorting with minimal storage [m1]. Commun. ACM, 12(3):185-186, March 1969. URL: http://dx.doi.org/10.1145/362875.362901.
  31. J. W. Tukey. The ninther, a technique for low-effort robust (resistant) location in large samples. Contributions to Survey Sampling and Applied Statistics in Honor of HO Hartley, Academic Press, New York, pages 251-258, 1978. Google Scholar
  32. Wikipedia. Median of medians, 2016. [Online; accessed 25-Feb-2016]. URL: https://en.wikipedia.org/wiki/Median_of_medians.
  33. Andrew C. Yao and F. F. Yao. On the average-case complexity of selecting the k-th best. Technical report, Stanford University, Stanford, CA, USA, 1979. Google Scholar
  34. Chee K. Yap. New upper bounds for selection. Commun. ACM, 19(9):501-508, September 1976. URL: http://dx.doi.org/10.1145/360336.360339.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail