Approximate Maximum Rank Aggregation: Beyond the Worst-Case

Authors Yan Hong Yao Alvin, Diptarka Chakraborty



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2023.12.pdf
  • Filesize: 0.78 MB
  • 21 pages

Document Identifiers

Author Details

Yan Hong Yao Alvin
  • National University of Singapore, Singapore
Diptarka Chakraborty
  • National University of Singapore, Singapore

Cite As Get BibTex

Yan Hong Yao Alvin and Diptarka Chakraborty. Approximate Maximum Rank Aggregation: Beyond the Worst-Case. In 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 284, pp. 12:1-12:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.FSTTCS.2023.12

Abstract

The fundamental task of rank aggregation is to combine multiple rankings on a group of candidates into a single ranking to mitigate biases inherent in individual input rankings. This task has a myriad of applications, such as in social choice theory, collaborative filtering, web search, statistics, databases, sports, and admission systems. One popular version of this task, maximum rank aggregation (or the center ranking problem), aims to find a ranking (not necessarily from the input set) that minimizes the maximum distance to the input rankings. However, even for four input rankings, this problem is NP-hard (Dwork et al., WWW'01, and Biedl et al., Discrete Math.'09), and only a (folklore) polynomial-time 2-approximation algorithm is known for finding an optimal aggregate ranking under the commonly used Kendall-tau distance metric. Achieving a better approximation factor in polynomial time, ideally, a polynomial time approximation scheme (PTAS), is one of the major challenges.
This paper presents significant progress in solving this problem by considering the Mallows model, a classical probabilistic model. Our proposed algorithm outputs an (1+ε)-approximate aggregate ranking for any ε > 0, with high probability, as long as the input rankings come from a Mallows model, even in a streaming fashion. Furthermore, the same approximation guarantee is achieved even in the presence of outliers, presumably a more challenging task.

Subject Classification

ACM Subject Classification
  • Theory of computation → Probabilistic computation
  • Theory of computation → Facility location and clustering
  • Theory of computation → Theory and algorithms for application domains
Keywords
  • Rank Aggregation
  • Center Problem
  • Mallows Model
  • Approximation Algorithms
  • Clustering with Outliers

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):1-27, 2008. Google Scholar
  2. Pranjal Awasthi, Avrim Blum, Or Sheffet, and Aravindan Vijayaraghavan. Learning mixtures of ranking models. Advances in Neural Information Processing Systems, 27, 2014. Google Scholar
  3. Christian Bachmaier, Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier. On the hardness of maximum rank aggregation problems. Journal of Discrete Algorithms, 31:2-13, 2015. 24th International Workshop on Combinatorial Algorithms (IWOCA 2013). Google Scholar
  4. Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing, pages 250-257, 2002. Google Scholar
  5. Therese Biedl, Franz J Brandenburg, and Xiaotie Deng. On the complexity of crossings in permutations. Discrete Mathematics, 309(7):1813-1823, 2009. Google Scholar
  6. Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324-345, 1952. Google Scholar
  7. Felix Brandt, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D Procaccia. Handbook of computational social choice. Cambridge University Press, 2016. Google Scholar
  8. Mark Braverman and Elchanan Mossel. Sorting from noisy information. arXiv preprint, 2009. URL: https://arxiv.org/abs/0910.1191.
  9. Marc Bury and Chris Schwiegelshohn. On finding the jaccard center. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  10. Ioannis Caragiannis, Ariel D Procaccia, and Nisarg Shah. When do noisy votes reveal the truth? ACM Transactions on Economics and Computation (TEAC), 4(3):1-30, 2016. Google Scholar
  11. Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Approximating the median under the ulam metric. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 761-775. SIAM, 2021. Google Scholar
  12. Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Clustering permutations: New techniques with streaming applications. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 31:1-31:24. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. Google Scholar
  13. Diptarka Chakraborty, Kshitij Gajjar, and Agastya Vibhuti Jha. Approximating the center ranking under ulam. In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  14. Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, and Silvio Lattanzi. On reconstructing a hidden permutation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.604.
  15. Fabien Collas and Ekhine Irurozki. Concentric mixtures of mallows models for top-k rankings: sampling and identifiability. In International Conference on Machine Learning, pages 2079-2088. PMLR, 2021. Google Scholar
  16. Anindya De, Ryan O'Donnell, and Rocco Servedio. Learning sparse mixtures of rankings from noisy information. arXiv preprint, 2018. URL: https://arxiv.org/abs/1811.01216.
  17. Persi Diaconis and R. L. Graham. Spearman’s footrule as a measure of disarray. Journal of the Royal Statistical Society. Series B (Methodological), 39(2):262-268, 1977. URL: http://www.jstor.org/stable/2984804.
  18. Jean-Paul Doignon, Aleksandar Pekeč, and Michel Regenwetter. The repeated insertion model for rankings: Missing link between two subset choice models. Psychometrika, 69(1):33-54, 2004. Google Scholar
  19. Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International World Wide Web Conference, WWW 10, pages 613-622, 2001. URL: https://doi.org/10.1145/371920.372165.
  20. Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9-12, 2003, pages 301-312, 2003. Google Scholar
  21. Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113-119, 1997. Google Scholar
  22. David F Gleich and Lek-heng Lim. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 60-68, 2011. Google Scholar
  23. Donna Harman. Ranking algorithms. In William B. Frakes and Ricardo A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 363-392. Prentice-Hall, 1992. Google Scholar
  24. Kenneth Hung and William Fithian. Rank verification for exponential families. The Annals of Statistics, 47(2):758-782, 2019. Google Scholar
  25. David R Hunter. Mm algorithms for generalized bradley-terry models. The annals of statistics, 32(1):384-406, 2004. Google Scholar
  26. John G Kemeny. Mathematics without numbers. Daedalus, 88(4):577-591, 1959. Google Scholar
  27. Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81-93, 1938. Google Scholar
  28. Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In Proceedings of the thirty-ninth annual ACM Symposium on Theory of Computing, pages 95-103, 2007. Google Scholar
  29. Ashish Khetan and Sewoong Oh. Data-driven rank breaking for efficient rank aggregation. In International Conference on Machine Learning, pages 89-98. PMLR, 2016. Google Scholar
  30. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. Journal of the ACM (JACM), 49(2):157-171, March 2002. Google Scholar
  31. Allen Liu and Ankur Moitra. Efficiently learning mixtures of mallows models. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 627-638. IEEE, 2018. Google Scholar
  32. Allen Liu and Ankur Moitra. Robust voting rules from algorithmic robust statistics. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3471-3512. SIAM, 2023. Google Scholar
  33. R Duncan Luce. Individual choice behavior, 1959. Google Scholar
  34. Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39(4):1432-1443, 2010. Google Scholar
  35. Colin L Mallows. Non-null ranking models. i. Biometrika, 44(1/2):114-130, 1957. Google Scholar
  36. Nimrod Megiddo. Linear programming in linear time when the dimension is fixed. Journal of the ACM (JACM), 31(1):114-127, 1984. Google Scholar
  37. François Nicolas and Eric Rivals. Complexities of the centre and median string problems. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, pages 315-327, 2003. Google Scholar
  38. Vasyl Pihur, Susmita Datta, and Somnath Datta. Weighted rank aggregation of cluster validation measures: a monte carlo cross-entropy approach. Bioinformatics, 23(13):1607-1615, 2007. Google Scholar
  39. Robin L Plackett. The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193-202, 1975. Google Scholar
  40. V Yu Popov. Multiple genome rearrangement by swaps and by element duplications. Theoretical computer science, 385(1-3):115-126, 2007. Google Scholar
  41. Antti-Veikko Rosti, Necip Fazil Ayan, Bing Xiang, Spyros Matsoukas, Richard Schwartz, and Bonnie Dorr. Combining outputs from multiple machine translation systems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 228-235, 2007. Google Scholar
  42. Warren Schudy. Approximation schemes for inferring rankings and clusterings from pairwise data. Ph.D. diss., Brown University, 2012. Google Scholar
  43. Nihar B Shah and Martin J Wainwright. Simple, robust and optimal ranking from pairwise comparisons. The Journal of Machine Learning Research, 18(1):7246-7283, 2017. Google Scholar
  44. James Joseph Sylvester. A question in the geometry of situation. Quarterly Journal of Pure and Applied Mathematics, 1(1):79-80, 1857. Google Scholar
  45. Wenpin Tang. Mallows ranking models: maximum likelihood estimate and regeneration. In International Conference on Machine Learning, pages 6125-6134. PMLR, 2019. Google Scholar
  46. E Alper Yildirim. Two algorithms for the minimum enclosing ball problem. SIAM Journal on Optimization, 19(3):1368-1391, 2008. Google Scholar
  47. H Peyton Young. Condorcet’s theory of voting. American Political science review, 82(4):1231-1244, 1988. Google Scholar
  48. H Peyton Young and Arthur Levenglick. A consistent extension of condorcet’s election principle. SIAM Journal on Applied Mathematics, 35(2):285-300, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail