Finding a Fair Scoring Function for Top-k Selection: From Hardness to Practice

Cai, Guangya

doi:10.4230/LIPIcs.SoCG.2026.26

Abstract

We study the problem of finding a fair linear scoring function over (numerical) attributes for top-k selection, ensuring fairness through a proportional representation constraint on the protected group. Existing algorithms do not scale efficiently, particularly in higher dimensions. Our hardness analysis shows that in more than two dimensions, no algorithm is likely to scale efficiently with respect to dataset size, and the computational complexity is likely to grow rapidly with dimensionality. However, the hardness results also provide key insights guiding algorithm design, leading to our two-pronged solution: (1) For small k, our analysis reveals a gap in the hardness barrier. By addressing various engineering challenges, including achieving efficient parallelism, we turn this potential of efficiency into an optimized geometry-based algorithm delivering substantial performance gains. (2) For large k, where the hardness is robust, we employ a practically efficient optimization-based algorithm which, despite being theoretically worse, achieves superior real-world performance. Experimental evaluations on real-world datasets then explore scenarios where worst-case behavior does not manifest, identifying areas critical to practical performance. Our solution achieves speedups of up to several orders of magnitude compared to the state of the art, an efficiency made possible through a tight integration of hardness analysis, algorithm design, practical engineering, and empirical evaluation.

Pankaj K. Agarwal and Jiří Matoušek. Dynamic half-space range reporting and its applications. Algorithmica, 13(4):325-345, 1995. URL: https://doi.org/10.1007/BF01293483.
Yuval Aharoni, Dan Halperin, Iddo Hanniel, Sariel Har-Peled, and Chaim Linhart. On-line zone construction in arrangements of lines in the plane. In Proceedings of the 3rd International Workshop on Algorithm Engineering (WAE), pages 139-153, 1999. URL: https://doi.org/10.1007/3-540-48318-7_13.
Shabbir Ahmed and Weijun Xie. Relaxations and approximations of chance constraints under finite distributions. Mathematical Programming, 170:43-65, 2018. URL: https://doi.org/10.1007/S10107-018-1295-Z.
Artur Andrzejak and Komei Fukuda. Optimization over k-set polytopes and efficient k-set enumeration. In Proceedings of the 6th International Workshop on Algorithms and Data Structures (WADS), pages 1-12, 1999. URL: https://doi.org/10.1007/3-540-48447-7_1.
Abolfazl Asudeh, Gautam Das, H. V. Jagadish, Shangqi Lu, Azade Nazi, Yufei Tao, Nan Zhang, and Jianwen Zhao. On finding rank regret representatives. ACM Transactions on Database Systems, 47(3):1-37, 2022. URL: https://doi.org/10.1145/3531054.
Abolfazl Asudeh, H. V. Jagadish, Julia Stoyanovich, and Gautam Das. Designing fair ranking schemes. In Proceedings of the 2019 international conference on management of data (SIGMOD), pages 1259-1276, 2019. URL: https://doi.org/10.1145/3299869.3300079.
Luis Barba and Stefan Langerman. Optimal detection of intersections between convex polyhedra. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1641-1654, 2014.
Julien Basch, Leonidas J. Guibas, and G. D. Ramkumar. Reporting red-blue intersections between two sets of connected line segments. In Proceedings of The Fourth Annual European Symposium on Algorithms (ESA), pages 302-319, 1996. URL: https://doi.org/10.1007/3-540-61680-2_64.
Guy Blelloch, William Dally, Margaret Martonosi, Uzi Vishkin, and Katherine Yelick. SPAA'21 panel paper: Architecture-friendly algorithms versus algorithm-friendly architectures. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 1-7, 2021. URL: https://doi.org/10.1145/3409964.3461780.
Trevor Alexander Brown. Reclaiming memory for lock-free data structures: There has to be a better way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC), pages 261-270, 2015. URL: https://doi.org/10.1145/2767386.2767436.
Guangya Cai. Finding a fair scoring function for top-k selection: From hardness to practice, 2026. URL: https://arxiv.org/abs/2503.11575.
Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong, and Wei Zhan. k-regret minimizing set: Efficient algorithms and hardness. In 20th International Conference on Database Theory (ICDT), pages 11:1-11:19, 2017. URL: https://doi.org/10.4230/LIPIcs.ICDT.2017.11.
L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. Ranking with fairness constraints. In 45th International Colloquium on Automata, Languages, and Programming (ICALP), pages 28:1-28:15, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.28.
Timothy M. Chan. Remarks on k-level algorithms in the plane, 1999. Manuscript. Available at URL: https://tmc.web.engr.illinois.edu/lev2d_7_7_99.pdf.
Timothy M. Chan. Dynamic geometric data structures via shallow cuttings. Discrete & Computational Geometry, 64(4):1235-1252, 2020. URL: https://doi.org/10.1007/S00454-020-00229-5.
Zixuan Chen, Panagiotis Manolios, and Mirek Riedewald. Why not yet: Fixing a top-k ranking that is not fair to individuals. Proceedings of the VLDB Endowment (VLDB), 16(9):2377-2390, 2023. URL: https://doi.org/10.14778/3598581.3598606.
Man-Kwun Chiu, Stefan Felsner, Manfred Scheucher, Patrick Schnider, Raphael Steiner, and Pavel Valtr. On the average complexity of the k-level. Journal of Computational Geometry, 11(1):493-506, 2020. URL: https://doi.org/10.20382/JOCG.V11I1A19.
Tamal K. Dey. Improved bounds for planar k-sets and related problems. Discrete & Computational Geometry, 19:373-382, 1998. URL: https://doi.org/10.1007/PL00009354.
Herbert Edelsbrunner and Ernst Peter Mücke. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Transactions on Graphics, 9(1):66-104, 1990. URL: https://doi.org/10.1145/77635.77639.
Jeff Erickson. New lower bounds for convex hull problems in odd dimensions. SIAM Journal on Computing, 28(4):1198-1214, 1999. URL: https://doi.org/10.1137/S0097539797315410.
Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM, 64(4):136-143, 2021. URL: https://doi.org/10.1145/3433949.
Anka Gajentaan and Mark H. Overmars. On a class of O(n²) problems in computational geometry. Computational geometry, 5(3):165-185, 1995. URL: https://doi.org/10.1016/0925-7721(95)00022-2.
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2024. URL: https://www.gurobi.com.
Dan Halperin and Micha Sharir. Arrangements. In Handbook of discrete and computational geometry, pages 723-762. Chapman and Hall/CRC, 2017.
Sariel Har-Peled. Taking a walk in a planar arrangement. SIAM Journal on Computing, 30(4):1341-1367, 2000. URL: https://doi.org/10.1137/S0097539799362627.
Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. Quantitative relaxation of concurrent data structures. In Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL), pages 317-328, 2013. URL: https://doi.org/10.1145/2429069.2429109.
Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463-492, 1990. URL: https://doi.org/10.1145/78969.78972.
Indian Institute Of Technology. IIT-JEE, 2009. Retrieved from URL: https://indiankanoon.org/doc/1955304.
Paul-Virak Khuong and Pat Morin. Array layouts for comparison-based searching. ACM Journal of Experimental Algorithmics, 22:1-39, 2017. URL: https://doi.org/10.1145/3053370.
Christoph M. Kirsch, Michael Lippautz, and Hannes Payer. Fast and scalable, lock-free k-fifo queues. In Proceedings of the 12th International Conference on Parallel Computing Technologies (PaCT), pages 208-223, 2013. URL: https://doi.org/10.1007/978-3-642-39958-9_18.
Jon Kleinberg and Manish Raghavan. Selection problems in the presence of implicit bias. In 9th Innovations in Theoretical Computer Science Conference (ITCS), pages 33:1-33:17, 2018. URL: https://doi.org/10.4230/LIPIcs.ITCS.2018.33.
Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek. The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowledge and Information Systems, 52(2):341-378, 2017. URL: https://doi.org/10.1007/S10115-016-1004-2.
Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. Singularity: Scientific containers for mobility of compute. PloS one, 12(5):e0177459, 2017.
Charles E. Leiserson and Tao B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures (SPAA), pages 303-314, 2010. URL: https://doi.org/10.1145/1810479.1810534.
Hao Liu, Raymond Chi-Wing Wong, Zheng Zhang, Min Xie, and Bo Tang. Fair top-k query on alpha-fairness. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 2338-2350, 2024. URL: https://doi.org/10.1109/ICDE60146.2024.00185.
László Lovász. On the shannon capacity of a graph. IEEE Transactions on Information theory, 25(1):1-7, 1979. URL: https://doi.org/10.1109/TIT.1979.1055985.
Ketan Mulmuley. On levels in arrangements and voronoi diagrams. Discrete & Computational Geometry, 6:307-338, 1991. URL: https://doi.org/10.1007/BF02574692.
ProPublica. Correctional Offender Management Profiling for Alternative Sanctions, 2014. Retrieved from URL: https://github.com/propublica/compas-analysis.
Raimund Seidel. Small-dimensional linear programming and convex hulls made easy. Discrete & Computational Geometry, 6:423-434, 1991. URL: https://doi.org/10.1007/BF02574699.
Ori Shalev and Nir Shavit. Split-ordered lists: Lock-free extensible hash tables. Journal of the ACM (JACM), 53(3):379-405, 2006. URL: https://doi.org/10.1145/1147954.1147958.
Julian Shun and Guy E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP), pages 135-146, 2013. URL: https://doi.org/10.1145/2442516.2442530.
Supreme Court of the United States. Ricci v. DeStefano (Nos. 07-1428 and 08-328), 530 F. 3d 87, reversed and remanded. https://www.law.cornell.edu/supct/html/07-1428.ZO.html, 2009.
Yotaro Takazawa, Shinji Mizuno, and Tomonari Kitahara. Approximation algorithms for the covering-type k-violation linear program. Optimization Letters, 13:1515-1521, 2019. URL: https://doi.org/10.1007/S11590-019-01425-W.
The College Board. 2025 total group SAT suite of assessments annual report, 2025. URL: https://reports.collegeboard.org/media/pdf/2025-total-group-sat-suite-of-assessments-annual-report.pdf.
Géza Tóth. Point sets with many k-sets. Discrete & Computational Geometry, 26(2):187-194, 2001. URL: https://doi.org/10.1007/S004540010022.
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Nørvåg. Reverse top-k queries. In 2010 IEEE 26th International Conference on Data Engineering (ICDE), pages 365-376, 2010. URL: https://doi.org/10.1109/ICDE.2010.5447890.
Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In Proceedings of the international congress of mathematicians: Rio de janeiro 2018, pages 3447-3487, 2018.
Xingxing Xiao and Jianzhong Li. rkHit: Representative query with uncertain preference. Proceedings of the ACM on Management of Data (SIGMOD), 1(2):1-26, 2023. URL: https://doi.org/10.1145/3589271.
Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 6035-6042, 2019. URL: https://doi.org/10.24963/IJCAI.2019/836.
Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the 29th international conference on scientific and statistical database management (SSDBM), pages 1-6, 2017. URL: https://doi.org/10.1145/3085504.3085526.
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM), pages 1569-1578, 2017. URL: https://doi.org/10.1145/3132847.3132938.
Meike Zehlike, Ke Yang, and Julia Stoyanovich. Fairness in ranking, part I: Score-based ranking. ACM Computing Surveys, 55(6):1-36, 2022. URL: https://doi.org/10.1145/3533379.

Finding a Fair Scoring Function for Top-k Selection: From Hardness to Practice

Author Guangya Cai

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Finding a Fair Scoring Function for Top-k Selection: From Hardness to Practice

Author Guangya Cai

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

Supplementary Materials

References

Thanks for your feedback!

Could not send message