Approximating the Center Ranking Under Ulam

Chakraborty, Diptarka; Gajjar, Kshitij; Jha, Agastya Vibhuti

doi:10.4230/LIPIcs.FSTTCS.2021.12

Abstract

We study the problem of approximating a center under the Ulam metric. The Ulam metric, defined over a set of permutations over [n], is the minimum number of move operations (deletion plus insertion) to transform one permutation into another. The Ulam metric is a simpler variant of the general edit distance metric. It provides a measure of dissimilarity over a set of rankings/permutations. In the center problem, given a set of permutations, we are asked to find a permutation (not necessarily from the input set) that minimizes the maximum distance to the input permutations. This problem is also referred to as maximum rank aggregation under Ulam. So far, we only know of a folklore 2-approximation algorithm for this NP-hard problem. Even for constantly many permutations, we do not know anything better than an exhaustive search over all n! permutations. In this paper, we achieve a (3/2 - 1/(3m))-approximation of the Ulam center in time n^O(m² ln m), for m input permutations over [n]. We therefore get a polynomial time bound while achieving better than a 3/2-approximation for constantly many permutations. This problem is of special interest even for constantly many permutations because under certain dissimilarity measures over rankings, even for four permutations, the problem is NP-hard. In proving our result, we establish a surprising connection between the approximate Ulam center problem and the closest string with wildcards problem (the center problem over the Hamming metric, allowing wildcards). We further study the closest string with wildcards problem and show that there cannot exist any (2-ε)-approximation algorithm (for any ε > 0) for it unless 𝖯 = NP. This inapproximability result is in sharp contrast with the same problem without wildcards, where we know of a PTAS.

Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1-23:27, 2008. URL: https://doi.org/10.1145/1411509.1411513.
David Aldous and Persi Diaconis. Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society, 36(4):413-432, 1999. URL: https://doi.org/10.1090/S0273-0979-99-00796-X.
Alexandr Andoni and Robert Krauthgamer. The computational hardness of estimating edit distance. SIAM J. Comput., 39(6):2398-2429, 2010.
Alexandr Andoni and Huy L. Nguyen. Near-optimal sublinear time algorithms for Ulam distance. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 76-86, 2010. URL: https://doi.org/10.1137/1.9781611973075.8.
Per Austrin, Venkatesan Guruswami, and Johan Håstad. (2+ε)-sat is NP-hard. SIAM J. Comput., 46(5):1554-1573, 2017.
Christian Bachmaier, Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier. On the hardness of maximum rank aggregation problems. Journal of Discrete Algorithms, 31:2-13, 2015. 24th International Workshop on Combinatorial Algorithms (IWOCA 2013).
Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing, pages 250-257, 2002.
Therese Biedl, Franz J Brandenburg, and Xiaotie Deng. On the complexity of crossings in permutations. Discrete Mathematics, 309(7):1813-1823, 2009.
Mahdi Boroujeni and Saeed Seddighin. Improved MPC algorithms for edit distance and Ulam distance. In The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22-24, 2019., pages 31-40, 2019.
Marc Bury and Chris Schwiegelshohn. On finding the jaccard center. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Approximating the median under the ulam metric. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 761-775. SIAM, 2021.
Moses Charikar and Robert Krauthgamer. Embedding the Ulam metric into l₁. Theory of Computing, 2(11):207-224, 2006. URL: https://doi.org/10.4086/toc.2006.v002a011.
Flavio Chierichetti, Ravi Kumar, Sandeep Pandey, and Sergei Vassilvitskii. Finding the Jaccard median. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 293-311. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611973075.25.
Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A black box for online approximate pattern matching. In Annual Symposium on Combinatorial Pattern Matching, pages 143-151. Springer, 2008.
Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. Geometric median in nearly linear time. In Proceedings of the forty-eighth annual ACM Symposium on Theory of Computing, pages 9-21, 2016.
Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing, pages 592-601, 2002.
Graham Cormode, Shan Muthukrishnan, and Süleyman Cenk Sahinalp. Permutation editing and matching via embeddings. In International Colloquium on Automata, Languages, and Programming, pages 481-492. Springer, 2001. URL: https://doi.org/10.1007/3-540-48224-5_40.
Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International World Wide Web Conference, WWW 10, pages 613-622, 2001. URL: https://doi.org/10.1145/371920.372165.
Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113-119, 1997.
Nick Goldman, Paul Bertone, Siyuan Chen, Christophe Dessimoz, Emily M. LeProust, Botond Sipos, and Ewan Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, 494(7435):77-80, 2013.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997.
Danny Hermelin and Liat Rozenberg. Parameterized complexity analysis for the closest string with wildcards problem. In Combinatorial Pattern Matching, pages 140-149, Cham, 2014. Springer International Publishing.
John G Kemeny. Mathematics without numbers. Daedalus, 88(4):577-591, 1959.
Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In Proceedings of the thirty-ninth annual ACM Symposium on Theory of Computing, pages 95-103, 2007.
Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Parameterized algorithms for matrix completion with radius constraints. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
Teuvo Kohonen. Median strings. Pattern Recognition Letters, 3(5):309-313, 1985. URL: https://doi.org/10.1016/0167-8655(85)90061-3.
Joseph B Kruskal. An overview of sequence comparison: Time warps, string edits, and macromolecules. SIAM review, 25(2):201-237, 1983. URL: https://doi.org/10.1137/1025045.
Christina Leslie, Rui Kuang, and Kristin Bennett. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research, 5(9), 2004.
Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. Journal of the ACM (JACM), 49(2):157-171, March 2002.
Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39(4):1432-1443, 2010.
Carlos D. Martínez-Hinarejos, Alfons Juan, and Francisco Casacuberta. Use of median string for classification. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 2, pages 903-906. IEEE, 2000. URL: https://doi.org/10.1109/ICPR.2000.906220.
Daniel Marx. Closest substring problems with small distances. SIAM J. Comput., 38:1382-1410, 2008.
Nimrod Megiddo. Linear programming in linear time when the dimension is fixed. Journal of the ACM (JACM), 31(1):114-127, 1984.
Stanislav Minsker. Geometric median and robust estimation in banach spaces. Bernoulli, 21(4):2308-2335, 2015.
Timothy Naumovitz, Michael Saks, and C. Seshadhri. Accurate and nearly optimal sublinear approximations to Ulam distance. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2012-2031, 2017. URL: https://doi.org/10.1137/1.9781611974782.131.
François Nicolas and Eric Rivals. Complexities of the centre and median string problems. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, pages 315-327, 2003.
Pavel Pevzner. Computational molecular biology: an algorithmic approach. MIT press, 2000.
V Yu Popov. Multiple genome rearrangement by swaps and by element duplications. Theoretical computer science, 385(1-3):115-126, 2007.
Cyrus Rashtchian, Konstantin Makarychev, Miklós Z. Rácz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, and Karin Strauss. Clustering billions of reads for DNA data storage. In Advances in Neural Information Processing Systems 30, pages 3360-3371. Curran Associates, Inc., 2017.
David Sankoff. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28(1):35-42, 1975. URL: https://doi.org/10.1137/0128004.
Warren Schudy. Approximation schemes for inferring rankings and clusterings from pairwise data. Ph.D. Thesis, 2012.
James Joseph Sylvester. A question in the geometry of situation. Quarterly Journal of Pure and Applied Mathematics, 1(1):79-80, 1857.
E Alper Yildirim. Two algorithms for the minimum enclosing ball problem. SIAM Journal on Optimization, 19(3):1368-1391, 2008.
H Peyton Young. Condorcet’s theory of voting. American Political science review, 82(4):1231-1244, 1988.
H Peyton Young and Arthur Levenglick. A consistent extension of condorcet’s election principle. SIAM Journal on Applied Mathematics, 35(2):285-300, 1978.

Approximating the Center Ranking Under Ulam

Authors Diptarka Chakraborty, Kshitij Gajjar, Agastya Vibhuti Jha

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Approximating the Center Ranking Under Ulam

Authors Diptarka Chakraborty, Kshitij Gajjar, Agastya Vibhuti Jha

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message