Approximating the Center Ranking Under Ulam

Authors Diptarka Chakraborty, Kshitij Gajjar, Agastya Vibhuti Jha



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2021.12.pdf
  • Filesize: 0.94 MB
  • 21 pages

Document Identifiers

Author Details

Diptarka Chakraborty
  • National University of Singapore, Singapore
Kshitij Gajjar
  • National University of Singapore, Singapore
Agastya Vibhuti Jha
  • EPFL, Lausanne, Switzerland

Cite AsGet BibTex

Diptarka Chakraborty, Kshitij Gajjar, and Agastya Vibhuti Jha. Approximating the Center Ranking Under Ulam. In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 213, pp. 12:1-12:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.FSTTCS.2021.12

Abstract

We study the problem of approximating a center under the Ulam metric. The Ulam metric, defined over a set of permutations over [n], is the minimum number of move operations (deletion plus insertion) to transform one permutation into another. The Ulam metric is a simpler variant of the general edit distance metric. It provides a measure of dissimilarity over a set of rankings/permutations. In the center problem, given a set of permutations, we are asked to find a permutation (not necessarily from the input set) that minimizes the maximum distance to the input permutations. This problem is also referred to as maximum rank aggregation under Ulam. So far, we only know of a folklore 2-approximation algorithm for this NP-hard problem. Even for constantly many permutations, we do not know anything better than an exhaustive search over all n! permutations. In this paper, we achieve a (3/2 - 1/(3m))-approximation of the Ulam center in time n^O(m² ln m), for m input permutations over [n]. We therefore get a polynomial time bound while achieving better than a 3/2-approximation for constantly many permutations. This problem is of special interest even for constantly many permutations because under certain dissimilarity measures over rankings, even for four permutations, the problem is NP-hard. In proving our result, we establish a surprising connection between the approximate Ulam center problem and the closest string with wildcards problem (the center problem over the Hamming metric, allowing wildcards). We further study the closest string with wildcards problem and show that there cannot exist any (2-ε)-approximation algorithm (for any ε > 0) for it unless 𝖯 = NP. This inapproximability result is in sharp contrast with the same problem without wildcards, where we know of a PTAS.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Center Problem
  • Ulam Metric
  • Edit Distance
  • Closest String
  • Approximation Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1-23:27, 2008. URL: https://doi.org/10.1145/1411509.1411513.
  2. David Aldous and Persi Diaconis. Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society, 36(4):413-432, 1999. URL: https://doi.org/10.1090/S0273-0979-99-00796-X.
  3. Alexandr Andoni and Robert Krauthgamer. The computational hardness of estimating edit distance. SIAM J. Comput., 39(6):2398-2429, 2010. Google Scholar
  4. Alexandr Andoni and Huy L. Nguyen. Near-optimal sublinear time algorithms for Ulam distance. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 76-86, 2010. URL: https://doi.org/10.1137/1.9781611973075.8.
  5. Per Austrin, Venkatesan Guruswami, and Johan Håstad. (2+ε)-sat is NP-hard. SIAM J. Comput., 46(5):1554-1573, 2017. Google Scholar
  6. Christian Bachmaier, Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier. On the hardness of maximum rank aggregation problems. Journal of Discrete Algorithms, 31:2-13, 2015. 24th International Workshop on Combinatorial Algorithms (IWOCA 2013). Google Scholar
  7. Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing, pages 250-257, 2002. Google Scholar
  8. Therese Biedl, Franz J Brandenburg, and Xiaotie Deng. On the complexity of crossings in permutations. Discrete Mathematics, 309(7):1813-1823, 2009. Google Scholar
  9. Mahdi Boroujeni and Saeed Seddighin. Improved MPC algorithms for edit distance and Ulam distance. In The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22-24, 2019., pages 31-40, 2019. Google Scholar
  10. Marc Bury and Chris Schwiegelshohn. On finding the jaccard center. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  11. Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Approximating the median under the ulam metric. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 761-775. SIAM, 2021. Google Scholar
  12. Moses Charikar and Robert Krauthgamer. Embedding the Ulam metric into l₁. Theory of Computing, 2(11):207-224, 2006. URL: https://doi.org/10.4086/toc.2006.v002a011.
  13. Flavio Chierichetti, Ravi Kumar, Sandeep Pandey, and Sergei Vassilvitskii. Finding the Jaccard median. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 293-311. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611973075.25.
  14. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A black box for online approximate pattern matching. In Annual Symposium on Combinatorial Pattern Matching, pages 143-151. Springer, 2008. Google Scholar
  15. Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. Geometric median in nearly linear time. In Proceedings of the forty-eighth annual ACM Symposium on Theory of Computing, pages 9-21, 2016. Google Scholar
  16. Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing, pages 592-601, 2002. Google Scholar
  17. Graham Cormode, Shan Muthukrishnan, and Süleyman Cenk Sahinalp. Permutation editing and matching via embeddings. In International Colloquium on Automata, Languages, and Programming, pages 481-492. Springer, 2001. URL: https://doi.org/10.1007/3-540-48224-5_40.
  18. Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International World Wide Web Conference, WWW 10, pages 613-622, 2001. URL: https://doi.org/10.1145/371920.372165.
  19. Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113-119, 1997. Google Scholar
  20. Nick Goldman, Paul Bertone, Siyuan Chen, Christophe Dessimoz, Emily M. LeProust, Botond Sipos, and Ewan Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, 494(7435):77-80, 2013. Google Scholar
  21. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
  22. Danny Hermelin and Liat Rozenberg. Parameterized complexity analysis for the closest string with wildcards problem. In Combinatorial Pattern Matching, pages 140-149, Cham, 2014. Springer International Publishing. Google Scholar
  23. John G Kemeny. Mathematics without numbers. Daedalus, 88(4):577-591, 1959. Google Scholar
  24. Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In Proceedings of the thirty-ninth annual ACM Symposium on Theory of Computing, pages 95-103, 2007. Google Scholar
  25. Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Parameterized algorithms for matrix completion with radius constraints. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Google Scholar
  26. Teuvo Kohonen. Median strings. Pattern Recognition Letters, 3(5):309-313, 1985. URL: https://doi.org/10.1016/0167-8655(85)90061-3.
  27. Joseph B Kruskal. An overview of sequence comparison: Time warps, string edits, and macromolecules. SIAM review, 25(2):201-237, 1983. URL: https://doi.org/10.1137/1025045.
  28. Christina Leslie, Rui Kuang, and Kristin Bennett. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research, 5(9), 2004. Google Scholar
  29. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. Journal of the ACM (JACM), 49(2):157-171, March 2002. Google Scholar
  30. Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39(4):1432-1443, 2010. Google Scholar
  31. Carlos D. Martínez-Hinarejos, Alfons Juan, and Francisco Casacuberta. Use of median string for classification. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 2, pages 903-906. IEEE, 2000. URL: https://doi.org/10.1109/ICPR.2000.906220.
  32. Daniel Marx. Closest substring problems with small distances. SIAM J. Comput., 38:1382-1410, 2008. Google Scholar
  33. Nimrod Megiddo. Linear programming in linear time when the dimension is fixed. Journal of the ACM (JACM), 31(1):114-127, 1984. Google Scholar
  34. Stanislav Minsker. Geometric median and robust estimation in banach spaces. Bernoulli, 21(4):2308-2335, 2015. Google Scholar
  35. Timothy Naumovitz, Michael Saks, and C. Seshadhri. Accurate and nearly optimal sublinear approximations to Ulam distance. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2012-2031, 2017. URL: https://doi.org/10.1137/1.9781611974782.131.
  36. François Nicolas and Eric Rivals. Complexities of the centre and median string problems. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, pages 315-327, 2003. Google Scholar
  37. Pavel Pevzner. Computational molecular biology: an algorithmic approach. MIT press, 2000. Google Scholar
  38. V Yu Popov. Multiple genome rearrangement by swaps and by element duplications. Theoretical computer science, 385(1-3):115-126, 2007. Google Scholar
  39. Cyrus Rashtchian, Konstantin Makarychev, Miklós Z. Rácz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, and Karin Strauss. Clustering billions of reads for DNA data storage. In Advances in Neural Information Processing Systems 30, pages 3360-3371. Curran Associates, Inc., 2017. Google Scholar
  40. David Sankoff. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28(1):35-42, 1975. URL: https://doi.org/10.1137/0128004.
  41. Warren Schudy. Approximation schemes for inferring rankings and clusterings from pairwise data. Ph.D. Thesis, 2012. Google Scholar
  42. James Joseph Sylvester. A question in the geometry of situation. Quarterly Journal of Pure and Applied Mathematics, 1(1):79-80, 1857. Google Scholar
  43. E Alper Yildirim. Two algorithms for the minimum enclosing ball problem. SIAM Journal on Optimization, 19(3):1368-1391, 2008. Google Scholar
  44. H Peyton Young. Condorcet’s theory of voting. American Political science review, 82(4):1231-1244, 1988. Google Scholar
  45. H Peyton Young and Arthur Levenglick. A consistent extension of condorcet’s election principle. SIAM Journal on Applied Mathematics, 35(2):285-300, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail