Multi Layer Peeling for Linear Arrangement and Hierarchical Clustering

Authors Yossi Azar, Danny Vainstein



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2023.13.pdf
  • Filesize: 0.83 MB
  • 18 pages

Document Identifiers

Author Details

Yossi Azar
  • School of Computer Science, Tel-Aviv University, Israel
Danny Vainstein
  • School of Computer Science, Tel-Aviv University, Israel

Cite As Get BibTex

Yossi Azar and Danny Vainstein. Multi Layer Peeling for Linear Arrangement and Hierarchical Clustering. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.ICALP.2023.13

Abstract

We present a new multi-layer peeling technique to cluster points in a metric space. A well-known non-parametric objective is to embed the metric space into a simpler structured metric space such as a line (i.e., Linear Arrangement) or a binary tree (i.e., Hierarchical Clustering). Points which are close in the metric space should be mapped to close points/leaves in the line/tree; similarly, points which are far in the metric space should be far in the line or on the tree. In particular we consider the Maximum Linear Arrangement problem [Refael Hassin and Shlomi Rubinstein, 2001] and the Maximum Hierarchical Clustering problem [Vincent Cohen-Addad et al., 2018] applied to metrics.
We design approximation schemes (1-ε approximation for any constant ε > 0) for these objectives. In particular this shows that by considering metrics one may significantly improve former approximations (0.5 for Max Linear Arrangement and 0.74 for Max Hierarchical Clustering). Our main technique, which is called multi-layer peeling, consists of recursively peeling off points which are far from the "core" of the metric space. The recursion ends once the core becomes a sufficiently densely weighted metric space (i.e. the average distance is at least a constant times the diameter) or once it becomes negligible with respect to its inner contribution to the objective. Interestingly, the algorithm in the Linear Arrangement case is much more involved than that in the Hierarchical Clustering case, and uses a significantly more delicate peeling.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Hierarchical clustering
  • Linear arrangements
  • Metric embeddings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sara Ahmadian, Vaggos Chatziafratis, Alessandro Epasto, Euiwoong Lee, Mohammad Mahdian, Konstantin Makarychev, and Grigory Yaroslavtsev. Bisect and conquer: Hierarchical clustering via max-uncut bisection. CoRR, abs/1912.06983, 2019. URL: https://arxiv.org/abs/1912.06983.
  2. Noga Alon, Yossi Azar, and Danny Vainstein. Hierarchical clustering: A 0.585 revenue approximation. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 153-162. PMLR, 2020. URL: http://proceedings.mlr.press/v125/alon20b.html.
  3. Sanjeev Arora, David R. Karger, and Marek Karpinski. Polynomial time approximation schemes for dense instances of np-hard problems. J. Comput. Syst. Sci., 58(1):193-210, 1999. URL: https://doi.org/10.1006/jcss.1998.1605.
  4. Sanjeev Arora, Satish Rao, and Umesh V. Vazirani. Expander flows, geometric embeddings and graph partitioning. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 222-231. ACM, 2004. URL: https://doi.org/10.1145/1007352.1007355.
  5. Kevin Aydin, MohammadHossein Bateni, and Vahab S. Mirrokni. Distributed balanced partitioning via linear embedding. Algorithms, 12(8):162, 2019. URL: https://doi.org/10.3390/a12080162.
  6. MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab S. Mirrokni. Affinity clustering: Hierarchical clustering at scale. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6864-6874, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/2e1b24a664f5e9c18f407b2f9c73e821-Abstract.html.
  7. Moses Charikar and Vaggos Chatziafratis. Approximate hierarchical clustering via sparsest cut and spreading metrics. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 841-854, 2017. Google Scholar
  8. Moses Charikar, Vaggos Chatziafratis, and Rad Niazadeh. Hierarchical clustering better than average-linkage. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 2291-2304, 2019. Google Scholar
  9. Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh, and Grigory Yaroslavtsev. Hierarchical clustering for euclidean data. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, pages 2721-2730, 2019. URL: http://proceedings.mlr.press/v89/charikar19a.html.
  10. Moses Charikar, Mohammad Taghi Hajiaghayi, Howard J. Karloff, and Satish Rao. l^2_2 spreading metrics for vertex ordering problems. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 1018-1027. ACM Press, 2006. URL: http://dl.acm.org/citation.cfm?id=1109557.1109670.
  11. Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, and Sanjiv Kumar. Batch active learning at scale. CoRR, abs/2107.14263, 2021. URL: https://arxiv.org/abs/2107.14263.
  12. Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, and Claire Mathieu. Hierarchical clustering: Objective functions and algorithms. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 378-397, 2018. Google Scholar
  13. Sanjoy Dasgupta. A cost function for similarity-based hierarchical clustering. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 118-127, 2016. Google Scholar
  14. Wenceslas Fernandez de la Vega and Marek Karpinski. Polynomial time approximation of dense weighted instances of MAX-CUT. Electron. Colloquium Comput. Complex., 64, 1998. URL: https://eccc.weizmann.ac.il/eccc-reports/1998/TR98-064/index.html, URL: https://arxiv.org/abs/TR98-064.
  15. Wenceslas Fernandez de la Vega and Claire Kenyon. A randomized approximation scheme for metric MAX-CUT. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, November 8-11, 1998, Palo Alto, California, USA, pages 468-471. IEEE Computer Society, 1998. URL: https://doi.org/10.1109/SFCS.1998.743497.
  16. Guy Even, Joseph Naor, Satish Rao, and Baruch Schieber. Divide-and-conquer approximation algorithms via spreading metrics (extended abstract). In 36th Annual Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, USA, 23-25 October 1995, pages 62-71. IEEE Computer Society, 1995. URL: https://doi.org/10.1109/SFCS.1995.492463.
  17. Uriel Feige and James R. Lee. An improved approximation ratio for the minimum linear arrangement problem. Inf. Process. Lett., 101(1):26-29, 2007. URL: https://doi.org/10.1016/j.ipl.2006.07.009.
  18. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653-750, 1998. Google Scholar
  19. Mark D. Hansen. Approximation algorithms for geometric embeddings in the plane with applications to parallel processing problems (extended abstract). In 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989, pages 604-609. IEEE Computer Society, 1989. URL: https://doi.org/10.1109/SFCS.1989.63542.
  20. Refael Hassin and Shlomi Rubinstein. Approximation algorithms for maximum linear arrangement. Inf. Process. Lett., 80(4):171-177, 2001. URL: https://doi.org/10.1016/S0020-0190(01)00159-4.
  21. Marek Karpinski and Warren Schudy. Linear time approximation schemes for the gale-berlekamp game and related minimization problems. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 313-322. ACM, 2009. URL: https://doi.org/10.1145/1536414.1536458.
  22. Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In David S. Johnson and Uriel Feige, editors, Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pages 95-103. ACM, 2007. URL: https://doi.org/10.1145/1250790.1250806.
  23. Frank Thomson Leighton and Satish Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. J. ACM, 46(6):787-832, 1999. URL: https://doi.org/10.1145/331524.331526.
  24. Benjamin Moseley and Joshua Wang. Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 3094-3103, 2017. Google Scholar
  25. Stanislav Naumov, Grigory Yaroslavtsev, and Dmitrii Avdiukhin. Objective-based hierarchical clustering of deep embedding vectors. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 9055-9063. AAAI Press, 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/17094.
  26. Anand Rajagopalan, Fabio Vitale, Danny Vainstein, Gui Citovsky, Cecilia M. Procopiuc, and Claudio Gentile. Hierarchical clustering of data streams: Scalable algorithms and approximation guarantees. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8799-8809. PMLR, 2021. URL: http://proceedings.mlr.press/v139/rajagopalan21a.html.
  27. Satish Rao and Andréa W. Richa. New approximation techniques for some ordering problems. In Howard J. Karloff, editor, Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 25-27 January 1998, San Francisco, California, USA, pages 211-218. ACM/SIAM, 1998. URL: http://dl.acm.org/citation.cfm?id=314613.314703.
  28. R. Ravi, Ajit Agrawal, and Philip N. Klein. Ordering problems approximated: Single-processor scheduling and interval graph completion. In Javier Leach Albert, Burkhard Monien, and Mario Rodríguez-Artalejo, editors, Automata, Languages and Programming, 18th International Colloquium, ICALP91, Madrid, Spain, July 8-12, 1991, Proceedings, volume 510 of Lecture Notes in Computer Science, pages 751-762. Springer, 1991. URL: https://doi.org/10.1007/3-540-54233-7_180.
  29. Paul D. Seymour. Packing directed circuits fractionally. Comb., 15(2):281-288, 1995. URL: https://doi.org/10.1007/BF01200760.
  30. Baris Sumengen, Anand Rajagopalan, Gui Citovsky, David Simcha, Olivier Bachem, Pradipta Mitra, Sam Blasiak, Mason Liang, and Sanjiv Kumar. Scaling hierarchical agglomerative clustering to billion-sized datasets. CoRR, abs/2105.11653, 2021. URL: https://arxiv.org/abs/2105.11653.
  31. Danny Vainstein, Vaggos Chatziafratis, Gui Citovsky, Anand Rajagopalan, Mohammad Mahdian, and Yossi Azar. Hierarchical clustering via sketches and hierarchical correlation clustering. In Arindam Banerjee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 559-567. PMLR, 2021. URL: http://proceedings.mlr.press/v130/vainstein21a.html.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail