On Geometric Prototype and Applications

Authors Hu Ding, Manni Liu



PDF
Thumbnail PDF

File

LIPIcs.ESA.2018.23.pdf
  • Filesize: 0.61 MB
  • 15 pages

Document Identifiers

Author Details

Hu Ding
  • Department of Computer Science and Engineering, Michigan State University , East Lansing, USA
  • and, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Manni Liu
  • Department of Computer Science and Engineering, Michigan State University , East Lansing, USA

Cite AsGet BibTex

Hu Ding and Manni Liu. On Geometric Prototype and Applications. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ESA.2018.23

Abstract

In this paper, we propose to study a new geometric optimization problem called the "geometric prototype" in Euclidean space. Given a set of patterns, where each pattern is represented by a (weighted or unweighted) point set, the geometric prototype can be viewed as the "average pattern" minimizing the total matching cost to them. As a general model, the problem finds many applications in real-world, such as Wasserstein barycenter and ensemble clustering. The dimensionality could be either constant or high, depending on the applications. To our best knowledge, the general geometric prototype problem has yet to be seriously considered by the theory community. To bridge the gap between theory and practice, we first show that a small core-set can be obtained to substantially reduce the data size. Consequently, any existing heuristic or algorithm can run on the core-set to achieve a great improvement on the efficiency. As a new application of core-set, it needs to tackle a couple of challenges particularly in theory. Finally, we test our method on both image and high dimensional clustering datasets; the experimental results remain stable even if we run the algorithms on core-sets much smaller than the original datasets, while the running times are reduced significantly.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • prototype
  • core-set
  • Wasserstein barycenter
  • ensemble clustering

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences, 66(4):671-687, 2003. Google Scholar
  2. Pankaj K. Agarwal, Kyle Fox, Debmalya Panigrahi, Kasturi R. Varadarajan, and Allen Xiao. Faster algorithms for the geometric transportation problem. In 33rd International Symposium on Computational Geometry, SoCG 2017, July 4-7, 2017, Brisbane, Australia, pages 7:1-7:16, 2017. Google Scholar
  3. Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric approximation via coresets. Combinatorial and Computational Geometry, 52:1-30, 2005. Google Scholar
  4. Pankaj K. Agarwal and Kasturi R. Varadarajan. A near-linear constant-factor approximation for euclidean bipartite matching? In Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8-11, 2004, pages 247-252, 2004. Google Scholar
  5. Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. Network flows: theory, algorithms, and applications. Prentice Hall, 1993. Google Scholar
  6. Nir Ailon and Bernard Chazelle. The fast Johnson—Lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302-322, 2009. Google Scholar
  7. Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574-583, 2014. Google Scholar
  8. Esther M Arkin, José Miguel Díaz-Báñez, Ferran Hurtado, Piyush Kumar, Joseph S.B. Mitchell, Belén Palop, Pablo Pérez-Lantero, Maria Saumell, and Rodrigo I Silveira. Bichromatic 2-center of pairs of points. Computational Geometry, 48(2):94-107, 2015. Google Scholar
  9. Marcus Baum, Peter Willett, and Uwe D. Hanebeck. On wasserstein barycenters and MMOSPA estimation. IEEE Signal Process. Lett., 22(10):1511-1515, 2015. Google Scholar
  10. Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111-A1138, 2015. Google Scholar
  11. Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, and Tao Jiang. On the approximation of correlation clustering and consensus clustering. Journal of Computer and System Sciences, 74(5):671-696, 2008. Google Scholar
  12. Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trendsregistered in Machine learning, 3(1):1-122, 2011. Google Scholar
  13. Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. Matching point sets with respect to the earth mover’s distance. Computational Geometry, 39(2):118-133, 2008. Google Scholar
  14. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  15. Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9-21. ACM, 2016. Google Scholar
  16. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009. Google Scholar
  17. Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. In International Conference on Machine Learning, pages 685-693, 2014. Google Scholar
  18. Sanjoy Dasgupta. The hardness of k-means clustering. Technical Report, 2008. Google Scholar
  19. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures &Algorithms, 22(1):60-65, 2003. Google Scholar
  20. Hu Ding, Ronald Berezney, and Jinhui Xu. k-prototype learning for 3d rigid structures. In Advances in Neural Information Processing Systems, pages 2589-2597, 2013. Google Scholar
  21. Hu Ding and Manni Liu. On geometric prototype and applications. CoRR, abs/1804.09655, 2018. URL: http://arxiv.org/abs/1804.09655.
  22. Hu Ding, Lu Su, and Jinhui Xu. Towards distributed ensemble clustering for networked sensing systems: a novel geometric approach. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing, MobiHoc 2016, Paderborn, Germany, July 4-8, 2016, pages 1-10, 2016. Google Scholar
  23. Hu Ding and Jinhui Xu. Solving the chromatic cone clustering problem via minimum spanning sphere. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP), pages 773-784, 2011. Google Scholar
  24. Hu Ding and Jinhui Xu. Finding median point-set using earth mover’s distance. In Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014. Google Scholar
  25. Hu Ding and Jinhui Xu. A unified framework for clustering constrained data without locality property. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1471-1490, 2015. Google Scholar
  26. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 569-578, 2011. Google Scholar
  27. Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, and Jiawei Han. Graph-based consensus maximization among multiple supervised and unsupervised models. In Advances in Neural Information Processing Systems, pages 585-593, 2009. Google Scholar
  28. Joydeep Ghosh and Ayan Acharya. Cluster ensembles: Theory and applications. In Data Clustering: Algorithms and Applications, pages 551-570. CRC, 2013. Google Scholar
  29. Alexandre Gramfort, Gabriel Peyré, and Marco Cuturi. Fast optimal transport averaging of neuroimaging data. In International Conference on Information Processing in Medical Imaging, pages 261-272. Springer, 2015. Google Scholar
  30. Piotr Indyk. A near linear time constant factor approximation for euclidean bichromatic matching (cost). In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 39-42. Society for Industrial and Applied Mathematics, 2007. Google Scholar
  31. Piotr Indyk and Nitin Thaper. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003. Google Scholar
  32. Michael Langberg and Leonard J Schulman. Universal ε-approximators for integrals. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 598-607. SIAM, 2010. Google Scholar
  33. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998. Google Scholar
  34. Edo Liberty and Steven W Zucker. The mailman algorithm: A note on matrix-vector multiplication. Information Processing Letters, 109(3):179-182, 2009. Google Scholar
  35. Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In Proc. of SDM, volume 13, pages 252-260, 2013. Google Scholar
  36. Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129-137, 1982. Google Scholar
  37. Ofir Pele and Michael Werman. Fast and robust earth mover’s distances. In Computer vision, 2009 IEEE 12th international conference on, pages 460-467. IEEE, 2009. Google Scholar
  38. Jeff M. Phillips. Coresets and sketches. Computing Research Repository, 2016. Google Scholar
  39. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99-121, 2000. Google Scholar
  40. R. Sharathkumar and Pankaj K. Agarwal. Algorithms for the transportation problem in geometric settings. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 306-317, 2012. Google Scholar
  41. R. Sharathkumar and Pankaj K. Agarwal. A near-linear time ε-approximation algorithm for geometric bipartite matching. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 385-394, 2012. Google Scholar
  42. Vikas Singh, Lopamudra Mukherjee, Jiming Peng, and Jinhui Xu. Ensemble clustering using semidefinite programming with applications. Machine learning, 79(1-2):177-200, 2010. Google Scholar
  43. Matthew Staib, Sebastian Claici, Justin Solomon, and Stefanie Jegelka. Parallel streaming wasserstein barycenters. arXiv preprint arXiv:1705.07443, 2017. Google Scholar
  44. Alexander Strehl and Joydeep Ghosh. Cluster ensembles-a knowledge reuse framework for combining partitionings. In AAAI/IAAI, pages 93-99, 2002. Google Scholar
  45. Kasturi R. Varadarajan and Xin Xiao. A near-linear algorithm for projective clustering integer points. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1329-1342, 2012. Google Scholar
  46. Jianbo Ye, Panruo Wu, James Z Wang, and Jia Li. Fast discrete distribution clustering using wasserstein barycenter with sparse support. IEEE Transactions on Signal Processing, 65(9):2317-2332, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail