A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions

Author Hu Ding



PDF
Thumbnail PDF

File

LIPIcs.ESA.2020.38.pdf
  • Filesize: 0.75 MB
  • 21 pages

Document Identifiers

Author Details

Hu Ding
  • School of Computer Science and Technology, University of Science and Technology of China, Anhui, China

Acknowledgements

The author wants to thank the anonymous reviewers for their helpful comments and suggestions for improving the paper.

Cite AsGet BibTex

Hu Ding. A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 38:1-38:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ESA.2020.38

Abstract

Many real-world problems can be formulated as geometric optimization problems in high dimensions, especially in the fields of machine learning and data mining. Moreover, we often need to take into account of outliers when optimizing the objective functions. However, the presence of outliers could make the problems to be much more challenging than their vanilla versions. In this paper, we study the fundamental minimum enclosing ball (MEB) with outliers problem first; partly inspired by the core-set method from Bădoiu and Clarkson, we propose a sub-linear time bi-criteria approximation algorithm based on two novel techniques, the Uniform-Adaptive Sampling method and Sandwich Lemma. To the best of our knowledge, our result is the first sub-linear time algorithm, which has the sample size (i.e., the number of sampled points) independent of both the number of input points n and dimensionality d, for MEB with outliers in high dimensions. Furthermore, we observe that these two techniques can be generalized to deal with a broader range of geometric optimization problems with outliers in high dimensions, including flat fitting, k-center clustering, and SVM with outliers, and therefore achieve the sub-linear time algorithms for these problems respectively.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • minimum enclosing ball
  • outliers
  • shape fitting
  • high dimensions
  • sub-linear time

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Esther Ezra, and Kyle Fox. Geometric optimization revisited. In Bernhard Steffen and Gerhard J. Woeginger, editors, Computing and Software Science - State of the Art and Perspectives, volume 10000 of Lecture Notes in Computer Science, pages 66-84. Springer, 2019. Google Scholar
  2. Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric approximation via coresets. Combinatorial and Computational Geometry, 52:1-30, 2005. Google Scholar
  3. Pankaj K. Agarwal, Sariel Har-Peled, and Hai Yu. Robust shape fitting via peeling and grating coresets. Discrete & Computational Geometry, 39(1-3):38-58, 2008. Google Scholar
  4. Pankaj K. Agarwal and R. Sharathkumar. Streaming algorithms for extent problems in high dimensions. Algorithmica, 72(1):83-98, 2015. Google Scholar
  5. Alok Aggarwal, Hiroshi Imai, Naoki Katoh, and Subhash Suri. Finding k points with minimum diameter and related problems. Journal of algorithms, 12(1):38-56, 1991. Google Scholar
  6. Zeyuan Allen Zhu, Zhenyu Liao, and Yang Yuan. Optimization algorithms for faster computational geometry. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, pages 53:1-53:6, 2016. Google Scholar
  7. Noga Alon, Seannie Dar, Michal Parnas, and Dana Ron. Testing of clustering. SIAM Journal on Discrete Mathematics, 16(3):393-417, 2003. Google Scholar
  8. Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012, 2012. Google Scholar
  9. Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84:317-331, 2018. Google Scholar
  10. Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. Time bounds for selection. Journal of Computer and System Sciences, 7(4):448-461, 1973. Google Scholar
  11. Mihai Bădoiu and Kenneth L. Clarkson. Smaller core-sets for balls. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 801-802, 2003. Google Scholar
  12. Mihai Bădoiu and Kenneth L. Clarkson. Optimal core-sets for balls. Computational Geometry, 40(1):14-22, 2008. Google Scholar
  13. Mihai Bădoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the ACM Symposium on Theory of Computing (STOC), pages 250-257, 2002. Google Scholar
  14. Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. PVLDB, 12(7):766-778, 2019. Google Scholar
  15. Deeparnab Chakrabarty, Prachi Goyal, and Ravishankar Krishnaswamy. The non-uniform k-center problem. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, pages 67:1-67:15, 2016. Google Scholar
  16. Timothy M. Chan and Vinayak Pathak. Streaming and dynamic algorithms for minimum enclosing balls in high dimensions. Comput. Geom., 47(2):240-247, 2014. Google Scholar
  17. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM TIST, 2(3), 2011. Google Scholar
  18. Moses Charikar, Samir Khuller, David M Mount, and Giri Narasimhan. Algorithms for facility location problems with outliers. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 642-651. Society for Industrial and Applied Mathematics, 2001. Google Scholar
  19. Moses Charikar, Liadan O'Callaghan, and Rina Panigrahy. Better streaming algorithms for clustering problems. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 30-39. ACM, 2003. Google Scholar
  20. Kenneth L. Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms, 6(4):63, 2010. Google Scholar
  21. Kenneth L. Clarkson, Elad Hazan, and David P. Woodruff. Sublinear optimization for machine learning. J. ACM, 59(5):23:1-23:49, 2012. Google Scholar
  22. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20:273, 1995. Google Scholar
  23. David J. Crisp and Christopher J. C. Burges. A geometric interpretation of v-SVM classifiers. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, NIPS, pages 244-250. The MIT Press, 1999. Google Scholar
  24. Artur Czumaj and Christian Sohler. Sublinear-time approximation for clustering via random sampling. In International Colloquium on Automata, Languages, and Programming, pages 396-407. Springer, 2004. Google Scholar
  25. Artur Czumaj and Christian Sohler. Sublinear-time algorithms. In Property Testing - Current Research and Surveys, pages 41-64, 2010. Google Scholar
  26. Hu Ding. Minimum enclosing ball revisited: Stability and sub-linear time algorithms. CoRR, abs/1904.03796, 2019. Google Scholar
  27. Hu Ding. A sub-linear time framework for geometric optimization with outliers in high dimensions. CoRR, abs/2004.10090, 2020. URL: http://arxiv.org/abs/2004.10090.
  28. Hu Ding and Jinhui Xu. Random gradient descent tree: A combinatorial approach for svm with outliers. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 2561-2567, 2015. Google Scholar
  29. Hu Ding, Haikuo Yu, and Zixiu Wang. Greedy strategy works for k-center clustering with outliers and coreset construction. In 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany., pages 40:1-40:16, 2019. Google Scholar
  30. Alon Efrat, Micha Sharir, and Alon Ziv. Computing the smallest k-enclosing circle and related problems. Computational Geometry, 4(3):119-136, 1994. Google Scholar
  31. Dan Feldman, Chongyuan Xiang, Ruihao Zhu, and Daniela Rus. Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks. In Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2017, Pittsburgh, PA, USA, April 18-21, 2017, pages 3-15, 2017. Google Scholar
  32. Kaspar Fischer, Bernd Gärtner, and Martin Kutz. Fast smallest-enclosing-ball computation in high dimensions. In Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003, Proceedings, pages 630-641, 2003. Google Scholar
  33. Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95-110, 1956. Google Scholar
  34. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653-750, 1998. Google Scholar
  35. Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306, 1985. Google Scholar
  36. Ian J. Goodfellow, Patrick D. McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Commun. ACM, 61(7):56-66, 2018. Google Scholar
  37. Sudipto Guha, Yi Li, and Qin Zhang. Distributed partial clustering. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pages 143-152. ACM, 2017. Google Scholar
  38. Laszlo Gyongyosi and Sandor Imre. Geometrical analysis of physically allowed quantum cloning transformations for quantum cryptography. Information Sciences, 285:1-23, 2014. Google Scholar
  39. Sariel Har-Peled and Soham Mazumdar. Fast algorithms for computing the smallest k-enclosing circle. Algorithmica, 41(3):147-157, 2005. Google Scholar
  40. Sariel Har-Peled, Dan Roth, and Dav Zimak. Maximum margin coresets for active and noise tolerant learning. In IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pages 836-841, 2007. Google Scholar
  41. Sariel Har-Peled and Kasturi R. Varadarajan. Approximate shape fitting via linearization. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 66-73, 2001. Google Scholar
  42. Sariel Har-Peled and Kasturi R. Varadarajan. High-dimensional shape fitting in linear time. Discret. Comput. Geom., 32(2):269-288, 2004. Google Scholar
  43. Sariel Har-Peled and Yusu Wang. Shape fitting with outliers. SIAM Journal on Computing, 33(2):269-285, 2004. Google Scholar
  44. Elad Hazan, Tomer Koren, and Nati Srebro. Beating SGD: learning svms in sublinear time. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, pages 1233-1241, 2011. Google Scholar
  45. Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence, pages 43-58, 2011. Google Scholar
  46. Lingxiao Huang, Shaofeng Jiang, Jian Li, and Xuan Wu. Epsilon-coresets for clustering (with outliers) in doubling metrics. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 814-825, 2018. Google Scholar
  47. Piotr Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1-4, 1999, Atlanta, Georgia, USA, pages 428-434, 1999. Google Scholar
  48. Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, California, USA, pages 19-35, 2018. Google Scholar
  49. Michael Kerber and R. Sharathkumar. Approximate čech complex in low and high dimensions. In Algorithms and Computation - 24th International Symposium, ISAAC 2013, Hong Kong, China, December 16-18, 2013, Proceedings, pages 666-676, 2013. Google Scholar
  50. Piyush Kumar, Joseph S. B. Mitchell, and E. Alper Yildirim. Approximate minimum enclosing balls in high dimensions using core-sets. ACM Journal of Experimental Algorithmics, 8, 2003. Google Scholar
  51. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint, 2016. URL: http://arxiv.org/abs/1611.01236.
  52. Shi Li and Xiangyu Guo. Distributed k-clustering for data with heavy noise. In Advances in Neural Information Processing Systems, pages 7849-7857, 2018. Google Scholar
  53. Gustavo Malkomes, Matt J Kusner, Wenlin Chen, Kilian Q Weinberger, and Benjamin Moseley. Fast distributed k-center clustering with outliers on massive data. In Advances in Neural Information Processing Systems, pages 1063-1071, 2015. Google Scholar
  54. Jiří Matoušek. On enclosing k points by a circle. Information Processing Letters, 53(4):217-221, 1995. Google Scholar
  55. Richard Matthew McCutchen and Samir Khuller. Streaming algorithms for k-center clustering with outliers and with anonymity. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 165-178. Springer, 2008. Google Scholar
  56. Adam Meyerson, Liadan O'Callaghan, and Serge A. Plotkin. A k-median algorithm with running time independent of data size. Machine Learning, 56(1-3):61-87, 2004. Google Scholar
  57. Nina Mishra, Dan Oblinger, and Leonard Pitt. Sublinear time approximate clustering. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 439-447. Society for Industrial and Applied Mathematics, 2001. Google Scholar
  58. David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. On the least trimmed squares estimator. Algorithmica, 69(1):148-183, 2014. Google Scholar
  59. Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Locating a small cluster privately. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 413-427, 2016. Google Scholar
  60. Rina Panigrahy. Minimum enclosing polytope in high dimensions. arXiv preprint, 2004. URL: http://arxiv.org/abs/cs/0407020.
  61. Jeff M. Phillips. Coresets and sketches. Computing Research Repository, 2016. Google Scholar
  62. J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185-208, Cambridge, MA, 1999. MIT Press. Google Scholar
  63. Ronitt Rubinfeld. Sublinear time algorithms. In Survey. Citeseer, 2006. Google Scholar
  64. Ankan Saha, S. V. N. Vishwanathan, and Xinhua Zhang. New approximation algorithms for minimum enclosing convex shapes. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011, pages 1146-1160, 2011. Google Scholar
  65. B. Scholkopf, A. J. Smola, K. R. Muller, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207-1245, 2000. Google Scholar
  66. Shinya Suzumura, Kohei Ogawa, Masashi Sugiyama, and Ichiro Takeuchi. Outlier path: A homotopy algorithm for robust svm. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1098-1106, 2014. Google Scholar
  67. Pang-Ning Tan, Michael S. Steinbach, and Vipin Kumar. Introduction to Data Mining. Addison-Wesley, 2005. Google Scholar
  68. Ivor W. Tsang, James T. Kwok, and Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363-392, 2005. URL: http://jmlr.org/papers/v6/tsang05a.html.
  69. Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11-30. Springer, 2015. Google Scholar
  70. Linli Xu, Koby Crammer, and Dale Schuurmans. Robust support vector machine training via convex outlier ablation. In AAAI, pages 536-542. AAAI Press, 2006. Google Scholar
  71. Hai Yu, Pankaj K. Agarwal, Raghunath Poreddy, and Kasturi R. Varadarajan. Practical methods for shape fitting and kinetic data structures using coresets. Algorithmica, 52(3):378-402, 2008. Google Scholar
  72. Hamid Zarrabi-Zadeh and Asish Mukhopadhyay. Streaming 1-center with outliers in high dimensions. In Proceedings of the Canadian Conference on Computational Geometry (CCCG), pages 83-86, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail