Probabilistic Smallest Enclosing Ball in High Dimensions via Subgradient Sampling

Authors Amer Krivošija, Alexander Munteanu



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2019.47.pdf
  • Filesize: 496 kB
  • 14 pages

Document Identifiers

Author Details

Amer Krivošija
  • Department of Computer Science, TU Dortmund, Germany
Alexander Munteanu
  • Department of Computer Science, TU Dortmund, Germany

Cite As Get BibTex

Amer Krivošija and Alexander Munteanu. Probabilistic Smallest Enclosing Ball in High Dimensions via Subgradient Sampling. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 47:1-47:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.SoCG.2019.47

Abstract

We study a variant of the median problem for a collection of point sets in high dimensions. This generalizes the geometric median as well as the (probabilistic) smallest enclosing ball (pSEB) problems. Our main objective and motivation is to improve the previously best algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear. This is achieved via a novel combination of sampling techniques for clustering problems in metric spaces with the framework of stochastic subgradient descent. As a result, the algorithm becomes applicable to shape fitting problems in Hilbert spaces of unbounded dimension via kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape fitting method to the probabilistic case. This is done by simulating the pSEB algorithm implicitly in the feature space induced by the kernel function.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Computational geometry
Keywords
  • geometric median
  • convex optimization
  • smallest enclosing ball
  • probabilistic data
  • support vector data description
  • kernel methods

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms, 6(4):59:1-59:26, 2010. Google Scholar
  2. Pankaj K. Agarwal and R. Sharathkumar. Streaming Algorithms for Extent Problems in High Dimensions. Algorithmica, 72(1):83-98, 2015. URL: http://dx.doi.org/10.1007/s00453-013-9846-4.
  3. Amir Beck and Shoham Sabach. Weiszfeld’s Method: Old and New Results. Journal of Optimization Theory and Applications, pages 1-40, 2014. Google Scholar
  4. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
  5. Mihai Bădoiu and Kenneth L. Clarkson. Smaller core-sets for balls. In Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 801-802, 2003. Google Scholar
  6. Mihai Bădoiu and Kenneth L. Clarkson. Optimal core-sets for balls. Computational Geometry, 40(1):14-22, 2008. Google Scholar
  7. Mihai Bădoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the 34th ACM Symposium on Theory of Computing, STOC, pages 250-257, 2002. Google Scholar
  8. M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, 69(3):653-656, 1982. Google Scholar
  9. Mark Cieliebak, Paola Flocchini, Giuseppe Prencipe, and Nicola Santoro. Distributed Computing by Mobile Robots: Gathering. SIAM Journal on Computing, 41(4):829-879, 2012. URL: http://dx.doi.org/10.1137/100796534.
  10. Michael B. Cohen, Yin Tat Lee, Gary L. Miller, Jakub Pachocki, and Aaron Sidford. Geometric median in nearly linear time. In Proceedings of the 48th ACM Symposium on Theory of Computing, STOC, pages 9-21, 2016. URL: http://dx.doi.org/10.1145/2897518.2897647.
  11. Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In Proceedings of the 27th ACM Symposium on Principles of Database Systems, PODS, pages 191-200, 2008. Google Scholar
  12. Pavlos S. Efraimidis. Weighted Random Sampling over Data Streams. In Algorithms, Probability, Networks, and Games, pages 183-195. Springer International, 2015. URL: http://dx.doi.org/10.1007/978-3-319-24024-4_12.
  13. Ashish Goel, Piotr Indyk, and Kasturi R. Varadarajan. Reductions among high dimensional proximity problems. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 769-778, 2001. Google Scholar
  14. Sudipto Guha and Kamesh Munagala. Exceeding expectations and clustering uncertain data. In Proceedings of the 28th ACM Symposium on Principles of Database Systems, PODS, pages 269-278, 2009. Google Scholar
  15. Lingxiao Huang and Jian Li. Stochastic k-Center and j-Flat-Center Problems. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 110-129, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.8.
  16. Lingxiao Huang, Jian Li, Jeff M. Phillips, and Haitao Wang. ε-Kernel Coresets for Stochastic Points. In Proceedings of the 24th Annual European Symposium on Algorithms, ESA, pages 50:1-50:18, 2016. Google Scholar
  17. Piotr Indyk. High-dimensional Computational Geometry. PhD thesis, Stanford University, 2000. Google Scholar
  18. Ilan Kremer, Noam Nisan, and Dana Ron. On Randomized One-Round Communication Complexity. Computational Complexity, 8(1):21-49, 1999. URL: http://dx.doi.org/10.1007/s000370050018.
  19. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the Association for Computing Machinery, 57(2):5:1-5:32, 2010. Google Scholar
  20. Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google Scholar
  21. Alexander Munteanu and Chris Schwiegelshohn. Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms. Künstliche Intelligenz, 32(1):37-53, 2018. URL: http://dx.doi.org/10.1007/s13218-017-0519-3.
  22. Alexander Munteanu, Christian Sohler, and Dan Feldman. Smallest enclosing ball for probabilistic data. In Proceedings of the 30th ACM Symposium on Computational Geometry, SoCG, pages 214-223, 2014. Google Scholar
  23. Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Springer, New York, 2004. Google Scholar
  24. Rasmus Pagh, Francesco Silvestri, Johan Sivertsen, and Matthew Skala. Approximate furthest neighbor with application to annulus query. Information Systems, 64:152-162, 2017. URL: http://dx.doi.org/10.1016/j.is.2016.07.006.
  25. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006. URL: http://www.worldcat.org/oclc/61285753.
  26. Bernhard Schölkopf and Alexander Johannes Smola. Learning with Kernels: support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning series. MIT Press, 2002. URL: http://www.worldcat.org/oclc/48970254.
  27. Marco Stolpe, Kanishka Bhaduri, Kamalika Das, and Katharina Morik. Anomaly Detection in Vertically Partitioned Data by Distributed Core Vector Machines. In Proceedings of Machine Learning and Knowledge Discovery in Databases, ECML/PKDD Part III, pages 321-336, 2013. URL: http://dx.doi.org/10.1007/978-3-642-40994-3_21.
  28. David M. J. Tax and Robert P. W. Duin. Support Vector Data Description. Machine Learning, 54(1):45-66, 2004. URL: http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49.
  29. Mikkel Thorup. Quick k-Median, k-Center, and Facility Location for Sparse Graphs. SIAM Journal on Computing, 34(2):405-432, 2005. Google Scholar
  30. Ivor W. Tsang, James T. Kwok, and Pak-Ming Cheung. Core Vector Machines: Fast SVM Training on Very Large Data Sets. Journal of Machine Learning Research, 6:363-392, 2005. Google Scholar
  31. Endre Weiszfeld. Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Mathematical Journal, 43(2):355-386, 1937. Google Scholar
  32. Endre Weiszfeld. On the point for which the sum of the distances to n given points is minimum. Annals of Operations Research, 167:7-41, 2009. Translated from the French original and annotated by Frank Plastria. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail