SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms

Authors Lingxiao Huang, Yifei Jin, Jian Li



PDF
Thumbnail PDF

File

LIPIcs.SWAT.2018.25.pdf
  • Filesize: 0.48 MB
  • 13 pages

Document Identifiers

Author Details

Lingxiao Huang
  • École polytechnique fédérale de Lausanne, CH-1015, Lausanne, Switzerland
Yifei Jin
  • Tsinghua University, Beijing 100084, China
Jian Li
  • Tsinghua University , Beijing 100084, China

Cite As Get BibTex

Lingxiao Huang, Yifei Jin, and Jian Li. SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms. In 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 101, pp. 25:1-25:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.SWAT.2018.25

Abstract

We study two important SVM variants: hard-margin SVM (for linearly separable cases) and nu-SVM (for linearly non-separable cases). We propose new algorithms from the perspective of saddle point optimization. Our algorithms achieve (1-epsilon)-approximations with running time O~(nd+n sqrt{d / epsilon}) for both variants, where n is the number of points and d is the dimensionality. To the best of our knowledge, the current best algorithm for nu-SVM is based on quadratic programming approach which requires Omega(n^2 d) time in worst case [Joachims, 1998; Platt, 1999]. In the paper, we provide the first nearly linear time algorithm for nu-SVM. The current best algorithm for hard margin SVM achieved by Gilbert algorithm [Gärtner and Jaggi, 2009] requires O(nd / epsilon) time. Our algorithm improves the running time by a factor of sqrt{d}/sqrt{epsilon}. Moreover, our algorithms can be implemented in the distributed settings naturally. We prove that our algorithms require O~(k(d +sqrt{d/epsilon})) communication cost, where k is the number of clients, which almost matches the theoretical lower bound. Numerical experiments support our theory and show that our algorithms converge faster on high dimensional, large and dense data sets, as compared to previous methods.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Continuous optimization
  • Computing methodologies → Support vector machines
Keywords
  • nu-SVM
  • hard-margin SVM
  • saddle point optimization
  • distributed algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nir Ailon and Bernard Chazelle. Faster dimension reduction. Communications of the ACM, 53(2):97-104, 2010. Google Scholar
  2. Zeyuan Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. STOC 16, 2016. Google Scholar
  3. Zeyuan Allen-Zhu, Zhenyu Liao, and Yang Yuan. Optimization algorithms for faster computational geometry. In LIPIcs, volume 55, 2016. Google Scholar
  4. Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121-164, 2012. Google Scholar
  5. Kristin P Bennett and Erin J Bredensteiner. Duality and geometry in svm classifiers. In ICML, pages 57-64, 2000. Google Scholar
  6. Marshall W. Bern and David Eppstein. Optimization over zonotopes and training support vector machines. In WADS, pages 111-121, 2001. Google Scholar
  7. Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144-152. ACM, 1992. Google Scholar
  8. Kenneth L Clarkson, Elad Hazan, and David P Woodruff. Sublinear optimization for machine learning. Journal of the ACM (JACM), 59(5):23, 2012. Google Scholar
  9. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273-297, 1995. Google Scholar
  10. DJ Crisp and CJC Burges. A geometry interpretation of μ-svm classifiers. NIPS, pages 244-251, 2000. Google Scholar
  11. John Duchi and Yoram Singer. Efficient online and batch learning using forward backward splitting. JMLR, 10(Dec):2899-2934, 2009. Google Scholar
  12. Pedro A Forero, Alfonso Cano, and Georgios B Giannakis. Consensus-based distributed support vector machines. JMLR, 11(May):1663-1707, 2010. Google Scholar
  13. Vojtěch Franc and Soeren Sonnenburg. Optimized cutting plane algorithm for support vector machines. In ICML 08, pages 320-327. ACM, 2008. Google Scholar
  14. Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast monte-carlo algorithms for finding low-rank approximations. Journal of the ACM (JACM), 51(6):1025-1041, 2004. Google Scholar
  15. Bernd Gärtner and Martin Jaggi. Coresets for polytope distance. In SOCG 09, pages 33-42. ACM, 2009. Google Scholar
  16. Elmer G Gilbert. An iterative procedure for computing the minimum of a quadratic form on a convex set. SIAM Journal on Control, 4(1):61-80, 1966. Google Scholar
  17. Hans Peter Graf, Eric Cosatto, Leon Bottou, Igor Durdanovic, and Vladimir Vapnik. Parallel support vector machines: The cascade svm. In NIPS, volume 17, 2004. Google Scholar
  18. Sariel Har-Peled, Dan Roth, and Dav Zimak. Maximum margin coresets for active and noise tolerant learning. In IJCAI, pages 836-841, 2007. Google Scholar
  19. Elad Hazan, Tomer Koren, and Nati Srebro. Beating sgd: Learning svms in sublinear time. In Advances in Neural Information Processing Systems, pages 1233-1241, 2011. Google Scholar
  20. Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S Sathiya Keerthi, and Sellamanickam Sundararajan. A dual coordinate descent method for large-scale linear svm. In ICML 08, pages 408-415. ACM, 2008. Google Scholar
  21. Thorsten Joachims. Making large-scale svm learning practical. Technical report, Technical Report, SFB 475: Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität Dortmund, 1998. Google Scholar
  22. Thorsten Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217-226. ACM, 2006. Google Scholar
  23. Anatoli Juditsky, Fatma Kılınç Karzan, and Arkadi Nemirovski. Randomized first order algorithms with applications to 𝓁₁-minimization. Mathematical Programming, 142(1-2):269-310, 2013. Google Scholar
  24. Jyrki Kivinen, Alexander J Smola, and Robert C Williamson. Online learning with kernels. IEEE transactions on signal processing, 52(8):2165-2176, 2004. Google Scholar
  25. Eyal Kushilevitz. Communication complexity. Advances in Computers, 44:331-360, 1997. Google Scholar
  26. Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu. Distributed and robust support vector machine. In LIPIcs, volume 64, 2016. Google Scholar
  27. Jorge López and José R Dorronsoro. Linear convergence rate for the mdm algorithm for the nearest point problem. Pattern Recognition, 48(4):1510-1522, 2015. Google Scholar
  28. Yumao Lu, Vwani Roychowdhury, and Lieven Vandenberghe. Distributed parallel support vector machines in strongly connected networks. IEEE Transactions on Neural Networks, 19(7):1167-1178, 2008. Google Scholar
  29. BF Mitchell, Vladimir Fedorovich Dem'yanov, and VN Malozemov. Finding the point of a polyhedron closest to the origin. SIAM Journal on Control, 12(1):19-26, 1974. Google Scholar
  30. A Navia-Vazquez, D Gutierrez-Gonzalez, Emilio Parrado-Hernández, and JJ Navarro-Abellan. Distributed support vector machines. IEEE Trans. Neural Networks, 17(4):1091-1097, 2006. Google Scholar
  31. Yu Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235-249, 2005. Google Scholar
  32. Alon Orlitsky and Abbas El Gamal. Average and randomized communication complexity. IEEE Transactions on Information Theory, 36(1):3-16, 1990. Google Scholar
  33. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. JMLR, 12(Oct):2825-2830, 2011. Google Scholar
  34. John C Platt. 12 fast training of support vector machines using sequential minimal optimization. Advances in kernel methods, pages 185-208, 1999. Google Scholar
  35. Bernhard Schölkopf, Alex J Smola, Robert C Williamson, and Peter L Bartlett. New support vector algorithms. Neural computation, 12(5):1207-1245, 2000. Google Scholar
  36. Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: primal estimated sub-gradient solver for SVM. Math. Program., 127(1):3-30, 2011. Google Scholar
  37. Alexander J Smola, SVN Vishwanathan, Quoc V Le, et al. Bundle methods for machine learning. In NIPS, volume 20, pages 1377-1384, 2007. Google Scholar
  38. Ivor W Tsang, Andras Kocsor, and James T Kwok. Simpler core vector machines with enclosing balls. In ICML 07, pages 911-918. ACM, 2007. Google Scholar
  39. Ivor W Tsang, James T Kwok, and Pak-Ming Cheung. Core vector machines: Fast svm training on very large data sets. JMLR, 6(Apr):363-392, 2005. Google Scholar
  40. Andrew Chi-Chih Yao. Some complexity questions related to distributive computing (preliminary report). In STOC 79, pages 209-213. ACM, 1979. Google Scholar
  41. Caoxie Zhang, Honglak Lee, and Kang G Shin. Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In Artificial Intelligence and Statistics, pages 1398-1406, 2012. Google Scholar
  42. Yuchen Zhang and Xiao Lin. Stochastic primal-dual coordinate method for regularized empirical risk minimization. In ICML, pages 353-361, 2015. Google Scholar
  43. Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor J Hastie. 1-norm support vector machines. In NIPS, pages 49-56, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail