Sparse Regression via Range Counting

Authors Jean Cardinal , Aurélien Ooms



PDF
Thumbnail PDF

File

LIPIcs.SWAT.2020.20.pdf
  • Filesize: 0.71 MB
  • 17 pages

Document Identifiers

Author Details

Jean Cardinal
  • Université libre de Bruxelles (ULB), Brussels, Belgium
Aurélien Ooms
  • BARC, University of Copenhagen, Denmark

Acknowledgements

The authors wish to thank the reviewers of earlier versions of this manuscript, who provided useful comments, as well as John Iacono and Stefan Langerman for insightful discussions.

Cite AsGet BibTex

Jean Cardinal and Aurélien Ooms. Sparse Regression via Range Counting. In 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 162, pp. 20:1-20:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SWAT.2020.20

Abstract

The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set S of n points in ℝ^d, a point y∈ ℝ^d, and an integer 2 ≤ k ≤ d, find an affine combination of at most k points of S that is nearest to y. We describe a O(n^{k-1} log^{d-k+2} n)-time randomized (1+ε)-approximation algorithm for this problem with d and ε constant. This is the first algorithm for this problem running in time o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP'18), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k = d = O(1), we provide a simple O_δ(n^{d-1+δ})-time deterministic exact algorithm, for any δ > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Computational geometry
  • Information systems → Nearest-neighbor search
Keywords
  • Sparse Linear Regression
  • Orthogonal Range Searching
  • Affine Degeneracy Testing
  • Nearest Neighbors
  • Hyperplane Arrangements

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Natan Rubin, and Micha Sharir. Approximate nearest neighbor search amid higher-dimensional flats. In 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria, pages 4:1-4:13, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.4.
  2. Michal Aharon, Michael Elad, and Alfred Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311-4322, November 2006. URL: https://doi.org/10.1109/TSP.2006.881199.
  3. Nir Ailon and Bernard Chazelle. Lower bounds for linear degeneracy testing. J. ACM, 52(2):157-171, 2005. URL: https://doi.org/10.1145/1059513.1059515.
  4. Rahul Arya, Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Optimal bound on the combinatorial complexity of approximating polytopes. In SODA, pages 786-805. SIAM, 2020. Google Scholar
  5. Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. On the combinatorial complexity of approximating polytopes, April 2016. URL: http://arxiv.org/abs/1604.01175v4.
  6. Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Approximate convex intersection detection with applications to width and minkowski sums, July 2018. URL: http://arxiv.org/abs/1807.00484v1.
  7. Sunil Arya, Guilherme Dias da Fonseca, and David M. Mount. Near-optimal epsilon-kernel construction and related problems. In Symposium on Computational Geometry, volume 77 of LIPIcs, pages 10:1-10:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. Google Scholar
  8. Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891-923, 1998. URL: https://doi.org/10.1145/293347.293348.
  9. Luis Barba, Jean Cardinal, John Iacono, Stefan Langerman, Aurélien Ooms, and Noam Solomon. Subquadratic algorithms for algebraic 3SUM. Discrete & Computational Geometry, 61(4):698-734, 2019. URL: https://doi.org/10.1007/s00454-018-0040-y.
  10. Ronen Basri, Tal Hassner, and Lihi Zelnik-Manor. Approximate nearest subspace search. IEEE Trans. Pattern Anal. Mach. Intell., 33(2):266-278, 2011. URL: https://doi.org/10.1109/TPAMI.2010.110.
  11. Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. Ann. Statist., 44(2):813-852, April 2016. URL: https://doi.org/10.1214/15-AOS1388.
  12. Dimitris Bertsimas, Jean Pauphilet, and Bart Van Parys. Sparse Regression: Scalable algorithms and empirical performance. arXiv e-prints, February 2019. URL: http://arxiv.org/abs/1902.06547.
  13. Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Information Theory, 52(2):489-509, 2006. URL: https://doi.org/10.1109/TIT.2005.862083.
  14. Bernard Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput., 17(3):427-462, 1988. URL: https://doi.org/10.1137/0217026.
  15. Bernard Chazelle. Cutting hyperplanes for divide-and-conquer. Discrete & Computational Geometry, 9:145-158, 1993. URL: https://doi.org/10.1007/BF02189314.
  16. Bernard Chazelle and Joel Friedman. A deterministic view of random sampling and its use in geometry. Combinatorica, 10(3):229-249, 1990. URL: https://doi.org/10.1007/BF02122778.
  17. G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations. Constructive Approximation, 13(1):57-98, March 1997. URL: https://doi.org/10.1007/BF02678430.
  18. David L. Donoho. Compressed sensing. IEEE Trans. Information Theory, 52(4):1289-1306, 2006. URL: https://doi.org/10.1109/TIT.2006.871582.
  19. Richard M. Dudley. Metric entropy of some classes of sets with differentiable boundaries. Journal of Approximation Theory, 10(3):227-236, 1974. URL: https://doi.org/10.1016/0021-9045(74)90120-8.
  20. Herbert Edelsbrunner, Raimund Seidel, and Micha Sharir. On the zone theorem for hyperplane arrangements. SIAM J. Comput., 22(2):418-429, 1993. URL: https://doi.org/10.1137/0222031.
  21. Jeff Erickson and Raimund Seidel. Better lower bounds on detecting affine and spherical degeneracies. Discrete & Computational Geometry, 13:41-57, 1995. URL: https://doi.org/10.1007/BF02574027.
  22. Dean P. Foster, Satyen Kale, and Howard J. Karloff. Online sparse linear regression. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, pages 960-970, 2016. URL: http://proceedings.mlr.press/v49/foster16.html.
  23. Dean P. Foster, Howard J. Karloff, and Justin Thaler. Variable selection is hard. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 696-709, 2015. URL: http://proceedings.mlr.press/v40/Foster15.html.
  24. Anka Gajentaan and Mark H. Overmars. On a class of o(n²) problems in computational geometry. Comput. Geom., 5:165-185, 1995. URL: https://doi.org/10.1016/0925-7721(95)00022-2.
  25. Sariel Har-Peled, Piotr Indyk, and Sepideh Mahabadi. Approximate sparse linear regression. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, pages 77:1-77:14, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.77.
  26. Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. The elements of statistical learning: data mining, inference, and prediction, 2nd Edition. Springer series in statistics. Springer, 2009. URL: http://www.worldcat.org/oclc/300478243.
  27. Avner Magen. Dimensionality reductions in 𝓁₂ that preserve volumes and distance to affine spaces. Discrete & Computational Geometry, 38(1):139-153, 2007. URL: https://doi.org/10.1007/s00454-007-1329-4.
  28. Sepideh Mahabadi. Approximate nearest line search in high dimensions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 337-354, 2015. URL: https://doi.org/10.1137/1.9781611973730.25.
  29. Stéphane Mallat and Zhifeng Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Processing, 41(12):3397-3415, 1993. URL: https://doi.org/10.1109/78.258082.
  30. Alan Miller. Subset Selection in Regression. Chapman and Hall/CRC, 2002. Google Scholar
  31. Balas K. Natarajan. Sparse approximate solutions to linear systems. SIAM J. Comput., 24(2):227-234, 1995. URL: https://doi.org/10.1137/S0097539792240406.
  32. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267-288, 1996. URL: http://www.jstor.org/stable/2346178.
  33. Csaba D. Tóth, Joseph O'Rourke, and Jacob E. Goodman, editors. Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC, 3rd edition, 2017. URL: https://doi.org/10.1201/9781315119601.
  34. Dan E. Willard. New data structures for orthogonal range queries. SIAM J. Comput., 14(1):232-253, 1985. Google Scholar