The VC Dimension of Metric Balls Under Fréchet and Hausdorff Distances
The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set X is a set of polygonal curves in R^d and the sets {R} are metric balls defined by curve similarity metrics, such as the Fréchet distance and the Hausdorff distance, as well as their discrete counterparts. We derive upper and lower bounds on the VC dimension that imply useful sampling bounds in the setting that the number of curves is large, but the complexity of the individual curves is small. Our upper bounds are either near-quadratic or near-linear in the complexity of the curves that define the ranges and they are logarithmic in the complexity of the curves that define the ground set.
VC dimension
Fréchet distance
Hausdorff distance
Theory of computation~Randomness, geometry and discrete structures
Theory of computation~Computational geometry
28:1-28:16
Regular Paper
A full version of the paper is available at https://arxiv.org/abs/1903.03211.
We thank Peyman Afshani for useful discussions on the topic of this paper. We also thank the organizers of the 2016 NII Shonan Meeting "Theory and Applications of Geometric Optimization" where this research was initiated.
Anne
Driemel
Anne Driemel
University of Bonn, Germany
Anne Driemel thanks the Hausdorff Center for Mathematics for their generous support and the Netherlands Organization for Scientific Research (NWO) for support under Veni Grant 10019853.
Jeff M.
Phillips
Jeff M. Phillips
University of Utah, Salt Lake City, USA
Jeff Phillips thanks his support from NSF CCF-1350888, ACI-1443046, CNS-1514520, CNS-1564287, and IIS-1816149. Part of the work was completed while visiting the Simons Institute for Theory of Computing.
Ioannis
Psarros
Ioannis Psarros
National & Kapodistrian University of Athens, Greece
This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme << Human Resources Development, Education and Lifelong Learning >> in the context of the project "Strengthening Human Resources Research Potential via Doctorate Research" (MIS-5000432), implemented by the State Scholarships Foundation (IKY).
10.4230/LIPIcs.SoCG.2019.28
Peyman Afshani and Anne Driemel. On the complexity of range searching among curves. CoRR, arXiv:1707.04789v1, 2017. URL: http://arxiv.org/abs/1707.04789.
http://arxiv.org/abs/1707.04789
Peyman Afshani and Anne Driemel. On the complexity of range searching among curves. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 898-917, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.58.
http://dx.doi.org/10.1137/1.9781611975031.58
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 1993.
Yohji Akama, Kei Irie, Akitoshi Kawamura, and Yasutaka Uwano. VC Dimension of Principal Component Analysis. Discrete &Computational Geometry, 44:589-598, 2010.
Helmut Alt, Bernd Behrends, and Johannes Blömer. Approximate matching of polygonal shapes. Annals of Mathematics and Artificial Intelligence, 13(3):251-265, September 1995. URL: http://dx.doi.org/10.1007/BF01530830.
http://dx.doi.org/10.1007/BF01530830
Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
Maria Astefanoaei, Paul Cesaretti, Panagiota Katsikouli, Mayank Goswami, and Rik Sarkar. Multi-resolution sketches and locality sensitive hashing for fast trajectory processing. In International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2018), volume 10, 2018.
Julian Baldus and Karl Bringmann. A Fast Implementation of Near Neighbors Queries for FréChet Distance (GIS Cup). In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL'17, pages 99:1-99:4, 2017.
Anselm Blumer, A. Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis Dimension. Journal of the ACM, 36:929-965, 1989.
Hervé Brönnimann and Michael T. Goodrich. Almost Optimal Set Covers in Finite VC-Dimension. Discrete &Computational Geometry, 1995.
Kevin Buchin, Yago Diez, Tom van Diggelen, and Wouter Meulemans. Efficient trajectory queries under the Fréchet distance (GIS Cup). In Proc. 25th Int. Conference on Advances in Geographic Information Systems (SIGSPATIAL), pages 101:1-101:4, 2017.
Bernard Chazelle and Emo Welzl. Quasi-Optimal Range Searching in Spaces of Finite VC-Dimension. Discrete and Computational Geometry, 4:467-489, 1989.
Monika Csikos, Andrey Kupavskii, and Nabil H. Mustafa. Optimal Bounds on the VC-dimension. arXiv:1807.07924, 2018. URL: http://arxiv.org/abs/1807.07924.
http://arxiv.org/abs/1807.07924
Mark De Berg, Atlas F Cook, and Joachim Gudmundsson. Fast Fréchet queries. Computational Geometry, 46(6):747-755, 2013.
Mark de Berg and Ali D. Mehrabi. Straight-Path Queries in Trajectory Data. In WALCOM: Algorithms and Computation - 9th Int. Workshop, WALCOM 2015, Dhaka, Bangladesh, February 26-28, 2015. Proceedings, pages 101-112, 2015. URL: http://dx.doi.org/10.1007/978-3-319-15612-5_10.
http://dx.doi.org/10.1007/978-3-319-15612-5_10
Anne Driemel, Amer Krivošija, and Christian Sohler. Clustering time series under the Fréchet distance. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 766-785, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch55.
http://dx.doi.org/10.1137/1.9781611974331.ch55
Anne Driemel, Jeff M. Phillips, and Ioannis Psarros. The VC dimension of metric balls under Fréchet and Hausdorff distances. CoRR, arXiv:1903.03211, 2019. URL: http://arxiv.org/abs/1903.03211.
http://arxiv.org/abs/1903.03211
Anne Driemel and Francesco Silvestri. Locally-sensitive hashing of curves. In 33st International Symposium on Computational Geometry, SoCG 2017, pages 37:1-37:16, 2017.
Fabian Dütsch and Jan Vahrenhold. A Filter-and-Refinement- Algorithm for Range Queries Based on the Fréchet Distance (GIS Cup). In Proc. 25th Int. Conference on Advances in Geographic Information Systems (SIGSPATIAL), pages 100:1-100:4, 2017.
Ioannis Z. Emiris and Ioannis Psarros. Products of Euclidean Metrics and Applications to Proximity Questions among Curves. In Proc. 34th Int. Symposium on Computational Geometry (SoCG), volume 99 of LIPIcs, pages 37:1-37:13, 2018.
Alexander Gilbers and Rolf Klein. A new upper bound for the VC-dimension of visibility regions. Computational Geometry: Theory and Applications, 74:61-74, 2014.
Paul W. Goldberg and Mark R. Jerrum. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers. Machine Learning, 18:131-148, 1995.
Joachim Gudmundsson and Michiel Smid. Fast algorithms for approximate Fréchet matching queries in geometric trees. Computational Geometry, 48(6):479-494, 2015. URL: http://dx.doi.org/10.1016/j.comgeo.2015.02.003.
http://dx.doi.org/10.1016/j.comgeo.2015.02.003
Sariel Har-Peled. Geometric Approximation Algorithms. American Mathematical Society, Boston, MA, USA, 2011.
Lingxiao Huang, Shaofeng Jiang, Jian Li, and Xuan Wu. Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 814-825. IEEE, 2018.
Sarang Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing Distributions and Shapes Using the Kernel Distance. In ACM SoCG, 2011.
Marek Karpinski and Angus Macintyre. Polynomial bounds for VC dimension of sigmoidal neural networks. In STOC, 1995.
Elmar Langetepe and Simone Lehmann. Exact VC-dimension for L1-visibility of points in simple polygons. arXiv:1705.01723, 2017. URL: http://arxiv.org/abs/1705.01723.
http://arxiv.org/abs/1705.01723
Frank Olken. Random Sampling in Databases. PhD thesis, University of California at Berkeley, 1993.
Norbert Sauer. On the Density of Families of Sets. Journal of Combinatorial Theory Series A, 13:145-147, 1972.
Saharon Shelah. A Combinatorial Problem; Stability and Order for Models and Theories in Infinitary Languages. Pacific Journal of Mathematics, 41(1), 1972.
Pavel Valtr. Guarding Galleries Where No Point Sees a Small Area. Israel Journal of Mathematics, 104:1-16, 1998.
Vladimir Vapnik and Alexey Chervonenkis. On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and its Applications, 16:264-280, 1971.
Vladimir N. Vapnik. Statistical Learning Theory. John Wiley &Sons, 1998.
Martin Werner and Dev Oliver. ACM SIGSPATIAL GIS Cup 2017: Range queries under Fréchet distance. SIGSPATIAL Special, 10(1):24-27, June 2018. URL: http://dx.doi.org/10.1145/3231541.3231549.
http://dx.doi.org/10.1145/3231541.3231549
Anne Driemel, Jeff M. Phillips, and Ioannis Psarros
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode