epsilon-Kernel Coresets for Stochastic Points
With the dramatic growth in the number of application domains that generate probabilistic, noisy and uncertain data, there has been an increasing interest in designing algorithms for geometric or combinatorial optimization problems over such data. In this paper, we initiate the study of constructing epsilon-kernel coresets for uncertain points. We consider uncertainty in the existential model where each point's location is fixed but only occurs with a certain probability, and the locational model where each point has a probability distribution describing its location. An epsilon-kernel coreset approximates the width of a point set in any direction. We consider approximating the expected width (an epsilon-EXP-KERNEL), as well as the probability distribution on the width (an (epsilon, tau)-QUANT-KERNEL) for any direction. We show that there exists a set of O(epsilon^{-(d-1)/2}) deterministic points which approximate the expected width under the existential and locational models, and we provide efficient algorithms for constructing such coresets. We show, however, it is not always possible to find a subset of the original uncertain points which provides such an approximation. However, if the existential probability of each point is lower bounded by a constant, an epsilon-EXP-KERNEL is still possible. We also provide efficient algorithms for construct an (epsilon, tau)-QUANT-KERNEL coreset in nearly linear time. Our techniques utilize or connect to several important notions in probability and geometry, such as Kolmogorov distances, VC uniform convergence and Tukey depth, and may be useful in other geometric optimization problem in stochastic settings. Finally, combining with known techniques, we show a few applications to approximating the extent of uncertain functions, maintaining extent measures for stochastic moving points and some shape fitting problems under uncertainty.
e-kernel
coreset
stochastic point
shape fitting
50:1-50:18
Regular Paper
Lingxiao
Huang
Lingxiao Huang
Jian
Li
Jian Li
Jeff M.
Phillips
Jeff M. Phillips
Haitao
Wang
Haitao Wang
10.4230/LIPIcs.ESA.2016.50
A. Abdullah, S. Daruki, and J.M. Phillips. Range counting coresets for uncertain data. In Proceedings 29th ACM Syposium on Computational Geometry, pages 223-232, 2013.
Marcel R Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms (TALG), 6(4):59, 2010.
P. Afshani, P.K. Agarwal, L. Arge, K.G. Larsen, and J.M. Phillips. (Approximate) uncertain skylines. In Proceedings of the 14th International Conference on Database Theory, pages 186-196, 2011.
P.K. Agarwal, S.-W. Cheng, and K. Yi. Range searching on uncertain data. ACM Transactions on Algorithms (TALG), 8(4):43, 2012.
P.K. Agarwal, A. Efrat, S. Sankararaman, and W. Zhang. Nearest-neighbor searching under uncertainty. In Proceedings of the 31st Symposium on Principles of Database Systems, pages 225-236, 2012.
P.K. Agarwal, S. Har-Peled, S. Suri, H. Yıldız, and W. Zhang. Convex hulls under uncertainty. In Proceedings of the 22nd Annual European Symposium on Algorithms, pages 37-48, 2014.
P.K. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. Journal of the ACM, 51(4):606-635, 2004.
P.K. Agarwal, S. Har-Peled, and K.R. Varadarajan. Geometric approximation via coresets. Combinatorial and Computational Geometry, 52:1-30, 2005.
P.K. Agarwal and M. Sharir. Arrangements and their applications. Handbook of Computational Geometry, J. Sack and J. Urrutia (eds.), pages 49-119. Elsevier, Amsterdam, The Netherlands, 2000.
Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations. cambridge university press, 2009.
D. Bandyopadhyay and J. Snoeyink. Almost-Delaunay simplices: Nearest neighbor relations for imprecise points. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms, pages 410-419, 2004.
Saugata Basu, Richard Pollack, and M Roy. Algorithms in real algebraic geometry. AMC, 10:12, 2011.
T.M. Chan. Faster core-set constructions and data-stream algorithms in fixed dimensions. Computational Geometry: Theory and Applications, 35:20-35, 2006.
K. Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009.
R. Cheng, J. Chen, and X. Xie. Cleaning uncertain data with quality guarantees. Proceedings of the VLDB Endowment, 1(1):722-735, 2008.
G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In Proceedings of the 27th Symposium on Principles of Database Systems, pages 191-200, 2008.
A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. In Proceedings of the 17th ACM-SIAM symposium on Discrete algorithm, pages 1117-1126, 2006.
X. Dong, A.Y. Halevy, and C. Yu. Data integration with uncertainty. In Proceedings of the 33rd International Conference on Very Large Data Bases, pages 687-698, 2007.
A. Driemel, H. HAverkort, M. Löffler, and R.I. Silveira. Flow computations on imprecise terrains. Journal of Computational Geometry, 4:38-78, 2013.
W. Evans and J. Sember. The possible hull of imprecise points. In Proceedings of the 23rd Canadian Conference on Computational Geometry, 2011.
D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pages 361-370, 2009.
D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, pages 569-578, 2011.
Dan Feldman and Leonard J Schulman. Data reduction for weighted and outlier-resistant clustering. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1343-1354. SIAM, 2012.
Martin Fink, John Hershberger, Nirman Kumar, and Subhash Suri. Hyperplane seperability and convexity of probabilistic point sets. In Proceedings Symposium on Computational Geometry, 2016.
P.K. Ghosh and K.V. Kumar. Support function representation of convex bodies, its application in geometric computing, and some related representations. Computer Vision and Image Understanding, 72(3):379-403, 1998.
S. Guha and K. Munagala. Exceeding expectations and clustering uncertain data. In Proceedings of the 28th Symposium on Principles of Database Systems, pages 269-278, 2009.
L.J. Guibas, D. Salesin, and J. Stolfi. Constructing strongly convex approximate hulls with inaccurate primitives. Algorithmica, 9:534-560, 1993.
S. Har-Peled. On the expected complexity of random convex hulls. arXiv:1111.5340, 2011.
S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pages 291-300, 2004.
Sariel Har-Peled and Yusu Wang. Shape fitting with outliers. SIAM Journal on Computing, 33(2):269-285, 2004.
M. Held and J.S.B. Mitchell. Triangulating input-constrained planar point sets. Information Processing Letters, 109(1):54-56, 2008.
Lingxiao Huang and Jian Li. Approximating the expected values for combinatorial optimization problems over stochastic points. In The 42nd International Colloquium on Automata, Languages, and Programming, pages 910-921. Springer, 2015.
A.G. Jørgensen, M. Löffler, and J.M. Phillips. Geometric computation on indecisive points. In Proceedings of the 12th Algorithms and Data Structure Symposium, pages 536-547, 2011.
P. Kamousi, T.M. Chan, and S. Suri. The stochastic closest pair problem and nearest neighbor search. In Proceedings of the 12th Algorithms and Data Structure Symposium, pages 548-559, 2011.
P. Kamousi, T.M. Chan, and S. Suri. Stochastic minimum spanning trees in euclidean spaces. In Proceedings of the 27th Symposium on Computational Geometry, pages 65-74, 2011.
H. Kruger. Basic measures for imprecise point sets in ℝ^d. Master’s thesis, Utrecht University, 2008.
M. Langberg and L.J. Schulman. Universal ε-approximators for integrals. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 2010.
J. Li and H. Wang. Range queries on uncertain data. Theoretical Computer Science, 609(1):32-48, 2016.
M. Löffler and J. Phillips. Shape fitting on point sets with probability distributions. In Proceedings of the 17th European Symposium on Algorithms, pages 313-324, 2009.
M. Löffler and J. Snoeyink. Delaunay triangulations of imprecise points in linear time after preprocessing. In Proceedings of the 24th Sympoium on Computational Geometry, pages 298-304, 2008. URL: http://dx.doi.org/10.1145/1377676.1377727.
http://dx.doi.org/10.1145/1377676.1377727
M. Löffler and M. van Kreveld. Approximating largest convex hulls for imprecise points. Journal of Discrete Algorithms, 6:583-594, 2008.
J. Matoušek. Computing the center of planar point sets. Discrete and Computational Geometry, 6:221, 1991.
A. Munteanu, C. Sohler, and D. Feldman. Smallest enclosing ball for probabilistic data. In Proceedings of the 30th Annual Symposium on Computational Geometry, 2014.
T. Nagai and N. Tokura. Tight error bounds of geometric problems on convex objects with imprecise coordinates. In Jap. Conf. on Discrete and Comput. Geom., LNCS 2098, pages 252-263, 2000.
Y. Ostrovsky-Berman and L. Joskowicz. Uncertainty envelopes. In Abstracts of the 21st European Workshop on Comput. Geom., pages 175-178, 2005.
Jeff M. Phillips. Coresets and sketches. In Handbook of Discrete and Computational Geometry. CRC Press, 3rd edition, 2016. Chapter 49.
D. Salesin, J. Stolfi, and L.J. Guibas. Epsilon geometry: building robust algorithms from imprecise computations. In Proceedings of the 5th Symposium on Computational Geometry, pages 208-217, 1989.
R. Schneider. Convex bodies: the Brunn-Minkowski theory, volume 44. Cambridge University Press, 1993.
S. Suri, K. Verbeek, and H. Yıldız. On the most likely convex hull of uncertain points. In Proceedings of the 21st European Symposium on Algorithms, pages 791-802, 2013.
M. van Kreveld and M. Löffler. Largest bounding box, smallest diameter, and related problems on imprecise points. Computational Geometry: Theory and Applications, 43:419-433, 2010. URL: http://dx.doi.org/10.1016/j.comgeo.2009.03.007.
http://dx.doi.org/10.1016/j.comgeo.2009.03.007
V.N. Vapnik and A.Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability &Its Applications, 16(2):264-280, 1971.
Jie Xue, Yuan Li, and Ravi Janardan. On the separability of stochasitic geometric objects, with applications. In Proceedings Symposium on Computational Geometry, 2016.
H. Yu, P.K. Agarwal, R. Poreddy, and K. Varadarajan. Practical methods for shape fitting and kinetic data structures using coresets. Algorithmica, 52(378-402), 2008.
K. Zheng, G. Trajcevski, X. Zhou, and P. Scheuermann. Probabilistic range queries for uncertain trajectories on road networks. In Proceedings of the 14th International Conference on Extending Database Technology, pages 283-294, 2011.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode