epsilon-Kernel Coresets for Stochastic Points

Authors Lingxiao Huang, Jian Li, Jeff M. Phillips, Haitao Wang



PDF
Thumbnail PDF

File

LIPIcs.ESA.2016.50.pdf
  • Filesize: 0.64 MB
  • 18 pages

Document Identifiers

Author Details

Lingxiao Huang
Jian Li
Jeff M. Phillips
Haitao Wang

Cite AsGet BibTex

Lingxiao Huang, Jian Li, Jeff M. Phillips, and Haitao Wang. epsilon-Kernel Coresets for Stochastic Points. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 50:1-50:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ESA.2016.50

Abstract

With the dramatic growth in the number of application domains that generate probabilistic, noisy and uncertain data, there has been an increasing interest in designing algorithms for geometric or combinatorial optimization problems over such data. In this paper, we initiate the study of constructing epsilon-kernel coresets for uncertain points. We consider uncertainty in the existential model where each point's location is fixed but only occurs with a certain probability, and the locational model where each point has a probability distribution describing its location. An epsilon-kernel coreset approximates the width of a point set in any direction. We consider approximating the expected width (an epsilon-EXP-KERNEL), as well as the probability distribution on the width (an (epsilon, tau)-QUANT-KERNEL) for any direction. We show that there exists a set of O(epsilon^{-(d-1)/2}) deterministic points which approximate the expected width under the existential and locational models, and we provide efficient algorithms for constructing such coresets. We show, however, it is not always possible to find a subset of the original uncertain points which provides such an approximation. However, if the existential probability of each point is lower bounded by a constant, an epsilon-EXP-KERNEL is still possible. We also provide efficient algorithms for construct an (epsilon, tau)-QUANT-KERNEL coreset in nearly linear time. Our techniques utilize or connect to several important notions in probability and geometry, such as Kolmogorov distances, VC uniform convergence and Tukey depth, and may be useful in other geometric optimization problem in stochastic settings. Finally, combining with known techniques, we show a few applications to approximating the extent of uncertain functions, maintaining extent measures for stochastic moving points and some shape fitting problems under uncertainty.
Keywords
  • e-kernel
  • coreset
  • stochastic point
  • shape fitting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Abdullah, S. Daruki, and J.M. Phillips. Range counting coresets for uncertain data. In Proceedings 29th ACM Syposium on Computational Geometry, pages 223-232, 2013. Google Scholar
  2. Marcel R Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms (TALG), 6(4):59, 2010. Google Scholar
  3. P. Afshani, P.K. Agarwal, L. Arge, K.G. Larsen, and J.M. Phillips. (Approximate) uncertain skylines. In Proceedings of the 14th International Conference on Database Theory, pages 186-196, 2011. Google Scholar
  4. P.K. Agarwal, S.-W. Cheng, and K. Yi. Range searching on uncertain data. ACM Transactions on Algorithms (TALG), 8(4):43, 2012. Google Scholar
  5. P.K. Agarwal, A. Efrat, S. Sankararaman, and W. Zhang. Nearest-neighbor searching under uncertainty. In Proceedings of the 31st Symposium on Principles of Database Systems, pages 225-236, 2012. Google Scholar
  6. P.K. Agarwal, S. Har-Peled, S. Suri, H. Yıldız, and W. Zhang. Convex hulls under uncertainty. In Proceedings of the 22nd Annual European Symposium on Algorithms, pages 37-48, 2014. Google Scholar
  7. P.K. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. Journal of the ACM, 51(4):606-635, 2004. Google Scholar
  8. P.K. Agarwal, S. Har-Peled, and K.R. Varadarajan. Geometric approximation via coresets. Combinatorial and Computational Geometry, 52:1-30, 2005. Google Scholar
  9. P.K. Agarwal and M. Sharir. Arrangements and their applications. Handbook of Computational Geometry, J. Sack and J. Urrutia (eds.), pages 49-119. Elsevier, Amsterdam, The Netherlands, 2000. Google Scholar
  10. Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations. cambridge university press, 2009. Google Scholar
  11. D. Bandyopadhyay and J. Snoeyink. Almost-Delaunay simplices: Nearest neighbor relations for imprecise points. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms, pages 410-419, 2004. Google Scholar
  12. Saugata Basu, Richard Pollack, and M Roy. Algorithms in real algebraic geometry. AMC, 10:12, 2011. Google Scholar
  13. T.M. Chan. Faster core-set constructions and data-stream algorithms in fixed dimensions. Computational Geometry: Theory and Applications, 35:20-35, 2006. Google Scholar
  14. K. Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  15. R. Cheng, J. Chen, and X. Xie. Cleaning uncertain data with quality guarantees. Proceedings of the VLDB Endowment, 1(1):722-735, 2008. Google Scholar
  16. G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In Proceedings of the 27th Symposium on Principles of Database Systems, pages 191-200, 2008. Google Scholar
  17. A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. In Proceedings of the 17th ACM-SIAM symposium on Discrete algorithm, pages 1117-1126, 2006. Google Scholar
  18. X. Dong, A.Y. Halevy, and C. Yu. Data integration with uncertainty. In Proceedings of the 33rd International Conference on Very Large Data Bases, pages 687-698, 2007. Google Scholar
  19. A. Driemel, H. HAverkort, M. Löffler, and R.I. Silveira. Flow computations on imprecise terrains. Journal of Computational Geometry, 4:38-78, 2013. Google Scholar
  20. W. Evans and J. Sember. The possible hull of imprecise points. In Proceedings of the 23rd Canadian Conference on Computational Geometry, 2011. Google Scholar
  21. D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pages 361-370, 2009. Google Scholar
  22. D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, pages 569-578, 2011. Google Scholar
  23. Dan Feldman and Leonard J Schulman. Data reduction for weighted and outlier-resistant clustering. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1343-1354. SIAM, 2012. Google Scholar
  24. Martin Fink, John Hershberger, Nirman Kumar, and Subhash Suri. Hyperplane seperability and convexity of probabilistic point sets. In Proceedings Symposium on Computational Geometry, 2016. Google Scholar
  25. P.K. Ghosh and K.V. Kumar. Support function representation of convex bodies, its application in geometric computing, and some related representations. Computer Vision and Image Understanding, 72(3):379-403, 1998. Google Scholar
  26. S. Guha and K. Munagala. Exceeding expectations and clustering uncertain data. In Proceedings of the 28th Symposium on Principles of Database Systems, pages 269-278, 2009. Google Scholar
  27. L.J. Guibas, D. Salesin, and J. Stolfi. Constructing strongly convex approximate hulls with inaccurate primitives. Algorithmica, 9:534-560, 1993. Google Scholar
  28. S. Har-Peled. On the expected complexity of random convex hulls. arXiv:1111.5340, 2011. Google Scholar
  29. S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pages 291-300, 2004. Google Scholar
  30. Sariel Har-Peled and Yusu Wang. Shape fitting with outliers. SIAM Journal on Computing, 33(2):269-285, 2004. Google Scholar
  31. M. Held and J.S.B. Mitchell. Triangulating input-constrained planar point sets. Information Processing Letters, 109(1):54-56, 2008. Google Scholar
  32. Lingxiao Huang and Jian Li. Approximating the expected values for combinatorial optimization problems over stochastic points. In The 42nd International Colloquium on Automata, Languages, and Programming, pages 910-921. Springer, 2015. Google Scholar
  33. A.G. Jørgensen, M. Löffler, and J.M. Phillips. Geometric computation on indecisive points. In Proceedings of the 12th Algorithms and Data Structure Symposium, pages 536-547, 2011. Google Scholar
  34. P. Kamousi, T.M. Chan, and S. Suri. The stochastic closest pair problem and nearest neighbor search. In Proceedings of the 12th Algorithms and Data Structure Symposium, pages 548-559, 2011. Google Scholar
  35. P. Kamousi, T.M. Chan, and S. Suri. Stochastic minimum spanning trees in euclidean spaces. In Proceedings of the 27th Symposium on Computational Geometry, pages 65-74, 2011. Google Scholar
  36. H. Kruger. Basic measures for imprecise point sets in ℝ^d. Master’s thesis, Utrecht University, 2008. Google Scholar
  37. M. Langberg and L.J. Schulman. Universal ε-approximators for integrals. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 2010. Google Scholar
  38. J. Li and H. Wang. Range queries on uncertain data. Theoretical Computer Science, 609(1):32-48, 2016. Google Scholar
  39. M. Löffler and J. Phillips. Shape fitting on point sets with probability distributions. In Proceedings of the 17th European Symposium on Algorithms, pages 313-324, 2009. Google Scholar
  40. M. Löffler and J. Snoeyink. Delaunay triangulations of imprecise points in linear time after preprocessing. In Proceedings of the 24th Sympoium on Computational Geometry, pages 298-304, 2008. URL: http://dx.doi.org/10.1145/1377676.1377727.
  41. M. Löffler and M. van Kreveld. Approximating largest convex hulls for imprecise points. Journal of Discrete Algorithms, 6:583-594, 2008. Google Scholar
  42. J. Matoušek. Computing the center of planar point sets. Discrete and Computational Geometry, 6:221, 1991. Google Scholar
  43. A. Munteanu, C. Sohler, and D. Feldman. Smallest enclosing ball for probabilistic data. In Proceedings of the 30th Annual Symposium on Computational Geometry, 2014. Google Scholar
  44. T. Nagai and N. Tokura. Tight error bounds of geometric problems on convex objects with imprecise coordinates. In Jap. Conf. on Discrete and Comput. Geom., LNCS 2098, pages 252-263, 2000. Google Scholar
  45. Y. Ostrovsky-Berman and L. Joskowicz. Uncertainty envelopes. In Abstracts of the 21st European Workshop on Comput. Geom., pages 175-178, 2005. Google Scholar
  46. Jeff M. Phillips. Coresets and sketches. In Handbook of Discrete and Computational Geometry. CRC Press, 3rd edition, 2016. Chapter 49. Google Scholar
  47. D. Salesin, J. Stolfi, and L.J. Guibas. Epsilon geometry: building robust algorithms from imprecise computations. In Proceedings of the 5th Symposium on Computational Geometry, pages 208-217, 1989. Google Scholar
  48. R. Schneider. Convex bodies: the Brunn-Minkowski theory, volume 44. Cambridge University Press, 1993. Google Scholar
  49. S. Suri, K. Verbeek, and H. Yıldız. On the most likely convex hull of uncertain points. In Proceedings of the 21st European Symposium on Algorithms, pages 791-802, 2013. Google Scholar
  50. M. van Kreveld and M. Löffler. Largest bounding box, smallest diameter, and related problems on imprecise points. Computational Geometry: Theory and Applications, 43:419-433, 2010. URL: http://dx.doi.org/10.1016/j.comgeo.2009.03.007.
  51. V.N. Vapnik and A.Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability &Its Applications, 16(2):264-280, 1971. Google Scholar
  52. Jie Xue, Yuan Li, and Ravi Janardan. On the separability of stochasitic geometric objects, with applications. In Proceedings Symposium on Computational Geometry, 2016. Google Scholar
  53. H. Yu, P.K. Agarwal, R. Poreddy, and K. Varadarajan. Practical methods for shape fitting and kinetic data structures using coresets. Algorithmica, 52(378-402), 2008. Google Scholar
  54. K. Zheng, G. Trajcevski, X. Zhou, and P. Scheuermann. Probabilistic range queries for uncertain trajectories on road networks. In Proceedings of the 14th International Conference on Extending Database Technology, pages 283-294, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail