Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data

Authors Kevin Buchin, Jeff M. Phillips, Pingfan Tang



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2018.16.pdf
  • Filesize: 0.57 MB
  • 14 pages

Document Identifiers

Author Details

Kevin Buchin
Jeff M. Phillips
Pingfan Tang

Cite AsGet BibTex

Kevin Buchin, Jeff M. Phillips, and Pingfan Tang. Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.SoCG.2018.16

Abstract

Robust estimators, like the median of a point set, are important for data analysis in the presence of outliers. We study robust estimators for locationally uncertain points with discrete distributions. That is, each point in a data set has a discrete probability distribution describing its location. The probabilistic nature of uncertain data makes it challenging to compute such estimators, since the true value of the estimator is now described by a distribution rather than a single point. We show how to construct and estimate the distribution of the median of a point set. Building the approximate support of the distribution takes near-linear time, and assigning probability to that support takes quadratic time. We also develop a general approximation technique for distributions of robust estimators with respect to ranges with bounded VC dimension. This includes the geometric median for high dimensions and the Siegel estimator for linear regression.
Keywords
  • Uncertain Data
  • Robust Estimators
  • Geometric Median
  • Tukey Median

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Boris Aronov, Sariel Har-Peled, Jeff M. Phillips, Ke Yi, and Wuzhou Zhang. Nearest-neighbor searching under uncertainty II. In PODS, 2013. Google Scholar
  2. Pankaj K. Agarwal, Siu-Wing Cheng, Yufei Tao, and Ke Yi. Indexing uncertain data. In PODS, 2009. Google Scholar
  3. Pankaj K. Agarwal, Alon Efrat, Swaminathan Sankararaman, and Wuzhou Zhang. Nearest-neighbor searching under uncertainty. In PODS, 2012. Google Scholar
  4. Pankaj K. Agarwal, Sariel Har-Peled, Subhash Suri, Hakan Yildiz, and Wuzhou Zhang. Convex hulls under uncertainty. In ESA, 2014. Google Scholar
  5. Greg Aloupis. Geometric measures of data depth. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. AMS, 2006. Google Scholar
  6. Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Euclidean k-medians and related problems. In STOC, 1998. Google Scholar
  7. Prosenjit Bose, Anil Maheshwari, and Pat Morin. Fast approximations for sums of distances clustering and the Fermet-Weber problem. CGTA, 24:135-146, 2003. Google Scholar
  8. Kevin Buchin, Jeff M. Phillips, and Pingfan Tang. Approximating the distribution of the median and other robust estimators on uncertain data. ArXiv e-prints, 2018. URL: http://arxiv.org/abs/1601.00630.
  9. R. Chandrasekaran and A. Tamir. Algebraic optimization: The Fermet-Weber location problem. Mathematical Programming, 46:219-224, 1990. Google Scholar
  10. Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google Scholar
  11. David Donoho and Peter J. Huber. The notion of a breakdown point. In P. Bickel, K. Doksum, and J. Hodges, editors, A Festschrift for Erich L. Lehmann, pages 157-184. 1983. Google Scholar
  12. Lingxiao Huang and Jian Li. Approximating the expected values for combinatorial optimization problems over stochastic points. In ICALP, 2015. Google Scholar
  13. Allan G. Jørgensen, Maarten Löffler, and Jeff M. Phillips. Geometric computation on indecisive points. In WADS, 2011. Google Scholar
  14. Jian Li, Barna Saha, and Amol Deshpande. A unified approach to ranking in probabilistic databases. In VLDB, 2009. Google Scholar
  15. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. Journal of Computer and System Science, 62:516-527, 2001. Google Scholar
  16. Maarten Löffler and Jeff Phillips. Shape fitting on point sets with probability distributions. In ESA, 2009. Google Scholar
  17. Hendrik P. Lopuhaa and Peter J. Rousseeuw. Breakdown points of affine equivaniant estimators of multivariate location and converiance matrices. The Annals of Statistics, 19:229-248, 1991. Google Scholar
  18. Peter J. Rousseeuw. Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, pages 283-297, 1985. Google Scholar
  19. Andrew F. Siegel. Robust regression using repeated medians. Biometrika, 82:242-244, 1982. Google Scholar
  20. J. W. Tukey. Mathematics and the picturing of data. In Proceedings of the 1974 International Congress of Mathematics, Vancouver, volume 2, pages 523-531, 1975. Google Scholar
  21. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Th. Probability and Applications, 16:264-280, 1971. Google Scholar
  22. Endre Weiszfeld. Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Mathematical Journal, First Series, 43:355-386, 1937. Google Scholar
  23. Ying Zhang, Xuemin Lin, Yufei Tao, and Wenjie Zhang. Uncertain location based range aggregates in a multi-dimensional space. In Proceedings 25th IEEE International Conference on Data Engineering, 2009. Google Scholar