Identity Testing Under Label Mismatch

Authors Clément L. Canonne , Karl Wimmer



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2021.55.pdf
  • Filesize: 0.78 MB
  • 17 pages

Document Identifiers

Author Details

Clément L. Canonne
  • The University of Sydney, Australia
Karl Wimmer
  • Duquesne University, Pittsburgh, PA, USA

Cite AsGet BibTex

Clément L. Canonne and Karl Wimmer. Identity Testing Under Label Mismatch. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 55:1-55:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.55

Abstract

Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Mathematics of computing → Hypothesis testing and confidence interval computation
Keywords
  • distribution testing
  • property testing
  • permutations
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 3591-3599, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/1f36c15d6a3d18d52e8d493bc8187cb9-Abstract.html.
  2. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In 41st Annual Symposium on Foundations of Computer Science (Redondo Beach, CA, 2000), pages 259-269. IEEE Comput. Soc. Press, Los Alamitos, CA, 2000. URL: https://doi.org/10.1109/SFCS.2000.892113.
  3. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Symposium on Theory of Computing Conference, STOC'04, pages 381-390, New York, NY, USA, 2004. ACM. URL: https://doi.org/10.1145/1007352.1007414.
  4. Eric Blais, Clément L. Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. ACM Trans. Comput. Theory, 11(2):Art. 6, 37, 2019. URL: https://doi.org/10.1145/3305270.
  5. Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Number 9 in Graduate Surveys. Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
  6. Clément L. Canonne and Karl Wimmer. Testing data binnings. In Jaroslaw Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, volume 176 of LIPIcs, pages 24:1-24:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.24.
  7. Siu-on Chan, Ilias Diakonikolas, Paul Valiant, and Gregory Valiant. Optimal algorithms for testing closeness of discrete distributions. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1193-1203. SIAM, 2014. URL: https://doi.org/10.1137/1.9781611973402.88.
  8. Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio, Gregory Valiant, and Paul Valiant. Testing k-modal distributions: Optimal algorithms via reductions. In Proceedings of SODA, pages 1833-1852. Society for Industrial and Applied Mathematics (SIAM), 2013. URL: http://dl.acm.org/citation.cfm?id=2627817.2627948.
  9. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-optimal identity testing with high probability. In 45th International Colloquium on Automata, Languages, and Programming, volume 107 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 41, 14. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2018. Google Scholar
  10. Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Testing identity of structured distributions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1841-1854. SIAM, Philadelphia, PA, 2015. URL: https://doi.org/10.1137/1.9781611973730.123.
  11. Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27:642-669, 1956. URL: https://doi.org/10.1214/aoms/1177728174.
  12. Karl Pearson F.R.S. X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302):157-175, 1900. URL: https://doi.org/10.1080/14786440009463897.
  13. Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. In Computational Complexity and Property Testing, volume 12050 of Lecture Notes in Computer Science, pages 152-172. Springer, 2020. Google Scholar
  14. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653-750, July 1998. Google Scholar
  15. Dayu Huang and Sean Meyn. Generalized error exponents for small sample universal hypothesis testing. IEEE Trans. Inform. Theory, 59(12):8157-8181, 2013. URL: https://doi.org/10.1109/TIT.2013.2283266.
  16. Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the L₁ distance. IEEE Trans. Inf. Theory, 64(10):6672-6706, 2018. Google Scholar
  17. Pascal Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab., 18(3):1269-1283, 1990. Google Scholar
  18. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inform. Theory, 54(10):4750-4755, 2008. URL: https://doi.org/10.1109/TIT.2008.928987.
  19. Kazuhiro Suzuki, Dongvu Tonien, Kaoru Kurosawa, and Koji Toyota. Birthday paradox for multi-collisions. In Information security and cryptology - ICISC 2006, volume 4296 of Lecture Notes in Comput. Sci., pages 29-40. Springer, Berlin, 2006. URL: https://doi.org/10.1007/11927587_5.
  20. Gregory Valiant and Paul Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC), 17:179, 2010. URL: http://eccc.hpi-web.de/report/2010/179.
  21. Gregory Valiant and Paul Valiant. Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC), 17:180, 2010. URL: http://eccc.hpi-web.de/report/2010/180.
  22. Gregory Valiant and Paul Valiant. The power of linear estimators. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science - FOCS 2011, pages 403-412. IEEE Computer Soc., Los Alamitos, CA, 2011. URL: https://doi.org/10.1109/FOCS.2011.81.
  23. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM J. Comput., 46(1):429-455, 2017. URL: https://doi.org/10.1137/151002526.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail