Identity Testing Under Label Mismatch

Canonne, Clément L.; Wimmer, Karl

doi:10.4230/LIPIcs.ISAAC.2021.55

File

Author Details

Clément L. Canonne

The University of Sydney, Australia

Karl Wimmer

Duquesne University, Pittsburgh, PA, USA

Cite AsGet BibTex

Clément L. Canonne and Karl Wimmer. Identity Testing Under Label Mismatch. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 55:1-55:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.55

Abstract

Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model.

Subject Classification

ACM Subject Classification

Theory of computation → Streaming, sublinear and near linear time algorithms
Mathematics of computing → Hypothesis testing and confidence interval computation

Keywords

distribution testing
property testing
permutations
lower bounds

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 3591-3599, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/1f36c15d6a3d18d52e8d493bc8187cb9-Abstract.html.
Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In 41st Annual Symposium on Foundations of Computer Science (Redondo Beach, CA, 2000), pages 259-269. IEEE Comput. Soc. Press, Los Alamitos, CA, 2000. URL: https://doi.org/10.1109/SFCS.2000.892113.
Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Symposium on Theory of Computing Conference, STOC'04, pages 381-390, New York, NY, USA, 2004. ACM. URL: https://doi.org/10.1145/1007352.1007414.
Eric Blais, Clément L. Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. ACM Trans. Comput. Theory, 11(2):Art. 6, 37, 2019. URL: https://doi.org/10.1145/3305270.
Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Number 9 in Graduate Surveys. Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
Clément L. Canonne and Karl Wimmer. Testing data binnings. In Jaroslaw Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, volume 176 of LIPIcs, pages 24:1-24:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.24.
Siu-on Chan, Ilias Diakonikolas, Paul Valiant, and Gregory Valiant. Optimal algorithms for testing closeness of discrete distributions. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1193-1203. SIAM, 2014. URL: https://doi.org/10.1137/1.9781611973402.88.
Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio, Gregory Valiant, and Paul Valiant. Testing k-modal distributions: Optimal algorithms via reductions. In Proceedings of SODA, pages 1833-1852. Society for Industrial and Applied Mathematics (SIAM), 2013. URL: http://dl.acm.org/citation.cfm?id=2627817.2627948.
Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-optimal identity testing with high probability. In 45th International Colloquium on Automata, Languages, and Programming, volume 107 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 41, 14. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2018.
Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Testing identity of structured distributions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1841-1854. SIAM, Philadelphia, PA, 2015. URL: https://doi.org/10.1137/1.9781611973730.123.
Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27:642-669, 1956. URL: https://doi.org/10.1214/aoms/1177728174.
Karl Pearson F.R.S. X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302):157-175, 1900. URL: https://doi.org/10.1080/14786440009463897.
Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. In Computational Complexity and Property Testing, volume 12050 of Lecture Notes in Computer Science, pages 152-172. Springer, 2020.
Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653-750, July 1998.
Dayu Huang and Sean Meyn. Generalized error exponents for small sample universal hypothesis testing. IEEE Trans. Inform. Theory, 59(12):8157-8181, 2013. URL: https://doi.org/10.1109/TIT.2013.2283266.
Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the L₁ distance. IEEE Trans. Inf. Theory, 64(10):6672-6706, 2018.
Pascal Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab., 18(3):1269-1283, 1990.
Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inform. Theory, 54(10):4750-4755, 2008. URL: https://doi.org/10.1109/TIT.2008.928987.
Kazuhiro Suzuki, Dongvu Tonien, Kaoru Kurosawa, and Koji Toyota. Birthday paradox for multi-collisions. In Information security and cryptology - ICISC 2006, volume 4296 of Lecture Notes in Comput. Sci., pages 29-40. Springer, Berlin, 2006. URL: https://doi.org/10.1007/11927587_5.
Gregory Valiant and Paul Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC), 17:179, 2010. URL: http://eccc.hpi-web.de/report/2010/179.
Gregory Valiant and Paul Valiant. Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC), 17:180, 2010. URL: http://eccc.hpi-web.de/report/2010/180.
Gregory Valiant and Paul Valiant. The power of linear estimators. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science - FOCS 2011, pages 403-412. IEEE Computer Soc., Los Alamitos, CA, 2011. URL: https://doi.org/10.1109/FOCS.2011.81.
Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM J. Comput., 46(1):429-455, 2017. URL: https://doi.org/10.1137/151002526.

Identity Testing Under Label Mismatch

Authors Clément L. Canonne , Karl Wimmer

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Identity Testing Under Label Mismatch

Authors Clément L. Canonne , Karl Wimmer

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message