Two Party Distribution Testing: Communication and Security

Authors Alexandr Andoni, Tal Malkin, Negev Shekel Nosatzki



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2019.15.pdf
  • Filesize: 0.56 MB
  • 16 pages

Document Identifiers

Author Details

Alexandr Andoni
  • Columbia University, New York City, NY, USA
Tal Malkin
  • Columbia University, New York City, NY, USA
Negev Shekel Nosatzki
  • Columbia University, New York City, NY, USA

Acknowledgements

We thank Devanshi Nishit Vyas for her contribution to some of the initial work which led to this paper. We thank Clement Canonne for invaluable comments on an early draft of the manuscript. We thank Yuval Ishai for helpful discussions. Work supported in part by Simons Foundation (#491119), NSF grants CCF-1617955 and CCF-1740833.

Cite AsGet BibTex

Alexandr Andoni, Tal Malkin, and Negev Shekel Nosatzki. Two Party Distribution Testing: Communication and Security. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 15:1-15:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICALP.2019.15

Abstract

We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are epsilon-far (in the l_1 distance). This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions. Despite being a natural constraint in applications, the two-party setting has previously evaded attention. We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other’s input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent or epsilon-far from being independent. Our contribution is three-fold: 1) We show how to gain communication efficiency given more samples, beyond the information-theoretic bound on t. The gain is polynomially better than what one would obtain via adapting one-party algorithms. 2) We prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight Omega(sqrt{m}) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs are a set of i.i.d. samples. 3) We define the concept of secure distribution testing, and provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Hypothesis testing and confidence interval computation
Keywords
  • distribution testing
  • communication complexity
  • security

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Clément L. Canonne, Cody Freitag, and Himanshu Tyagi. Test without Trust: Optimal Locally Private Distribution Testing. CoRR, abs/1808.02174, 2018. Google Scholar
  2. Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Distributed Simulation and Distributed Inference. CoRR, abs/1804.06952, 2018. URL: http://arxiv.org/abs/1804.06952.
  3. Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. In Proceedings of the 24th Annual Conference on Learning Theory, pages 47-68, 2011. Google Scholar
  4. Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan, and Ananda Suresh. Competitive classification and closeness testing. In Conference on Learning Theory, pages 22-1, 2012. Google Scholar
  5. Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Sublinear algorithms for outlier detection and generalized closeness testing. In Information Theory (ISIT), 2014 IEEE International Symposium on, pages 3200-3204. IEEE, 2014. Google Scholar
  6. Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially Private Testing of Identity and Closeness of Discrete Distributions. CoRR, abs/1707.05128, 2017. URL: http://arxiv.org/abs/1707.05128.
  7. Rudolf Ahlswede and Imre Csiszár. Hypothesis testing with communication constraints. IEEE transactions on information theory, 32(4), 1986. Google Scholar
  8. Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. Differentially Private Identity and Closeness Testing of Discrete Distributions. CoRR, abs/1707.05497, 2017. URL: http://arxiv.org/abs/1707.05497.
  9. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comp. Sys. Sci., 58:137-147, 1999. Previously appeared in STOC'96. Google Scholar
  10. S Amari et al. Statistical inference under multiterminal data compression. IEEE Transactions on Information Theory, 44(6):2300-2324, 1998. Google Scholar
  11. Andrew D Barbour. Stein’s method and Poisson process convergence. Journal of Applied Probability, 25(A):175-184, 1988. Google Scholar
  12. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM (JACM), 60(1):4, 2013. Google Scholar
  13. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing Closeness of Discrete Distributions. J. ACM, 60(1):4:1-4:25, February 2013. URL: http://dx.doi.org/10.1145/2432622.2432626.
  14. Amos Beimel, Paz Carmi, Kobbi Nissim, and Enav Weinreb. Private Approximation of Search Problems. SIAM J. Comput., 38(5):1728-1760, 2008. Google Scholar
  15. Amos Beimel, Renen Hallak, and Kobbi Nissim. Private Approximation of Clustering and Vertex Cover. Computational Complexity, 18(3):435-494, 2009. Google Scholar
  16. Bhaswar Bhattacharya and Gregory Valiant. Testing closeness with unequal sized samples. In Advances in Neural Information Processing Systems, pages 2611-2619, 2015. Google Scholar
  17. Elette Boyle, Niv Gilboa, and Yuval Ishai. Breaking the Circuit Size Barrier for Secure Computation Under DDH. In CRYPTO (1), volume 9814 of Lecture Notes in Computer Science, pages 509-539. Springer, 2016. Google Scholar
  18. Elette Boyle, Niv Gilboa, and Yuval Ishai. Group-Based Secure Computation: Optimizing Rounds, Communication, and Computation. In EUROCRYPT (2), volume 10211 of Lecture Notes in Computer Science, pages 163-193, 2017. Google Scholar
  19. Mark Braverman, Ankit Garg, Tengyu Ma, Huy L Nguyen, and David P Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1011-1020. ACM, 2016. Google Scholar
  20. Sergey Bravyi, Aram W Harrow, and Avinatan Hassidim. Quantum algorithms for testing properties of distributions. IEEE Transactions on Information Theory, 57(6):3971-3981, 2011. Google Scholar
  21. Bryan Cai, Constantinos Daskalakis, and Gautam Kamath. Priv'IT: Private and Sample Efficient Identity Testing. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.10127.
  22. Clément L Canonne. A survey on distribution testing: Your data is big. but is it blue? In Electronic Colloquium on Computational Complexity (ECCC), volume 22(63), pages 1-9, 2015. Google Scholar
  23. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM Journal on Computing, 41(5):1299-1317, 2012. Google Scholar
  24. Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1193-1203. Society for Industrial and Applied Mathematics, 2014. Google Scholar
  25. Thomas M Cover. Hypothesis testing with finite statistics. The Annals of Mathematical Statistics, 40(3):828-835, 1969. Google Scholar
  26. Michael Crouch, Andrew McGregor, Gregory Valiant, and David P Woodruff. Stochastic streams: Sample complexity vs. space complexity. In LIPIcs-Leibniz International Proceedings in Informatics, volume 57. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  27. Yuval Dagan and Ohad Shamir. Detecting Correlations with Little Memory and Communication. CoRR, abs/1803.01420, 2018. URL: http://arxiv.org/abs/1803.01420.
  28. I. Diakonikolas, E. Grigorescu, J. Li, A. Natarajan, K. Onak, and L. Schmidt. Communication-Efficient Distributed Learning of Discrete Distributions. In Advances in Neural Information Processing Systems, 2017. To appear. Google Scholar
  29. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Collision-based testers are optimal for uniformity and closeness. arXiv preprint, 2016. URL: http://arxiv.org/abs/1611.03579.
  30. Ilias Diakonikolas, Moritz Hardt, and Ludwig Schmidt. Differentially private learning of structured discrete distributions. In Advances in Neural Information Processing Systems, pages 2566-2574, 2015. Google Scholar
  31. Ilias Diakonikolas and Daniel M Kane. A new approach for testing properties of discrete distributions. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 685-694. IEEE, 2016. Google Scholar
  32. Joan Feigenbaum, Yuval Ishai, Tal Malkin, Kobbi Nissim, Martin J. Strauss, and Rebecca N. Wright. Secure Multiparty Computation of Approximations. ACM Trans. Algorithms, 2(3):435-472, July 2006. URL: http://dx.doi.org/10.1145/1159892.1159900.
  33. Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990-1002. ACM, 2018. Google Scholar
  34. Craig Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages 169-178. ACM, 2009. Google Scholar
  35. Oded Goldreich. Introduction to Property Testing (working draft), 2017. URL: https://www.wisdom.weizmann.ac.il/~oded/PDF/pt-v3.pdf.
  36. Oded Goldreich and Dana Ron. On Testing Expansion in Bounded-Degree Graphs. ECCC 7(20), 2000. Google Scholar
  37. Michael Gutman. Asymptotically optimal classification for multiple tests with empirically observed statistics. IEEE Transactions on Information Theory, 35(2):401-408, 1989. Google Scholar
  38. U. Hadar, J. Liu, Y. Polyanskiy, and O. Shayevitz. Communication Complexity of Estimating Correlations. In Proceedings of the Symposium on Theory of Computing (STOC), 2019. Google Scholar
  39. Shai Halevi, Robert Krauthgamer, Eyal Kushilevitz, and Kobbi Nissim. Private approximation of NP-hard functions. In STOC, pages 550-559. ACM, 2001. Google Scholar
  40. Te Han. Hypothesis testing with multiterminal data compression. IEEE transactions on information theory, 33(6):759-772, 1987. Google Scholar
  41. Martin E Hellman and Thomas M Cover. Learning with finite memory. The Annals of Mathematical Statistics, pages 765-782, 1970. Google Scholar
  42. Piotr Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. J. ACM, 53(3):307-323, 2006. Previously appeared in FOCS'00. Google Scholar
  43. Piotr Indyk and David Woodruff. Polylogarithmic Private Approximations and Efficient Matching. In Proceedings of the Third Conference on Theory of Cryptography, TCC'06, pages 245-264, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11681878_13.
  44. Yuval Ishai, Tal Malkin, Martin J. Strauss, and Rebecca N. Wright. Private multiparty sampling and approximation of vector combinations. Theor. Comput. Sci., 410(18):1730-1745, 2009. Google Scholar
  45. Joe Kilian, André Madeira, Martin J. Strauss, and Xuan Zheng. Fast Private Norm Estimation and Heavy Hitters. In TCC, volume 4948 of Lecture Notes in Computer Science, pages 176-193. Springer, 2008. Google Scholar
  46. Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067-1080. ACM, 2017. Google Scholar
  47. Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457-474, 2000. Preliminary version appeared in STOC'98. Google Scholar
  48. Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516-1566, 2017. Google Scholar
  49. Moni Naor and Kobbi Nissim. Communication Complexity and Secure Function Evaluation. CoRR, cs.CR/0109011, 2001. URL: http://arxiv.org/abs/cs.CR/0109011.
  50. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266-275. IEEE, 2016. Google Scholar
  51. Ran Raz. A time-space lower bound for a large class of learning problems. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on, pages 732-742. IEEE, 2017. Google Scholar
  52. Ronitt Rubinfeld. Sublinear time algorithms. In International Congress of Mathematicians, volume 3, pages 1095-1110, 2006. Google Scholar
  53. Ronitt Rubinfeld. Taming big probability distributions. XRDS: Crossroads, The ACM Magazine for Students, 19(1):24-28, 2012. Google Scholar
  54. KR Sahasranand and Himanshu Tyagi. Extra Samples can Reduce the Communication for Independence Testing. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 2316-2320. IEEE, 2018. Google Scholar
  55. Ofer Shayevitz. On Rényi measures and hypothesis testing. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pages 894-898. IEEE, 2011. Google Scholar
  56. Or Sheffet. Locally Private Hypothesis Testing. In ICML, 2018. Google Scholar
  57. Alexander A. Sherstov. The Communication Complexity of Gap Hamming Distance. Theory of Computing, 8(1):197-208, 2012. URL: http://dx.doi.org/10.4086/toc.2012.v008a008.
  58. Jayakrishnan Unnikrishnan. On optimal two sample homogeneity tests for finite alphabets. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 2027-2031. Ieee, 2012. Google Scholar
  59. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011. Previously in STOC'08. Google Scholar
  60. Elad Verbin and Wei Yu. The streaming complexity of cycle counting, sorting by reversals, and other problems. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 11-25. SIAM, 2011. Google Scholar
  61. Andrew Chi-Chih Yao. Protocols for Secure Computations (Extended Abstract). In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, FOCS '82, pages 160-164, 1982. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail