Differential Privacy for Coverage Analysis of Software Traces

Hao, Yu; Latif, Sufian; Zhang, Hailong; Bassily, Raef; Rountev, Atanas

doi:10.4230/LIPIcs.ECOOP.2021.8

Abstract

This work considers software execution traces, where a trace is a sequence of run-time events. Each user of a software system collects the set of traces covered by her execution of the software, and reports this set to an analysis server. Our goal is to report the local data of each user in a privacy-preserving manner by employing local differential privacy, a powerful theoretical framework for designing privacy-preserving data analysis. A significant advantage of such analysis is that it offers principled "built-in" privacy with clearly-defined and quantifiable privacy protections. In local differential privacy, the data of an individual user is modified using a local randomizer before being sent to the untrusted analysis server. Based on the randomized information from all users, the analysis server computes, for each trace, an estimate of how many users have covered it.
Such analysis requires that the domain of possible traces be defined ahead of time. Unlike in prior related work, here the domain is either infinite or, at best, restricted to many billions of elements. Further, the traces in this domain typically have structure defined by the static properties of the software. To capture these novel aspects, we define the trace domain with the help of context-free grammars. We illustrate this approach with two exemplars: a call chain analysis in which traces are described through a regular language, and an enter/exit trace analysis in which traces are described by a balanced-parentheses context-free language. Randomization over such domains is challenging due to their large size, which makes it impossible to use prior randomization techniques. To solve this problem, we propose to use count sketch, a fixed-size hashing data structure for summarizing frequent items. We develop a version of count sketch for trace analysis and demonstrate its suitability for software execution data. In addition, instead of randomizing separately each contribution to the sketch, we develop a much-faster one-shot randomization of the accumulated sketch data.
One important client of the collected information is the identification of high-frequency ("hot") traces. We develop a novel approach to identify hot traces from the collected randomized sketches. A key insight is that the very large domain of possible traces can be efficiently explored for hot traces by using the frequency estimates of a visited trace and its prefixes and suffixes. Our experimental study of both call chain analysis and enter/exit trace analysis indicates that the frequency estimates, as well as the identification of hot traces, achieve high accuracy and high privacy.

ACM SIGACT/EATCS. Gödel Prize. https://sigact.org/prizes/g%C3%B6del/citation2017.pdf, 2017.
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685-701, 2010.
G. Ammons, T. Ball, and J. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. In PLDI, page 85–96, 1997.
Apple. Learning with privacy at scale. https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html, 2017.
M. Arnold, S.J. Fink, D. Grove, M. Hind, and P.F. Sweeney. A survey of adaptive optimization in virtual machines. Proceedings of the IEEE, 93(2):449-466, 2005.
B. Avent, A. Korolova, D. Zeber, T. Hovden, and B. Livshits. BLENDER: Enabling local search with a hybrid differential privacy model. In USENIX Security, pages 747-764, 2017.
T. Ball and J. Larus. Optimally profiling and tracing programs. TOPLAS, 16(4):1319-1360, 1994.
R. Bassily, K. Nissim, U. Stemmer, and A. Thakurta. Practical locally private heavy hitters. In NIPS, pages 2285-2293, 2017.
A. Budi, D. Lo, L. Jiang, and Lucia. kb-anonymity: A model for anonymized behaviour-preserving test and debugging data. In PLDI, pages 447-457, 2011.
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693-703, 2002.
S.R. Choudhary, A. Gorla, and A. Orso. Automated test input generation for android: Are we there yet? In ASE, pages 429-440. IEEE, 2015.
J. Clause and A. Orso. A technique for enabling and supporting debugging of field failures. In ICSE, pages 261-270, 2007.
J. Clause and A. Orso. Camouflage: Automated anonymization of field data. In ICSE, pages 21-30, 2011.
A. Dajan, A. Lauger, P. Singer, D. Kifer, J. Reiter, A. Machanavajjhala, S. Garfinkel, S. Dahl, M. Graham, V. Karwa, H. Kim, P. Leclerc, I. Schmutte, W. Sexton, L. Vilhuber, and J. Abowd. The modernization of statistical disclosure limitation at the U.S. Census Bureau. https://www2.census.gov/cac/sac/meetings/2017-09/statistical-disclosure-limitation.pdf, 2017.
M. Diep, M. Cohen, and S. Elbaum. Probe distribution techniques to profile events in deployed software. In ISSRE, pages 331-342, 2006.
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211-407, 2014.
S. Elbaum and M. Hardojo. An empirical study of profiling strategies for released software and their impact on testing activities. In ISSTA, pages 65-75, 2004.
Ú. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054-1067, 2014.
Facebook. Facebook analytics. https://analytics.facebook.com, 2020.
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA, page 57–76, 2007.
Google. Google Analytics. URL: https://analytics.google.com.
Google. Firebase Analytics. https://firebase.google.com, 2020.
Google. Monkey: UI/Application exerciser for Android. https://developer.android.com/studio/test/monkey, 2020.
M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data privacy always good for software testing? In ISSRE, pages 368-377, 2010.
I. Hadar, T. Hasson, O. Ayalon, E. Toch, M. Birnhack, S. Sherman, and A. Balissa. Privacy by designers: Software developers’ privacy mindset. Empirical Software Engineering, 23(1):259-289, 2018.
S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie. Performance debugging in the large via mining millions of stack traces. In ICSE, pages 145-155, 2012.
M. Haran, A. Karr, A. Orso, A. Porter, and A. Sanil. Applying classification techniques to remotely-collected program execution data. In ESEC/FSE, pages 146-155, 2005.
J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. In CSF, pages 398-410, 2014.
W. Jin and A. Orso. BugRedux: Reproducing field failures for in-house debugging. In ICSE, pages 474-484, 2012.
W. Jin and A. Orso. F3: Fault localization for field failures. In ISSTA, pages 213-223, 2013.
Z. Li, X. Jing, X. Zhu, H. Zhang, B. Xu, and S. Ying. On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Transaction on Software Engineering, pages 1-21, 2017.
Lucia, D. Lo, L. Jiang, and A. Budi. kbe-anonymity: Test data anonymization for evolving programs. In ASE, pages 262-265, 2012.
Microsoft. New differential privacy platform co-developed with Harvard’s OpenDP unlocks data while safeguarding privacy. https://blogs.microsoft.com/on-the-issues/2020/06/24/differential-privacy-harvard-opendp, 2020.
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In S&P, pages 111-125, 2008.
A. Narayanan and V. Shmatikov. De-anonymizing social networks. In S&P, pages 173-187, 2009.
J. P. Near, D. Darais, C. Abuah, T. Stevens, P. Gaddamadugu, L. Wang, N. Somani, M. Zhang, N. Sharma, A. Shan, and D. Song. Duet: An expressive higher-order language and linear type system for statically enforcing differential privacy. Proceedings of the ACM on Programming Languages, 3(OOPSLA), 2019.
Oath. Flurry. URL: http://flurry.com.
P. Ohmann, A. Brooks, L. D'Antoni, and B. Liblit. Control-flow recovery from partial failure reports. In PLDI, pages 390-405, 2017.
P. Ohmann, D. B. Brown, N. Neelakandan, J. Linderoth, and B. Liblit. Optimizing customized program coverage. In ASE, pages 27-38, 2016.
OpenDP. https://projects.iq.harvard.edu/opendp, 2020.
A. Orso, T. Apiwattanapong, and M. J. Harrold. Leveraging field data for impact analysis and regression testing. In ESEC/FSE, pages 128-137, 2003.
A. Orso, D. Liang, M. J. Harrold, and R. Lipton. GAMMA system: Continuous evolution of software after deployment. In ISSTA, pages 65-69, 2002.
C. Pavlopoulou and M. Young. Residual test coverage monitoring. In ICSE, pages 277-284, 1999.
F. Peters and T. Menzies. Privacy and utility for defect prediction: Experiments with MORPH. In ICSE, pages 189-199, 2012.
F. Peters, T. Menzies, L. Gong, and H. Zhang. Balancing privacy and utility in cross-company defect prediction. IEEE Transaction on Software Engineering, 39(8):1054-1068, 2013.
T. Reps. Program analysis via graph reachability. IST, 40(11-12):701-726, 1998.
Soot. Soot analysis framework. https://soot-oss.github.io/soot, 2020.
M. Sridharan and R. Bodik. Refinement-based context-sensitive points-to analysis for Java. In PLDI, pages 387-400, 2006.
W. N. Sumner, Y. Zheng, D. Weeratunge, and X. Zhang. Precise calling context encoding. IEEE Transaction on Software Engineering, 38(5):1160-1177, 2012.
K. Taneja, M. Grechanik, R. Ghani, and T. Xie. Testing software in age of data privacy: A balancing act. In ESEC/FSE, pages 201-211, 2011.
Uber. Uber releases project for differential privacy. https://medium.com/uber-security-privacy/differential-privacy-open-source-7892c82c42b6, July 2017.
T. Wang, J. Blocki, N. Li, and S. Jha. Locally differentially private protocols for frequency estimation. In USENIX Security, pages 729-745, 2017.
Y. Wang, Z. Ding, G. Wang, D. Kifer, and D. Zhang. Proving differential privacy with shadow execution. In PLDI, pages 655-669, 2019.
S. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 309(60):63-69, 1965.
A. Wood, M. Altman, A. Bembenek, M. Bun, M. Gaboardi, J. Honaker, K. Nissim, D. OquoterightBrien, T. Steinke, and S. Vadhan. Differential privacy: A primer for a non-technical audience. Vanderbilt Journal of Entertainment and Technology Law, 21(1):209-276, 2018.
D. Zhang and D. Kifer. LightDP: Towards automating differential privacy proofs. In PLDI, pages 888-901, 2017.
H. Zhang, Y. Hao, S. Latif, R. Bassily, and A. Rountev. Differentially-private software frequency profiling under linear constraints. Proceedings of the ACM on Programming Languages, 4(OOPSLA), 2020.
H. Zhang, Y. Hao, S. Latif, R. Bassily, and A. Rountev. A study of event frequency profiling with differential privacy. In CC, page 51–62, 2020.
H. Zhang, S. Latif, R. Bassily, and A. Rountev. Introducing privacy in screen event frequency analysis for Android apps. In SCAM, pages 268-279, 2019.
H. Zhang, S. Latif, R. Bassily, and A. Rountev. Differentially-private control-flow node coverage for software usage analysis. In USENIX Security, pages 1021-1038, 2020.
H. Zhang, E. Roth, A. Haeberlen, B. Pierce, and A. Roth. Testing differential privacy with dual interpreters. Proceedings of the ACM on Programming Languages, 4(OOPSLA), November 2020.
X. Zhuang, M. Serrano, H. W Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In PLDI, pages 263-271, 2006.

Differential Privacy for Coverage Analysis of Software Traces

Authors Yu Hao, Sufian Latif, Hailong Zhang, Raef Bassily, Atanas Rountev

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Differential Privacy for Coverage Analysis of Software Traces

Authors Yu Hao, Sufian Latif, Hailong Zhang, Raef Bassily, Atanas Rountev

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

Supplementary Materials

References

Thanks for your feedback!

Could not send message