Scaling Interprocedural Static Data-Flow Analysis to Large C/C++ Applications: An Experience Report

Authors Fabian Schiebel , Florian Sattler , Philipp Dominik Schubert , Sven Apel , Eric Bodden



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2024.36.pdf
  • Filesize: 1.52 MB
  • 28 pages

Document Identifiers

Author Details

Fabian Schiebel
  • Fraunhofer Institute for Mechatronic Systems Design IEM, Paderborn, Germany
Florian Sattler
  • Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Philipp Dominik Schubert
  • Heinz Nixdorf Institute, Paderborn, Germany
Sven Apel
  • Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Eric Bodden
  • Paderborn University, Department of Computer Science, Heinz Nixdorf Institute, Germany
  • Fraunhofer IEM, Paderborn, Germany

Cite AsGet BibTex

Fabian Schiebel, Florian Sattler, Philipp Dominik Schubert, Sven Apel, and Eric Bodden. Scaling Interprocedural Static Data-Flow Analysis to Large C/C++ Applications: An Experience Report. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 36:1-36:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ECOOP.2024.36

Abstract

Interprocedural data-flow analysis is important for computing precise information on whole programs. In theory, the popular algorithmic framework interprocedural distributive environments (IDE) provides a tool to solve distributive interprocedural data-flow problems efficiently. Yet, unfortunately, available state-of-the-art implementations of the IDE framework start to run into scalability issues for programs with several thousands of lines of code, depending on the static analysis domain. Since the IDE framework is a basic building block for many static program analyses, this presents a serious limitation. In this paper, we report on our experience with making the IDE algorithm scale to C/C++ applications with up to 500 000 lines of code. We analyze the IDE algorithm and its state-of-the-art implementations to identify their weaknesses related to scalability at both a conceptual and implementation level. Based on this analysis, we propose several optimizations to overcome these weaknesses, aiming at a sweet spot between reducing running time and memory consumption. As a result, we provide an improved IDE solver that implements our optimizations within the PhASAR static analysis framework. Our evaluation on real-world C/C++ applications shows that applying the optimizations speeds up the analysis on average by up to 7×, while also reducing memory consumption by 7× on average as well. For the first time, these optimizations allow us to analyze programs with several hundreds of thousands of lines of LLVM-IR code in reasonable time and space.

Subject Classification

ACM Subject Classification
  • Theory of computation → Program analysis
Keywords
  • Interprocedural data-flow analysis
  • IDE
  • LLVM
  • C/C++

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Steven Arzt. Sustainable Solving: Reducing The Memory Footprint of IFDS-Based Data Flow Analyses Using Intelligent Garbage Collection. In Proc. Int. Conf. Software Engineering (ICSE), pages 1098-1110. IEEE, 2021. Google Scholar
  2. Steven Arzt and Eric Bodden. Reviser: Efficiently Updating IDE-/IFDS-Based Data-Flow Analyses in Response to Incremental Program Changes. In Proc. Int. Conf. Software Engineering (ICSE), pages 288-298. ACM, 2014. Google Scholar
  3. Steven Arzt and Eric Bodden. StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework. In Proc. Int. Conf. Software Engineering (ICSE), pages 725-735. ACM, 2016. Google Scholar
  4. Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick D. McDaniel. FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In Proc. Conf. Programming Language Design and Implementation (PLDI), pages 259-269. ACM, 2014. Google Scholar
  5. Eric Bodden. Inter-Procedural Data-Flow Analysis with IFDS/IDE and Soot. In Proc. Int. Workshop on State Of the Art in Java Program Analysis (SOAP), pages 3-8. ACM, 2012. Google Scholar
  6. Eric Bodden. The secret sauce in efficient and precise static analysis: the beauty of distributive, summary-based static analyses (and how to master them). In Comp. Proc. ISSTA/ECOOP Workshops, pages 85-93. ACM, 2018. Google Scholar
  7. Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. Practical memory leak detection using guarded value-flow analysis. In Proc. Conf. Programming Language Design and Implementation (PLDI), pages 480-491. ACM, 2007. Google Scholar
  8. Dongjie He, Yujiang Gui, Yaoqing Gao, and Jingling Xue. Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection. In Proc. Int. Symp. Software Testing and Analysis (ISSTA), pages 101-113. ACM, 2023. Google Scholar
  9. Dongjie He, Haofeng Li, Lei Wang, Haining Meng, Hengjie Zheng, Jie Liu, Shuangwei Hu, Lian Li, and Jingling Xue. Performance-Boosting Sparsification of the IFDS Algorithm with Applications to Taint Analysis. In Proc. Int. Conf. Automated Software Engineering (ASE), pages 267-279. IEEE, 2020. Google Scholar
  10. Min-Yih Hsu, Felicitas Hetzelt, and Michael Franz. DFI: An Interprocedural Value-Flow Analysis Framework that Scales to Large Codebases. Comput. Research Repository, abs/2209.02638, 2022. Google Scholar
  11. Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis. Model checking for weakly consistent libraries. In Proc. Conf. Programming Language Design and Implementation (PLDI), pages 96-110. ACM, 2019. Google Scholar
  12. Akash Lal, Thomas Reps, and Gogul Balakrishnan. Extended Weighted Pushdown Systems. In Proc. Int. Conf. Computer Aided Verification (CAV), pages 434-448. Springer-Verlag, 2005. Google Scholar
  13. Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. Int. Symp. Code Generation and Optimization (CGO), pages 75-88. IEEE, 2004. Google Scholar
  14. Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhoták, José Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. In Defense of Soundiness: A Manifesto. Commun. ACM, 58(2):44-46, 2015. Google Scholar
  15. Nomair A Naeem, Ondřej Lhoták, and Jonathan Rodriguez. Practical Extensions to the IFDS Algorithm. In Proc. Int. Conf. on Compiler Construction (CC), pages 124-144. Springer-Verlag, 2010. Google Scholar
  16. Oswaldo Olivo, Isil Dillig, and Calvin Lin. Static detection of asymptotic performance bugs in collection traversals. In Proc. Conf. Programming Language Design and Implementation (PLDI), pages 369-378. ACM, 2015. Google Scholar
  17. Thomas Reps, Susan Horwitz, and Mooly Sagiv. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proc. Symp. Principles of Programming Languages (POPL), pages 49-61. ACM, 1995. Google Scholar
  18. Thomas Reps, Stefan Schwoon, and Somesh Jha. Weighted Pushdown Systems and Their Application to Interprocedural Dataflow Analysis. In Proc. Int. Symp. Static Analysis (SAS), pages 189-213. Springer-Verlag, 2003. Google Scholar
  19. Atanas Rountev, Mariana Sharp, and Guoqing Xu. IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. In Proc. Int. Conf. on Compiler Construction (CC), pages 53-68. Springer-Verlag, 2008. Google Scholar
  20. Mooly Sagiv, Thomas Reps, and Susan Horwitz. Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. Theor. Comput. Sci., 167(1-2):131-170, 1996. Google Scholar
  21. Florian Sattler, Sebastian Böhm, Philipp Dominik Schubert, Norbert Siegmund, and Sven Apel. SEAL: Integrating Program Analysis and Repository Mining. ACM Trans. Softw. Eng. Methodol., 32(5):121:1-121:34, 2023. Google Scholar
  22. Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. PhASAR: An Inter-procedural Static Analysis Framework for C/C++. In Proc. Int. Conf. Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 393-410. Springer-Verlag, 2019. Google Scholar
  23. Philipp Dominik Schubert, Richard Leer, Ben Hermann, and Eric Bodden. Know your analysis: How instrumentation aids understanding static analysis. In Proc. Int. Workshop on State Of the Art in Program Analysis (SOAP), pages 8-13. ACM, 2019. Google Scholar
  24. M Sharir and A Pnueli. Two approaches to interprocedural data flow analysis. New York Univ. Comput. Sci. Dept., 1978. Google Scholar
  25. Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java. In Proc. Europ. Conf. Object-Oriented Programming (ECOOP), pages 22:1-22:26. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. Google Scholar
  26. Yulei Sui, Ding Ye, and Jingling Xue. Static memory leak detection using full-sparse value-flow analysis. In Proc. Int. Symp. Software Testing and Analysis (ISSTA), pages 254-264. ACM, 2012. Google Scholar
  27. Yulei Sui, Ding Ye, and Jingling Xue. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Trans. Software Eng., 40(2):107-122, 2014. Google Scholar
  28. Erik van der Kouwe, Vinod Nigade, and Cristiano Giuffrida. DangSan: Scalable Use-after-free Detection. In Proc. Europ. Conf. Computer Systems (EuroSys), pages 405-419. ACM, 2017. Google Scholar
  29. Cathrin Weiss, Cindy Rubio-González, and Ben Liblit. Database-backed program analysis for scalable error propagation. In Proc. Int. Conf. Software Engineering (ICSE), pages 586-597. IEEE, 2015. Google Scholar
  30. Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In Proc. Int. Conf. Software Engineering (ICSE), pages 327-337. ACM, 2018. Google Scholar
  31. Xiaodong Yu, Fengguo Wei, Xinming Ou, Michela Becchi, Tekin Bicer, and Danfeng Daphne Yao. GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting. In Int. Symp. Parallel and Distributed Processing (IPDPS), pages 274-284. IEEE, 2020. Google Scholar
  32. Zhiqiang Zuo, Yiyu Zhang, Qiuhong Pan, Shenming Lu, Yue Li, Linzhang Wang, Xuandong Li, and Guoqing Harry Xu. Chianina: an evolving graph system for flow- and context-sensitive analyses of million lines of C code. In Proc. Conf. Programming Language Design and Implementation (PLDI), pages 914-929. ACM, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail