Scaling Interprocedural Static Data-Flow Analysis to Large C/C++ Applications: An Experience Report

Authors Fabian Schiebel , Florian Sattler , Philipp Dominik Schubert , Sven Apel , Eric Bodden

Fabian Schiebel
  • Fraunhofer Institute for Mechatronic Systems Design IEM, Paderborn, Germany
Florian Sattler
  • Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Philipp Dominik Schubert
  • Heinz Nixdorf Institute, Paderborn, Germany
Sven Apel
  • Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Eric Bodden
  • Paderborn University, Department of Computer Science, Heinz Nixdorf Institute, Germany
  • Fraunhofer IEM, Paderborn, Germany

Fabian Schiebel, Florian Sattler, Philipp Dominik Schubert, Sven Apel, and Eric Bodden. Scaling Interprocedural Static Data-Flow Analysis to Large C/C++ Applications: An Experience Report. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 36:1-36:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Interprocedural data-flow analysis is important for computing precise information on whole programs. In theory, the popular algorithmic framework interprocedural distributive environments (IDE) provides a tool to solve distributive interprocedural data-flow problems efficiently. Yet, unfortunately, available state-of-the-art implementations of the IDE framework start to run into scalability issues for programs with several thousands of lines of code, depending on the static analysis domain. Since the IDE framework is a basic building block for many static program analyses, this presents a serious limitation. In this paper, we report on our experience with making the IDE algorithm scale to C/C++ applications with up to 500 000 lines of code. We analyze the IDE algorithm and its state-of-the-art implementations to identify their weaknesses related to scalability at both a conceptual and implementation level. Based on this analysis, we propose several optimizations to overcome these weaknesses, aiming at a sweet spot between reducing running time and memory consumption. As a result, we provide an improved IDE solver that implements our optimizations within the PhASAR static analysis framework. Our evaluation on real-world C/C++ applications shows that applying the optimizations speeds up the analysis on average by up to 7×, while also reducing memory consumption by 7× on average as well. For the first time, these optimizations allow us to analyze programs with several hundreds of thousands of lines of LLVM-IR code in reasonable time and space.

  • Theory of computation → Program analysis
  • Interprocedural data-flow analysis
  • IDE
  • LLVM
  • C/C++


