Abstract 1 Introduction 2 Syntax of the Considered Functional Language 3 Compositional and Relational Analysis at First-Order 4 Abstracting Recursive Algebraic Data Types 5 Higher-Order, Disjunctive Relational Summaries 6 Combining both Domains into an Analysis 7 Experimental Evaluation 8 Related Work 9 Conclusion and Future Work References

Compositional Static Value Analysis for Higher-Order Numerical Programs

Milla Valnet ORCID LIP6, Sorbonne Université, F-75005, Paris, France Raphaël Monat ORCID Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France Antoine Miné ORCID LIP6, Sorbonne Université, F-75005, Paris, France
Abstract

Static analyzers have been successfully developed to detect runtime errors in many languages. However, the automatic analysis of functional languages remains a challenge due to their recursive functions, recursive algebraic data types, and higher-order functions. Classic type systems provide compositional methods that are in general not precise enough to prove the absence of runtime errors such as assertion failures. At the other end of the spectrum, deductive methods are more expressive but may require user guidance to prove invariants.

Our work describes a static value analysis by abstract interpretation for a higher-order pure functional language. This analysis provides a sound and automatic approach to discover invariants and prevent assertion and match failures. We have designed a compositional analysis: functions are analyzed only once, at their definition site, generating a summary of their behavior. The summaries can be viewed as input-output relations expressed with relational abstract domains. We present two new abstract domains. A first abstract domain summarizes recursive algebraic data types. A second abstract domain lifts existing disjunctive relational summaries to higher-order by formalizing them as domains able to abstract higher-order functions. Both abstractions are parameterized by the abstractions of basic types (strings, integers, …). Thanks to this parametric nature, both domains can be combined, allowing the analysis of higher-order functions manipulating algebraic data types and, conversely, algebraic data types using functions as first-class values.

We have implemented this analysis in the open-source MOPSA platform. Preliminary evaluation confirms the precision of our approach on a set of 40 handwritten toy programs as well as 20 programs from the state-of-the-art Salto analyzer benchmark.

Keywords and phrases:
Static Value Analysis, Functional Programming, Abstract Interpretation
Copyright and License:
[Uncaptioned image] © Milla Valnet, Raphaël Monat, and Antoine Miné; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Software and its engineering Automated static analysis
; Theory of computation Program analysis ; Software and its engineering Functional languages
Related Version:
Full Version: https://hal.science/hal-05047369
Supplementary Material:
Software  (ECOOP 2025 Artifact Evaluation approved artifact): https://doi.org/10.4230/DARTS.11.2.5
Editors:
Jonathan Aldrich and Alexandra Silva

1 Introduction

Thanks to strong static typing, functional programming languages such as OCaml are safer than traditional imperative languages, as they “reduce the number of runtime errors […] found in Java and C#” [29]. Still, well-typed functional programs can occasionally go wrong upon encountering runtime errors, such as out-of-bounds array accesses, arithmetic overflows, non-exhaustive pattern matching or uncaught exceptions. In this work, we aim at statically detecting such errors by developing a compositional static value analysis and targeting higher-order pure functional languages. Our approach is rooted in the abstract interpretation framework, and leverages relational abstract domains to yield a high expressivity. In particular, we explore a new compositional value analysis that can yield better precision than classic type systems. The analysis is fully automated, contrary to more expressive approaches traditionally based on deductive verification.

Value analyses based on abstract interpretation have been mostly developed to target imperative programming languages [6, 9]. An ongoing research direction focuses on developing compositional (or function-modular) analyses. These analyses consist in analyzing functions once, at their definition site, to infer a summary; the summary is then applied to analyze each function call. These methods significantly improve scalability when a function is called multiple times, as summary applications are computationally cheaper than reanalyzing a function at every calling context. One difficulty faced by compositional analyses is precision loss: analyses need to produce a unique summary, correct for every possible input of the analyzed function. A key ingredient to achieve interesting precision levels is to rely on relational abstract domains which are able to express relationships between variables. Relational domains have been used extensively in previous works to create input-output relational summaries [26, 14, 17, 25]. However, relational domains are not always sufficient to make up for precision loss, for example when very distinct function behaviors are approximated as a single input-output relation. A second ingredient is the addition of partitioning [30, 7] to create a set of separated summaries for each function, each summary expressing different results on different inputs. Boutonnet and Halbwachs [8] developed a method to compute disjunctive relational summaries that express precise properties of first-order, numeric functions.

The static analysis of functional programs raises new challenges: analyzers need to handle features such as user-defined algebraic data types (ADTs), higher-order functions, partial application, recursive functions and polymorphism. This work presents a compositional analysis which leverages relational abstract domains, exploring a new precision-scalability tradeoff. The analysis supports a pure, monomorphic subset of OCaml, including recursive ADTs, as well as both recursive and higher-order functions.

We present in Figure 1 a motivating example. This program defines a custom algebraic data type, with one constructor containing functions. to_fun is a higher-order function, as its result is a function. The definitions of values f1 and f2 are performed through partial application. In the case of the program in Figure 1, our approach analyzes function to_fun once, at its definition site, and deduces a summary for this function. Under the hood, our compositional analysis relies on two new, key abstract domains: one abstracts user-defined algebraic data types, while the other abstracts higher-order functions. We rely on a cooperation between these two abstract domains to infer a precise, disjunctive [8], summary for to_fun. Handling partial applications is seamless, and our analysis is able to infer the following summaries for functions f1 and f2: f1:x5,f2:xx+4. The analysis can then perform further applications to conclude that r1=5,r2=9 fully automatically. Note that both abstract domains can leverage arbitrary abstractions for leaf data types (integers, strings, …). In particular, the summary of f2 is expressed precisely thanks to the relational polyhedra domain [15].

Figure 1: Motivating OCaml program with key features from functional programming paradigm.

In previous work, Lermusiaux and Montagu [27] implemented a value analysis for a large subset of OCaml, which includes recursive ADTs and higher-order constructs. However, this analysis is neither compositional nor relational. This simplifies the analysis of higher-order functions, which can be triggered by analyzing functions only when their parameters are known, at call site. In our example, a non-compositional analysis would analyze to_fun twice, when computing the abstract values of r1 and r2. Bautista et al. [5] defined a compositional, relational analysis for a toy imperative language extended with non-recursive ADTs. In particular, their approach does not support recursive functions, nor higher-order ones.

Contributions.

This paper defines the following contributions:

  • We introduce a domain summarizing recursive algebraic data types (ADTs). This domain is fully parametric in the abstraction of its base types (integers, strings, …). Our domain supports relational abstractions.

  • Independently, we define an abstract domain for higher-order functions, lifting previous work on disjunctive relational summaries [8]. This allows the creation of a compositional analysis for higher-order functions.

  • We illustrate how the two abstract domains can mutually benefit from each other on an example, and provide a soundness statement about the overall analysis.

  • We have implemented these domains in the MOPSA platform [22]. Thanks to this implementation, a preliminary evaluation confirms the precision of our approach on a set of 60 toy OCaml programs (handwritten or from previous work [27]).

Outline.

Section 2 presents the syntax of a functional language with standard semantics. Section 3 provides background on compositional, relational analyses for the first-order case. Section 4 proposes an abstract domain for recursive algebraic data types, and Section 5 presents our method to analyze higher-order functions. Section 6 highlights the cooperation achieved by combining both abstract domains thanks to their parametricity. Section 7 presents our implementation, benchmarks, and experimental results. Section 8 presents related work and Section 9 concludes.

2 Syntax of the Considered Functional Language

::= x𝕍
|n|e1e2|e0e1en|fun x1xne|let x=e1 in e2|let rec x=e1 in e2|if e1 then e2 else e3|assert e|C(e1,,en)|match e0 with p1e1||pmem
Π::= _|x𝕍|n|p1whene1|C(p1,,pn)
where ij,fv(pi)fv(pj)=
Figure 2: Syntax of the functional language.
𝒱::= {C(v1,,vn)|C,vi𝒱}{ω}n[𝒱n𝒱]
Σ=𝕍 𝒱
𝔼: ×Σ𝒱
Figure 3: Semantic domain of the functional language.

In this section, we present the syntax of a generic functional language, on which we will define our analysis. Its key features are recursion, algebraic data types, and higher-order. We limit ourselves to a monomorphic higher-order language without side-effects. This language is described in Figure 3. 𝕍 is the set of variables, any operator of the language (+, , ×, /, =, etc.). is the set of expressions. fv(e) is the set of free variables of expression e. Π, the set of patterns, contains the wildcard _, variables, integers, when clauses, and type constructors. Free variables from the same pattern are required to be disjoint, as is the case in OCaml: the same variable cannot be bound twice. Figure 3 describes the semantic domain of the language. is the set of all possible constructors. Semantic values of the language 𝒱 are integers, type constructors containing values, error ω and continuous functions from values to values, to which we add (for non-termination). Finally, Σ is the set of concrete environments, associating a value to each variable. The concrete semantics of an expression e is 𝔼eΣ𝒱. It associates a semantic value to an expression in an environment. This semantics follows the standard for eagerly evaluated functional languages (such as OCaml). It is standard, and not detailed here.

We work under the hypothesis that programs are well-typed. The type inference works as usual on functional languages [20, 31]. This way, some situations where an expression evaluates to the error ω are statically detected. However, well-typedness is not enough to prevent all possible kinds of errors, such as assertion failures, or match failures in the presence of when clauses. Our analysis focuses on those. During the analysis, we use type information, as well as user-defined type definitions, to select relevant abstract domains.

To simplify presentation, int type represents mathematical integers: working on machine integers and statically detecting overflows would only require adding assertions when performing arithmetic operations. Similarly, even though the language is pure, we can still handle array bound checks with a simple translation of programs with arrays to array-less programs: array accesses are replaced with assertions that indices are within the array range, abstracting away the content of the array.

3 Compositional and Relational Analysis at First-Order

Since the concrete semantics is uncomputable, we propose an abstract semantics for our language. This section provides background on relational analyses, and compositional interprocedural analyses for first-order programs with integers and pairs. We start by introducing an abstract semantics compatible with numerical relational analyses through the introduction of symbolic expressions (Section 3.1), and then describe how this abstract semantics can be lifted to perform compositional analysis of recursive functions (Section 3.2).

The analyses we describe feature two notable strengths: their definition is domain-modular – i.e. fully parametric in the underlying abstract domains – and supports precise relational inference – allowing us to express relationships between variables. We believe that this approach, originally formulated by the authors of the MOPSA framework [22], provides greater expressivity than traditional static analysis definitions. However, this generic and domain-modular approach requires additional formalization efforts to define domains.

3.1 Relational Analysis

We define a computable abstract semantics 𝔼e of an expression e, based on abstract domains. It delegates the abstraction of an object of type τ (e.g. integers, pairs, and after Section 4, ADTs) to the abstract domain 𝒟τ. We denote the union of those domains as 𝒟. Usually, such a semantics takes as parameter an expression e and an abstract environment (abstracting the possible values of all the variables), and returns an abstract value representing the set of possible values of the expression (in the abstract domain corresponding to the type of the expression). However, this formulation cannot easily express a relationship between the value of the expression and those of the variables. Indeed, let us consider the program in Figure 5. A numerical domain such as the interval domain can abstract x and y independently, as x[1,10] and y[2,11], but cannot infer the relation y = x + 1. To overcome this limitation, we use an abstract numerical domain able to store relations between variables – e.g. the polyhedra domain [15]. However, expressing relations between numeric variables from the program, that is, x and y, is not sufficient to leverage those relational information to pair, since we have no way to express a relation between the second field of pair and y.

Following the works of Chevalier and Feret [10], Journault et al. [22], we introduce intermediate variables (also named ghost variables), in addition to the initial program variables. Here, we introduce two intermediate variables p1 and p2, which respectively represent the first and second fields of pair. In practice, we keep separate the set of program variables 𝕍 and the set 𝕍 of intermediate variables on which a (possibly relational) numerical domain will express semantic constraints. We then keep a map from program variables in 𝕍 to their abstract values in 𝒱, which can either be an element of an abstract domain, or an intermediate variable. In our example, 𝕍 is {x,y,p1,p2} whereas 𝕍={x,y,pair}. The map is then m=𝚡x,𝚢y,𝚙𝚊𝚒𝚛(p1,p2). The possible values of variables in 𝕍 are then represented as an abstract element from a (possibly relational) abstract domain 𝒟. If the chosen numerical domain 𝒟 is the polyhedra domain – which can express conjunctions of linear constraints – we can infer: d=1x10y=x+1p1=y1p2y𝒟. Note that the relation between the second field of pair and y was not expressible without introducing p2. Such intermediate variables will be especially useful when abstracting algebraic data types in Section 4.

An abstract environment in Σ is then the pair of a map m𝕍𝒱 and an element of the possibly relational numerical domain d𝒟. Thus, σ=(m,d)Σ.

As a last step, we define abstract expressions , a lifting of abstract values where numerical intermediate variables can instead be numeric expressions , that is, symbolic operations between intermediate variables and integers. The abstract semantics of an expression e is then: 𝔼e:Σ×Σ. It evaluates an expression e in an environment and returns an abstract expression into a new environment. Let p be the program in Figure 5, and σ0 a starting environment with no information on program variables. The analysis of p is:

𝔼pσ0=(p1,p2),( 𝚡x,𝚢y,𝚙𝚊𝚒𝚛(p1,p2),
1x10y=x+1p1=y1p2y)

The abstract expression (p1,p2) is a symbolic pair of intermediate variables p1,p2𝕍, therefore (p1,p2). Figure 5 summarizes all abstract sets.

Figure 4: Example program motivating relational analysis.
Set of program variables 𝕍
Set of intermediate variables 𝕍
Union of non-numerical domains 𝒟
Numerical domain 𝒟
Abstract values 𝒱=𝒟𝕍
Abstract environments Σ=(𝕍𝒱)×𝒟
Concrete numerical environments Σ=𝕍
Numeric expressions ::=n|v𝕍|e1e2,where e1,e2
Abstract expressions =𝒟
Abstract semantics of expr. e 𝔼e:Σ×Σ
Figure 5: Formalization of the abstract semantics.
Assignments.

Our abstract environments are made of two components (m,d)Σ: a map m from variables to abstract values and an element d of the relational domain. To define assignments in such an environment, we write:

  • m[𝚡v] the map m(𝕍𝒱) where variable 𝚡𝕍 is now bound to the value v𝒱;

  • 𝔼xvd𝒟 the result of the assignment in an element d𝒟 of variable x𝕍 to a value or a numeric expression v.

Definition 3.1 (Abstract assignment).

The abstract assignment of a program variable 𝚡𝕍 with abstract value v𝒱 in environment (m,d)Σ is then:

(m,d)[𝚡v]={(m[𝚡x],𝔼xvd)if x:int(m[𝚡v],d)otherwise

Intuitively, the assignment is delegated to the numerical domain for integer variables, and it is a simple map update otherwise.

Join.

We denote the join between two environments as , which lifts the join operators from underlying components. The join is defined pointwise between two maps in 𝕍𝒱: each variable maps to a join of its abstract values from both maps in the relevant domain. We rely on the join of the numerical abstraction for the second component of the abstract environment. Note that this join can be heterogeneous [24], if both elements are not defined on the same set of variables (this will be further developed in Section 4). The meet is defined similarly.

Concretization.

The concretization of an abstract environment is based on the concretization γ of the numerical domain, and on the concretizations γτ of the domains 𝒟τ abstracting objects of type τ. Those domains can be relational, so the concretization of a value can depend on other variables’ concretization, following a construction defined by Monat [32]. Consequently, γτ takes as parameter an element of the domain 𝒟τ together with a concrete numerical environment σΣ.

Definition 3.2 (Environment concretization).

Given σ=(m,d)Σ, its concretization is:

γΣ(m,d)={σΣ|σγ(d),x𝕍,x:τ,σ(x)γτ(m(x),σ)}

This concretization thus depends on the concretizations for ground types (such as integers, strings, etc.), but also on the concretizations for functions and algebraic data types. Those will be defined in Section 4 and Section 5.

 Remark 3.3 (Implicit concretization signature).

Even though the concretization γτ takes as parameter a concrete environment, we can lift this definition so that it takes as parameter an abstract environment by writing:

γτ(e,(m,d))=σγ(d)γτ(e,σ)

We simply join concretizations for every concrete numerical environment abstracted by d. Our abstract semantics takes as input an abstract environment and returns an abstract value in an abstract environment. With this notation, given e:τ and σΣ, we can write the concretization of the abstract semantics 𝔼eσ𝒟τ×Σ, and we can write γτ(𝔼eσ). In particular, this will be useful when stating the soundness theorem (Section 6.2).

Error propagation.

To simplify the presentation, errors are implicitly propagated: if a sub-expression evaluates to the error ω, then so does the whole expression.

3.2 Compositional Function Analysis

Since an important contribution of this article is to design a compositional analysis, we first introduce how we analyze first-order functions in a compositional fashion, before extending our analysis in Section 5 to higher-order.

Compositional analysis aims at over-approximating all the possible behaviors of a function at definition site. The function is analyzed once and for all, and we then store the result of the analysis, its summary. Since the language is pure, the function does not modify the environment (no side-effects). Its behavior can then be represented by a relation between its inputs and its output, using a relational domain.

Let τ be the set of functions of type τ=τ1τn+1. Since we are restricted to first-order in this section, τi are base types such as integers or pairs. Given a set V of variables for inputs and output, we denote (V) as the abstract domain chosen to represent relations between those. It abstracts Σ|V, i.e. concrete environments restricted to V. It is provided with its concretization γr:(V)Σ|V, with a lifting λr:(V)Σ and with a projection operator projr such that projr(σ)(V) only keeps information on variables in V.

We denote the domain chosen for functions of type τ as 𝒟τ=𝕍n×𝕍×(V). This way, a function is abstracted as: the names x1,,xn of its formal inputs, a result variable, and a relation between those variables.

Definition 3.4 (Functions concretization).

The concretization γτ:𝒟τ×Στ is:

γτ((x1,,xn,r,p),_)={f:τ| (a1,,an):τ1××τn,
[x1=a1][xn=an][r=f(a1,,an)]γr(p)}

A function f is abstracted as (x1,,xn,r,p) if the relation between its inputs and its output can be described by p. _Σ is not used by the concretization, as the summary relies on its own relational abstract element p.

We now describe the transfer functions used by the compositional analysis.

Function definition.

Intuitively, the analysis of functions needs to generate a summary valid for all inputs. Thus, the body is analyzed with the most general inputs: the inputs are initialized at , the domain maximum (meaning we have no information on their values). We project the result on input and output variables. This way, we get the most general input-output relation, usable at any call site. Note that relationality is critical there, enabling us to discover relevant information on the function behavior despite making no hypothesis on the values of the inputs.

𝔼fun x1xnbodyσ= let e,σ0=𝔼bodyσ[x1,,xn] in
((x1,,xn),r,projr(σ0[re])),σ
Recursive functions.

For recursive functions, the concrete semantics has to compute a least fixpoint of the function concrete semantics F:

𝔼let rec f=e1 in e2σ= let F(v)=𝔼e1σ[fv] in 𝔼e2σ[flfp(F)]

We use a standard technique from abstract interpretation [12] to obtain a computable abstract semantics, in three steps. First, thanks to Kleene’s theorem, we rewrite the least fixpoint as the union of iterations of the analysis of the function, starting from f, i.e. lfp(F)=kFk(). Second, we move to the abstract semantics by using an iteration from fτ, but evaluate the function body in the abstract domain with F. Third, we use a widening operator τ instead of a join to enforce the convergence in finite time. The widening operator is a component-wise lift, similar to the definition of the join operator on abstract environments in Section 3.1.

𝔼let rec f=e1 in e2σ= let τ=((x1,,xn),r,[r][xi]in) in
let F:𝒟τ𝒟τ=vfst(𝔼e1σ[fv]) in
let s=τkFk(τ) in
𝔼e2σ[fs]

This reasoning extends to mutually recursive functions by iterating on a vector of function abstractions.

Function application.

When applying a function e0 to inputs of abstract values ei, we intuitively combine the current environment and the summary relation p. Before the combination, both abstract environments need to be defined on the same set of variables. We rely on two auxiliary functions: add_vars:Σ×𝒫(𝕍)Σ which extends the definition domain of an abstract environment with a set of variables, and dom:Σ𝒫(𝕍) which provides the definition domain of an abstract environment. The combination also requires additional equality constraints [xi=ei] between formal arguments xi and values ei. These equality constraints are delegated to the filter function111The filter function models the effect of a test or a guard in the domain, by selecting environments satisfying a conditional – here, [xi=ei] of the relevant abstract domain.

𝔼e0enσ0= let ((x1,,xn),r,p),σ0=𝔼e0σ in
let i0,n1ei+1,σi+1=𝔼ei+1σi in
let σn=add_vars(σn,dom(p)) in
let p=add_vars(λr(p),dom(σn)) in
r,(σnp)[x1=e1,,xn=en]
Compositionality.

A key observation on our abstract semantics is that the function is analyzed at definition site. Consequently, we analyze it only once. At any call site, we only instantiate the resulting summary with input values. At first-order, Boutonnet and Halbwachs [8] proved that such a compositional analysis leads to a more efficient analysis for a function used at many call sites. Therefore, it can help the analysis to scale up.

Note that the purity hypothesis is necessary here: abstracting functions as input-output relations is correct only because the function does not modify the memory state. To handle side-effects, we would need to modify this abstraction to track mutability of the memory state, e.g., using extra input-output arguments to functions to model the mutable part of the state. Keeping compositionality while overcoming this limitation is left for future work.

In conclusion, this section defined a compositional analysis for first-order functions, both relational and fully parametric in the choice of abstract domains used to represent ground types. These features will be at the core of our abstractions for ADTs and higher-order functions.

4 Abstracting Recursive Algebraic Data Types

Algebraic objects are basic blocks of functional languages. Lists, trees, abstract syntax trees, etc. are defined using algebraic data types. To analyze them, we need a specific domain for objects of user-defined type:

type τ=C1 of τ1,1τ1,m1||Cn of τn,1τn,mn

This section defines an abstraction for user-defined algebraic data types which is fully parametric in the abstraction of base types (integers, strings, etc). We illustrate our construction on integer base types, as these are very common, and many numerical abstractions already exist to serve as parameter. It also allows us to show that our analysis can be relational when parameterized with relational base domains. This relationality is a crucial factor to make compositional analyses precise, as exemplified in Section 4.3.

4.1 Parametric Relational Domain with Symbolic Variables

4.1.1 Domain

Let n be the number of constructors and mi the number of fields of constructor Ci. Let τ={Ci| 1𝚒n} be the set of constructors of type τ. Let 𝕋τ be the set of finite concrete objects of type τ. For recursive types, 𝕋τ can be infinite. To provide a computable abstract semantics, we chose to summarize (or “smash”) together all the elements of a given field that are nested in an abstract data type. For instance, when abstracting a list of integers, a single abstract integer variable will be used to represent all the possible values of all the elements of the list. More generally, infinite sets of unbounded data-structures can be abstracted using a finite number of dimensions – here, one per field.

We construct systematically an abstraction for values of type τ from existing abstractions for values of types τi,j used in the definition of τ. However, in order to avoid cyclic definitions in the case of recursive types, we start from given domains 𝒟τi,j defined only when τi,jτ. The case of recursive types is well-founded, as is the case of user-defined types using fields of previously defined user-defined types. For simplicity, this definition omits the case of mutually recursive types, which does not pose additional theoretical issues.

Definition 4.1 (ADTs domain).

We derive domains 𝒟i,j used to abstract the value of the j-th field of the i-th constructor as follows:

𝒟i,j={𝒫(τ)if τi,j=τ𝕍if τi,j=int𝒟τi,j otherwise

Finally, values of type τ are abstracted in 𝒟τ, defined as follows:

𝒟τ=1in 1jmi(𝒟i,j)×𝒫(τ)

Intuitively, 𝒟τ abstracts an object x of type τ as ((gi,j)i,j,𝒞) such that:

  • gi,j(𝒟i,j) abstracts every j-th field of Ci nested in x;

  • 𝒞𝒫(τ) is the set of possible constructors Ci for x.

𝒟τ delegates the abstraction of field (i,j) to 𝒟i,j. If this field is recursive, 𝒟i,j=𝒫(τ), meaning that we only keep track of its possible start constructors. If this is an integer field, 𝒟i,j=𝕍, meaning that we create a symbolic variable x.i.j𝕍, to represent every field (i,j) nested in x. The numerical domain can then infer relations between them. When the field is neither recursive nor an integer, we delegate the abstraction to the domain chosen to abstract objects of type τi,j. In this case, 𝒟i,j=𝒟τi,j.

By smashing every constructor field in one summary variable, we use an abstraction of fixed size to represent unbounded objects of recursive type. This approach shares similarities with the approach of Gopan et al. [18]. They use a fixed number of dimensions to over-approximate numeric values in unbounded collections, to analyze e.g. dynamic arrays or heap-allocated structures. Summarization variables consequently abstract more than one object. Note that smashing is done at the variable allocation site: each variable of type τ defines a set of numerical variables – the abstraction is called object-sensitive [39].

Example 4.2.

Let us illustrate this abstraction on lists of integers, defined as:

type list = Cons of int * list | Nil

We use the interval domain (i.e., 𝒟=𝕀) to abstract integers. The set of possible constructors is 𝚕𝚒𝚜𝚝={𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}. The constructor 𝙲𝚘𝚗𝚜 has two fields. The first one, containing integers, is represented by 𝒟1,1=𝕍, that is, a numeric variable, abstracted in σ as an interval. The second, recursive, one is represented by 𝒟1,2=𝒫(𝚕𝚒𝚜𝚝). The constructor 𝙽𝚒𝚕 has no field to be represented. The global abstraction for lists of integers is then:

𝒟𝚕𝚒𝚜𝚝=((𝒟1,1)×(𝒟1,2))×𝒫(𝚕𝚒𝚜𝚝)=(𝕍×𝒫(𝚕𝚒𝚜𝚝))×𝒫(𝚕𝚒𝚜𝚝)

4.1.2 Concretization

Let γτi,j be the concretization for 𝒟τi,j when τi,jτ. It is provided by the domain 𝒟τi,j.

Definition 4.3 (ADTs concretization).

We define γτ:𝒟τ×Σ𝒫(𝕋τ). For (g,𝒞)𝒟τ, σΣ:

γτ((g,𝒞),σ)={x:τ|i, x=Ci(xi,1,,xi,mi)Ci𝒞
j,xi,j{γτ((g,gi,j),σ)if τi,j=τγτi,j(gi,j,σ)otherwise}

Since type τ is finite, recursion is well-founded (xi,j contains strictly less constructors than x). An object x:τ is abstracted as (g,𝒞) if:

  • it starts with Ci𝒞,

  • and every j-th field of Ci accessible in x

    • either starts with constructor in gi,jτ, in the recursive case (τi,j=τ),

    • or can be abstracted as gi,j otherwise.

gi,j represents every j-th field from Ci nested in x, while Σ keeps track of numeric variables.

Example 4.4.

Following Example 4.2, we consider the integer list abstract value (g,𝒞)=((r,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}) with numerical environment σ=[r[1,10]]. Its concretization γ𝚕𝚒𝚜𝚝((g,𝒞),σ) is the set of non-empty lists containing integers between 1 and 10:

{x:𝚒𝚗𝚝𝚕𝚒𝚜𝚝|x=𝙲𝚘𝚗𝚜(h,q)h[1,10]qγ𝚕𝚒𝚜𝚝(((r,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),σ)}

Let us consider l=𝙲𝚘𝚗𝚜(1,𝙲𝚘𝚗𝚜(4,𝙲𝚘𝚗𝚜(10,𝙽𝚒𝚕))). We have lγ𝚕𝚒𝚜𝚝((g,𝒞),σ), meaning that (g,𝒞) is a correct over-approximation of l.

4.1.3 Lattice Operators

We now define lattice operators. Since our domain is relational, operators are defined on environments. As seen in Section 3, environments are pairs, composed of a map from program variables to objects from their abstract domains, together with an element from the numerical relational domain. We denote lattice operators for the numerical relational domain as (r,r,r,r). We denote operators for 𝒟i,j as (𝒟i,j,𝒟i,j,𝒟i,j,𝒟i,j). We define an intermediate inclusion between two ADT abstractions, interpreted outside their abstract environment. We simply lift inclusions 𝒟i,j.

Definition 4.5 (Environment-free inclusion).

For (g1,𝒞1), (g2,𝒞2)𝒟τ abstract elements:

(g1,𝒞1)τ(g2,𝒞2)𝒞1𝒞2i,j,gi,j1𝒟i,jgi,j2

(g1,𝒞1)τ(g2,𝒞2) then means that the non-numerical fields of (g1,𝒞1) are more precise than those of (g2,𝒞2). To compare numerical fields, we need to check inclusion on environments:

Definition 4.6 (Inclusion).

We define :ΣΣ:

(m1,d1)(m2,d2)v:τ,m1(v)τm2(v)d1rd2

An environment (m1,d1) is more precise than (m2,d2) if for every program variable, the abstraction of this variable is more precise in m1 than in m2, and its abstraction for numerical variables, d1, is more precise than d2.

Example 4.7.

Following Example 4.2 on integer lists, we have for instance:

((x.1.1,{𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜})list((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜})

So we can deduce:

(x((x.1.1,{𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),{3x.1.16})
(x((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),{2x.1.17})

To define the meet, join and widening operators, we also start by defining those operators on the environment-free ADT abstractions, that is, ignoring their numerical part. Similarly, we lift operators from domains 𝒟i,j. Those operators are defined between objects giving the same name to their summarization variables.

Definition 4.8 (Environment-free operators).

For {,,}, for (g1,𝒞1),(g2,𝒞2)𝒟τ:

(g1,𝒞1)τ(g2,𝒞2)=(g,𝒞1𝒞2) where g is defined by i,j,gi,j=gi,j1𝒟i,jgi,j2

(g1,𝒞1)τ(g2,𝒞2) performs a join (resp. meet or widening) between non-numerical fields by delegating the operation to the relevant domains. Afterwards, to perform the operation on numerical fields too, we need to perform it on environments:

Definition 4.9 (Operators).

For {,,},we build :ΣΣ:

(m1,d1)(m2,d2)=(v:τm1(v)τm2(v),d1rd2)

Performing an operation between two environments consists in performing it on each variable abstraction in both maps, pointwise, and between both numerical abstract elements.

Note that the inclusion test, meet, join, and widening for the numerical relational domain operate on heterogeneous environments, that is, environments that may be defined on different sets of variables. We rely on the technique from Journault et al. [24] to lift classic relational domains (such as polyhedra) to the case of heterogeneous environments.. The soundness proof of our operators is a direct consequence of their soundness proofs [21].

4.1.4 Constructor Transfer Function

We then define the transfer function of constructors. Given an abstract environment σ0 and a concrete expression Ck(e1,,en), we compute a sound abstraction (g,𝒞) in a new abstract environment:

𝔼Ck(e1,,en)σ0= let i1,nvi,σi=𝔼eiσi1 in (1)
let (g0,_),σ=𝐟𝐨𝐥𝐝((g0,)(vj)τk,j=τ)σn in (2)
let g0,σ=j,τk,jτ(𝐟𝐨𝐥𝐝(g0k,jvj))σ in (3)
let g=({gk,j0snd(vj)if i=kτk,j=τgi,j0otherwise)i,j in (4)
(g,{Ck}),σ

The abstract semantics of Ck(e1,,en) is an element of 𝒟τ. The only possible start constructor is Ck. We abstract its fields as g, which is computed as follows. First, we recursively compute abstractions for ei in vi (1). We then define g0=(gi,j0)i,j. We use the fold operator, defined by Gopan et al. [18] for summarization variables of arrays and lifted there for ADTs. It folds inside every gi,j0 all abstractions for the j-th fields of Ci nested inside every recursive field abstraction vj (2). After the first folding, gi,j0 abstracts possible values in the j-th field of Ci nested at a depth of at least 1 in x. Then, we need to update it with every possible values in j-th fields of Ci nested at any depth in x. First, we add non-recursive information at depth 0, that is, updating gk,j0 with j-th field of Ck at depth 0, vj (3). To do so, we compose (), for every non-recursive field (k,j), the fold operator to add vj as a possible abstract values for field gk,j0. Finally, the possible start constructors of ej, that is, snd(vj), are added as possible start constructors for the j-th field of Ck (4).

Example 4.10.

Following previous examples on integer lists, given σΣ:

𝔼𝙽𝚒𝚕σ=((r.1.1,),{𝙽𝚒𝚕}),σ[r.1.1]
𝔼𝙲𝚘𝚗𝚜(10,𝙽𝚒𝚕)σ=((r.1.1,{𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),σ[r.1.1𝕀[10,10]]
𝔼𝙲𝚘𝚗𝚜(1,𝙲𝚘𝚗𝚜(10,𝙽𝚒𝚕))σ=((r.1.1,{𝙽𝚒𝚕,𝙲𝚘𝚗𝚜}),{𝙲𝚘𝚗𝚜}),σ[r.1.1[1,10]]

Note that we rely on type information to delegate abstraction of a given field to the relevant domain. We discuss related implementation details in Section 7.1.

4.1.5 Abstraction Precision

Example 4.11 (Abstraction of trees).

We start with σ0 an empty environment. As an illustration for this abstract domain, consider:

Here, 𝔼𝙽𝚘𝚍𝚎(𝙽𝚘𝚍𝚎,(𝙻𝚎𝚊𝚏(250),100,𝙻𝚎𝚊𝚏(251)),1,𝙻𝚎𝚊𝚏(252))σ0=((g,𝒞),σ) with:

(g,𝒞) =(({𝙽𝚘𝚍𝚎,𝙻𝚎𝚊𝚏},x.1.2,{𝙻𝚎𝚊𝚏},x.2.1),{𝙽𝚘𝚍𝚎})
σ =([x.1.2vx.1.2],[x.2.1vx.2.1],[vx.1.2[1,100]][vx.2.1[250,252]])
γ𝚝𝚛𝚎𝚎((g,𝒞),σ)={x| x=𝙽𝚘𝚍𝚎(t1,n,t2)n[1,100]
t1γ𝚝𝚛𝚎𝚎((g,{𝙽𝚘𝚍𝚎,𝙻𝚎𝚊𝚏}))t2γ𝚝𝚛𝚎𝚎(g,{𝙻𝚎𝚊𝚏})}

It abstracts the set of trees starting with 𝙽𝚘𝚍𝚎, only growing to the left, where the content of 𝙻𝚎𝚊𝚏 fields are between 250 and 252, and the content of 𝙽𝚘𝚍𝚎 between 1 and 100.

Note that our summarization abstraction can quickly lose precision, even when the structure size is known statically. To limit this, we could keep track of x’s exact content and summarize only when needed for convergence (recursive call), or for a certain depth of x. Such features are not yet implemented. Note that though of identical type, 𝙽𝚘𝚍𝚎 and 𝙻𝚎𝚊𝚏’s integer fields are summed up separately.

Example 4.12 (Relationality).

Consider this program manipulating integer variables a and y of unknown value:

With the polyhedra domain of Cousot and Halbwachs [15], the analysis is able to infer relational properties:

z ((z.1.1,{𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),[z.1.1=a]
x (((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),[x.1.1=z.1.1y][y2a+1][z.1.1=a])list
(((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),[x.1.1=z.1.12a+1][y>2a+1][z.1.1=a])
((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),[x.1.1<2a+1][z.1.1=a]

We can then prove: x1,12a+1, that is, every element of the list is smaller than 2a+1, without assuming any bound on a. This relational aspect is crucial to infer input-output function summaries for generic inputs.

4.2 Pattern-Matching Abstract Semantics

Pattern-matching is a key construct to manipulate ADTs. We define their abstract semantics. We use as intermediate function 𝚖𝚊𝚝𝚌𝚑:𝒱×Σ×ΠΣ×Σ. Given an abstract value v, an abstract environment σ and a pattern p, it returns two abstract environments. The first is an over-approximation of environments from σ, in which p and v can match. The second is an over-approximation of environments in which they cannot. We rely on a filter function from the numerical domain. It restricts an abstract environment to keep only those where a given boolean predicate evaluates to true. It is useful to model the when guards appearing in patterns:

𝚖𝚊𝚝𝚌𝚑(v,σ,pwhene1)= let σ0,σ¬,0=𝚖𝚊𝚝𝚌𝚑(σ,v,p) in
let e1n,σ1=𝔼e1σ0 in
(e1n0σ1),(σ¬,0e1n=0σ1)

Thus, v matches p when e1 in environments where v and p match, and when e1n is non-zero. The other cases are simpler.

𝔼match e0 with p1e1 ||pmemσ=
let σ1,σ¬,1=𝚖𝚊𝚝𝚌𝚑(𝔼e0σ,p1) in
let e1=match e0 with p2e2||pmem in
{𝔼e1σ1𝔼e1σ¬,1if m1{ω},σif m=0σ

The abstract semantics of pattern matching is the abstract union of all interpretations of branches ei, in an environment in which e0 matches the pattern pi but not the previous patterns pj for j<i. We add {ω} (match failure) if some part of the environment did not match any pattern. Our analysis can thus target match failure, i.e. when no pattern caught the value of e0. This is the case if σ when no pattern is left. To the best of our knowledge, this is the first value analysis handling when clauses in pattern-matching exploiting value information to precisely detect non-exhaustive pattern-matching. The OCaml compiler does warn on non-exhaustive pattern-matching but, since its analysis is not value-sensitive, it is more conservative. For the following program, it will issue a “non-exhaustive patternmatching” warning, whereas our analysis, knowing that l starts with Cons, can prove that no case is left unmatched:
let l = Cons(1, Nil) in let x = match l with Cons(h,Nil)-> h.

4.3 Example

Before defining a domain for higher-order functions, we show with an example how our compositional function analysis works together with our ADT summarization domain and the abstract semantics of pattern-matching. We consider the program in Figure 6. It defines a function 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎. This function of type τ=𝚒𝚗𝚝𝚕𝚒𝚜𝚝𝚕𝚒𝚜𝚝 takes as inputs an integer 𝑖𝑛𝑓 and a list l and keeps only the elements of l that are lower than 𝑖𝑛𝑓.

Figure 6: A recursive function manipulating recursive algebraic objects.

Let body be the body of the function 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎. We analyze it with unknown inputs, in σ=[l][𝑖𝑛𝑓]. Its abstract semantics is F:vfst(𝔼bodyσ[fv]). To get the abstract semantics of the recursive function, we compute the abstract fixpoint of the body semantics (Section 3.2). Starting with 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎 at τ=((𝑖𝑛𝑓,l),r,σ[r]), we compute the fixpoint nFn(τ):

F(τ) =𝔼fun l𝑖𝑛𝑓bodyσ[𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎τ]
=((𝑖𝑛𝑓,l),((r.1.1,),{𝙲𝚘𝚗𝚜},σ[r.1.1]))τ
=(τ,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕},σ[r.1.1])
F2(τ) =F(((𝑖𝑛𝑓,l),(r.1.1,),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕},σ[r.1.1]))
=(((𝑖𝑛𝑓,l),(r.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),σ[r.1.1𝑖𝑛𝑓])
F3(τ) =F2(τ)
nFn(τ) =(((𝑖𝑛𝑓,l),(r.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),σ[r.1.1𝑖𝑛𝑓])

By iteratively computing the summary of the function, we are finally able to infer that the elements in the output list (r.1.1) are lower than input 𝑖𝑛𝑓. We denote this result as V1, assign it to 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎 and continue the analysis:

𝔼let rec 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎=body in eσ= let σ1=σ[𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎V1] in 𝔼eσ1

We abstract the assignment into x:

𝔼𝙲𝚘𝚗𝚜(0,𝙲𝚘𝚗𝚜(5,𝙲𝚘𝚗𝚜(11,𝙽𝚒𝚕)))σ1= let σ2=σ1[x.1.10511] in ((x.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜}),σ2=V2,σ2
𝔼let x=𝙲𝚘𝚗𝚜(0,𝙲𝚘𝚗𝚜(5,𝙲𝚘𝚗𝚜(11,𝙽𝚒𝚕))) in e2σ3= let σ3=σ2[xV2] in 𝔼e2σ3

We then abstract the assignment into y by applying 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎.

𝔼𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎 4xσ3=((r.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),(σ[r.1.1𝑖𝑛𝑓])[𝑖𝑛𝑓=4][l=σ3(x)]((r.1.1,{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),{𝙲𝚘𝚗𝚜,𝙽𝚒𝚕}),σ[r.1.14]=V3,σ4
𝔼let y=𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎 4x in e2σ4= let σ5=σ4[x.1.10511][yV3] in
𝔼e2σ5

We then evaluate e1= match y with |𝙲𝚘𝚗𝚜(h,q)h|𝙽𝚒𝚕0. We assign the result to hd.

𝔼e1σ1 =h,σ5[hd4]

Since hd4, the assertion is proven. More generally, the summary for filter_le proves that for any list l and integer 𝑖𝑛𝑓, 𝚏𝚒𝚕𝚝𝚎𝚛_𝚕𝚎𝑖𝑛𝑓l is a list whose content is lower than 𝑖𝑛𝑓. The analysis also proves that the pattern-matching inside the function body is exhaustive.

5 Higher-Order, Disjunctive Relational Summaries

In a higher-order setting, functions can be inputs or output of other functions. This is a key concept of functional programming, making programs harder to analyze. We consider the program from Figure 1 and highlight the benefits of compositional analyses on this example.

Non-compositional analysis.

Different choices were made in the literature to analyze higher-order functions. A method giving precise results [27] is simply to defer the analysis of to_fun’s body until we know its parameters, or an abstraction of them. Consequently, the body analysis is performed at every call site. Thus, the definitions of f1 and f2 only trigger a store of their definition code. When we compute r1 and r2, all inputs are known, yielding: r1=5,r2=9. This method is able to infer precise results for r1 and r2, with the drawback that the function body is reanalyzed at each call site. This is costly, and hinders scalability.

Compositional analysis.

To circumvent this cost, we can perform a compositional analysis, where a function is analyzed only once, at its definition site. This is how filter_le was analyzed in Section 4.3, at first-order. Since functions are abstracted as points in the polyhedra space, intuitively, manipulating them as first class values for higher order does not add further complexity. The function domain is simply based on the polyhedra one. We can analyze to_fun once and for all, supposing that we do not know anything about f’s behavior. We get r1=,r2=. This loss of information is due to a join of both possible cases from to_fun’s behavior. However, this is not inherent to compositional analysis, since the specification of to_fun is:

to_fun(c,x)={nif c=𝙲𝚜𝚝(n)f(x)if c=𝙵𝚞𝚗(f)

It is thus possible to describe to_fun’s behavior without a priori information on possible values for c, as long as disjunctions on those possible values are possible. The precision loss mentioned above was indeed due to the impossibility to express disjunctions when abstracting higher-order functions. This is why we focus in this section on lifting partitioning methods to higher-order.

5.1 Partitioning Function Summaries

Even at first-order, a similar precision loss occurs on very simple programs:

By conducting a compositional non-relational analysis with the interval domain, we are only able to infer that the output is in [10,10], a quite weak result. This comes from the fact that binary has two very distinct behaviors depending on the input value. Joining them induces a precision loss. The issue can be overcome by partitioning on the input [7], then keeping separate those behaviors: λx.(x>0res=10)(x0res=10). Note that relationality cannot, on its own, overcome this cause of imprecision. The polyhedra domain gives the same result as intervals on binary. However, Boutonnet and Halbwachs [8] successfully use relationality and partitioning together. Consider the following example:

With the interval domain, even with partitioning, we cannot express anything interesting about max, whereas the polyhedra domain alone can only infer: λxy.xresyres. But with disjunctive relational summaries, we can separate disjunct behaviors, then summarizing the maximum function with: λxy.(x<yres=y)(xyres=x). Maintaining multiple partitions is costly, so a key point of partitioning is to decide when to partition and when to merge different behaviors. Various heuristics were developed on that topic. Those methods are compatible with recursion. Indeed, Bourdoncle [7] and Boutonnet and Halbwachs [8] are able to give precise summaries for McCarthy 91 function.

We want to benefit from this precision improvement at higher-order. In a higher-order setting however, functions are first class values: they can be input or output of other functions, etc. As first class objects, they should be represented as abstract values. Compared to previous works at first-order, we then need to define the abstract domain of functions. To the best of our knowledge, this is the first presentation of disjunctive summaries as a domain abstracting functions complete with all its lattice operators, including join and widening.

5.2 Disjunctive Relational Summaries as a Domain on Functions

As in Section 3.2, we choose ((V),r,r,r,r,γr,projr,λr) a relational domain on V. This way, the formalization is independent of the chosen domain.

Definition 5.1 (Function domain).

Given a function type τ=τ1τnτr, we define τ the set of elements abstracting functions of type τ. Let ni be the number of variables necessary to abstract an object of type τi.

τ={(V,r,(pi)1,m)| V=(x1,,xn),i,xi𝕍ni,i,j,ijxixj,r𝕍,
i1,m,pi(V{r})}

A function is thus abstracted as a triplet of: a n-tuple of input variables, an output variable, and a collection of m relations abstracting different cases in the function’s behavior. We denote (x1,,xn),r,(p1,,pm) as λx1xn.i=1mpi(x1,,xn,r). For example, let add x y = x + y is abstracted as ((x,y),r,(r=x+y)), written λxy.r=x+y.

Definition 5.2 (Function concretization).

For τ=τ1τnτr, f=((xi)i1,n,r,(pi)i1,m)τ, the concretization is:

γτ(f,_)={f:τ| (a1,,an):τ1××τn,
[x1=a1][xn=an][r=f(a1,,an)]i=1mγr(pi)}

This concretization is simply an update of Definition 3.4, with a disjunction of relations instead of a unique one.

Example 5.3.

For f=λx1x2.(x1x2r>0)(x1>x2r<0), τ=𝚒𝚗𝚝𝚒𝚗𝚝𝚒𝚗𝚝:

γτ(f,_)={f:| x1,x2,
x1x2f(x1,x2)>0x1>x2f(x1,x2)<0}

That is, f represents all functions whose result is lower than 0 when the first input is the smallest, and greater than 0 otherwise.

We need to define comparison, join, meet, and widening operators on functions. Since we consider only well-typed programs, those operators will be defined for functions of identical type: in particular, they will have the same number of parameters. These definitions build upon relational domain operators.

Note that renaming the inputs or the output of the function does not change its concretization. Without loss of generality, we assume when comparing two functions that they use the same names for inputs or outputs (this can always be ensured by renaming). We then define abstract inclusion, meet, join, and widening operators on the functional domain:

Definition 5.4 (Inclusion).

For f=V,r,(fi)1,n1, g=V,r,(gi)1,n2 , the inclusion on 𝒫n is: fgi1,n1,j1,n2,firgj

f is included in g if every disjunct of f is included in a disjunct of g. This order is the classic order for disjunctive completion domains [1].

Example 5.5.

With this definition, given two functions of type int -> int:

x,r,{x>1r=10x=0r=10x<0r=x,r,{x1r=10x<1r=
Definition 5.6 (Meet).

Given f=V,r,(fi)i1,m1 and g=V,r,(gi)i1,m2, the abstract intersection of functions is defined as f:

ffg=V,r,(j=1m2(firgj))i1,n1

That is, given a disjunct fi, we compute its meet with every disjunct gj and then join them together, keeping n1 partitions. Indeed, for performance reasons, we do not want to maintain all (quadratic) combinations of pairwise intersections. This method ensures that we keep the number of partitions used in the right argument. This abstraction of the meet is then not commutative. An alternative definition would partition with regard to the abstraction with the highest number of disjuncts, which would likely be more precise. An experimental study, left for future work, could be led to determine the best version.

Example 5.7.

Given two functions of type int -> int with different input partitioning:

n,r,{n01r3n<03r1fn,r,{n12r4n=0r=3n<04r2
=n,r,{(n12r3)p(n=0,r=3)ppppppn<03r2
=n,r,{n02r3n<03r2
Definition 5.8 (Join).

Given f=V,r,(fi)i1,n1 and g=V,r, (gi)i1,n2, the abstract union of functions is defined as f. We write Gi={gj| 1jn2projr(gj)projr(fi)} the set of gj whose constraints on the inputs are included in the constraints of fi, and G¬={gj| 1jn21in1,gjGi} the set of gj not included in any Gi.

ffg=V,r,(figjGigj)i1,n1(gj)j1,n2gjG¬

That is, we keep every disjunct fi, and join them with disjuncts of g respecting their conditions on the input. We add disjuncts gj that are compatible with no fi. Similarly to the meet operator, we do not want to simply keep every disjunct, so as to limit their number.

Example 5.9.

Given two functions of type int -> int:

n,r,{n01r3n<03r1fn,r,{n12r4n=0r=3n<04r2
=n,r,{(n01r3)p(n12r4)p(n=0r=3)(n<03r1)p(n<04r2)
=n,r,{n01r4n<04r1

Widening is defined the same way as join, replacing p with p. However, to ensure convergence, we also limit the number of disjuncts to a user-controlable constant. When performing widening, we join disjuncts when necessary to respect this limit, before delegating widening to the underlying domain.

In conclusion, we defined the set τ alongside its concretization, inclusion, join, meet, and widening operators, as an abstract domain for functions of type τ. We now explain how to apply a computed summary in τ at a call site, to evaluate the effect of a function call.

5.3 Function Application with Partitioning

Function summaries being now disjunctive, we need to update the semantics from Section 3.2:

𝔼e0ekσ= let (x1,,xn,r,(pi)i1,m),σ0=𝔼e0σ in (5)
let i0,k1ei+1,σi+1=𝔼ei+1σi in (6)
let i1,mσk,i=add_vars(σk,dom(pi)) in (7)
let i1,mpi=add_vars(λr(pi),dom(σn)) in (8)
let i1,mpi=(piσk,i)[xj=ej]j1,k in (9)
let X={xk+1,,xn} in (10)
{(X,r,(projr(pi))i1,mpi),σif k<nr,(pi)i1,mpiotherwise (11)

First, we compute the semantics of the function e0 (5). Then we compute sequentially the semantics of its argument expressions e1, …, ek (6). As in Section 3.2, we extend pi and σk to the same definition domain (7,8). We combine them and add to each function disjunct pi the equality constraint between formal argument xj and value ej (9). X is the set of free formal arguments (not bound to an actual argument) in case of partial application (10).

Partial application comes naturally (case k<n in (11)). When partially applying a function, we project the result on the set of remaining variables. The result is then a set of relations consisting in a disjunctive summary. It abstracts the function resulting from the partial application. When the application is total (otherwise in (11)), there is no remaining parameters: we simply change the format of the result. In both cases, we remove empty disjuncts pi=, which correspond to unsatisfiable constraints on the inputs.

Example 5.10.

The summary of the function max is (x,y),r,(x>yr=x)(xyr=y). Then partially applying it with one argument a, we get the abstraction:

𝔼𝚖𝚊𝚡a{a5}=y,r,{(a>yr=aa5)(ayr=ya5)}

This way, we deduce that the function returns a result which is always greater than 5, and is y if it is greater than a and a otherwise.

5.4 Function Analysis with Partitioning

We update function analysis semantics from Section 3.2 to allow a disjunctive result:

𝔼fun x1xnbodyσ= let c1,,cp=get_preconditions(body) in (12)
let i1,p,ei,σi=𝔼body(σci)[xj] (13)
and si=projr(σi[rei]) in (14)
((x1,,xn),r,(si)i1,p),σ (15)

The get_preconditions helper function scans the body to get a set of predicates (c1,,cp) suitable for partitioning (12). Then, we analyze the body of the function with each precondition ci (13) and get a set of possible behaviors si (14). The function is then abstracted as the disjunction of those behaviors for the function (15). Note that we do not keep partitioned states inside the analysis of a recursive function, to avoid the risk of diverging sets of partitions. Consequently, when applying a function inside the body of the recursive function, we immediately merge cases. also note that this definition is independent of the chosen get_preconditions partitioning heuristics.

We chose to use a heuristics, so called local reachability, introduced by Boutonnet and Halbwachs [8]. We try to partition regarding conditions controlling reachability to certain key points – branches in conditional branching or pattern matching. Note that finding those preconditions is a pre-analysis, performed just before analyzing the function itself. Consequently, when a function body we are either analyzing or pre-analyzing performs a function call, the abstract summary of the called function is already available (or it is set to bottom if this is the first iteration of a recursive call).

To sum up the method, the body of the function is pre-analyzed to decide a partitioning on the inputs. It starts with a precondition I, which can be , and follows those steps:

  1. 1.

    Analyze the function body with an interprocedural analysis and note ri the result at each reachable control point;

  2. 2.

    Choose a control point for which ri;

  3. 3.

    Choose an abstract value s which is complementable, i.e. its complementary can be expressed in the domain (note that xy as well as x>y tests are always complementable for polyhedra on integers) such that projr(ri)s. We get disjuncts Is and Is¯.

We may iterate this algorithm to get more disjuncts and gain more precision.

6 Combining both Domains into an Analysis

Sections 4 and 5 presented, respectively, independent extensions of a first-order numeric analysis to algebraic data types and to higher-order functions. Thanks to their parametricity, we can combine these two extensions to analyze our target language from Section 2:

  • Algebraic data types can contain higher-order values (i.e., functions). Indeed, the parametricity of the ADT abstraction supports functions as leaf data types, themselves represented by the higher-order part of our work.

  • Function summaries from the higher-order section can express precise properties on ADTs they manipulate (including the case of recursive functions manipulating recursive ADT). Indeed, the abstract domain representing functions is parametric in the domain chosen to express relations between inputs and outputs (e.g., polyhedra).

We show how the resulting analysis performs on our example function to_fun and state its soundness theorem.

6.1 Analysis Cooperation Example

We choose as relational domain the polyhedra domain, combined with a domain stating the equality of functions. We analyze our example from the introduction. We recall its code:

Pre-analysis by local reachability.

We analyze the function with a classic relational analysis. We denote the control point after the first branch of the match as p1 and the one after the second branch of the match as p2. We denote possible constructors of a as ac.

The analysis in p1 gives us r1=(x,r,r=n),[ac={𝙲𝚜𝚝}a.1.1=n]. We see that proj({ac,a.1.1,a.2.1},r1)=[ac={𝙲𝚜𝚝}] is complementable, of complementary [ac={𝙵𝚞𝚗}]. Consequently, we get two disjuncts: ac={𝙲𝚜𝚝} and ac={𝙵𝚞𝚗}.

Full analysis.

Let body be the body of to_fun.

𝔼fun abodyσ =(ac,a.1.1,a.2.1),r,{𝔼bodyσ[ac={𝙲𝚜𝚝}]𝔼bodyσ[ac={𝙵𝚞𝚗}]

We analyze the first case. We have 𝚖𝚊𝚝𝚌𝚑(σ[ac={𝙲𝚜𝚝}],a,p1)=σ[ac={𝙲𝚜𝚝}], so:

𝔼bodyσ[ac={𝙲𝚜𝚝}] =𝔼fun xnσ[ac={𝙲𝚜𝚝}]
=(x,r,r=n),[ac={𝙲𝚜𝚝}a.1.1=n]

Similarly, for the second case:

𝔼bodyσ[ac={𝙵𝚞𝚗}] =𝔼fun xnσ[ac={𝙵𝚞𝚗}]
=f,[ac={𝙵𝚞𝚗}a.2.1=f]

In the end, the semantics of to_fun is:

𝔼fun abodyσ =(ac,a.1.1,a.2.1),r,{[ac={𝙲𝚜𝚝}r=(x,r,r=a.1.1)][ac={𝙵𝚞𝚗}a.2.1=f]

We denote this summary as s and bind to_fun to s in the environment. Note that this summary is as precise as in the concrete. We can evaluate f_1 = to_fun Cst(5) and f_2 = to_fun Fun(fun xn):

𝔼to_fun 𝙲𝚜𝚝(5)σ=r,σ[ac={𝙲𝚜𝚝} r=(x,r,r=a.1.1)]
[ac={𝙲𝚜𝚝} a.1.1=5]
=r,σ[ac={𝙲𝚜𝚝} a.1.1=5
r=(x,r,r=a.1.1)]
𝔼to_fun 𝙵𝚞𝚗(fun xn)σ=r,σ[ac={𝙵𝚞𝚗} a.2.1=f]
[ac={𝙵𝚞𝚗} a.2.1=(x,r,r=x+1)]
=r,σ[ac={𝙵𝚞𝚗} a.2.1=(x,r,r=x+1)
r=a.2.1]

In both cases, one of the summary disjuncts has an empty intersection with the environment, so it does not appear in the final state. Applying the summaries, we are able to infer r1:5,r2:9. In the end, we are able to recover the same precision as a non-compositional method (Section 5) while staying compositional. This way, we do not have to re-analyze to_fun’s body when computing r1 and r2, instead using the summaries inferred at call site. Additionally, we get a precise contract for to_fun, which is valid for every input.

6.2 Analysis Soundness

The analysis defined in this article is sound, i.e. the analysis of a program P over-approximates the reachable states of P. Section 3.1 defined the concretization of abstract environments (Definition 3.2) and the concretization of the abstract semantics of an expression (Remark 3.3).

Theorem 6.1.

The abstract semantics 𝔼 is sound, i.e.:

σΣ,σΣ,e:τ,σγΣ(σ)𝔼eσγτ(𝔼eσ)
Proof.

The proof is by induction over the syntax of expression e.

7 Experimental Evaluation

7.1 Implementation

The methods described in the article were implemented in MOPSA [22], an open source and multi-language platform to ease the development of abstract analyzers. Similarly to other static analyzers by abstract interpretation, such as Astrée [6], Frama-C [9], Infer [16], or Julia [40], MOPSA is based on a composition of several abstract domains. Compared to them, it goes further in terms of modularity, enforcing a finer level of granularity of abstractions. It encourages relational abstractions for all types and features powerful domain communication mechanisms (such as reduced products, cartesian products, expression rewriting) and supports the reuse of abstractions in the analysis of widely different languages (such as C [37, 23] and Python [33, 34, 35]). The core of the platform is composed of 22 000 LoC of OCaml.

We added 3 000 LoC to support OCaml analysis and to implement the algebraic data types and higher-order domains described in the article. Our analysis is fully automatic: it leverages typing information from the frontend – handled by the OCaml compiler – to automatically associate abstract domains to values, based on their types. The user sets a global precision level by selecting more or less expressive domains for base types (e.g., polyhedra or intervals for integers). Choosing the best precision/performance trade-off is a long-standing problem for the analysis of any language. It is not addressed by this paper.

In MOPSA, a configuration file describes how domains are combined to define an analysis. A configuration for our analysis is shown in Figure 8. Sequences delegate the analysis of an expression to the right-hand side domain when relevant. Reduced products perform reductions between both domain results. Cartesian products implement collaborations between domains. ML.program is the frontend, handling the import from the OCaml compiler. ML.intraproc handles conditionals and assertions. ML.let_rec computes an abstract fixpoint for recursive definitions of variables. Domains ML.functions (defined in Section 5.2) and ML.adts (defined in Section 4.1) depend on each other because of their parametricity. They share underlying domains for ground types, here strings and integers. MOPSA includes a Universal language, implementing ready-to-use abstractions for basic constructs. Here, integers are abstracted by the reduced product of the polyhedra domain U.polyhedra and the congruence domain U.congruences. U.string abstracts strings. The configuration could be defined using other domains for ground types as well. Our implementation is parametric in those domains.

On a technical note, our ADT domain required support for heterogeneous operations, that is, operations on environments defined on different sets of variables [24], which is the case when joining two match cases. Our functions’ domain required adapting the already existing trace partitioning [30] for C-like languages on the platform. Sound operations on summarization variables from Gopan et al. [18] were already part of the platform and were leveraged when handling summarization variables of recursive algebraic data types.

7.2 Experiments

We tested our implementation on the examples from this article as well as 40 handwritten programs highlighting the precise handling of recursive functions manipulating algebraic data types, functions partitioning on algebraic constructors, partial application, algebraic data types containing functions, etc. In particular, it inferred correct summaries for filter_le and to_fun functions. Additionally, we selected 20 programs from Salto’s benchmarks [27] that are compatible with the current limitations of our approach, i.e. they do not contain modules, nor imperative features, nor polymorphism. Figure 8 displays the analysis time for an extract of our 60 programs. The first five programs are from Salto’s benchmarks. The programs we consider are small: they consist in around a dozen lines each.

Figure 7: OCaml configuration in MOPSA.
Program name Analysis (ms)
numeric_loop3b.ml 12
binomial.ml 400
mc91.ml 30
tak.ml 923
abs.ml 15
xor.ml 36
rec_add.ml 44
non_terminate.ml 6
Program name Analysis (ms)
filter_le.ml 131
make_list.ml 131
match_when.ml 9
is_exhaustive.ml 10
partial_app.ml 15
embed_fun.ml 9
to_fun.ml 11
f_from_g.ml 11
Figure 8: Benchmarks for the OCaml analysis.
Precision.

For most numerical recursive functions (e.g. binomial, ackerman, mc91), our relational compositional analysis gives results at least as precise as Salto’s non-compositional and non-relational approach, but with only one analysis of the body. This illustrates how relationality and disjunctions can recover precision lost by compositionality, thus enhancing performance when the analyzed function is analyzed multiple times. Besides, for some functions, no precise invariant can be discovered without relationality – e.g. rec_add, or numeric_loop3b from Salto’s benchmark. Our analysis was therefore of the same precision or better than Salto on 14 out of those 16 examples. Finally, as explained in Section 4.2 our analysis is the first to support when clauses, and to detect non-exhaustive pattern-matching.

Performance.

All our tests were performed on a Intel(R) Core(TM) i7-8565U CPU 1.80GHz with 16 GB of RAM. The analysis for those small programs is lower than a second. The slowest analysis was for the tak function, making three recursive calls in its body. Those runtimes are comparable to the analysis performed by Salto – less than a second. The runtime difference on the (small) benchmarks available is not significant enough to deduce that one analysis is faster, and to evaluate which part of our method results in a speed-up and which part in a slowdown. Moreover, Salto and MOPSA feature different abstractions beside compositionality that may impact the analysis time as well. A more thorough evaluation of runtime trade-offs is thus left for future work. Previous work by Boutonnet and Halbwachs [8] at first-order concluded that summary construction time is often negligible with regard to total analysis time. Besides, in our implementation, maintaining a summary is equivalent to storing it (they are immutable, as functions are currently pure). We optimistically hope that future work would draw a similar conclusion on higher-order programs.

Note that our method’s interest goes beyond performance: it automatically generates contracts for higher-order functions. Therefore, it plays a role in proving not only the absence of runtime errors, but also refined properties on the program behavior, being able to automatically discover functions specifications.

8 Related Work

Type systems.

To prove properties on functional programs, type systems are widely used. The simplest ones already prevent some errors, such as adding a function to an integer. More expressive type systems, used for program verification, include dependent types, and in particular refinement types. They have enjoyed a steady popularity over the years [44, 43], relying under-the-hood on SMT solvers to reason on properties, mainly numeric ones. Although formulated within the framework of abstract interpretation, we believe our work shares similarities, as it provides a compositional analysis and infers numeric invariants.

Deductive methods.

Deductive methods, such as Cameleer [38] or F* [41], can prove precise properties, such as correctness with regard to a specification, but need often both user annotations and SMT solvers. It differs from our goal, a fully-automated and solver-free method, to infer semantic properties.

Non-value abstract interpretation analysis.

Whereas many works approach static analysis of functional languages, they often focus on control flow analysis, which, as precised in Liang and Might [28], does not suffice to keep track of values through flow. Cousot and Cousot [13] define higher-order functions abstractions, e.g. as relations or as sets. They also formulate refinements, e.g. disjunctive completion. They are however mainly interested in comportment analysis such as strictness or termination. We use similar abstractions and refinements, but to define a value analysis. Besides, we support algebraic data types with an abstraction fully parametric in possibly relational domains. Montagu and Jensen [36] develop methods to infer a form of frame condition for pure functional higher-order languages, i.e. identifying equality relations between parts of algebraic values. They suffer from the same lack of information when applying an input function as we do. They are however limited to non-recursive types, and cannot handle complex numeric relations.

Abstract interpretation-based value analysis.

In our experimental evaluation, we compared to Salto [27], a static value analyzer verifying OCaml programs. It supports a wider subset of the language, such as side-effects and modules. Their method differs from ours, being non-relational and non-compositional. Our compositional approach could be more scalable. Journault et al. [24] propose a relational abstract domain for trees but are limited to this specific data structure. Bautista et al. [3], Bautista et al. [4] describe a relational abstraction for numeric algebraic values and infer structural equalities between non-numeric fields, but are limited to non-recursive objects. Valnet et al. [42] support recursion, but not higher-order.

Compositional analysis.

Input-output relational analysis has been long studied, first applied to while programs [26]. Our method can be seen as an extension of symbolic relational separate analysis from Cousot and Cousot [14], extended to support ADTs and higher-order. This is a long-known method [14] to improve analysis scalability. Compositionality has been used in multiple settings: Farzan and Kincaid [17] use them to analyze independently decomposed parts of a program and Kincaid et al. [25] for inter-procedural analysis, analyzing procedures independently of their calling context, once and for all. Codish et al. [11] develop a compositional analysis for logic programs, therefore proving it useful in a declarative context. Bautista et al. [2, 5] define an ADT domain in an input-output analysis for non-recursive imperative programs. In the MOPSA platform [22], a prototype modular analysis was implemented for C programs manipulating strings [23]. To improve precision, Bourdoncle [7] expresses non-relational disjunctive summaries. Boutonnet and Halbwachs [8], on which we built upon, permit relationality. All those methods are however limited to first-order.

9 Conclusion and Future Work

This article presents two abstract domains, one tailored to handle recursive ADTs, and the other one to represent functions used as first-class values, as well as higher-order functions. Thanks to the parametricity of both domains, each domain can leverage the other, e.g. to abstract functions manipulating ADTs (and conversely). The combination of these two domains yields a compositional and relational analysis for a pure functional programming language. This analysis has been implemented into the MOPSA framework to analyze a pure subset of the OCaml language. Our preliminary evaluation shows the precision of our approach on 60 programs, including 20 benchmarks from Salto [27].

We now review the limitations of our approach and discuss future work. The language we have studied here is pure; extending our analysis to support references will be challenging. Our analysis focuses on a monomorphic functional programming language; we could rely, in future work, on the polymorphic equality domain from Montagu and Jensen [36] to handle polymorphism. Our current analysis does not detect arithmetic overflows: we believe this is an important, yet orthogonal concern. Our analysis is technically able to infer ranges of numerical variables and can thus detect overflows. However, our OCaml analysis does not support the wraparound semantics of overflows for now. To the best of our knowledge, there are no compositional value analysis currently tackling this issue, even in the case of first-order imperative languages. Previous work on compositional static analysis for first-order languages (e.g. Boutonnet and Halbwachs [8]) proved that compositionality is useful for scaling up. We postulate that similar speed-ups could be achieved in a higher-order setting with compositional analyses. While this article provides a contribution towards this goal by proposing a precise and compositional analysis, we have not evaluated its scalability in our preliminary experiments. We believe this work is a first step towards automatically proving functional properties (e.g., sorting [19]) in a functional setting with higher-order functions.

References

  • [1] Samson Abramsky and Achim Jung. Domain theory. Oxford University Press, 1994.
  • [2] Santiago Bautista. Static Analysis of Algebraic Data Types and Arrays. PhD thesis, ENS Rennes, 2023.
  • [3] Santiago Bautista, Thomas Jensen, and Benoît Montagu. Numeric domains meet algebraic data types. In Proceedings of the 9th ACM SIGPLAN International Workshop on Numerical and Symbolic Abstract Domains, pages 12–16, 2020. doi:10.1145/3427762.3430178.
  • [4] Santiago Bautista, Thomas Jensen, and Benoît Montagu. Lifting numeric relational domains to algebraic data types. In International Static Analysis Symposium, pages 104–134. Springer, 2022. doi:10.1007/978-3-031-22308-2_6.
  • [5] Santiago Bautista, Thomas Jensen, and Benoît Montagu. An input–output relational domain for algebraic data types and functional arrays. Formal Methods in System Design, pages 1–74, 2024.
  • [6] Julien Bertrane, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, Xavier Rival, et al. Static analysis and verification of aerospace software by abstract interpretation. Foundations and Trends® in Programming Languages, 2(2-3):71–190, 2015. doi:10.1561/2500000002.
  • [7] François Bourdoncle. Abstract interpretation by dynamic partitioning. Journal of Functional Programming, 2(4):407–435, 1992.
  • [8] Rémy Boutonnet and Nicolas Halbwachs. Disjunctive relational abstract interpretation for interprocedural program analysis. In Verification, Model Checking, and Abstract Interpretation: 20th International Conference, VMCAI 2019, Cascais, Portugal, January 13–15, 2019, Proceedings 20, pages 136–159. Springer, 2019. doi:10.1007/978-3-030-11245-5_7.
  • [9] David Bühler. Structuring an abstract interpreter through value and state abstractions: eva, an evolved value analysis for Frama-C. PhD thesis, Université de Rennes 1, 2017.
  • [10] Marc Chevalier and Jérôme Feret. Sharing ghost variables in a collection of abstract domains. In VMCAI, volume 11990 of Lecture Notes in Computer Science, pages 158–179. Springer, 2020. doi:10.1007/978-3-030-39322-9_8.
  • [11] Michael Codish, Saumya K Debray, and Roberto Giacobazzi. Compositional analysis of modular logic programs. In Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 451–464, 1993. doi:10.1145/158511.158703.
  • [12] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 238–252, 1977. doi:10.1145/512950.512973.
  • [13] Patrick Cousot and Radhia Cousot. Higher-order abstract interpretation (and application to comportment analysis generalizing strictness, termination, projection and PER analysis of functional languages). In Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL’94), pages 95–112. IEEE, 1994.
  • [14] Patrick Cousot and Radhia Cousot. Modular static program analysis. In International Conference on Compiler Construction, pages 159–179. Springer, 2002. doi:10.1007/3-540-45937-5_13.
  • [15] Patrick Cousot and Nicolas Halbwachs. Automatic discovery of linear restraints among variables of a program. In Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 84–96, 1978. doi:10.1145/512760.512770.
  • [16] Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W O’Hearn. Scaling static analyses at Facebook. Communications of the ACM, 62(8):62–70, 2019. doi:10.1145/3338112.
  • [17] Azadeh Farzan and Zachary Kincaid. Compositional recurrence analysis. In 2015 Formal Methods in Computer-Aided Design (FMCAD), pages 57–64. IEEE, 2015. doi:10.1109/FMCAD.2015.7542253.
  • [18] Denis Gopan, Frank DiMaio, Nurit Dor, Thomas Reps, and Mooly Sagiv. Numeric domains with summarized dimensions. In Tools and Algorithms for the Construction and Analysis of Systems: 10th International Conference, TACAS 2004, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29-April 2, 2004. Proceedings 10, pages 512–529. Springer, 2004. doi:10.1007/978-3-540-24730-2_38.
  • [19] Nicolas Halbwachs and Mathias Péron. Discovering properties about arrays in simple programs. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 339–348. ACM, 2008. doi:10.1145/1375581.1375623.
  • [20] Roger Hindley. The principal type-scheme of an object in combinatory logic. Transactions of the american mathematical society, 146:29–60, 1969.
  • [21] Matthieu Journault. Precise and modular static analysis by abstract interpretation for the automatic proof of program soundness and contracts inference. PhD thesis, Sorbonne Université, 2019.
  • [22] Matthieu Journault, Antoine Miné, Raphaël Monat, and Abdelraouf Ouadjaout. Combinations of reusable abstract domains for a multilingual static analyzer. In Verified Software. Theories, Tools, and Experiments: 11th International Conference, VSTTE 2019, New York City, NY, USA, July 13–14, 2019, Revised Selected Papers 11, pages 1–18. Springer, 2020. doi:10.1007/978-3-030-41600-3_1.
  • [23] Matthieu Journault, Antoine Miné, and Abdelraouf Ouadjaout. Modular static analysis of string manipulations in C programs. In SAS, volume 11002 of Lecture Notes in Computer Science, pages 243–262. Springer, 2018. doi:10.1007/978-3-319-99725-4_16.
  • [24] Matthieu Journault, Antoine Miné, and Abdelraouf Ouadjaout. An abstract domain for trees with numeric relations. In European Symposium on Programming, pages 724–751. Springer, 2019. doi:10.1007/978-3-030-17184-1_26.
  • [25] Zachary Kincaid, Jason Breck, Ashkan Forouhi Boroujeni, and Thomas Reps. Compositional recurrence analysis revisited. ACM SIGPLAN Notices, 52(6):248–262, 2017. doi:10.1145/3062341.3062373.
  • [26] Dexter Kozen. Kleene algebra with tests. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(3):427–443, 1997. doi:10.1145/256167.256195.
  • [27] Pierre Lermusiaux and Benoît Montagu. Detection of uncaught exceptions in functional programs by abstract interpretation. In Programming Languages and Systems - 33rd European Symposium on Programming, ESOP 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part II, volume 14577 of LNCS, pages 391–420. Springer, 2024. doi:10.1007/978-3-031-57267-8_15.
  • [28] Shuying Liang and Matthew Might. Entangled abstract domains for higher-order programs. In Proceedings of the 2013 Workshop on Scheme and Functional Programming, Washington, DC, 2013.
  • [29] Anil Madhavapeddy and Yaron Minsky. Real World OCaml: Functional Programming for the Masses. Cambridge University Press, 2 edition, 2022.
  • [30] Laurent Mauborgne and Xavier Rival. Trace partitioning in abstract interpretation based static analyzers. In European Symposium on Programming, pages 5–20. Springer, 2005. doi:10.1007/978-3-540-31987-0_2.
  • [31] Robin Milner. A theory of type polymorphism in programming. Journal of computer and system sciences, 17(3):348–375, 1978. doi:10.1016/0022-0000(78)90014-4.
  • [32] Raphaël Monat. Static type and value analysis by abstract interpretation of Python programs with native C libraries. PhD thesis, Sorbonne Université, 2021.
  • [33] Raphaël Monat, Abdelraouf Ouadjaout, and Antoine Miné. Static type analysis by abstract interpretation of Python programs. In 34th European Conference on Object-Oriented Programming (ECOOP 2020), pages 17–1. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020.
  • [34] Raphaël Monat, Abdelraouf Ouadjaout, and Antoine Miné. Value and allocation sensitivity in static python analyses. In SOAP@PLDI, pages 8–13. ACM, 2020. doi:10.1145/3394451.3397205.
  • [35] Raphaël Monat, Abdelraouf Ouadjaout, and Antoine Miné. A multilanguage static analysis of python programs with native C extensions. In SAS, volume 12913 of Lecture Notes in Computer Science, pages 323–345. Springer, 2021. doi:10.1007/978-3-030-88806-0_16.
  • [36] Benoît Montagu and Thomas Jensen. Stable relations and abstract interpretation of higher-order programs. Proceedings of the ACM on Programming Languages, 4(ICFP):1–30, 2020. doi:10.1145/3409001.
  • [37] Abdelraouf Ouadjaout and Antoine Miné. A library modeling language for the static analysis of C programs. In SAS, volume 12389 of Lecture Notes in Computer Science, pages 223–247. Springer, 2020. doi:10.1007/978-3-030-65474-0_11.
  • [38] Mário Pereira and António Ravara. Cameleer: A deductive verification tool for OCaml. In International Conference on Computer Aided Verification, pages 677–689. Springer, 2021. doi:10.1007/978-3-030-81688-9_31.
  • [39] Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. Pick your contexts well: understanding object-sensitivity. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 17–30, 2011. doi:10.1145/1926385.1926390.
  • [40] Fausto Spoto. The Julia static analyzer for Java. In Static Analysis: 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings 23, pages 39–57. Springer, 2016. doi:10.1007/978-3-662-53413-7_3.
  • [41] Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, et al. Dependent types and multi-monadic effects in f. In Proceedings of the 43rd annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 256–270, 2016.
  • [42] Milla Valnet, Raphaël Monat, and Antoine Miné. Analyse statique de valeurs par interprétation abstraite de programmes fonctionnels manipulant des types algébriques récursifs. In JFLA 2023-34èmes Journées Francophones des Langages Applicatifs, pages 211–242, 2023.
  • [43] Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon L. Peyton Jones. Refinement types for Haskell. In ICFP, pages 269–282. ACM, 2014. doi:10.1145/2628136.2628161.
  • [44] Hongwei Xi and Frank Pfenning. Eliminating array bound checking through dependent types. In PLDI, pages 249–257. ACM, 1998. doi:10.1145/277650.277732.