Tracing compilation by abstract interpretation

Tracing just-in-time compilation is a popular compilation schema for the efficient implementation of dynamic languages, which is commonly used for JavaScript, Python, and PHP. It relies on two key ideas. First, it monitors the execution of the program to detect so-called hot paths, i.e., the most frequently executed paths. Then, it uses some store information available at runtime to optimize hot paths. The result is a residual program where the optimized hot paths are guarded by sufficient conditions ensuring the equivalence of the optimized path and the original program. The residual program is persistently mutated during its execution, e.g., to add new optimized paths or to merge existing paths. Tracing compilation is thus fundamentally different than traditional static compilation. Nevertheless, despite the remarkable practical success of tracing compilation, very little is known about its theoretical foundations. We formalize tracing compilation of programs using abstract interpretation. The monitoring (viz., hot path detection) phase corresponds to an abstraction of the trace semantics that captures the most frequent occurrences of sequences of program points together with an abstraction of their corresponding stores, e.g., a type environment. The optimization (viz., residual program generation) phase corresponds to a transform of the original program that preserves its trace semantics up to a given observation as modeled by some abstraction. We provide a generic framework to express dynamic optimizations and to prove them correct. We instantiate it to prove the correctness of dynamic type specialization. We show that our framework is more general than a recent model of tracing compilation introduced in POPL~2011 by Guo and Palsberg (based on operational bisimulations). In our model we can naturally express hot path reentrance and common optimizations like dead-store elimination, which are either excluded or unsound in Guo and Palsberg's framework.


Introduction
Efficient traditional static compilation of popular dynamic languages like JavaScript, Python and PHP is very hard if not impossible. In fact those languages present so many dynamic features which make all traditional static analyses used for program optimization very imprecise. Therefore, practical implementations of dynamic languages should rely on dynamic information in order to produce an optimized version of the program. In particular, tracing just-in-time compilation (TJITC) [1, 3-6, 15, 16, 24] has emerged as a valuable implementation and optimization technique for dynamic languages. For instance, the Facebook HipHop virtual machine for PHP and the V8 JavaScript engine of Google Chrome use some form of tracing compilation [19,20]. The Mozilla Firefox JavaScript engine used to have a tracing engine, TraceMonkey, which has been later substituted by whole-method just-in-time compilation engines (initially JägerMonkey and then IonMonkey) [13,14].

The Problem
Tracing JIT compilers leverage runtime profiling of programs to detect and record often executed paths, called hot paths, and then they optimize and compile only these paths at runtime. A path is a linear sequence of instructions through the program. Profiling may also collect information about the values that the program variables may assume during the execution of that path, which is then used to specialize/optimize the code. Of course, this information is not guaranteed to hold for all the subsequent executions of the hot path. Since optimizations rely on that information, the hot path is augmented with guards that check the profiled conditions, such as, for example, variable types. When a guard fails, execution jumps back to the old, non-optimized code. The main hypotheses of tracing compilers, confirmed by the practice, are: (i) loop bodies are the only interesting code to optimize, so they only consider paths inside program loops; and (ii) optimizing straight-line code is easier than a whole-method analysis (involving loops, goto, etc.).
Hence, tracing compilers look quite different than traditional compilers. These differences raise some natural questions on trace compilation: (i) what is a viable formal model, which is generic yet realistic enough to capture the behavior of real optimizers? (ii) which optimizations are sound? (iii) how can one prove their soundness? In this paper we answer the above questions.
Our formal model is based on program trace semantics [9] and abstract interpretation [10,12]. Hot path detection is modeled just as an abstraction of the trace semantics of the program, which only retains: (i) the sequences of program points which are repeated more than some threshold; (ii) an abstraction of the possible program stores, e.g., the type of the variables instead of their concrete values. As a consequence, a hot path does not contain loops nor join points. Furthermore, in the hot path, all the correctness conditions (i.e., guards) are explicit, for instance before performing integer addition, we should check that the operands are integers. If the guard condition is not satisfied then the execution leaves the hot path, reverting to the non-optimized code. Guards are essentially elements of some abstract domain, which is then left as a parameter in our framework. The hot path is then optimized using standard compilation techniques -we only require the optimization to be sound.
We define the correctness of the residual (or extracted) program in terms of abstraction of the trace semantics: the residual program is correct if it is indistinguishable, up to some abstraction of the trace semantics, from the original program. Examples of abstractions are the program store at the exit of a method, or the stores at loop entry and loop exit points.

Main Contributions
This paper puts forward a formal model of TJITC whose key features are as follows: -We provide the first model of tracing compilation based on abstract interpretation of trace semantics of programs.
-We provide a more general and realistic framework than a recent model of TJITC by Guo and Palsberg [17] based on program bisimulations: we employ a less restrictive correctness criterion that enables the correctness proof of actually implemented optimizations; hot paths can be annotated with runtime information on the stores, notably type information; optimized hot loops can be re-entered.
-We formalize and prove the correctness of type specialization of hot paths.
Our model focusses on source-to-source program transformations and optimizations of a low level imperative language with untyped global variables, which may play the role of intermediate language of some virtual machine. Our starting point is that program optimizations can be seen as transformations that lose some information on the original program, so that optimizations can be viewed as approximations and in turn can be formalized by abstract interpretation. More precisely, we rely on the insight by Cousot and Cousot [12] that a program source can be seen as an abstraction of its trace semantics, i.e. the set of all possible execution sequences, so that a source-to-source optimization can be viewed as an abstraction of a transform of the program trace semantics. In our model, soundness of program optimizations is defined as program equivalence w.r.t. an observational abstract interpretation of the program trace semantics. Here, an observational abstraction induces a correctness criterion by describing what is observable about program executions, so that program equivalence means that two programs are indistinguishable by looking only at their observable behaviors.
A crucial part of tracing compilation is the selection of the hot path(s) to optimize. Of course, this choice is made at runtime based on program executions, so it can be seen once again as an abstraction of trace semantics. Here, a simple trace abstraction selects cyclic instruction sequences, i.e. loop paths, that appear at least N times within a single execution trace. These instruction sequences are recorded together with some property of the values assumed by program variables at that point, which is represented as a value of a suitable store abstraction, which in general depends on the successive optimization.
A program optimization can be seen as an abstraction of a semantic transformation of program execution traces, as described by the Cousots in [12]. The advantage of this approach is that optimization properties, such as their soundness, are easier to prove at a semantic level. The optimization itself can be defined on the whole program or, as in the case of real tracing JIT compilers, can be restricted to the hot path. This latter restriction is achieved by transforming the original program so that the hot path is extracted, i.e. made explicit: the hot path is added to the program as a path with no join points that jumps back to the original code when execution leaves it. A guard is placed before each command in this hot path that checks if the necessary conditions, as selected by the store abstraction, are satisfied. A program optimization can be then confined to the hot path only, making it linear, by ignoring the parts of the program outside it. The guards added to the hot path allows us to retain precision.
We apply our TJITC model to type specialization. Type specialization is definitely the key optimization for dynamic languages such as Javascript [15], as they provide generic operations whose execution depends on the type of run-time values of their operands.

Related Work
A formal model for tracing JIT compilation has been put forward in POPL 2011 by Guo and Palsberg [17]. It is based on operational bisimulation [23] to describe equivalence between source and optimized programs. In Section 11 we show how this model can be expressed within our framework through the following steps: Guo and Palsberg's language is compiled into ours; we then exhibit an observational abstraction which is equivalent to Guo and Palsberg's correctness criterion; finally, after some minor changes that address a few differences in path selection, the transformations performed on the source program turn out to be the same. Our framework overcomes some significant limitations in Guo and Palsberg's model. The bisimulation equivalence model used in [17] implies that the optimized program has to match every change to the store made by the original program, whereas in practice we only need this match to hold in certain program points and for some variables, such as in output instructions. This limits the number of real optimizations that can be modeled in the theoretical framework. For instance, dead store elimination is proven unsound in [17], while it is implemented in actual tracing compilers [15,Section 5.1]. Furthermore, their formalization fails to model some important features of actual TJITC implementation: (i) traces are simple linear paths of instructions, i.e., they cannot be annotated with store properties; (ii) hot path selection is completely non-deterministic, they do not model a selection criterion; and, (iii) once execution leaves an optimized hot path the program will not be able to re-enter it.
It is also worth citing that abstract interpretation of program trace semantics roots at the foundational work by Cousot [8,9] and has been widely used as a successful technique for defining a range of static program analyses [2,7,18,22,[26][27][28]. Abstract interpretation has been used to describe static compilation and optimizations. In particular, Rival [25] describes various optimizations as trace abstractions they preserve. In the Cousot and Cousot terminology [12], Rival approach corresponds to offline transformations whereas tracing compilation is an online transformation.

Syntax
Following [12], we consider a basic low level language with untyped global variables, a kind of elementary dynamic language. Program commands range in C and consist of a labeled action which specifies a next label (Ł is the undefined label, where the execution becomes stuck). For any command C ≡ L : A → L , we use the following notation: lbl(C) L, act(C) A and suc(C) L . Commands L : B → L whose action is a Boolean expression are called conditionals. A program P ∈ ℘(C) is a (possibly infinite, at least in theory) set of commands, with a distinct initial label Lin from which execution starts, so that Pin denotes the commands in P labeled by Lin (Pin consists of two commands when the initial command is a conditional). If a program P includes a conditional C ≡ L : B → L then P must also include a unique complement conditional L : ¬B → L , which is denoted by cmpl(C) or C c , where ¬¬B is taken to be equal to B, so that cmpl(cmpl(C)) = C. For simplicity, we consider deterministic programs, i.e., we require that for any C1, C2 ∈ P such that lbl(C1) = lbl(C2): The set of well-formed programs is denoted by Program.

Transition Semantics
The language semantics relies on the following type values, where Char is a nonempty set of characters and undef represents a generic error.

Int Z Bool {true, false}
String Char * Undef {undef } In turn, Value and type names in Types are defined as follows: Types {Int, String, Undef, Any, ∅} while type : Value → Types provides the type of any value. Here, the type name Any plays the role of top type, which is the supertype (i.e., contains) all types, while ∅ is the bottom type, which is a subtype for all types.
Let ρ ∈ Store Var → Value denote the set of possible program stores. The semantics of expressions and program actions is standard and goes as defined in Fig. 1. Let us remark that: the binary function +Int denotes integer addition; · is string concatenation; logical negation and conjunction are extended in order to handle undef values, i.e., ¬undef = undef and undef ∧ b = undef = b ∧ undef . With a slight abuse of notation we also consider the so-called collecting versions of the semantic functions in Fig. 1: Program states are pairs of stores and commands: State Store ×C. If P is a program then StateP Store ×P . We extend lbl, act and suc to be defined on states, meaning that they are defined on the command component of a state. Also, store(s) returns the store of a state s. Given P ∈ Program, the program transition relation S P : StateP → ℘(StateP ) between states is defined as follows: It is worth remarking that, according to the above definition, if C ≡ L : A → L , C1 ≡ L : B → L and C c 1 ≡ L : ¬B → L are all commands that belong to P and ρ ∈ A A ρ then we have that S P ρ, C = { ρ , C1 , ρ , C c 1 }.

Trace Semantics
A partial (forward) trace is a finite sequence of program states which are related by the transition relation S. If P is a program then we define A trace σ ∈ TraceP is maximal if for any state s ∈ StateP , σs ∈ TraceP . Let us note that according to the above definitions, if a trace σ ∈ TraceP has a last state σ |σ|−1 = ρ, L : B → L with a conditional command such that B B ρ = false then σ is maximal. Also, if a trace σ ∈ TraceP has a last state σ |σ|−1 = ρ, L : A → Ł whose next label is the undefined label then σ is maximal as well.
The trace semantics T P is the set of all the partial (including maximal) traces of the program P . This set is defined as the least fixed point of a monotonic operator F [P ] : ℘(TraceP ) → ℘(TraceP ), called trace transition operator, defined as follows: The function F [P ] is trivially monotone on the complete lattice ℘(TraceP ), ⊆ , so that its least fixpoint T P is well defined.
Example 2.1. Let us consider the program below written in some while-language: Its translation in our language is given below, where, with a little abuse, we assume that the syntax of arithmetic and Boolean expression is extended to allow expressions like x%3 = 0.
he trace semantics T P includes the following partial traces, where ? stands for any integer value and stores are denoted within square brackets.

Abstract Interpretation Background
In standard abstract interpretation [10,11], abstract domains (or abstractions) are specified by Galois connections/insertions (GCs/GIs for short) or, equivalently, adjunctions. Concrete and abstract domains, C, ≤C and A, ≤A , are assumed to be complete lattices which are related by abstraction and concretization maps α : C → A and γ : A → C that give rise to an adjunction (α, C, A, γ), that is, for all a and c, α(c) ≤A a ⇔ c ≤C γ(a). A GC is a GI when α • γ = λx.x. It is well known that a join-preserving α uniquely determines γ as follows: γ(a) = ∨{c ∈ C | α(c) ≤A a}; conversely, a meet-preserving γ uniquely determines α as follows: Let f : C → C be some concrete monotone functionfor simplicity, we consider 1-ary functions -and let f : A → A be a corresponding monotone abstract function defined on some abstraction A related to C by a GC. Then, f is a correct abstract interpretation of f on A when α • f f • α holds, where denotes the pointwise ordering. Moreover, the abstract function Hence, for any abstraction A, f A plays the role of the best possible approximation of f on the abstract domain A.

Store Abstractions
As usual in abstract interpretation, a store property is modeled by some abstraction Store that we assume to be encoded through a Galois connection For instance, as we will see later, the static types of program variables give rise to a simple store abstraction.
Given a program P , a store abstraction Store also induces a corresponding state abstraction State P Store ×P and, in turn, a trace abstraction Trace P (State P ) * .

Nonrelational Abstractions
Nonrelational store abstractions can be easily designed by a standard pointwise lifting of some value abstraction. Let Value be a value abstraction as formalized by a Galois connection The abstract domain Value induces a nonrelational store abstraction where is the pointwise ordering induced by ≤ Value : Here, the bottom and top abstract stores are, respectively, λx.⊥ Value and λx. Value . The abstraction map α value : ℘(Store) → Store value is thus defined as follows: while the corresponding concretization map γ value : Store value → ℘(Store) is defined by adjunction from α value as recalled in Section 3 and it is easy to check that it turns out to be defined as follows:

Hot Path Selection
A loop path is a sequence of program commands which is repeated in some execution of a program loop, together with a store property which is valid at the entry of each command in the path. A loop path becomes hot when, during the execution, it is repeated at least a fixed number N of times. In a TJITC, hot path selection is performed by a loop path monitor that also records store properties (see, e.g., [15]). Here, hot path selection is not operationally defined, it is instead modeled as an abstraction map over program traces, i.e., program executions. We first define a mapping loop : TraceP → ℘(TraceP ) that returns all the loop paths in some execution trace of a program P . Formally, a loop path is a substring (i.e., a segment) τ of a trace σ such that: (1) the successor command in σ of the last state in τ exists and coincides with the command -or its complement, when this is the last loop iteration -of the first state in τ ; (2) there is no other such command within τ (otherwise the sequence τ would contain multiple iterations); (3) the last state of τ performs a backward jump in the program P . To recognize backward jumps, we consider a topological order on the control flow graph of commands in P , here denoted by : Let us remark that a loop path ρi, Ci · · · ρj, Cj ∈ loop( ρ0, C0 · · · ρn, Cn ) may contain some sub-loop path, namely it may happen that loop( ρi, Ci · · · ρj, Cj ) = ∅ so that some commands C k , with k ∈ [i, j], occur more than once in ρi, Ci · · · ρj, Cj . We abuse notation by using αstore to denote a map αstore : TraceP → Trace P which "abstracts" program traces in Trace P : Given a static parameter N > 0, we define a function which returns the set of Store -abstracted loop paths appearing at least N times in some program trace. To count the number of times a loop path appears within a trace we use an auxiliary function count : Trace P × Trace P → N such that count(σ, τ ) yields the number of times an abstract path τ occurs in a abstract trace σ: Hence, hot N can be defined as follows: i ≤ j, αstore ( ρi, Ci · · · ρj, Cj ) = ai, Ci · · · aj, Cj , count(αstore (σ), ai, Ci · · · aj, Cj ) ≥ N¯.
Finally, an abstraction map α N hot : ℘(TraceP ) → ℘(Trace P ) collects the results of applying hot N to a set of traces: A hot path hp ∈ α N hot (T P ) is also called a N -hot path and is compactly denoted as hp = a0, C0, ..., an, Cn . Let us observe that if the hot path is the body of some while loop then its first command C0 is a conditional, namely C0 is the Boolean guard of the while loop. We define the following successor function for indices in hot paths: next λi. i = n ? 0 : i + 1. For a N -hot path a0, C0, ..., an, Cn ∈ α N hot (T P ), for any i ∈ [0, n], if Ci is a conditional command Li : Bi → L next(i) then throughout the paper its complement C c i = cmpl(Ci) will be also denoted by Li : ¬Bi → L c next(i) .
Example 5.1. Let us consider the program P in Example 2.1.
We consider a trivial one-point store abstraction Store = { }, where all the stores are abstracted to the same abstract store , i.e., αstore = λS. . Here, we have two 2-hot paths in P , that is, it turns

Trace Extraction
For any abstract store a ∈ Store , a corresponding Boolean expression "guard Ea" ∈ BExp is defined (where the notation Ea should hint at an expression which is induced by the abstract store a) , whose semantics is as follows: for any ρ ∈ Store, Thus, in turn, we also have program actions guard Ea such that: Let P be a program and hp = a0, C0, ..., an, Cn ∈ α N hot (T P ) be a hot path on some store abstraction Store . We define a synctatic transform of P where the hot path hp is explicitly extracted from P . This is implemented by a suitable relabeling of each command Ci in hp which is in turn preceded by the conditional guard Ea i induced by the store property ai. To this aim, we consider three injective relabeling functions where L1, L2 and L are pairwise disjoint sets of fresh labels, so that labels(P ) ∩ (L1 ∪ L2 ∪ L) = ∅. The transformed program extr hp (P ) for the hot path hp is defined as follows and a graphical example of this transform is depicted in Fig. 2.
Definition 6.1 (Trace extraction transform). The trace extraction transform of P for the hot path hp is: where the stitch of hp into P is defined as follows: The new command L0 : guard Ea 0 → 0 is therefore the entry conditional of the stitched hot path stitchP (hp), while any command C ∈ stitchP (hp) such that suc(C) ∈ labels(P ) ∪ L is a potential exit (or bail out) command of stitchP (hp). Lemma 6.2. If P is well-formed then, for any hot path hp, extr hp (P ) is well-formed.
Let us remark that the stitch of the hot path hp into P is always a linear sequence of different commands, namely, stitchP (hp) does not contain loops nor join points. Furthermore, this happens even if the hot path hp does contain some inner sub-loop. Technically, this comes as a consequence of the fact that the above relabeling functions are required to be injective. Hence, even if some command C occurs more than once inside hp then these multiple occurrences of C in hp are transformed into differently labeled commands in stitchP (hp). Here, we have that for any store ρ ∈ Store, B guard E ρ = true. The trace extraction transform of P w.r.t. hp is therefore as follows:

Correctness
As advocated by Cousot and Cousot [12, par. 3.8], correctness of dynamic program transformations and optimizations should be defined with respect to some observational abstraction of program trace semantics: a program transform is correct when, at some level of abstraction, the observation of the execution of the subject program is equivalent to the observation of the execution of the transformed program. The approach by Guo and Palsberg [17] basically relies on a notion of correctness that requires the same store changes in both the transformed/optimized program and the original program. This can be easily encoded by an observational abstraction αsc : ℘(TraceP ) → ℘(Store * ) of trace semantics that observes store changes in execution traces of a program P : Since αsc obviously preserves arbitrary set unions, it admits a right adjoint γsc : ℘(Store * ) → ℘(TraceP ) defined as γsc(S) ∪{T ∈ ℘(TraceP ) | αsc(T ) ⊆ S}, that gives rise to a GC (αsc, ℘(TraceP ), ⊆ , ℘(Store * ), ⊆ , γsc).
However, the store changes abstraction αsc may be too strong in practice. This condition can be thus relaxed and generalized to an observational abstraction that demands to have the same stores (possibly just for some subset of variables) only at some specific program points. For example, these program points may depend on the language. In a language without output primitives and functions, as that considered in [17], we could be interested just in the final store of the program (when it terminates), or in the entry and exit stores of any loop containing an extracted hot path. If a more general language includes a sort of primitive "put X " that "outputs" the value of program variables ranging in some set X then we may want to have stores with the same values for variables in X at each output point. Moreover, the same sequence of outputs should be preserved, i.e. optimizations must not modify the order of output instructions.
We therefore consider an additional sort of actions: put X ∈ A, where X ⊆ Var is a set of program variables. The semantics of put X obviously does not affect program stores, i.e., A put X ρ ρ. Correspondingly, an observational abstraction αo : ℘(TraceP ) → ℘(Store * ) of trace semantics observes program stores at output program points only (we use ρ |X to denote store restriction to variables in X ): Similarly to αsc, here again we have a GC (αo, ℘(TraceP ), ⊆ , ℘(Store * ), ⊆ , γo). This approach is clearly more general because the above store changes abstraction αsc is more precise than αo, i.e., for any set of traces T , γsc(αsc(T )) ⊆ γo(αo(T )), or, equivalently, αsc(T1) = αsc(T2) ⇒ αo(T1) = αo(T2). and perform dead store elimination by optimizing hp to hp = x ≤ 0, x := x+1, z := 1 . As observed by Guo and Palsberg [17,Section 4.3], this is clearly unsound in bisimulation-based correctness because this hot path optimization does not output bisimilar code. By contrast, this optimization can be made sound in our framework by choosing an observational abstraction that records store changes at the beginning and at the exit of loops containing extracted hot paths.

Correctness Proof
It turns out that observational correctness of the hot path extraction transform can be proved w.r.t. the more precise observational abstraction αsc.
In the rest of this section we outline a proof sketch of this result. Let us fix a hot path hp = a0, C0, ..., an, Cn ∈ α N hot (T P ) and let P hp extr hp (P ). The proof relies on a mapping of traces of the program P into corresponding traces of P hp that unfolds the hot path hp (or any its initial fragment) according to the hot path extraction strategy given by Definition 6.1.
We define two functions tr in hp , tr out hp : TraceP → TraceP hp in Fig. 3. The first function, tr out hp (sσ), on the trace sσ begins to unfold in P hp the hot path hp when: (i) s = ρ, C0 where C0 is the first command of hp; and (ii) the condition guard Ea 0 is satisfied in the store ρ. If this unfolding for the trace sσ is actually started by applying tr out hp (sσ) then it is carried on by applying tr in hp (σ), i.e., with a in-modality. The second function application, tr in hp (sσ), carries on the unfolding of hp in P hp when: (i) s = ρ, Ci where i ∈ [1, n − 1], namely the command Ci in hp is different from C0 and Cn; and (ii) the condition guard Ea i holds for the store ρ. If this is not the case then tr in hp ( ρ, Ci σ), after a suitable unfolding step for ρ, Ci , jumps back to the out-modality by progressing with tr out hp (σ). It turns out that these two functions are well defined and tr out hp does not alter store change sequences. Lemma 7.3.
(A) tr hp (T P ) ⊆ T P hp : this shows that for any execution trace σ of P , tr out hp (σ) is an execution trace of P hp ; this is not hard to prove.
(B) αsc(T P hp ) ⊆ αsc(T P ): this is proved by the following statement: σ ∈ T P hp tr hp (T P ) ⇒ F [P hp ]{σ} ∈ tr hp (T P ). The proof relies on the fact that one such trace σ is necessarily of the following shape: σ = σ ρ, C where tr out hp ( ) act(C) ∈ {guard Ea i , ¬guard Ea i }; then, it is not hard to prove that F [P hp ]{σ ρ, C } ∈ tr hp (T P ). In words, one such trace σ of P hp can be extended through an execution step in P hp to a trace in tr hp (T P ).
We therefore obtain:

αsc(T P )
and this closes the proof.

Type Specialization
One key optimization for dynamic languages like JavaScript and PHP is type specialization, that is, using type-specific primitives in place of generic untyped operations whose runtime execution can be very costly. As a paradigmatic example, a generic addition operation could be defined on more than one type, so that the execution environment must check the type of its operands and execute a different operation depending on these types: this is the case of the addition operation in JavaScript (see its semantics in the ECMA-262 standard [21,Section 11.6]) and of the semantics of + in our language as given in Section 2.2. Of course, type specialization avoids the overhead of dynamic type checking and dispatch of generic untyped operations. When a type is associated to each variable before the execution of a command in some hot path, this type environment can be used to replace generic operations with type-specific primitives.

Type Abstraction
Let us recall that the set of type names is Types = {Int, String, Undef, Any, ∅}.
Type names can be therefore viewed as the following finite lattice Types, ⊆ : Given a value v ∈ Value, αtype ({v}) thus coincides with type(v).
Here, the concretization function γtype : Types → ℘(Value) is simply the identity map (with Any = Value). Following the general approach described in Section 4.1, we consider a simple nonrelational store abstraction for types

Store t
Var → Types,⊆ where⊆ is the usual pointwise lifting of the ordering ⊆ for Types, so that λx.∅ and λx. Any are, respectively, the bottom and top abstract stores in Store t . The abstraction and concretization maps αstore : ℘(Store) → Store t and γstore : Store t → ℘(Store) are defined as a straight instantiation of the definitions in Section 4.1.
The abstract type semantics E t : Exp → Store t → Types of expressions is defined as best correct approximation of the corresponding concrete semantics E on the type abstractions Store t and Types, i.e., E t E ρ t αtype (E E γstore (ρ t )). This definition leads to the following equalities: For instance, we have that: According to Section 6, for any abstract type store [xi/Ti | xi ∈ Var] we consider a corresponding Boolean action guard guard x0 : T0 · · · xn : Tn ∈ BExp whose corresponding program action has the following semantics, which is automatically induced (as defined in Section 6) by the Galois connection (αstore , ℘(Store), Store t , γstore ): for any ρ ∈ Store, A guard x0 : T0 · · · xn : Tn ρ ( ρ if ∀i. type(ρ(xi)) ⊆ Ti ⊥ ∃i. type(ρ(xi)) ⊆ Ti

Type Specialization of Hot Paths
Let us consider some hot path hp = ρ t 0 , C0, . . . , ρ t n , Cn ∈ α N hot (T P ) on the type abstraction Store t ,⊆ , where each ρ t i is therefore a type map. The trace extraction transform extr hp (P ) of P for hp gives rise to the set stitchP (hp) of commands that stitches the hot path hp into P . Hence, for any i ∈ [0, n], stitchP (hp) contains a typed guard that we simply denote as guard ρ t i . Typed guards allow us to define type specialization of commands in the stitched hot path: this is defined as a program transform that instantiates most type-specific addition operations in place of generic untyped additions by exploiting the type information dynamically recorded by typed guards in stitchP (hp).
Note that if C ∈ stitchP (hp) and act(C) ≡ x := E1 + E2 then C ≡ i : x := E1 + E2 → L , for some i ∈ [0, n], where L ∈ {li+1, L0}. Let C t denote the set of commands that permits type specific additions +Int and +String and, in turn, Program t denote the possible type specialized programs over C t . The function ts hp : stitchP (hp) → C t is defined as follows: Hence, hot path type specialization TS is defined by The correctness of this program transform is quite straightforward. Let Trace t be the set of traces for type specialized programs in Program t and let tt : Trace t → Trace be defined as follows: Theorem 8.1 (Correctness of type specialization). For any typed hp ∈ α N hot (T P ), we have that tt(T TS(stitchP (hp)) ) = T stitchP (hp) .
Typed trace extraction extr t hp (P ) consists in extracting and simultaneously type specializing a typed hot path hp in a program P , i.e., it can be defined as follows: extr t hp (P ) extr hp (P ) stitchP (hp) ∪ TS(stitchP (hp)). Correctness of typed trace extraction extr t hp is a straight consequence of Theorems 7.2 and 8.1.
Corollary 8.2 (Correctness of typed trace extraction). For any typed hp ∈ α N hot (T P ), we have that αsc(T extr t hp (P ) ) = αsc(T P ). Example 8.3. Let us consider the following sieve of Eratosthenes in a Javascript-like language -this is taken from the running example in [15] -where primes is initialized to an array of 100 true values. With a slight abuse, we assume that our language is extended with Boolean values and arrays. The semantics of arrays load and stores is as usual: first the index expression is checked to be in bounds, then the value is read or stored into the array. If the index is out of bounds, we assume the program is aborted.
This program is encoded in our language as follows: Let us consider the type environment ρ t defined as where primes[n]/ Bool is a shorthand for primes[0]/ Bool, . . . , primes[99]/ Bool. Then the first traced 2-hot path on the type abstraction Store t is: As a consequence, the typed transform extraction of hp1 yields:

A General Correctness Criterion
Abstract interpretation allows us to view type specialization in Section 8 just as a particular correct hot path optimization that can be easily generalized. Guarded hot paths are a key feature of our tracing compilation model, where guards are dynamically recorded by the hot path monitor and range over abstract values in some store abstraction. An abstract guard for a command C in some hot path hp thus encodes a store property which is modeled in some abstract domain Store and is guaranteed to hold at the entry of C. This store information encapsulated by abstract guards can then be used to transform and optimize hp, i.e., all the commands in the stitched hot path stitchP (hp).
This provides a modular approach to proving the correctness of some hot path optimization O. In fact, since correctness has to be proved w.r.t. some observational abstraction αo of trace semantics and Theorem 7.2 ensures that this correctness holds for the store changes abstraction αsc of the unoptimized trace extraction transform, we just need to prove the correctness of the optimization O on the whole stitched hot path stitchP (hp), which thus includes the abstract guards of the hot path hp. Hence, fixing a program P , a hot path optimization O is modeled as a program transform Program may permit new expressions and/or actions, like the case of type-specific addition operations in type specialization. O is required to be correct according to the following definition.
As an example, it would be quite simple to formalize the variable folding optimization of hot paths considered by Guo and Palsberg [17] and to prove it correct in our framework w.r.t. the store changes abstraction αsc.

Nested Hot Paths
Once a first hot path hp1 has been extracted by transforming P to P1 extr hp 1 (P ), it may well happen that a new hot path hp2 in P1 contains hp1 as a nested sub-path. Following TraceMonkey's trace recording strategy [15], we attempt to nest an inner hot path inside the current trace: during trace recording, an inner hot path is called as a subroutine, this executes a loop to a successful completion and then returns to the trace recorder that may therefore register the inner hot path as part of a new hot path.
To this aim, let us reshape the definitions in Section 5. Let P be the original program and P be some hot path transform of P so that P P contains all the commands (guards included) in the hot path. We define a function hotcut : Trace P → (State P ) * that cuts from a trace in P all the states whose commands appear in (some previous) hot path hp except the entry and exit states of hp: In turn, we define outerhot N : Trace P → ℘((State P ) * ) as follows: αstore ( ρi, Ci · · · ρj, Cj ) = ai, Ci · · · aj, Cj , count(αstore (hotcut(σ)), ai, Ci · · · aj, Cj ) ≥ N }.
Clearly, when P = P we have that hotcut = λσ.σ so that outerhot N = hot N . Finally, we define the collecting version α N outerhot λT. ∪σ∈T outerhot N (σ). ], C4 · · · so that hp2 = , H0, , H c 5 , , C4 ∈ α 2 outerhot (T P1 ) Hence, hp2 contains a nested hot path, which is called at the beginning of hp2 and whose entry and exit commands are, respectively, H0 and H c 5 . Let hp = a0, C0, . . . , an, Cn ∈ α N outerhot (T P ) be a Nhot path in P , where, for all i ∈ [0, n], Ci ≡ Li : Ai → L next(i) . Let us note that: -If for all i ∈ [0, n], Ci ∈ P then hp actually is a hot path in P , i.e., hp ∈ α N hot (T P ). -Otherwise, there exists some C k ∈ P . If Ci ∈ P and Ci+1 ∈ P then Ci+1 is the entry command of some inner hot path; on the other hand, if Ci ∈ P and Ci+1 ∈ P then Ci is the exit command of some inner hot path.
The transform of P for extracting hp is a generalization of Definition 6.1.
Definition 10.2 (Nested trace extraction transform). The nested trace extraction transform of P for the hot path hp is: Let us observe that: -Clauses (1)-(6) are the same clauses of Definition 6.1, with the additional constraints that Ci and cmpl(Ci) are all commands in P , conditions which are trivially satisfied in Definition 6.1.
-Clause (7) where Ci ∈ P and Ci+1 ∈ P , namely next(Ci) is the call program point of a nested hot path nhp and Ci+1 is the entry command of nhp, performs a relabeling that allows to correctly nest nhp in hp.
-Clauses (8)- (9) where Ci ∈ P and Ci+1 ∈ P , i.e., Ci is the exit command of a nested hot path nhp that returns to the program point lbl(Ci+1), performs the relabeling of suc(Ci) in Ci in order to return from nhp to hp; We have thus obtained the same three trace extraction steps as described by Gal et al. [15,Section 2]. In particular, in P1 we specialized the typed addition operation k := k +Int i, in P2 we specialized k := i +Int i and i := i +Int 1, while in P3 we specialized once again i := i +Int 1 in a different hot path. Thus, in P3 all the addition operations have been type specialized.

Comparison with Guo and Palsberg's Framework
A formal model for tracing JIT compilation has been put forward in POPL 2011 by Guo and Palsberg [17]. Its main distinctive feature is the use of bisimulation [23] to describe operational equivalence between source and optimized programs. In this section we show how this model can be expressed within our framework.

Language and Semantics
Guo and Palsberg [17] employ a simple imperative language with while loops and a so-called bail construct.

Language Compilation
Programs in Stm can be easily compiled into Program by resorting to an injective labeling function : Stm → L that assigns different labels to different statements. Correctness for the above compilation function C means that for any S ∈ Stm: (i) C(S) ∈ Program and (ii) program traces of S and C(S) have the same store sequences. If st : TraceGP ∪ Trace → Store * returns the store sequence of a trace, i.e., st( ) and st( ρ, S σ) ρ · st(σ), and, for a set X of traces αst (X) {st(σ) | σ ∈ X}, then correctness goes as follows: Theorem 11.3 (Correctness of language compilation). For any S ∈ Stm, C(S) ∈ Program and αst (TGP S ) = αst (T C(S) ).

Hot Paths and Trace Extraction
In Guo and Palsberg's model [17]: (i) hot paths always begin with an entry while-loop conditional, which is however not included in the hot path; (ii) the store of a hot path is recorded at the end of the first loop iteration and is a concrete store; and (iii) hot paths actually are 1-hot paths according to our definition. Guo and Palsberg's hot loops can be modeled in our framework by relying on a loop selection map loop GP : Trace → ℘(C * × Store) defined as follows: loop GP ( ρ0, C0 · · · ρn, Cn ) ˘ CiCi+1 · · · Cj, ρj+1 | 0 ≤ i ≤ j < n, Ci Cj, lbl(Cj+1) = lbl(Ci), ∀k ∈ (i, j]. C k ∈ {Ci, cmpl(Ci)}¯.
Notice that, for simplicity, the above definition includes the entry loop conditional in the hot path. The map α GP hot : ℘(Trace) → ℘(C * × Store) then lifts loop GP to sets of traces: α GP hot (T ) ∪σ∈T loop GP (σ).
Let us thus consider a hot path hp = C0C1 · · · Cn, ρ ∈ α GP hot (T P ), for some P ∈ Program (where P may coincide with a compiled C(S) for some S ∈ Stm) and let us follow the same notation used in Section 6. Guo and Palsberg's [17] trace extraction scheme is defined as follows, where the hot path hp cannot be reentered once execution leaves hp.
Definition 11.6 (GP trace extraction transform). The GP trace extraction transform of P for the hot path hp is: Clearly, extr GP hp (P ) remains a well-formed program. The correctness of this GP trace extraction transform, which is stated and proved in [17,Lemma 3.6], goes as follows.
Theorem 11.7 (Correctness of GP trace extraction). For any P ∈ Program, hp = C0 · · · Cn, ρ ∈ α GP hot (T P ), we have that α ρ stc (T extr GP hp (P ) ) = α ρ stc (T P ). Example 11.8. Let us consider the program P in Example 2.1 and the GP-hot path hp = C1C2C c 3 , ρ = [x/1] ∈ α GP hot (T P ). A corresponding 2-hot path hp1 with the same sequence of commands has been selected in Example 5.1 and extracted in Example 6.3. Here, the GP trace extraction of hp provides the following program transform:

Further Work
We have put forward a formal model of tracing compilation and correctness of hot path optimization based on program trace semantics and abstract interpretation. We see a number of interesting avenues for further work on this topic. We aim at making use of this framework to study and relate the foundational differences between traditional static vs dynamic tracing compilation. We then expect to formalize and prove the correctness of most beneficial optimizations employed by tracing compilers of practical dynamic languages like JavaScript, PHP and Python. For example, we plan to cast in our model the allocation removal optimization for Python described in [5] in order to formally prove its correctness. Finally, we plan to adapt our framework in order to provide a model of whole-method just-in-time compilation, as used, e.g., by Ion-Monkey [14], the current JIT compilation scheme in the Firefox JavaScript engine.
tion 2013 Award (SEIF 2013) and by the University of Padova under the project BECOM.