The Cost of Skeletal Call-By-Need, Smoothly
Abstract
Skeletal call-by-need is an optimization of call-by-need evaluation also known as “fully lazy sharing”: when the duplication of a value has to take place, it is first split into “skeleton”, which is then duplicated, and “flesh” which is instead kept shared.
Here, we provide two cost analyses of skeletal call-by-need. Firstly, we provide a family of terms showing that skeletal call-by-need can be asymptotically exponentially faster than call-by-need in both time and space; it is the first such evidence, to our knowledge.
Secondly, we prove that skeletal call-by-need can be implemented efficiently, that is, with bi-linear overhead. This result is obtained by providing a new smooth presentation of ideas by Shivers and Wand for the reconstruction of skeletons, which is then smoothly plugged into the study of an abstract machine following the distillation technique by Accattoli et al.
Keywords and phrases:
-calculus, abstract machines, call-by-need, cost modelsFunding:
Claudio Sacerdoti Coen: Supported by the INdAM-GNCS project “Modelli Composizionali per l’Analisi di Sistemi Reversibili Distribuiti” and by the Cost Action CA2011 EuroProofNet.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Lambda calculusSupplementary Material:
Software (Source Code): https://github.com/Franciman/fullylazymadarchived at
swh:1:dir:50ff71d3e33d6febc5f53b8a6887aac6ef0be90f
Editors:
Maribel FernándezSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Call-by-need evaluation of the -calculus was introduced by Wadsworth in 1971 as an optimization of the untyped -calculus [41]. It combines the advantages of both call-by-name and call-by-value: it avoids diverging on unused arguments, as in call-by-name, and when arguments are evaluated they do so only once, as in call-by-value. Such a combination is trickier to specify than call-by-name or call-by-value. In particular, Wadsworth’s original presentation via graph reduction was not easy to manage.
In the 1990s, call-by-need was adopted by the then new Haskell language, and more manageable term-based calculi were developed by Launchbury [35] and Ariola et al. [21], and extended with non-determinism by Kutzner and Schmidt-Schauß [34]. Additionally, Sestoft studied abstract machines for call-by-need [38]. In 2014, Accattoli et al. introduced the CbNeed linear substitution calculus (CbNeed LSC) [4], a simplification of Ariola et al.’s calculus resting on the at a distance approach to explicit substitutions by Accattoli and Kesner [14]. It became a reference presentation, used in many recent studies [31, 24, 5, 33, 26, 13, 32, 25, 17, 16]. As far as this paper is concerned, the study in [4] is particularly relevant: beyond introducing the CbNeed LSC that we shall use, it also relates it with abstract machines for CbNeed, building a neat bisimulation between the two, deemed distillation.
Skeletal Call-by-Need.
CbNeed was introduced together with a further optimization usually called fully lazy sharing and here rather referred to as Skeletal CbNeed (because fully lazy sharing is not really descriptive). The idea is to refine the duplication of a value so as to duplicate only the skeleton of while keeping its flesh shared, thus ending up sharing more than in ordinary call-by-need. As an example, if then the skeleton of is and its flesh is the context . That is, the flesh collects and removes the maximal (non-variable) sub-terms of that are free, i.e. such that none of their free variables is captured in .
Skeletal CbNeed is less studied than CbNeed because it is even trickier to define, as it requires to compute the skeletal decomposition of a value (that is, the split into skeleton and flesh). Early works by Turner [40], Hughes [30], and Peyton Jones [36] focus on implementative aspects. In the last ten years or so, various works provided operational insights. Balabonski developed an impressive theoretical analysis of Skeletal CbNeed, showing the equivalence of various presentations [22] and studying the relationship with Lévy’s optimality [23]. Soon after that, Gundersen et al. showed that the atomic -calculus, a -calculus with sharing issued from deep inference technology, naturally specifies the duplication of the skeleton [29]. Next, Kesner et al. [32] merged the insights of the CbNeed LSC and the atomic -calculus obtaining a Skeletal CbNeed LSC, that is, a presentation at a distance of Skeletal CbNeed.
This Paper.
The present work extends the operational research line of Balabonski, Gundersen et al., and Kesner et al. with cost analyses of Kesner et al.’s Skeletal CbNeed LSC, aiming at closing the gap with some of the earlier implementative studies at the same time. Generally speaking, we aim at adding Skeletal CbNeed to the list of concepts covered by the theory of cost-based analyses of abstract machines and sharing mechanisms by Accattoli and co-authors [4, 10, 19, 5, 7, 27, 12, 11, 8, 15]. We provide two analyses.
First Analysis: Exponential Time and Space Speed-Ups.
The first analysis shows that in some cases Skeletal CbNeed is considerably more efficient than CbNeed. For that, we exhibit a family of -terms such that evaluates in time and space in CbNeed, while requiring only space and time in Skeletal CbNeed. The literature only shows examples of single -terms where Skeletal CbNeed takes a few steps less than CbNeed, but never proves any asymptotic speed-up nor deals with space. Lastly, proving that our family evaluates within the given bounds requires a fine, non-trivial analysis of evaluation sequences.
Second Analysis: Bi-Linear Overhead.
The second analysis shows that Skeletal CbNeed can be implemented efficiently, with an overhead that is bi-linear, i.e. linear in both the number of -steps and the size of the initial term. This time, the same is true for CbNeed. The insight is that computing skeletons does require additional implementative efforts, but it comes at no asymptotic price. This work was indeed triggered by showing that the operational reconstruction of the skeleton at work in Gundersen et al. [29] and Kesner et al. [32] can be implemented in linear time. Their specifications rest on side conditions about variables in sub-terms that lead to non-linear overhead, if implemented literally. In fact, the linear time reconstruction of skeletons by Shivers and Wand [39], from 2010, avoids the side conditions and pre-dates the recent works. The insightful study in [39], however, is technical and not well-known. Here we simplify it, recasting it in the context of abstract machines.
Efficient Skeletal Decompositions, or Shivers and Wand Reloaded.
Shivers and Wand present their ideas using graph-reduction for -terms. They reconstruct the skeletal decomposition via a visit of the value to skeletonize seen as a graph, based on two key points:
-
1.
Bi-directional edges: all the edges of their graphs can be traversed in both directions. When term graphs are seen as mathematical objects, edges are often directed but the theoretical study can also look at edges in reverse. At the implementative level, however, things are different. For a parsimonious use of space, most graph-based implementations (e.g. the proof nets inspired ones in [5, 8, 20]) indeed restrict some edges to be traversed in one direction only (namely, parent to children), thus sparing some pointers. Therefore, bi-directional edges are an unusual approach, usually considered redundant.
-
2.
Abstractions, occurrences, and upward reconstruction: in most graphical representations, there are edges between abstractions and the occurrences of their bound variables, usually directed only from the occurrences to the abstraction. Shivers and Wand’s insight is that, with bi-directional edges, one can move from the abstraction back to the occurrences of the variable and then reconstruct the skeleton by visiting upward from the occurrences.
Shivers and Wand’s study is graph-based and low-level, interleaved with code snippets. Here, we provide a new neat presentation of their algorithm as a simple high-level rewriting system over terms, circumventing graphs and low-level details. The reformulation of their study is the main technical innovation of this paper. While the key concepts are due to Shivers and Wand, we believe that our new presentation is a notable contribution in that it allows their concepts to blossom, simplifying their study and making them more widely accessible.
Distillation.
We then smoothly lift the distillation-based study of CbNeed by Accattoli et al. [4] to Skeletal CbNeed, providing an abstract machines using the reloaded algorithm for skeletal decompositions as a black-box. The new abstract machine is then shown to be implementable within a bi-linear overhead, completing the second analysis.
The overall insight is that the apparently linear space inefficiency of adding pointers for bi-directional traversals of terms/graphs enables optimizations – namely, skeletal duplications – that can bring, in some cases, exponential speed-ups for both time and space.
Implementation.
We provide an OCaml implementation of the abstract machine, and in particular of the reloaded skeleton reconstruction algorithm it rests upon. The implementation is there to validate the complexity analyses, as well as to integrate the low-level aspect of Shivers and Wand’s study. The implementation is discussed in Appendix A.
Proofs.
A long version with proofs in the Appendix is on arXiv [18].
2 Background: Call-by-Need
In this section, we recall CbNeed, presented as a strategy of the CbNeed linear substitution calculus (shortened to LSC). For the sake of simplicity, we do not distinguish here between the strategy and the calculus, and use CbNeed LSC to refer to both.
Accattoli and Kesner’s LSC [1, 6] is a micro-step -calculus with explicit substitutions. Micro-step means that substitutions act on one variable occurrence at a time, rather than small-step, that is, on all occurrences at the same time. The CbNeed LSC was introduced by Accattoli et al. [4] as a simplified variant of Ariola et al.’s presentation of CbNeed [21]. The simplification is the use of rewriting rules at a distance – that is, adopting contexts in the root rules – that allow one to get rid of the commuting rewriting rules of Ariola et al.
Root rules
Terms and Contexts.
The set of terms of the -calculus is noted . The CbNeed LSC is defined in Fig. 1. The set of terms of the CbNeed LSC is noted . They add explicit substitutions (shortened to ESs) to -terms, that is a more compact notation for , but where the order of evaluation between and is a priori not fixed; evaluation contexts shall fix it. The set of free variables is defined as expected, in particular, . Both and bind in , and terms are identified up to -renaming. A term is closed if , open otherwise. Meta-level capture-avoiding substitution is noted . The size of is the number of its constructs.
Note that values are only abstractions; this choice is standard in the literature on CbNeed. Note that, moreover, their bodies are -terms without ESs. This is in order to avoid dealing with calculating the skeleton of ESs in the next section. It is a harmless choice because evaluation (that creates ESs) never enters inside abstractions.
Contexts are terms with exactly one occurrence of the hole , an additional constant, standing for a removed sub-term. We shall heavily use two notions of contexts: substitution contexts , that are lists of ESs, and evaluation contexts .
The main operation about contexts is plugging where the hole in context is replaced by . Plugging, as usual with contexts, can capture variables; for instance . We write when we want to stress that the context does not capture the free variables of .
Rewriting Rules.
The reduction rules of the CbNeed LSC are slightly unusual as they use contexts both to allow one to reduce redexes located in sub-terms (via evaluation contexts ), which is standard, and to define the redexes themselves (via both substitution and evaluation contexts ), which is less standard. This approach is called at a distance and related to cut-elimination on proof nets; see Accattoli [2, 3] for the link with proof nets. The notion of evaluation context used for CbNeed is exactly the one by Ariola et al. [21], that extend call-by-name evaluation contexts with the production , which – by enterings ESs – is what enables the memoization aspect of CbNeed.
The distant beta rule is essentially the -rule, except that the argument goes into a new ES, rather then being immediately substituted, and that there can be a substitution context in between the abstraction and the argument. Example: . One with on-the-fly -renaming is .
The linear substitution rule by need replaces a single variable occurrence by a copy of the value in the ES, commuting the substitution context around the value (if any) out of the ES. Example: .
The use of substitution contexts in the root rules is what allows one to avoid the two commuting rules of Ariola et al. [21], namely (rephrasing their -expressions as ESs) and .
The two root rules and are then closed by evaluation contexts . In Fig. 1, we use the compact notation to denote that is defined as if , and similarly for the other rule.
3 Skeletal Call-by-Need
In this section, we present our version of Skeletal CbNeed, which is a very minor variant of the one by Kesner et al. [32] (the difference is discussed at the end of the section), itself defined by tweaking the CbNeed LSC via the notion of skeleton.
The basic idea of skeletal CbNeed is that every value can be split in two: the skeleton , which is a sort of sub-term of , and the flesh , which is a substitution context collecting the maximal sub-terms of that do not depend on . The flesh is extracted from before duplicating it, as to avoid duplicating the code (and possibly redexes) of the flesh, thus increasing sharing. Some definitions are in order.
Free Sub-Terms.
Extractable sub-terms are called free sub-terms, defined next. We actually need a parametrized notion of free sub-term, which is relative to a set of variables that are supposed not to occur in the sub-term.
Definition 1 (Maximally free sub-terms).
Let and be a set of variables. A sub-term of is (the term part of) a decomposition for some context ; additionally, is -free in if no variable in is captured by and if , and maximally -free if it is not a strict sub-term of any other -free sub-term in .
Skeleton.
The skeletal decomposition of a term is the splitting of into its maximal free sub-terms on one side, collected as a substitution context called the flesh, and what is left of after the removal, that is, its skeleton. As for free sub-terms, we rather define parametrized notions of skeletal decomposition and skeleton. Then, we specialize the definitions for values.
Definition 2 ((Relative) Skeleton).
Let be a set of variables. The skeletal decomposition of relative to a set of variables is the pair of an ordinary -term and a substitution context defined as with fresh, if and is not a variable, otherwise:
The skeleton, the flesh, and the skeletal decomposition of are defined respectively as , , and , where .
Root rules
Skeletal CbNeed.
We adopt the presentation of skeletal CbNeed by Kesner et al. [32] (refered to as fully lazy CbNeed by them), that adds a new skeletal (explicit) substitution containing skeletal values, that is, values that are equal to their skeleton. The language of terms and the strategy is defined in Fig. 2. The new set of terms is noted .
Notation: when we need to treat explicit and skeletal substitution together, we use the notation , which stands for either or (with a slight abuse) .
The CbNeed substitution rule for a redex is refined via two rules. The first skeletonization rule does three further mini tasks at once: firstly, it decomposes in its skeleton and its flesh (thus obtaining ); secondly, it re-organizes the term commuting and out of the ES (obtaining ); thirdly, it turns the ES into a skeletal substitutions, finally producing .
The second skeletal substitution rule simply replaces with .
Note that evaluation contexts are adapted to include skeletal substitutions, but that they do not enter inside them, since their content is always a value. The distant rule for Skeletal CbNeed is noted to distinguish it from the one for CbNeed. The Skeletal CbNeed reduction is the union of , , and .
The next proposition is an easy adaptation of the one for CbNeed in Accattoli et al. [4].
Proposition 3.
The reduction is deterministic.
Structural Equivalence.
To study the relationship with abstract machines, we shall need the concept of structural equivalence, defined in Fig. 3. The concept is standard for -calculi with ESs at a distance, and it also standard to use it in relationship with abstract machines, as systematically done by Accattoli et al. [4]. Intuitively, structural equivalence allows one to move explicit/skeletal substitutions around because the move does not change the behaviour of the term. This fact is captured by the following strong bisimulation property, proved by an immediate adaptation of the similar property in [4].
Root structural equalities Closure rules ref sym tr ev
Proposition 4 ( is a strong bisimulation).
Let . If and then there exists such that and .
Difference with Kesner et al’s Skeletal CbNeed [32].
There is a small difference between our definition of Skeletal CbNeed and Kesner et al.’s, namely in the definition of skeletons. For us, the skeletal decomposition of, say, is , while for them it is , that is, they flesh-out also single variable occurrences while we do not, obtaining a slightly more parsimonious decomposition.
This difference, however, has essentially no impact on the theory, nor on the asymptotic costs. In particular, the operational equivalence of Skeletal CbNeed with respect to CbNeed and CbN is preserved. The proof of Kesner et al., indeed, works for all decompositions such that substituting the flesh into the skeleton recovers the original value; our decomposition has this property. Therefore, we can import the following result.
Theorem 5 (Operational equivalence [32]).
Two -terms and are CbN observational equivalent if and only if they are Skeletal CbNeed observational equivalent.
4 Skeletal CbNeed Can Bring an Exponential Asymptotic Speed-Up
In this section, we show that Skeletal CbNeed can be both exponentially faster and use exponentially less space than CbNeed, taking as reference cost models (following the terminology of Accattoli et al. [15]):
-
Abstract time, that is the number of (distant) -steps;
-
Ink space, that is, the maximum size of the involved terms.
By smoothly adapting results in the literature [37, 9, 28], these can be proved to be reasonable cost models for time for CbNeed and for (super-)linear space for both CbNeed and Skeletal CbNeed. In fact, the relevant part of proving that abstract time is reasonable for Skeletal CbNeed shall be our own Theorem 31 at the end of the paper.
We prove the result by defining a family of terms and showing that evaluates in an exponential number of steps and producing terms of exponential size in CbNeed, while it uses only a linear number of steps and terms of linear size in Skeletal CbNeed.
The Family.
We shall use the abbreviations and . Next, we define an auxiliary family as follows: and . Lastly, the actual family is defined as .
Our result is that the term evaluates in -steps (namely steps) with skeletal CbNeed and -steps (namely steps) with CbNeed. This is shown via quite technical statements. Firstly, note that ; it is actually for that we shall prove bounds. These bounds are expressed stating that there are families of evaluation contexts and (one for CbNeed and one for Skeletal CbNeed) such that reduces to and (which is a normal form in both cases) in and steps respectively. The results about space are obtained by showing that has size exponential in , while is linear in .
Evaluation in Skeletal CbNeed.
The skeletal case is, perhaps surprisingly, the easiest case for which one can find an inductive invariant leading to the bound.
Lemma 6.
Let and . Then there is a substitution context and an evaluation such that and .
Evaluation in CbNeed.
For the non-skeletal case, the bound is formulated via two families of substitution contexts. For , the first family is given by and with . The second family is defined in the proof of the lemma, itself relying on an auxiliary lemma in the technical report [18]. The statement is parametrized by an evaluation context for the induction to go through, but the relevant case is the one with empty.
Lemma 7.
Let and be an evaluation context. Then there exist a substitution context and an evaluation such that and .
The next statement sums up the results, and is based on the result at the end of the paper stating that Skeletal CbNeed can be implemented in bilinear time, that is, linear in the size of the initial term and in the number of steps (Theorem 31).
Theorem 8 (Instance of exponential skeletal speed-up for both time and space).
There is a family of -terms such that normalizes in abstract time and ink space with Skeletal CbNeed and abstract time and ink space with CbNeed.
Proof.
-
For Skeletal CbNeed, by Lemma 6 there is an evaluation sequence:
taking -steps and such that . Note that the calculus has no garbage collection rule, so the size of terms diminishes only by steps, and of a constant amount (a step removes two constructs and adds one). Therefore, the maximum size of terms is bounded by the size of the final term plus the number of steps. That is, the space cost is .
5 A Rewriting-Based Algorithm for Computing the Skeleton
In this section, we first define marked terms and a marked reformulation of skeletal decompositions. Then, we use the marked setting to give a graphically-inspired rewriting system computing the skeletal decomposition of a value, giving a neat new presentation of ideas first developed by Shivers and Wand [39]. The rewriting system formalizes an algorithm. It is presented as a rewriting system as to smoothly manage it as an operational semantics notion.
Marked Skeleton.
Marked terms and contexts are defined in Fig. 4. The idea is to incrementally mark with a tag the constructs belonging to the skeleton during a visit of the term which is allowed to jump from an abstraction to the occurrences of the variables, since these are usually connected in a graphical representation.
Constructs of the calculus are then either the usual ones (referred to as unmarked) or marked with , which means that the constructor is part of the skeleton. Additionally, there is an extra up construct that shall be used to indicate the frontier of the visit of the term.
In the marked setting, we can recast skeletal decompositions by marking as white the constructs belonging to the skeleton and leaving unmarked all the constructs of the flesh.
Definition 9 (Marked skeleton).
Let be a set of variables. The marked skeleton of relative to is defined as if , otherwise:
The marked skeleton of is defined as .
Example: for , one has .
From the Marked Skeleton to the Skeletal Decomposition.
Next, we turn marked skeletons into skeletal decompositions, by extracting unmarked sub-terms and turning them into the flesh, while removing the marks from the skeleton. We shall do this via a splitting function, defined below, which operates on (parametrized) marked skeletons. For that, we give a lemma guaranteeing that the unmarked sub-terms of marked skeletons can be recognized from their top constructor, that is in , thus removing the need to inspect them.
Lemma 10 (Being unmarked is downward closed in marked skeletons).
Let with starting unmarked. Then has no marks.
The lemma implies the correctness of the first clause of the following definition, as it ensures that in that case has no marks.
Definition 11 (Splitting).
Let for some and . The following splitting function separates the skeleton from the flesh and removes all marks:
Example: .
We can now prove that the marked notions allow one to retrieve the standard ones.
Proposition 12 (Soundness of the marked skeleton).
Let and be a set of variables.
-
1.
Auxiliary parametrized statement: ;
-
2.
Soundness: .
Computing the Marked Skeleton.
We shall now define a parallel algorithm for computing the marked skeleton via a rewriting system on marked terms. The definition is in Fig. 5. The algorithm shall be applied to abstractions, say to . The algorithm is described by inserting the up symbols , denoting where the algorithm is operating, and propagating through the term. At the beginning, there is exactly one , as the algorithm starts on .
The algorithm has two sets of rules. Propagation rules turn the encountered unmarked constructs into marked ones. The interesting rule is for abstractions, that is also the very first one to be applied on the initial term . Its effect is that is marked as and all the occurrences of in are also marked, obtaining ; this is how we recast the role played by Shivers and Wand’s bi-directional edges connecting abstractions and variables. Essentially, propagation rules move upwards but they also add at the bottom of the syntax tree on the variable occurrences associated to marked abstractions.
Absorption rules, instead, remove the symbol when it encounters a marked construct, as it means that the concerned thread of the algorithm is passing where another thread already passed before, thus the concerned thread becomes redundant and can be terminated.
Intuitively, the set of -sub-terms is the frontier of where the parallel algorithm is operating. When the algorithm is over, no internal node is marked with .
The algorithm is parallel in that the initial substitution and rule might introduce many occurrences of , which can be propagated independently.
Initialization Contextual closures for
As an example, consider , for which one has, for instance:
Proposition 13.
Reduction is strongly normalizing and diamond (thus confluent).
Proof.
-
1.
Strong normalization. Let be a marked term. As terminating measure, consider where and are the number of unmarked and constructs in . Note that the propagation rules decrease while the absorption rules decrease and leave unchanged. Then is strongly normalizing.
-
2.
Diamond. We prove the diamond property, which implies confluence. Clearly, redexes cannot duplicate or erase each other. There are only two critical pairs, namely:
To state the correctness of the algorithm and give a bound on the number of its steps, we need two definitions.
Definition 14.
Let be a marked term.
-
Starts with /unmarked/: starts with (resp. unmarked, resp. with ) if it has shape , , or (resp. , , or , resp. ).
-
White size: the white size of is the number of constructs in .
The following lemma is the key step in the proof of correctness of the algorithm. It is a technical lemma, the statement of which is more easily understood by looking at the proof of the following theorem.
Lemma 15 (Crucial).
Let be a term, a set of variables, , , and . Then:
Theorem 16 (Soundness of the algorithm).
Let be a term. Then with .
Proof.
By definition , while . We consider two cases.
-
Case . Then and , so . Thus, . Moreover, , since the only white construct of is the root abstraction.
5.1 Complexity Analysis
We proved that the number of rewriting steps of the algorithm for computing the marked skeleton is exactly its marked size (Theorem 16), which is the size of the skeleton (Prop. 12). The complexity of the algorithm is indeed linear but it requires some discussion, since steps are not constant time operations. Moreover, we need to take into account the cost of splitting.
Complexity Analysis 1: Propagation + Absorption.
For computing from , we assume that abstractions have pointers to the occurrences of the variable, so that the substitution in does not require going through the whole of . This is Shivers and Wand’s key idea, and also how our implementation works (see Appendix A). A global argument gives the cost: the algorithm turns unmarked constructs into -constructs exactly once, and it generates at most -constructs that are never duplicated and are absorbed exactly once, thus it globally requires time.
Lemma 17 (Cost of propagation + absorption).
Computing from requires time .
Complexity Analysis 2: Splitting.
For the cost of splitting, Lemma 10 guarantees that the unmarked sub-terms of marked skeletons can be recognized in by only inspecting their topmost constructor, so that implementing the splitting function takes time proportional to the number of marked constructors only.
Lemma 18 (Cost of splitting).
Computing from requires time .
The next theorem sums it all up, recasting the obtained costs in the unmarked setting via the soundness of the algorithm (Prop. 12).
Theorem 19 (The algorithm is linear in the skeleton).
Let be a value. Then the skeletal decomposition of can be computed in .
6 Preliminaries about Abstract Machines
In this section, we introduce terminology and basic concepts about abstract machines, that shall be the topic of the following sections.
Abstract Machines Glossary.
Abstract machines manipulate pre-terms, that is, terms without implicit -renaming. In this paper, an abstract machine is a quadruple the components of which are as follows.
-
States. A state is a quadruple composed by the active term , plus three data structure, namely the chain , the (applicative) stack , and the (global) environment . Terms in states are actually pre-terms.
-
Transitions. The pair is a transition system with transitions partitioned into principal transitions, whose union is noted and that are meant to correspond to steps on the calculus, and search transitions, whose union is noted , that take care of searching for (principal) redexes.
-
Initialization. The component is the initialization relation associating closed terms without ESs to initial states. It is a relation and not a function because maps a closed -term (considered modulo ) to a state having a pre-term representant of (which is not modulo ) as active term. Intuitively, any two states and such that and are -equivalent. A state is reachable if it can be reached starting from an initial state, that is, if where for some and , shortened as .
-
Read-back. The read-back function turns reachable states into terms (possibly with ESs, that is, or ) and satisfies the initialization constraint: if then .
A state is final if no transitions apply. A run is a possibly empty finite sequence of transitions, the length of which is noted ; note that the first and the last states of a run are not necessarily initial and final. If and are transitions labels (that is, and ) then and is the number of transitions in , , and similarly for more than two labels.
For the machines at work in this paper, the pre-terms in initial states shall be well-bound, that is, they have pairwise distinct bound names; for instance is well-bound while is not. We shall also write in a state for a fresh well-bound renaming of , i.e. is -equivalent to , well-bound, and its bound variables are fresh with respect to those in and in the other components of .
Mechanical Bismulations.
Machines are usually showed to be correct with respect to a strategy via some form of bisimulation relating terms and machine states. The notion that we adopt is here dubbed mechanical bisimulation, and it should not be confused with the strong bisimulation property of structural equivalence (which actually plays a role in mechanical bisimulations). The definition, tuned towards complexity analyses, requires a perfect match between the steps of the evaluation sequence and the principal transitions of the machine run. Terminology: a structural strategy is a strategy plus a relation that is a strong bisimulation with respect to (see Prop. 4).
Definition 20 (Mechanical bisimulation).
A machine and a structural strategy on -terms are mechanical bisimilar when, given an initial state :
-
1.
Runs to evaluations: for any run there exists an evaluation ;
-
2.
Evaluations to runs: for every evaluation there exists a run such that ;
-
3.
Principal matching: for every principal transition of label of , in both previous points the number of -transitions in is exactly the number of of -steps in the evaluation , i.e. .
The proof that a machine and a strategy are in a mechanical bisimulation follows from some basic properties, grouped under the notion of distillery, following Accattoli et al. [4].
Definition 21 (Distillery).
A machine and a structural strategy are a distillery if the following conditions hold:
-
1.
Principal projection: implies for every principal transition of label ;
-
2.
Search transparency: implies ;
-
3.
Search transitions terminate: terminates;
-
4.
Determinism: is deterministic;
-
5.
Halt: final states decode to -normal terms.
Theorem 22 (Sufficient condition for implementations).
Let a machine and a structural strategy be a distillery. Then, they are mechanical bisimilar.
7 The MAD and the Skeletal MAD
In this section, we study an abstract machine for skeletal CbNeed, obtained as a minor modification of a known machine for CbNeed, that is recalled here. Since the two machines share most data structures and transitions, they are presented together in Fig. 6 (slightly abusing notations: the two machines define environments differently, so the notation of the common transitions is overloaded).
The MAD.
The Milner Abstract machine by neeD (MAD) of Fig. 6 implements the CbNeed strategy . It is a CbNeed variant of the Milner Abstract Machine (MAM), itself a simplification of the Krivine abstract machine (KAM). The MAM is a CbN abstract machine using a single global environment (sometimes called heap), a stack for arguments, and no closures – it is omitted here (the KAM instead uses closures and many local environments). Both the MAM and the MAD are introduced by Accattoli et al. in [4].
The MAD modifies the MAM by adding a mechanism realizing the memoization characteristic of CbNeed. Namely, when execution encounters a variable occurrence associated to an environment entry such that is not a value, then the MAM duplicates while the MAD does not. Instead, part of the current state is saved on the new chain data structure , and the MAD starts evaluating inside the environment entry itself; this is transition . When such an evaluations ends with a value , the machines resumes the state saved on the chain, and the environment is updated replacing with – this is transition – so that future encounters of will not need to re-evaluate . Both the MAM and the MAD rely on -renaming as a black-box operation , at work in transition .
The Skeletal MAD.
The Skeletal MAD, also in Fig. 6, modifies the MAD exactly as the Skeletal CbNeed LSC modifies the CbNeed LSC, that is, by adding skeletal entries to the global environment and splitting the substitution transtition into two: the skeletonizing transition that separates the skeleton from the flesh of the value , and the skeletal substitution transition that substitutes the (-renamed) skeleton .
Notation. For specifying transition , we use the following notation for the flesh: a substitution context , where each can be or , induces a global environment .
The union of all the transition rules of the Skeletal MAD (namely, , , , , , and ) is noted .
The read-back of (Skeletal) MAD states to terms (possibly with ESs) is defined in Fig. 6. The definition uses auxiliary notions of read-back for chains, stacks, and environments that turn them into CbNeed evaluation contexts. For stacks and environments, their read-back is clearly an evaluation context. For chains, it shall be proved as an invariant of the machine.
Invariants.
To prove properties of the Skeletal MAD we need to isolate some of its invariants in Lemma 24 below. The closure one ensures that the closure of the initial term extends, in an appropriate sense, to all reachable states. The well-bound invariant, similarly, ensures that binders in reachable states are pairwise distinct, as in the initial term. To compactly formulate the closure invariant we need the notion of terms of a state.
Definition 23 (Terms of a state).
Let be a Skeletal MAD state where is a possibly empty chain for some . The terms of are , every term in and , and every term in an entry of or , for .
Lemma 24 (Qualitative invariants).
Let be a Skeletal MAD run.
-
1.
Closure: for every term of and for any variable there is an environment entry , , or on the right of in .
-
2.
Well-bound: if occurs in and has any other occurrence in then it is as a free variable of , and for any substitution , , or in the name can occur (in any form) only on the left of that entry in .
-
3.
Contextual chain read-back: , , and are CbNeed evaluation contexts.
Mechanical Bisimulation.
The invariants allow us to prove that the Skeletal MAD and the strategy modulo form a distillery, from which the mechanical bisimulation follows (by Theorem 22). The closure invariant is used for Point 4, the other two for Point 1.
Theorem 25 (Skeletal distillery).
The Skeletal MAD and the structural strategy form a distillery. Namely, let be a reachable Skeletal MAD state.
-
1.
Principal projection:
-
(a)
if then .
-
(b)
if then .
-
(c)
if then .
-
(a)
-
2.
Search transparency: if then .
-
3.
Search terminates: is terminating.
-
4.
Halt: if is final then it has shape and is -normal.
Corollary 26 (Skeletal mechanical bisimulation).
The Skeletal MAD and the structural strategy are in a mechanical bisimulation.
8 Complexity Analysis
In this section, we show that the Skeletal MAD can be concretely implemented within the same complexity of the MAD, that is, with bi-linear overhead.
Sub-Term Property.
As it is standard, the complexity analysis crucially relies on the sub-term property, that bounds the size of manipulated (and thus duplicated) terms using the size of the initial term. The proof requires a similar property about skeletons.
Lemma 27 (Skeleton and flesh size).
-
1.
Auxiliary parametrized statement: let , be a set of variables, and . Then and for any ES in ;
-
2.
Main: let . Then and for any ES in .
Lemma 28 (Quantitative sub-term property).
Let be a Skeletal MAD run. Then for any term of .
Next, some basic observations about the transitions, together with the sub-term property, allow us to bound their number using the two key parameters (that is, number of -steps/transitions and size of the initial term).
Proposition 29 (Number of transitions).
Let be a Skeletal MAD run.
-
1.
, and
-
2.
.
Proof.
-
1.
-
(a)
, because every transition consumes one a non-value entry from the environment, which are created only by transitions.
-
(b)
, because every transition consumes one entry from the chain, which are created only by transitions, which in turn are bound by transitions.
-
(c)
, because after a transition there can be either a transition, a transition, or no transition (if the transition is the last one of the run ). Therefore, .
-
(d)
, because every transition is followed by a transition.
-
(a)
-
2.
Note that transitions decrease the size of the code, which is increased only by and transition. By the sub-term property (Lemma 28), the code increase is bounded by the size of the initial term. By Point 1, . Then .
Lastly, we need some assumptions on how the Skeletal MAD can be concretely implemented. Our OCaml implementation (see Appendix A) represents variables as memory locations and variable occurrences as pointers to those locations, obtaining random access to environment entries in . As already mentioned in Sect. 5, all constructors have pointers to their parents in the syntactic tree, and in particular variables have pointers to their occurrences, as to implement efficiently the computation of the skeleton. Moreover, environments are implemented as doubly linked lists, to split/concatenate them in in transitions and . The cost of transitions and is bound by the size of the value to skeletonize/copy, which is bound by the sub-term property.
Lemma 30 (Cost of single transitions).
Let be a run. Implementing any transitions from costs but for and transitions, which cost each.
Putting all together, we obtain a bilinear bound.
Theorem 31 (The Skeletal MAD is bi-linear).
Let be a Skeletal MAD run. Then can be implemented on random access machiness in .
9 Conclusions
We show that skeletal call-by-need provides, in some cases, exponentially shorter evaluation sequences involving exponentially smaller terms than call-by-need. We also show that the required skeleton reconstruction can be implemented in linear time and space, by giving a new simpler and graph-free presentation of ideas by Shivers and Wand. Lastly, we smoothly plug this result in the existing distillation technique for abstract machines, obtaining an implementation of skeletal call-by-need with bi-linear overhead. In particular, the bi-linear overhead allows us to establish that the shrinking of evaluation sequences is a real time speed-up: the smart specification of skeletal call-by-need does not hide an expensive overhead.
References
- [1] Beniamino Accattoli. An abstract factorization theorem for explicit substitutions. In Ashish Tiwari, editor, 23rd International Conference on Rewriting Techniques and Applications (RTA’12) , RTA 2012, May 28 - June 2, 2012, Nagoya, Japan, volume 15 of LIPIcs, pages 6–21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPICS.RTA.2012.6.
- [2] Beniamino Accattoli. Proof nets and the call-by-value -calculus. Theor. Comput. Sci., 606:2–24, 2015. doi:10.1016/J.TCS.2015.08.006.
- [3] Beniamino Accattoli. Proof nets and the linear substitution calculus. In Bernd Fischer and Tarmo Uustalu, editors, Theoretical Aspects of Computing - ICTAC 2018 - 15th International Colloquium, Stellenbosch, South Africa, October 16-19, 2018, Proceedings, volume 11187 of Lecture Notes in Computer Science, pages 37–61. Springer, 2018. doi:10.1007/978-3-030-02508-3_3.
- [4] Beniamino Accattoli, Pablo Barenbaum, and Damiano Mazza. Distilling abstract machines. In Johan Jeuring and Manuel M. T. Chakravarty, editors, Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, Gothenburg, Sweden, September 1-3, 2014, pages 363–376. ACM, 2014. doi:10.1145/2628136.2628154.
- [5] Beniamino Accattoli and Bruno Barras. Environments and the complexity of abstract machines. In Wim Vanhoof and Brigitte Pientka, editors, Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming, Namur, Belgium, October 09 - 11, 2017, pages 4–16. ACM, 2017. doi:10.1145/3131851.3131855.
- [6] Beniamino Accattoli, Eduardo Bonelli, Delia Kesner, and Carlos Lombardi. A nonstandard standardization theorem. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 659–670. ACM, 2014. doi:10.1145/2535838.2535886.
- [7] Beniamino Accattoli, Andrea Condoluci, Giulio Guerrieri, and Claudio Sacerdoti Coen. Crumbling abstract machines. In Ekaterina Komendantskaya, editor, Proceedings of the 21st International Symposium on Principles and Practice of Programming Languages, PPDP 2019, Porto, Portugal, October 7-9, 2019, pages 4:1–4:15. ACM, 2019. doi:10.1145/3354166.3354169.
- [8] Beniamino Accattoli, Andrea Condoluci, and Claudio Sacerdoti Coen. Strong call-by-value is reasonable, implosively. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–14. IEEE, 2021. doi:10.1109/LICS52264.2021.9470630.
- [9] Beniamino Accattoli and Ugo Dal Lago. On the invariance of the unitary cost model for head reduction. In Ashish Tiwari, editor, 23rd International Conference on Rewriting Techniques and Applications (RTA’12) , RTA 2012, May 28 - June 2, 2012, Nagoya, Japan, volume 15 of LIPIcs, pages 22–37. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPICS.RTA.2012.22.
- [10] Beniamino Accattoli and Ugo Dal Lago. (leftmost-outermost) beta reduction is invariant, indeed. Log. Methods Comput. Sci., 12(1), 2016. doi:10.2168/LMCS-12(1:4)2016.
- [11] Beniamino Accattoli, Ugo Dal Lago, and Gabriele Vanoni. The (in)efficiency of interaction. Proc. ACM Program. Lang., 5(POPL):1–33, 2021. doi:10.1145/3434332.
- [12] Beniamino Accattoli and Giulio Guerrieri. Abstract machines for open call-by-value. Sci. Comput. Program., 184, 2019. doi:10.1016/J.SCICO.2019.03.002.
- [13] Beniamino Accattoli, Giulio Guerrieri, and Maico Leberle. Types by need. In Luís Caires, editor, Programming Languages and Systems - 28th European Symposium on Programming, ESOP 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, volume 11423 of Lecture Notes in Computer Science, pages 410–439. Springer, 2019. doi:10.1007/978-3-030-17184-1_15.
- [14] Beniamino Accattoli and Delia Kesner. The structural -calculus. In Anuj Dawar and Helmut Veith, editors, Computer Science Logic, 24th International Workshop , CSL 2010, 19th Annual Conference of the EACSL, Brno, Czech Republic, August 23-27, 2010. Proceedings, volume 6247 of Lecture Notes in Computer Science, pages 381–395. Springer, 2010. doi:10.1007/978-3-642-15205-4_30.
- [15] Beniamino Accattoli, Ugo Dal Lago, and Gabriele Vanoni. Reasonable space for the -calculus, logarithmically. Log. Methods Comput. Sci., 20(4), 2024. doi:10.46298/LMCS-20(4:15)2024.
- [16] Beniamino Accattoli and Adrienne Lancelot. Mirroring call-by-need, or values acting silly. In Jakob Rehof, editor, 9th International Conference on Formal Structures for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia, volume 299 of LIPIcs, pages 23:1–23:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.FSCD.2024.23.
- [17] Beniamino Accattoli and Maico Leberle. Useful open call-by-need. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany, volume 216 of LIPIcs, pages 4:1–4:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CSL.2022.4.
- [18] Beniamino Accattoli, Francesco Magliocca, Loïc Peyrot, and Claudio Sacerdoti Coen. The cost of skeletal call-by-need, smoothly, 2025. arXiv:2505.09242.
- [19] Beniamino Accattoli and Claudio Sacerdoti Coen. On the value of variables. Inf. Comput., 255:224–242, 2017. doi:10.1016/J.IC.2017.01.003.
- [20] Beniamino Accattoli and Claudio Sacerdoti Coen. IMELL cut elimination with linear overhead. In Jakob Rehof, editor, 9th International Conference on Formal Structures for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia, volume 299 of LIPIcs, pages 24:1–24:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.FSCD.2024.24.
- [21] Zena M. Ariola, Matthias Felleisen, John Maraist, Martin Odersky, and Philip Wadler. The call-by-need lambda calculus. In Ron K. Cytron and Peter Lee, editors, Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, pages 233–246. ACM Press, 1995. doi:10.1145/199448.199507.
- [22] Thibaut Balabonski. A unified approach to fully lazy sharing. In John Field and Michael Hicks, editors, Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, pages 469–480. ACM, 2012. doi:10.1145/2103656.2103713.
- [23] Thibaut Balabonski. Weak optimality, and the meaning of sharing. In Greg Morrisett and Tarmo Uustalu, editors, ACM SIGPLAN International Conference on Functional Programming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013, pages 263–274. ACM, 2013. doi:10.1145/2500365.2500606.
- [24] Thibaut Balabonski, Pablo Barenbaum, Eduardo Bonelli, and Delia Kesner. Foundations of strong call by need. PACMPL, 1(ICFP):20:1–20:29, 2017. doi:10.1145/3110264.
- [25] Thibaut Balabonski, Antoine Lanco, and Guillaume Melquiond. A strong call-by-need calculus. In Naoki Kobayashi, editor, 6th International Conference on Formal Structures for Computation and Deduction, FSCD 2021, July 17-24, 2021, Buenos Aires, Argentina (Virtual Conference), volume 195 of LIPIcs, pages 9:1–9:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.FSCD.2021.9.
- [26] Pablo Barenbaum, Eduardo Bonelli, and Kareem Mohamed. Pattern matching and fixed points: Resource types and strong call-by-need: Extended abstract. In David Sabel and Peter Thiemann, editors, Proceedings of the 20th International Symposium on Principles and Practice of Declarative Programming, PPDP 2018, Frankfurt am Main, Germany, September 03-05, 2018, pages 6:1–6:12. ACM, 2018. doi:10.1145/3236950.3236972.
- [27] Andrea Condoluci, Beniamino Accattoli, and Claudio Sacerdoti Coen. Sharing equality is linear. In Ekaterina Komendantskaya, editor, Proceedings of the 21st International Symposium on Principles and Practice of Programming Languages, PPDP 2019, Porto, Portugal, October 7-9, 2019, pages 9:1–9:14. ACM, 2019. doi:10.1145/3354166.3354174.
- [28] Yannick Forster, Fabian Kunze, and Marc Roth. The weak call-by-value -calculus is reasonable for both time and space. Proc. ACM Program. Lang., 4(POPL):27:1–27:23, 2020. doi:10.1145/3371095.
- [29] Tom Gundersen, Willem Heijltjes, and Michel Parigot. Atomic lambda calculus: A typed lambda-calculus with explicit sharing. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013, pages 311–320. IEEE Computer Society, 2013. doi:10.1109/LICS.2013.37.
- [30] John Hughes. The Design and Implementation of Programming Languages. PhD thesis, Programming Research Group, Oxford University, July 1983.
- [31] Delia Kesner. Reasoning about call-by-need by means of types. In Bart Jacobs and Christof Löding, editors, Foundations of Software Science and Computation Structures - 19th International Conference, FOSSACS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9634 of Lecture Notes in Computer Science, pages 424–441. Springer, 2016. doi:10.1007/978-3-662-49630-5_25.
- [32] Delia Kesner, Loïc Peyrot, and Daniel Ventura. Node replication: Theory and practice. Log. Methods Comput. Sci., 20(1), 2024. doi:10.46298/LMCS-20(1:5)2024.
- [33] Delia Kesner, Alejandro Ríos, and André s Viso. Call-by-need, neededness and all that. In Christel Baier and Ugo Dal Lago, editors, Foundations of Software Science and Computation Structures - 21st International Conference, FOSSACS 2018, Thessaloniki, Greece, April 14-20, 2018 , Proceedings, volume 10803 of Lecture Notes in Computer Science, pages 241–257. Springer, 2018. doi:10.1007/978-3-319-89366-2_13.
- [34] Arne Kutzner and Manfred Schmidt-Schauß. A non-deterministic call-by-need lambda calculus. In Matthias Felleisen, Paul Hudak, and Christian Queinnec, editors, Proceedings of the third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98), Baltimore, USA, September 27-29, 1998, pages 324–335. ACM, 1998. doi:10.1145/289423.289462.
- [35] John Launchbury. A natural semantics for lazy evaluation. In Mary S. Van Deusen and Bernard Lang, editors, Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, USA, January 1993, pages 144–154. ACM Press, 1993. doi:10.1145/158511.158618.
- [36] Simon L. Peyton Jones. The Implementation of Functional Programming Languages. Prentice-Hall, 1987.
- [37] David Sands, Jörgen Gustavsson, and Andrew Moran. Lambda calculi and linear speedups. In Torben Æ. Mogensen, David A. Schmidt, and Ivan Hal Sudborough, editors, The Essence of Computation, Complexity, Analysis, Transformation. Essays Dedicated to Neil D. Jones [on occasion of his 60th birthday], volume 2566 of Lecture Notes in Computer Science, pages 60–84. Springer, 2002. doi:10.1007/3-540-36377-7_4.
- [38] Peter Sestoft. Deriving a lazy abstract machine. J. Funct. Program., 7(3):231–264, 1997. doi:10.1017/S0956796897002712.
- [39] Olin Shivers and Mitchell Wand. Bottom-up beta-reduction: Uplinks and lambda-dags. Fundam. Informaticae, 103(1-4):247–287, 2010. doi:10.3233/FI-2010-328.
- [40] D. A. Turner. A new implementation technique for applicative languages. Softw. Pract. Exp., 9(1):31–49, 1979. doi:10.1002/SPE.4380090105.
- [41] Christopher P. Wadsworth. Semantics and pragmatics of the lambda-calculus. PhD Thesis, Oxford, 1971.
Appendix A OCaml Implementation
We provide at https://github.com/Franciman/fullylazymad a reference implementation in OCaml of the Skeletal MAD, meant to guarantee the soundness of the assumptions that underly the complexity analyses. The implementation takes in input a user provided term and it shows the full run printing all transitions and all intermediate machine states. We highlight here a few details on the implementation.
Data structures.
Pre-terms are represented by the following algebraic data type (ADT):
whose constructors App for applications, Abs for abstractions and Var for variable occurrence hold mutable pointers to their subterms, a mutable optional pointer to the parent and a mutable boolean taken for marking terms during skeleton extraction.
Variable occurrences point to a shared, unique variable declaration/substitution in memory of type var_ref, which holds the variable name, a substitution description field of type sub and two optional pointers (prev and next) to insert the variable in an environment, that is a double-linked lists of variable declarations. A substitution description is an ADT whose constructors are Sub for the explicit substitution , SubSkel for the skeletal substitutions , NoSub for bound variables, Hole to represent in chain items and Copy (parents,) to point to another variable declaration together with the list of its parents. The latter form is used temporarily to preserve sharing of variable declarations during the implementation of : the first time a variable is to be copied its substitution is make to point to the new copy and each time an occurrence of the variable is to be copied, the same variable declaration is reused and a new parent is added for it.
Abstractions also point to the list of occurrences of the bound variable, to implement rule in . This is a minor memory-saving divergence from the paper where it was suggested to have backpointers to the occurrences stored in the bound variable declaration node, of type var_ref: once an abstraction is consumed, the backpointers become useless and thus avoiding them in the implementation saves some space.
Machine states are implemented straightforwardly using lists for stacks, chains, and the set of roots (which is then actually represented as a list, not as a set), and doubly linked lists of variable declarations for environments:
Skeleton Reconstruction.
The rewriting system for marking terms to extract the skeleton and flesh is implemented sequentially as a recursive function. Each activation record for a call to mark_skeleton in the OCaml stack corresponds in the paper to an occurrence of in the rewritten term.
Parallel implementations of mark_skeleton are possible, but because of the necessity of syncronization over application nodes and because the size of skeletonized terms is supposed to be small, it is unclear if actual parallelization would provide any speed-up.
Remaining Code.
The code to extract the skeleton and flesh from marked terms, and the code for machine transitions follow straightforwardly the rules in the paper, up to noisy code to keep the doubly linked structure of terms and environments. Here is an example of the code for one transition in the Skeletal MAD:
The code that implements also follows the standard algorithm to copy a graph in linear time preserving the sharing, described by Accattoli and Barras in [5]. It uses the Copy constructor to temporarily associate each variable to its copy.
Appendix B Example of Skeletal MAD Run
We show here an example of invocation of our implementation. Suppose that the file family_3.lambda contains the following input that encodes the pure term that yields the third term of the family we studied, where backslashes are ASCII art equivalents of .
The command
yields the following output111 The option fully-lazy-functional-linked-env selects the Skeletal MAD implementation described in the paper, that stays as close as possible to the abstract machine description. The repository also contains code for alternative implementations of the Skeletal MAD, comprising versions that keep the machine state garbage free.:
Number of betas: 24
Interpreter fully-lazy-functional-linked-env, final result:
where in each machine state the chain is red, the code is underlined, the stack has no special marking and the environemnt is green.
