Abstract 1 Introduction 2 Background: Call-by-Need 3 Skeletal Call-by-Need 4 Skeletal CbNeed Can Bring an Exponential Asymptotic Speed-Up 5 A Rewriting-Based Algorithm for Computing the Skeleton 6 Preliminaries about Abstract Machines 7 The MAD and the Skeletal MAD 8 Complexity Analysis 9 Conclusions References Appendix A OCaml Implementation Appendix B Example of Skeletal MAD Run

The Cost of Skeletal Call-By-Need, Smoothly

Beniamino Accattoli ORCID Inria & LIX, École Polytechnique, Palaiseau, France Francesco Magliocca ORCID Università degli Studi di Napoli Federico II, Italy Loïc Peyrot ORCID IMDEA Software Institute, Madrid, Spain Claudio Sacerdoti Coen ORCID Alma Mater Studiorum – Università di Bologna, Italy
Abstract

Skeletal call-by-need is an optimization of call-by-need evaluation also known as “fully lazy sharing”: when the duplication of a value has to take place, it is first split into “skeleton”, which is then duplicated, and “flesh” which is instead kept shared.

Here, we provide two cost analyses of skeletal call-by-need. Firstly, we provide a family of terms showing that skeletal call-by-need can be asymptotically exponentially faster than call-by-need in both time and space; it is the first such evidence, to our knowledge.

Secondly, we prove that skeletal call-by-need can be implemented efficiently, that is, with bi-linear overhead. This result is obtained by providing a new smooth presentation of ideas by Shivers and Wand for the reconstruction of skeletons, which is then smoothly plugged into the study of an abstract machine following the distillation technique by Accattoli et al.

Keywords and phrases:
λ-calculus, abstract machines, call-by-need, cost models
Funding:
Claudio Sacerdoti Coen: Supported by the INdAM-GNCS project “Modelli Composizionali per l’Analisi di Sistemi Reversibili Distribuiti” and by the Cost Action CA2011 EuroProofNet.
Copyright and License:
[Uncaptioned image] © Beniamino Accattoli, Francesco Magliocca, Loïc Peyrot, and Claudio Sacerdoti Coen; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Lambda calculus
Related Version:
Full Version: https://arxiv.org/abs/2505.09242 [18]
Supplementary Material:
Software  (Source Code): https://github.com/Franciman/fullylazymad
  archived at Software Heritage Logo swh:1:dir:50ff71d3e33d6febc5f53b8a6887aac6ef0be90f
Editors:
Maribel Fernández

1 Introduction

Call-by-need evaluation of the λ-calculus was introduced by Wadsworth in 1971 as an optimization of the untyped λ-calculus [41]. It combines the advantages of both call-by-name and call-by-value: it avoids diverging on unused arguments, as in call-by-name, and when arguments are evaluated they do so only once, as in call-by-value. Such a combination is trickier to specify than call-by-name or call-by-value. In particular, Wadsworth’s original presentation via graph reduction was not easy to manage.

In the 1990s, call-by-need was adopted by the then new Haskell language, and more manageable term-based calculi were developed by Launchbury [35] and Ariola et al. [21], and extended with non-determinism by Kutzner and Schmidt-Schauß [34]. Additionally, Sestoft studied abstract machines for call-by-need [38]. In 2014, Accattoli et al. introduced the CbNeed linear substitution calculus (CbNeed LSC) [4], a simplification of Ariola et al.’s calculus resting on the at a distance approach to explicit substitutions by Accattoli and Kesner [14]. It became a reference presentation, used in many recent studies [31, 24, 5, 33, 26, 13, 32, 25, 17, 16]. As far as this paper is concerned, the study in [4] is particularly relevant: beyond introducing the CbNeed LSC that we shall use, it also relates it with abstract machines for CbNeed, building a neat bisimulation between the two, deemed distillation.

Skeletal Call-by-Need.

CbNeed was introduced together with a further optimization usually called fully lazy sharing and here rather referred to as Skeletal CbNeed (because fully lazy sharing is not really descriptive). The idea is to refine the duplication of a value v so as to duplicate only the skeleton of v while keeping its flesh shared, thus ending up sharing more than in ordinary call-by-need. As an example, if v=λx.λy.zzx(yz) then the skeleton of v is 𝚜𝚔𝚎𝚕(v):=λx.λy.wx(yz) and its flesh is the context 𝚏𝚕𝚎𝚜𝚑(v):=𝗅𝖾𝗍w=zz𝗂𝗇. That is, the flesh collects and removes the maximal (non-variable) sub-terms of v that are free, i.e. such that none of their free variables is captured in v.

Skeletal CbNeed is less studied than CbNeed because it is even trickier to define, as it requires to compute the skeletal decomposition of a value (that is, the split into skeleton and flesh). Early works by Turner [40], Hughes [30], and Peyton Jones [36] focus on implementative aspects. In the last ten years or so, various works provided operational insights. Balabonski developed an impressive theoretical analysis of Skeletal CbNeed, showing the equivalence of various presentations [22] and studying the relationship with Lévy’s optimality [23]. Soon after that, Gundersen et al. showed that the atomic λ-calculus, a λ-calculus with sharing issued from deep inference technology, naturally specifies the duplication of the skeleton [29]. Next, Kesner et al. [32] merged the insights of the CbNeed LSC and the atomic λ-calculus obtaining a Skeletal CbNeed LSC, that is, a presentation at a distance of Skeletal CbNeed.

This Paper.

The present work extends the operational research line of Balabonski, Gundersen et al., and Kesner et al. with cost analyses of Kesner et al.’s Skeletal CbNeed LSC, aiming at closing the gap with some of the earlier implementative studies at the same time. Generally speaking, we aim at adding Skeletal CbNeed to the list of concepts covered by the theory of cost-based analyses of abstract machines and sharing mechanisms by Accattoli and co-authors [4, 10, 19, 5, 7, 27, 12, 11, 8, 15]. We provide two analyses.

First Analysis: Exponential Time and Space Speed-Ups.

The first analysis shows that in some cases Skeletal CbNeed is considerably more efficient than CbNeed. For that, we exhibit a family of λ-terms {tn}n such that tn evaluates in Ω(2n) time and space in CbNeed, while requiring only 𝒪(n) space and time in Skeletal CbNeed. The literature only shows examples of single λ-terms where Skeletal CbNeed takes a few steps less than CbNeed, but never proves any asymptotic speed-up nor deals with space. Lastly, proving that our family evaluates within the given bounds requires a fine, non-trivial analysis of evaluation sequences.

Second Analysis: Bi-Linear Overhead.

The second analysis shows that Skeletal CbNeed can be implemented efficiently, with an overhead that is bi-linear, i.e. linear in both the number of β-steps and the size of the initial term. This time, the same is true for CbNeed. The insight is that computing skeletons does require additional implementative efforts, but it comes at no asymptotic price. This work was indeed triggered by showing that the operational reconstruction of the skeleton at work in Gundersen et al. [29] and Kesner et al. [32] can be implemented in linear time. Their specifications rest on side conditions about variables in sub-terms that lead to non-linear overhead, if implemented literally. In fact, the linear time reconstruction of skeletons by Shivers and Wand [39], from 2010, avoids the side conditions and pre-dates the recent works. The insightful study in [39], however, is technical and not well-known. Here we simplify it, recasting it in the context of abstract machines.

Efficient Skeletal Decompositions, or Shivers and Wand Reloaded.

Shivers and Wand present their ideas using graph-reduction for λ-terms. They reconstruct the skeletal decomposition via a visit of the value to skeletonize seen as a graph, based on two key points:

  1. 1.

    Bi-directional edges: all the edges of their graphs can be traversed in both directions. When term graphs are seen as mathematical objects, edges are often directed but the theoretical study can also look at edges in reverse. At the implementative level, however, things are different. For a parsimonious use of space, most graph-based implementations (e.g. the proof nets inspired ones in [5, 8, 20]) indeed restrict some edges to be traversed in one direction only (namely, parent to children), thus sparing some pointers. Therefore, bi-directional edges are an unusual approach, usually considered redundant.

  2. 2.

    Abstractions, occurrences, and upward reconstruction: in most graphical representations, there are edges between abstractions and the occurrences of their bound variables, usually directed only from the occurrences to the abstraction. Shivers and Wand’s insight is that, with bi-directional edges, one can move from the abstraction back to the occurrences of the variable and then reconstruct the skeleton by visiting upward from the occurrences.

Shivers and Wand’s study is graph-based and low-level, interleaved with code snippets. Here, we provide a new neat presentation of their algorithm as a simple high-level rewriting system over terms, circumventing graphs and low-level details. The reformulation of their study is the main technical innovation of this paper. While the key concepts are due to Shivers and Wand, we believe that our new presentation is a notable contribution in that it allows their concepts to blossom, simplifying their study and making them more widely accessible.

Distillation.

We then smoothly lift the distillation-based study of CbNeed by Accattoli et al. [4] to Skeletal CbNeed, providing an abstract machines using the reloaded algorithm for skeletal decompositions as a black-box. The new abstract machine is then shown to be implementable within a bi-linear overhead, completing the second analysis.

The overall insight is that the apparently linear space inefficiency of adding pointers for bi-directional traversals of terms/graphs enables optimizations – namely, skeletal duplications – that can bring, in some cases, exponential speed-ups for both time and space.

Implementation.

We provide an OCaml implementation of the abstract machine, and in particular of the reloaded skeleton reconstruction algorithm it rests upon. The implementation is there to validate the complexity analyses, as well as to integrate the low-level aspect of Shivers and Wand’s study. The implementation is discussed in Appendix A.

Proofs.

A long version with proofs in the Appendix is on arXiv [18].

2 Background: Call-by-Need

In this section, we recall CbNeed, presented as a strategy of the CbNeed linear substitution calculus (shortened to LSC). For the sake of simplicity, we do not distinguish here between the strategy and the calculus, and use CbNeed LSC to refer to both.

Accattoli and Kesner’s LSC [1, 6] is a micro-step λ-calculus with explicit substitutions. Micro-step means that substitutions act on one variable occurrence at a time, rather than small-step, that is, on all occurrences at the same time. The CbNeed LSC was introduced by Accattoli et al. [4] as a simplified variant of Ariola et al.’s presentation of CbNeed [21]. The simplification is the use of rewriting rules at a distance – that is, adopting contexts in the root rules – that allow one to get rid of the commuting rewriting rules of Ariola et al.

TermsΛ𝚎𝚜t,u::=xvtut[xu]Valuesv,v::=λx.t with tΛ Substitutions ctxs𝒮S,S::=S[xu]CbNeed Evaluation ctxsE,E::=EuE[xu]Ex[xE] Root rules Distant βSλx.tu𝚍𝙱St[xu]CbNeed linear subst.Ex[xSv]𝚗𝚍𝚕𝚜SEv[xv] Rewriting rules𝚍𝙱::=𝚍𝙱𝚗𝚍𝚕𝚜::=𝚗𝚍𝚕𝚜 Notation𝚗𝚎𝚎𝚍::=𝚍𝙱𝚗𝚍𝚕𝚜

Figure 1: The Call-by-Need Linear Substitution Calculus (CbNeed LSC).

Terms and Contexts.

The set of terms of the λ-calculus is noted Λ. The CbNeed LSC is defined in Fig. 1. The set of terms of the CbNeed LSC is noted Λ𝚎𝚜. They add explicit substitutions t[xu] (shortened to ESs) to λ-terms, that is a more compact notation for 𝗅𝖾𝗍x=u𝗂𝗇t, but where the order of evaluation between t and u is a priori not fixed; evaluation contexts shall fix it. The set 𝚏𝚟(t) of free variables is defined as expected, in particular, 𝚏𝚟(t[xu]):=(𝚏𝚟(t){x})𝚏𝚟(u). Both λx.t and t[xu] bind x in t, and terms are identified up to α-renaming. A term t is closed if 𝚏𝚟(t)=, open otherwise. Meta-level capture-avoiding substitution is noted t{xu}. The size |t| of t is the number of its constructs.

Note that values are only abstractions; this choice is standard in the literature on CbNeed. Note that, moreover, their bodies are λ-terms without ESs. This is in order to avoid dealing with calculating the skeleton of ESs in the next section. It is a harmless choice because evaluation (that creates ESs) never enters inside abstractions.

Contexts are terms with exactly one occurrence of the hole , an additional constant, standing for a removed sub-term. We shall heavily use two notions of contexts: substitution contexts S, that are lists of ESs, and evaluation contexts E.

The main operation about contexts is plugging Et where the hole in context E is replaced by t. Plugging, as usual with contexts, can capture variables; for instance ((t)[xu])x=(xt)[xu]. We write Et when we want to stress that the context E does not capture the free variables of t.

Rewriting Rules.

The reduction rules of the CbNeed LSC are slightly unusual as they use contexts both to allow one to reduce redexes located in sub-terms (via evaluation contexts E), which is standard, and to define the redexes themselves (via both substitution S and evaluation contexts E), which is less standard. This approach is called at a distance and related to cut-elimination on proof nets; see Accattoli [2, 3] for the link with proof nets. The notion of evaluation context E used for CbNeed is exactly the one by Ariola et al. [21], that extend call-by-name evaluation contexts with the production Ex[xE], which – by enterings ESs – is what enables the memoization aspect of CbNeed.

The distant beta rule 𝚍𝙱 is essentially the β-rule, except that the argument goes into a new ES, rather then being immediately substituted, and that there can be a substitution context S in between the abstraction and the argument. Example: (λx.y)[yt]u𝚖y[xu][yt]. One with on-the-fly α-renaming is (λx.y)[yt]y𝚖z[xy][zt].

The linear substitution rule by need 𝚗𝚍𝚕𝚜 replaces a single variable occurrence by a copy of the value in the ES, commuting the substitution context around the value (if any) out of the ES. Example: (xx)[x𝙸[yt]]𝚗𝚍𝚕𝚜(𝙸x)[x𝙸][yt].

The use of substitution contexts S in the root rules is what allows one to avoid the two commuting rules of Ariola et al. [21], namely (rephrasing their 𝗅𝖾𝗍-expressions as ESs) t[xs]ulet-C(tu)[xs] and t[yu[xs]]let-At[yu][xs].

The two root rules 𝚍𝙱 and 𝚗𝚍𝚕𝚜 are then closed by evaluation contexts E. In Fig. 1, we use the compact notation 𝚍𝙱:=𝚍𝙱 to denote that 𝚍𝙱 is defined as Et𝚍𝙱Eu if t𝚍𝙱u, and similarly for the other rule.

3 Skeletal Call-by-Need

In this section, we present our version of Skeletal CbNeed, which is a very minor variant of the one by Kesner et al. [32] (the difference is discussed at the end of the section), itself defined by tweaking the CbNeed LSC via the notion of skeleton.

The basic idea of skeletal CbNeed is that every value v:=λx.t can be split in two: the skeleton λx.u, which is a sort of sub-term of t, and the flesh S, which is a substitution context collecting the maximal sub-terms of v that do not depend on x. The flesh is extracted from v before duplicating it, as to avoid duplicating the code (and possibly redexes) of the flesh, thus increasing sharing. Some definitions are in order.

Free Sub-Terms.

Extractable sub-terms are called free sub-terms, defined next. We actually need a parametrized notion of free sub-term, which is relative to a set of variables 𝚅 that are supposed not to occur in the sub-term.

Definition 1 (Maximally free sub-terms).

Let tΛ and 𝚅 be a set of variables. A sub-term u of t is (the term part of) a decomposition t=Cu for some context C; additionally, u is 𝚅-free in t if no variable in 𝚏𝚟(u) is captured by C and if 𝚏𝚟(u)𝚅=, and maximally 𝚅-free if it is not a strict sub-term of any other 𝚅-free sub-term in t.

Skeleton.

The skeletal decomposition of a term t is the splitting of t into its maximal free sub-terms on one side, collected as a substitution context S called the flesh, and what is left of t after the removal, that is, its skeleton. As for free sub-terms, we rather define parametrized notions of skeletal decomposition and skeleton. Then, we specialize the definitions for values.

Definition 2 ((Relative) Skeleton).

Let 𝚅 be a set of variables. The skeletal decomposition of t relative to a set of variables 𝚅 is the pair (u,S) of an ordinary λ-term u and a substitution context S defined as 𝚜𝚔𝚍𝚎𝚌𝚅(t):=(x,[xt]) with x fresh, if 𝚏𝚟(t)𝚅= and t is not a variable, otherwise:

𝚜𝚔𝚍𝚎𝚌𝚅(x):=(x,)𝚜𝚔𝚍𝚎𝚌𝚅(λx.u):=(λx.s,S)where 𝚜𝚔𝚍𝚎𝚌𝚅{x}(u)=(s,S)𝚜𝚔𝚍𝚎𝚌𝚅(us):=(rp,SS)where 𝚜𝚔𝚍𝚎𝚌𝚅{x}(u)=(r,S) and 𝚜𝚔𝚍𝚎𝚌𝚅{x}(s)=(p,S)

The skeleton, the flesh, and the skeletal decomposition of λx.t are defined respectively as 𝚜𝚔𝚎𝚕(λx.t):=λx.u, 𝚏𝚕𝚎𝚜𝚑(λx.t):=S, and 𝚜𝚔𝚍𝚎𝚌(λx.t):=(𝚜𝚔𝚎𝚕(λx.t),𝚏𝚕𝚎𝚜𝚑(λx.t)), where 𝚜𝚔𝚍𝚎𝚌{x}(t)=(u,S).

TermsΛ𝚜𝚔t,u::=xvtut[xu]txv with v=𝚜𝚔𝚎𝚕(v)Valuesv,v::=λx.t with tΛSubstitutions ctxs𝒮𝚜𝚔S,S::=S[xu]SxvSkeletal eval. ctxs𝚜𝚔E,E::=EuE[xu]ExvEx[xE] Root rules Distant βSλx.tu𝚍𝙱St[xu]SkeletonizationEx[xSv]𝚜𝚔SSExxv with 𝚜𝚔𝚍𝚎𝚌(v)=(v,S)Skeletal subst.Exxv𝚜𝚜Evxv Rewriting rules𝚜𝚔𝚍𝙱::=𝚜𝚔𝚍𝙱𝚜𝚔::=𝚜𝚔𝚜𝚔𝚜𝚜::=𝚜𝚔𝚜𝚜 Notations𝚜𝚔𝚗𝚎𝚎𝚍::=𝚜𝚔𝚍𝙱𝚜𝚔𝚜𝚜[[xt]] stands for either [xt] or xv

Figure 2: The skeletal CbNeed strategy 𝚜𝚔𝚗𝚎𝚎𝚍.

Skeletal CbNeed.

We adopt the presentation of skeletal CbNeed by Kesner et al. [32] (refered to as fully lazy CbNeed by them), that adds a new skeletal (explicit) substitution txv containing skeletal values, that is, values that are equal to their skeleton. The language of terms and the strategy is defined in Fig. 2. The new set of terms is noted Λ𝚜𝚔.

Notation: when we need to treat explicit and skeletal substitution together, we use the notation [[xt]], which stands for either [xt] or (with a slight abuse) xv.

The CbNeed substitution rule 𝚗𝚍𝚕𝚜 for a redex Ex[xSv] is refined via two rules. The first skeletonization rule 𝚜𝚔 does three further mini tasks at once: firstly, it decomposes v in its skeleton v and its flesh S (thus obtaining Ex[xSSv]); secondly, it re-organizes the term commuting S and S out of the ES (obtaining SSEx[xv]); thirdly, it turns the ES into a skeletal substitutions, finally producing SSExxv.

The second skeletal substitution rule 𝚜𝚜 simply replaces x with v.

Note that evaluation contexts E are adapted to include skeletal substitutions, but that they do not enter inside them, since their content is always a value. The distant β rule for Skeletal CbNeed is noted 𝚜𝚔𝚍𝙱 to distinguish it from the one for CbNeed. The Skeletal CbNeed reduction 𝚜𝚔𝚗𝚎𝚎𝚍 is the union of 𝚜𝚔𝚍𝙱, 𝚜𝚔, and 𝚜𝚜.

The next proposition is an easy adaptation of the one for CbNeed in Accattoli et al. [4].

Proposition 3.

The reduction 𝚜𝚔𝚗𝚎𝚎𝚍 is deterministic.

Structural Equivalence.

To study the relationship with abstract machines, we shall need the concept of structural equivalence, defined in Fig. 3. The concept is standard for λ-calculi with ESs at a distance, and it also standard to use it in relationship with abstract machines, as systematically done by Accattoli et al. [4]. Intuitively, structural equivalence allows one to move explicit/skeletal substitutions around because the move does not change the behaviour of the term. This fact is captured by the following strong bisimulation property, proved by an immediate adaptation of the similar property in [4].

Root structural equalities Expl. subst.Et[xs]1Et[xs]if 𝚍𝚘𝚖(E)𝚏𝚟(t[xs])= and x𝚏𝚟(E)Skel. subst.Etxv2Etxvif 𝚍𝚘𝚖(E)𝚏𝚟(txv)= and x𝚏𝚟(E) Closure rules t1u   tu t2u   tu     ref tt tu  sym ut tu     us  tr     ts    tu  ev EtEu

Figure 3: Structural equivalence .
Proposition 4 ( is a strong bisimulation).

Let a{𝚍𝙱,𝚜𝚔,𝚜𝚜}. If tu and tat then there exists u such that uau and tu.

Difference with Kesner et al’s Skeletal CbNeed [32].

There is a small difference between our definition of Skeletal CbNeed and Kesner et al.’s, namely in the definition of skeletons. For us, the skeletal decomposition of, say, v:=λx.yx(zz) is (λx.yxw,[wzz]), while for them it is (λx.yxw,[yy][wzz]), that is, they flesh-out also single variable occurrences while we do not, obtaining a slightly more parsimonious decomposition.

This difference, however, has essentially no impact on the theory, nor on the asymptotic costs. In particular, the operational equivalence of Skeletal CbNeed with respect to CbNeed and CbN is preserved. The proof of Kesner et al., indeed, works for all decompositions such that substituting the flesh into the skeleton recovers the original value; our decomposition has this property. Therefore, we can import the following result.

Theorem 5 (Operational equivalence [32]).

Two λ-terms t and u are CbN observational equivalent if and only if they are Skeletal CbNeed observational equivalent.

4 Skeletal CbNeed Can Bring an Exponential Asymptotic Speed-Up

In this section, we show that Skeletal CbNeed can be both exponentially faster and use exponentially less space than CbNeed, taking as reference cost models (following the terminology of Accattoli et al. [15]):

  • Abstract time, that is the number of (distant) β-steps;

  • Ink space, that is, the maximum size of the involved terms.

By smoothly adapting results in the literature [37, 9, 28], these can be proved to be reasonable cost models for time for CbNeed and for (super-)linear space for both CbNeed and Skeletal CbNeed. In fact, the relevant part of proving that abstract time is reasonable for Skeletal CbNeed shall be our own Theorem 31 at the end of the paper.

We prove the result by defining a family of terms {tn}n and showing that tn evaluates in an exponential number of 𝚍𝙱 steps and producing terms of exponential size in CbNeed, while it uses only a linear number of 𝚜𝚔𝚍𝙱 steps and terms of linear size in Skeletal CbNeed.

The Family.

We shall use the abbreviations 𝙸:=λx.x and γ:=λy.λz.y𝙸(y𝙸)z. Next, we define an auxiliary family {un}n as follows: u0:=𝙸 and un+1:=γun. Lastly, the actual family is defined as tn:=(λx.x𝙸(x𝙸))un.

Our result is that the term tn evaluates in 𝒪(n) 𝚜𝚔𝚍𝙱-steps (namely 6n+4 steps) with skeletal CbNeed and Ω(2n) 𝚍𝙱-steps (namely 8×2n+n4 steps) with CbNeed. This is shown via quite technical statements. Firstly, note that tn𝚍𝙱(x𝙸(x𝙸))[xun]; it is actually for (x𝙸(x𝙸))[xun] that we shall prove bounds. These bounds are expressed stating that there are families of evaluation contexts Sn𝚗𝚍 and Sn𝚜𝚔 (one for CbNeed and one for Skeletal CbNeed) such that (x𝙸(x𝙸))[xun] reduces to Sn𝚗𝚍𝙸 and Sn𝚜𝚔𝙸 (which is a normal form in both cases) in 6n+3 and 8×2n+n3 steps respectively. The results about space are obtained by showing that Sn𝚗𝚍 has size exponential in n, while Sn𝚜𝚔 is linear in n.

Evaluation in Skeletal CbNeed.

The skeletal case is, perhaps surprisingly, the easiest case for which one can find an inductive invariant leading to the bound.

Lemma 6.

Let n and sn:=(x𝙸(x𝙸))[xun]. Then there is a substitution context Sn𝚜𝚔 and an evaluation 𝚎n:sn𝚜𝚔𝚗𝚎𝚎𝚍Sn𝚜𝚔𝙸 such that |𝚎n|𝚜𝚔𝚍𝙱=6n+3 and |Sn𝚜𝚔|=𝒪(n).

Evaluation in CbNeed.

For the non-skeletal case, the bound is formulated via two families of substitution contexts. For n, the first family is given by S0:=[x0𝙸] and Sn+1:=Sn[xn+1sn+1] with sn+1:=λyn.xn𝙸(xn𝙸)yn. The second family Sn𝚗𝚍 is defined in the proof of the lemma, itself relying on an auxiliary lemma in the technical report [18]. The statement is parametrized by an evaluation context E for the induction to go through, but the relevant case is the one with E empty.

Lemma 7.

Let n and E be an evaluation context. Then there exist a substitution context Sn𝚗𝚍 and an evaluation 𝚎n:Exn𝙸(xn𝙸)[xnun]𝚗𝚎𝚎𝚍SnESn𝚗𝚍𝙸 such that |𝚎n|𝚍𝙱=8×2n+n5 and |Sn𝚗𝚍|=Ω(2n).

The next statement sums up the results, and is based on the result at the end of the paper stating that Skeletal CbNeed can be implemented in bilinear time, that is, linear in the size |tn| of the initial term and in the number of 𝚜𝚔𝚍𝙱 steps (Theorem 31).

Theorem 8 (Instance of exponential skeletal speed-up for both time and space).

There is a family {tn}n of λ-terms such that tn normalizes in 𝒪(n) abstract time and 𝒪(n) ink space with Skeletal CbNeed and Ω(2n) abstract time and Ω(2n) ink space with CbNeed.

Proof.

  • For CbNeed, by Lemma 7 there is an evaluation sequence:

    tn=(λx.x𝙸(x𝙸))un𝚍𝙱(x𝙸(x𝙸))[xun]𝚗𝚎𝚎𝚍SnSn𝚗𝚍𝙸

    taking kn:=8×2n+n4=Ω(2n) 𝚍𝙱-steps and such that |Sn𝚗𝚍|=Ω(2n).

  • For Skeletal CbNeed, by Lemma 6 there is an evaluation sequence:

    tn=(λx.x𝙸(x𝙸))un𝚜𝚔𝚍𝙱(x𝙸(x𝙸))[xun]𝚜𝚔𝚗𝚎𝚎𝚍Sn𝚜𝚔𝙸

    taking kn:=6n+4=𝒪(n) 𝚜𝚔𝚍𝙱-steps and such that |Sn𝚜𝚔|=𝒪(n). Note that the calculus has no garbage collection rule, so the size of terms diminishes only by 𝚜𝚔𝚍𝙱 steps, and of a constant amount (a 𝚜𝚔𝚍𝙱 step removes two constructs and adds one). Therefore, the maximum size of terms is bounded by the size of the final term plus the number of steps. That is, the space cost is 𝒪(n).

5 A Rewriting-Based Algorithm for Computing the Skeleton

In this section, we first define marked terms and a marked reformulation of skeletal decompositions. Then, we use the marked setting to give a graphically-inspired rewriting system computing the skeletal decomposition of a value, giving a neat new presentation of ideas first developed by Shivers and Wand [39]. The rewriting system formalizes an algorithm. It is presented as a rewriting system as to smoothly manage it as an operational semantics notion.

Marked termsΛm,n::=xxλx.mλx.mmnmnm¯Marked contextsM::=λx.Mλx.MMnmMMnmMM¯

Figure 4: Marked terms and contexts.

Marked Skeleton.

Marked terms and contexts are defined in Fig. 4. The idea is to incrementally mark with a tag the constructs belonging to the skeleton during a visit of the term which is allowed to jump from an abstraction to the occurrences of the variables, since these are usually connected in a graphical representation.

Constructs of the λ calculus are then either the usual ones (referred to as unmarked) or marked with , which means that the constructor is part of the skeleton. Additionally, there is an extra up construct t¯ that shall be used to indicate the frontier of the visit of the term.

In the marked setting, we can recast skeletal decompositions by marking as white the constructs belonging to the skeleton and leaving unmarked all the constructs of the flesh.

Definition 9 (Marked skeleton).

Let 𝚅 be a set of variables. The marked skeleton of t relative to 𝚅 is defined as 𝚖𝚜𝚔𝚎𝚕𝚅(t):=t if 𝚏𝚟(t)𝚅=, otherwise:

𝚖𝚜𝚔𝚎𝚕𝚅(x):=x𝚖𝚜𝚔𝚎𝚕𝚅(tu):=𝚖𝚜𝚔𝚎𝚕𝚅(m)𝚖𝚜𝚔𝚎𝚕𝚅(n)𝚖𝚜𝚔𝚎𝚕𝚅(λx.t):=λx.𝚖𝚜𝚔𝚎𝚕𝚅{x}(t)

The marked skeleton of λx.t is defined as 𝚖𝚜𝚔𝚎𝚕(λx.t):=λx.𝚖𝚜𝚔𝚎𝚕{x}(t).

Example: for v:=λx.λy.zzx(yz), one has 𝚖𝚜𝚔𝚎𝚕(v)=λx.λy.(zz)x(yz).

From the Marked Skeleton to the Skeletal Decomposition.

Next, we turn marked skeletons into skeletal decompositions, by extracting unmarked sub-terms and turning them into the flesh, while removing the marks from the skeleton. We shall do this via a splitting function, defined below, which operates on (parametrized) marked skeletons. For that, we give a lemma guaranteeing that the unmarked sub-terms of marked skeletons can be recognized from their top constructor, that is in 𝒪(1), thus removing the need to inspect them.

Lemma 10 (Being unmarked is downward closed in marked skeletons).

Let 𝚖𝚜𝚔𝚎𝚕𝚅(t)=Mm with m starting unmarked. Then m has no marks.

The lemma implies the correctness of the first clause of the following definition, as it ensures that in that case m has no marks.

Definition 11 (Splitting).

Let m=𝚖𝚜𝚔𝚎𝚕𝚅(t) for some t and 𝚅. The following splitting function separates the skeleton from the flesh and removes all marks:

𝚜𝚙𝚕𝚒𝚝(m):=(z,[zm])if m starts unmarked, is not a variable, and z is fresh;𝚜𝚙𝚕𝚒𝚝(x):=(x,);𝚜𝚙𝚕𝚒𝚝(x):=(x,);𝚜𝚙𝚕𝚒𝚝(λx.n):=(λx.u,S)with 𝚜𝚙𝚕𝚒𝚝(n)=(u,S);𝚜𝚙𝚕𝚒𝚝(m1m2):=(t1t2,SS)with 𝚜𝚙𝚕𝚒𝚝(m1)=(t1,S) and 𝚜𝚙𝚕𝚒𝚝(m2)=(t2,S).

Example: 𝚜𝚙𝚕𝚒𝚝(λx.λy.(zz)x(yz))=(λx.λy.wx(yz),[wzz]).

We can now prove that the marked notions allow one to retrieve the standard ones.

Proposition 12 (Soundness of the marked skeleton).

Let tΛ and 𝚅 be a set of variables.

  1. 1.

    Auxiliary parametrized statement: 𝚜𝚙𝚕𝚒𝚝(𝚖𝚜𝚔𝚎𝚕𝚅(t))=𝚜𝚔𝚍𝚎𝚌𝚅(t);

  2. 2.

    Soundness: 𝚜𝚔𝚍𝚎𝚌(λx.t)=𝚜𝚙𝚕𝚒𝚝(𝚖𝚜𝚔𝚎𝚕(λx.t)).

Computing the Marked Skeleton.

We shall now define a parallel algorithm for computing the marked skeleton via a rewriting system on marked terms. The definition is in Fig. 5. The algorithm shall be applied to abstractions, say to λx.t. The algorithm is described by inserting the up symbols , denoting where the algorithm is operating, and propagating through the term. At the beginning, there is exactly one , as the algorithm starts on λx.t¯.

The algorithm has two sets of rules. Propagation rules turn the encountered unmarked constructs into marked ones. The interesting rule is 𝚙𝚛3 for abstractions, that is also the very first one to be applied on the initial term λx.t¯. Its effect is that λx is marked as λx and all the occurrences of x in t are also marked, obtaining λx.t{xx}; this is how we recast the role played by Shivers and Wand’s bi-directional edges connecting abstractions and variables. Essentially, propagation rules move upwards but they also add at the bottom of the syntax tree on the variable occurrences associated to marked abstractions.

Absorption rules, instead, remove the symbol when it encounters a marked construct, as it means that the concerned thread of the algorithm is passing where another thread already passed before, thus the concerned thread becomes redundant and can be terminated.

Intuitively, the set of -sub-terms is the frontier of where the parallel algorithm is operating. When the algorithm is over, no internal node is marked with .

The algorithm is parallel in that the initial substitution t{xx} and rule 𝚙𝚛3 might introduce many occurrences of , which can be propagated independently.

Propagating rulesm¯n𝚙𝚛1mn¯mn¯𝚙𝚛2mn¯λx.m¯𝚙𝚛3λx.m{xx¯}¯Absorbing rulesm¯n𝚊𝚋1mnmn¯𝚊𝚋2mnλx.m¯𝚊𝚋3λx.m Initialization 𝗂𝗇𝗂𝗍(λx.t):=λx.t¯ Contextual closures a:=a for a{𝚙𝚛1,𝚙𝚛2,𝚙𝚛3,𝚊𝚋1,𝚊𝚋2,𝚊𝚋3} Algorithm𝚊𝚕𝚐:=𝚙𝚛1𝚙𝚛2𝚙𝚛3𝚊𝚋1𝚊𝚋2𝚊𝚋3

Figure 5: Rewriting rules for marked terms.

As an example, consider v:=λx.λy.zzx(yz), for which one has, for instance:

𝗂𝗇𝗂𝗍(v)=λx.λy.zzx(yz)¯𝚙𝚛3λx.λy.zzx¯(yz)¯𝚙𝚛2λx.λy.(zz)x¯(yz)¯𝚙𝚛1λx.λy.(zz)x(yz)¯¯𝚙𝚛3λx.λy.(zz)x(y¯z)¯¯𝚙𝚛1λx.λy.(zz)xyz¯¯¯𝚊𝚋2𝚊𝚋3λx.λy.(zz)x(yz)¯=𝚖𝚜𝚔𝚎𝚕(v)¯
Proposition 13.

Reduction 𝚊𝚕𝚐 is strongly normalizing and diamond (thus confluent).

Proof.

  1. 1.

    Strong normalization. Let m be a marked term. As terminating measure, consider (|m|¬,|m|) where |m|¬ and |m| are the number of unmarked and constructs in m. Note that the propagation rules decrease |m|¬ while the absorption rules decrease |m| and leave |m|¬ unchanged. Then 𝚊𝚕𝚐 is strongly normalizing.

  2. 2.

    Diamond. We prove the diamond property, which implies confluence. Clearly, redexes cannot duplicate or erase each other. There are only two critical pairs, namely:

To state the correctness of the algorithm and give a bound on the number of its steps, we need two definitions.

Definition 14.

Let m be a marked term.

  • Starts with /unmarked/: m starts with (resp. unmarked, resp. with ) if it has shape x, λx.n, or no (resp. x, λx.n, or no, resp. m¯).

  • White size: the white size |m| of m is the number of constructs in m.

The following lemma is the key step in the proof of correctness of the algorithm. It is a technical lemma, the statement of which is more easily understood by looking at the proof of the following theorem.

Lemma 15 (Crucial).

Let t be a term, 𝚅 a set of variables, x𝚏𝚟(t)𝚅, m:=𝚖𝚜𝚔𝚎𝚕𝚅(t), and n:=𝚖𝚜𝚔𝚎𝚕𝚅{x}(t). Then:

m{xx¯}𝚊𝚕𝚐k{n¯with k=|n||m|1, if m starts unmarked;nwith k=|n||m|, if m starts with .
Theorem 16 (Soundness of the algorithm).

Let t be a term. Then 𝗂𝗇𝗂𝗍(λx.t)𝚊𝚕𝚐n𝚖𝚜𝚔𝚎𝚕(λx.t)¯ with n=|𝚖𝚜𝚔𝚎𝚕(λx.t)|.

Proof.

By definition 𝗂𝗇𝗂𝗍(λx.t)=λx.t¯𝚙𝚛3λx.t{xx¯}¯=λx.𝚖𝚜𝚔𝚎𝚕(t){xx¯}¯, while 𝚖𝚜𝚔𝚎𝚕(λx.t)=λx.𝚖𝚜𝚔𝚎𝚕{x}(t). We consider two cases.

  • Case x𝚏𝚟(t). Then 𝚖𝚜𝚔𝚎𝚕{x}(t)=t and t{xx¯}=t, so λx.t{xx¯}=λx.t=λx.𝚖𝚜𝚔𝚎𝚕{x}(t)=𝚖𝚜𝚔𝚎𝚕(λx.t). Thus, 𝗂𝗇𝗂𝗍(λx.t)𝚙𝚛3𝚖𝚜𝚔𝚎𝚕(λx.t)¯. Moreover, n=1=|𝚖𝚜𝚔𝚎𝚕(λx.t)|, since the only white construct of 𝚖𝚜𝚔𝚎𝚕(λx.t) is the root abstraction.

  • Case x𝚏𝚟(t). By Lemma 15, λx.t{xx¯}𝚊𝚕𝚐kλx.𝚖𝚜𝚔𝚎𝚕{x}(t)¯, where k=|𝚖𝚜𝚔𝚎𝚕{x}(t)||t|1=|𝚖𝚜𝚔𝚎𝚕{x}(t)|1. Therefore:

    𝗂𝗇𝗂𝗍(λx.t)𝚙𝚛3λx.t{xx¯}¯𝚊𝚕𝚐kλx.𝚖𝚜𝚔𝚎𝚕{x}(t)¯¯𝚊𝚋3λx.𝚖𝚜𝚔𝚎𝚕{x}(t)¯=𝚖𝚜𝚔𝚎𝚕(λx.t)¯.

    Now, n=k+2=|𝚖𝚜𝚔𝚎𝚕{x}(t)|1+2=|𝚖𝚜𝚔𝚎𝚕{x}(t)|+1=|𝚖𝚜𝚔𝚎𝚕(λx.t)|.

5.1 Complexity Analysis

We proved that the number of rewriting steps of the algorithm for computing the marked skeleton 𝚖𝚜𝚔𝚎𝚕(v) is exactly its marked size |𝚖𝚜𝚔𝚎𝚕(λx.t)| (Theorem 16), which is the size of the skeleton (Prop. 12). The complexity of the algorithm is indeed linear but it requires some discussion, since 𝚙𝚛3 steps are not constant time operations. Moreover, we need to take into account the cost of splitting.

Complexity Analysis 1: Propagation + Absorption.

For computing 𝚖𝚜𝚔𝚎𝚕(v) from v, we assume that abstractions have pointers to the occurrences of the variable, so that the substitution m{xx¯} in 𝚙𝚛3 does not require going through the whole of m. This is Shivers and Wand’s key idea, and also how our implementation works (see Appendix A). A global argument gives the cost: the algorithm turns |𝚖𝚜𝚔𝚎𝚕(v)| unmarked constructs into -constructs exactly once, and it generates at most |𝚖𝚜𝚔𝚎𝚕(v)| -constructs that are never duplicated and are absorbed exactly once, thus it globally requires 𝒪(|𝚖𝚜𝚔𝚎𝚕(v)|) time.

Lemma 17 (Cost of propagation + absorption).

Computing 𝚖𝚜𝚔𝚎𝚕(v) from v requires time 𝒪(|𝚖𝚜𝚔𝚎𝚕(v)|).

Complexity Analysis 2: Splitting.

For the cost of splitting, Lemma 10 guarantees that the unmarked sub-terms of marked skeletons can be recognized in 𝒪(1) by only inspecting their topmost constructor, so that implementing the splitting function takes time proportional to the number of marked constructors only.

Lemma 18 (Cost of splitting).

Computing 𝚜𝚙𝚕𝚒𝚝(𝚖𝚜𝚔𝚎𝚕(v)) from 𝚖𝚜𝚔𝚎𝚕(v) requires time 𝒪(|𝚖𝚜𝚔𝚎𝚕(v)|).

The next theorem sums it all up, recasting the obtained costs in the unmarked setting via the soundness of the algorithm (Prop. 12).

Theorem 19 (The algorithm is linear in the skeleton).

Let vΛ be a value. Then the skeletal decomposition (𝚜𝚔𝚎𝚕(v),𝚏𝚕𝚎𝚜𝚑(v)) of v can be computed in 𝒪(|𝚜𝚔𝚎𝚕(v)|).

6 Preliminaries about Abstract Machines

In this section, we introduce terminology and basic concepts about abstract machines, that shall be the topic of the following sections.

Abstract Machines Glossary.

Abstract machines manipulate pre-terms, that is, terms without implicit α-renaming. In this paper, an abstract machine is a quadruple 𝙼=(𝚂𝚝𝚊𝚝𝚎𝚜,,,¯) the components of which are as follows.

  • States. A state 𝚀𝚂𝚝𝚊𝚝𝚎𝚜 is a quadruple (𝙲,t,𝚂,𝙴) composed by the active term t, plus three data structure, namely the chain 𝙲, the (applicative) stack 𝚂, and the (global) environment 𝙴. Terms in states are actually pre-terms.

  • Transitions. The pair (𝚂𝚝𝚊𝚝𝚎𝚜,) is a transition system with transitions partitioned into principal transitions, whose union is noted 𝚙𝚛 and that are meant to correspond to steps on the calculus, and search transitions, whose union is noted 𝚜𝚎𝚊, that take care of searching for (principal) redexes.

  • Initialization. The component Λ×𝚂𝚝𝚊𝚝𝚎𝚜 is the initialization relation associating closed terms without ESs to initial states. It is a relation and not a function because t𝚀 maps a closed λ-term t (considered modulo α) to a state 𝚀 having a pre-term representant of t (which is not modulo α) as active term. Intuitively, any two states 𝚀 and 𝚀 such that t𝚀 and t𝚀 are α-equivalent. A state 𝚀 is reachable if it can be reached starting from an initial state, that is, if 𝚀𝚀 where t𝚀 for some t and 𝚀, shortened as t𝚀𝚀.

  • Read-back. The read-back function ¯:𝚂𝚝𝚊𝚝𝚎𝚜𝚃𝚎𝚛𝚖𝚜 turns reachable states into terms (possibly with ESs, that is, 𝚃𝚎𝚛𝚖𝚜=Λ𝚎𝚜 or 𝚃𝚎𝚛𝚖𝚜=Λ𝚜𝚔) and satisfies the initialization constraint: if t𝚀 then 𝚀¯=αt.

A state is final if no transitions apply. A run r:𝚀𝚀 is a possibly empty finite sequence of transitions, the length of which is noted |r|; note that the first and the last states of a run are not necessarily initial and final. If a and b are transitions labels (that is, a and b) then a,b:=ab and |r|a is the number of a transitions in r, |r|a,b:=|r|a+|r|b, and similarly for more than two labels.

For the machines at work in this paper, the pre-terms in initial states shall be well-bound, that is, they have pairwise distinct bound names; for instance (λx.x)λy.y is well-bound while (λx.x)λx.x is not. We shall also write tα in a state 𝚀 for a fresh well-bound renaming of t, i.e. tα is α-equivalent to t, well-bound, and its bound variables are fresh with respect to those in t and in the other components of 𝚀.

Mechanical Bismulations.

Machines are usually showed to be correct with respect to a strategy via some form of bisimulation relating terms and machine states. The notion that we adopt is here dubbed mechanical bisimulation, and it should not be confused with the strong bisimulation property of structural equivalence (which actually plays a role in mechanical bisimulations). The definition, tuned towards complexity analyses, requires a perfect match between the steps of the evaluation sequence and the principal transitions of the machine run. Terminology: a structural strategy (𝚜𝚝𝚛,) is a strategy 𝚜𝚝𝚛 plus a relation that is a strong bisimulation with respect to 𝚜𝚝𝚛 (see Prop. 4).

Definition 20 (Mechanical bisimulation).

A machine 𝙼=(𝚂𝚝𝚊𝚝𝚎𝚜,,,¯) and a structural strategy (𝚜𝚝𝚛,) on Λ𝚜𝚔-terms are mechanical bisimilar when, given an initial state t𝚀:

  1. 1.

    Runs to evaluations: for any run r:t𝚀𝚀 there exists an evaluation 𝚎:t𝚜𝚝𝚛𝚀¯;

  2. 2.

    Evaluations to runs: for every evaluation 𝚎:t𝚜𝚝𝚛u there exists a run r:t𝚀𝚀 such that 𝚀¯u;

  3. 3.

    Principal matching: for every principal transition 𝚊 of label a of 𝙼, in both previous points the number |r|𝚊 of 𝚊-transitions in r is exactly the number |𝚎|a of of 𝚊-steps in the evaluation 𝚎, i.e. |𝚎|𝚊=|r|𝚊.

The proof that a machine and a strategy are in a mechanical bisimulation follows from some basic properties, grouped under the notion of distillery, following Accattoli et al. [4].

Definition 21 (Distillery).

A machine 𝙼=(𝚂𝚝𝚊𝚝𝚎𝚜,,,¯) and a structural strategy (𝚜𝚝𝚛,) are a distillery if the following conditions hold:

  1. 1.

    Principal projection: 𝚀𝚊𝚀 implies 𝚀¯𝚊𝚀¯ for every principal transition of label 𝚊;

  2. 2.

    Search transparency: 𝚀𝚜𝚎𝚊𝚀 implies 𝚀¯=𝚀¯;

  3. 3.

    Search transitions terminate: 𝚜𝚎𝚊 terminates;

  4. 4.

    Determinism: 𝚜𝚝𝚛 is deterministic;

  5. 5.

    Halt: 𝙼 final states decode to 𝚜𝚝𝚛-normal terms.

Theorem 22 (Sufficient condition for implementations).

Let a machine 𝙼 and a structural strategy (𝚜𝚝𝚛,) be a distillery. Then, they are mechanical bisimilar.

7 The MAD and the Skeletal MAD

In this section, we study an abstract machine for skeletal CbNeed, obtained as a minor modification of a known machine for CbNeed, that is recalled here. Since the two machines share most data structures and transitions, they are presented together in Fig. 6 (slightly abusing notations: the two machines define environments 𝙴 differently, so the notation of the common transitions is overloaded).

Stacks𝚂::=ϵt:𝚂Chains𝙲::=ϵ𝙲:(x,𝚂,𝙴[x])MAD envs𝙴::=ϵ[xt]:𝙴States𝚀:=(𝙲,t,𝚂,𝙴)Skel. MAD envs𝙴::=ϵ[xt]:𝙴xv:𝙴Init.t(ϵ,tα,ϵ,ϵ)(#)Common transitionsChainCodeStackEnvChainCodeStackEnv𝙲tu𝚂𝙴𝚜𝚎𝚊1𝙲tu:𝚂𝙴𝙲λx.tu:𝚂𝙴β𝙲t𝚂[xu]:𝙴𝙲x𝚂𝙴:[xt]:𝙴𝚜𝚎𝚊2𝙲:(x,𝚂,𝙴[x])tϵ𝙴()𝙲:(x,𝚂,𝙴[x])vϵ𝙴𝚜𝚎𝚊3𝙲x𝚂𝙴:[xv]:𝙴MAD specific transition𝙲x𝚂𝙴:[xv]:𝙴𝚜𝚞𝚋𝙲vα𝚂𝙴:[xv]:𝙴(#)Skeletal MAD specific transitions𝙲x𝚂𝙴:[xv]:𝙴𝚜𝚔𝙲x𝚂𝙴:xv:𝙴S:𝙴(%)𝙲x𝚂𝙴:xv:𝙴𝚜𝚜𝙲vα𝚂𝙴:xv:𝙴(#)(#) tα is any well-bound code α-equivalent to t such that its bound names are freshwith respect to those in the rest of the state.(*) If t is not a value  (%) Where 𝚜𝚔𝚍𝚎𝚌(v)=(v,S) is the skeletal decomposition of v(computed as 𝚜𝚙𝚕𝚒𝚝(𝚖𝚜𝚔𝚎𝚕(v))), and 𝙴S is S seen as an environment.Read-backEmptyϵ¯:=Envs[xt]:𝙴¯:=𝙴¯[xt]Stackst:𝚂¯:=𝚂¯txt:𝙴¯:=𝙴¯xtChains𝙲:(x,𝚂,𝙴[x])¯:=𝙴¯𝙲¯𝚂¯x[x]States(𝙲,t,𝚂,𝙴)¯:=𝙴¯𝙲¯𝚂¯t

Figure 6: The Milner Abstract machine by Need (MAD) and the Skeletal Milner Abstract machine by Need (Skeletal MAD) presented as different extension of a common set of transitions (but be careful: the common transitions use different notions of environment).

The MAD.

The Milner Abstract machine by neeD (MAD) of Fig. 6 implements the CbNeed strategy 𝚗𝚎𝚎𝚍. It is a CbNeed variant of the Milner Abstract Machine (MAM), itself a simplification of the Krivine abstract machine (KAM). The MAM is a CbN abstract machine using a single global environment (sometimes called heap), a stack for arguments, and no closures – it is omitted here (the KAM instead uses closures and many local environments). Both the MAM and the MAD are introduced by Accattoli et al. in [4].

The MAD modifies the MAM by adding a mechanism realizing the memoization characteristic of CbNeed. Namely, when execution encounters a variable occurrence x associated to an environment entry [xu] such that u is not a value, then the MAM duplicates u while the MAD does not. Instead, part of the current state is saved on the new chain data structure 𝙲, and the MAD starts evaluating u inside the environment entry itself; this is transition 𝚜𝚎𝚊2. When such an evaluations ends with a value v, the machines resumes the state saved on the chain, and the environment is updated replacing u with v – this is transition 𝚜𝚎𝚊3 – so that future encounters of x will not need to re-evaluate u. Both the MAM and the MAD rely on α-renaming as a black-box operation α, at work in transition 𝚜𝚞𝚋.

The presentation in Fig. 6 differs from the MAD in [4] in a very minor point. In [4], transitions 𝚜𝚎𝚊3 and 𝚜𝚞𝚋 are concatened into a single transition. Splitting them helps in stating the invariants of the machines.

The Skeletal MAD.

The Skeletal MAD, also in Fig. 6, modifies the MAD exactly as the Skeletal CbNeed LSC modifies the CbNeed LSC, that is, by adding skeletal entries xv to the global environment and splitting the substitution transtition 𝚜𝚞𝚋 into two: the skeletonizing transition 𝚜𝚔 that separates the skeleton v from the flesh of the value v, and the skeletal substitution transition 𝚜𝚜 that substitutes the (α-renamed) skeleton vα.

Notation. For specifying transition 𝚜𝚔, we use the following notation for the flesh: a substitution context S=[[x1t1]][[xktk]], where each [[xiti]] can be [xiti] or xiti, induces a global environment 𝙴S:=[[x1t1]]::[[xktk]]:ϵ.

The union of all the transition rules of the Skeletal MAD (namely, 𝚜𝚎𝚊1, 𝚜𝚎𝚊2, 𝚜𝚎𝚊3, β, 𝚜𝚔, and 𝚜𝚜) is noted 𝚂𝙼𝙰𝙳.

The read-back of (Skeletal) MAD states to terms (possibly with ESs) is defined in Fig. 6. The definition uses auxiliary notions of read-back for chains, stacks, and environments that turn them into CbNeed evaluation contexts. For stacks and environments, their read-back is clearly an evaluation context. For chains, it shall be proved as an invariant of the machine.

Invariants.

To prove properties of the Skeletal MAD we need to isolate some of its invariants in Lemma 24 below. The closure one ensures that the closure of the initial term extends, in an appropriate sense, to all reachable states. The well-bound invariant, similarly, ensures that binders in reachable states are pairwise distinct, as in the initial term. To compactly formulate the closure invariant we need the notion of terms of a state.

Definition 23 (Terms of a state).

Let 𝚀=(𝙲,u,𝚂,𝙴) be a Skeletal MAD state where 𝙲 is a possibly empty chain ϵ:(xk,𝚂k,𝙴k:[xk])::(x1,𝚂1,𝙴1:[x1]) for some k0. The terms of 𝚀 are u, every term in 𝚂 and 𝚂i, and every term in an entry of 𝙴 or 𝙴i, for 1ik.

Lemma 24 (Qualitative invariants).

Let t𝚀𝚀=(𝙲,u,𝚂,𝙴) be a Skeletal MAD run.

  1. 1.

    Closure: for every term s of 𝚀 and for any variable x𝚏𝚟(s) there is an environment entry [xt], xv, or [x] on the right of s in 𝚀.

  2. 2.

    Well-bound: if λx.s occurs in 𝚀 and x has any other occurrence in 𝚀 then it is as a free variable of s, and for any substitution [yt], yv, or [y] in 𝚀 the name y can occur (in any form) only on the left of that entry in 𝚀.

  3. 3.

    Contextual chain read-back: 𝙲¯, 𝙲¯𝚂¯, 𝙴¯𝙲¯ and 𝙴¯𝙲¯𝚂¯ are CbNeed evaluation contexts.

Mechanical Bisimulation.

The invariants allow us to prove that the Skeletal MAD and the strategy 𝚜𝚔𝚗𝚎𝚎𝚍 modulo form a distillery, from which the mechanical bisimulation follows (by Theorem 22). The closure invariant is used for Point 4, the other two for Point 1.

Theorem 25 (Skeletal distillery).

The Skeletal MAD and the structural strategy (𝚜𝚔𝚗𝚎𝚎𝚍,) form a distillery. Namely, let 𝚀 be a reachable Skeletal MAD state.

  1. 1.

    Principal projection:

    1. (a)

      if 𝚀β𝚀 then 𝚀¯𝚜𝚔𝚍𝙱𝚀¯.

    2. (b)

      if 𝚀𝚜𝚔𝚀 then 𝚀¯𝚜𝚔𝚀¯.

    3. (c)

      if 𝚀𝚜𝚜𝚀 then 𝚀¯𝚜𝚜𝚀¯.

  2. 2.

    Search transparency: if 𝚀𝚜𝚎𝚊1,𝚜𝚎𝚊2,𝚜𝚎𝚊3𝚀 then 𝚀¯=𝚀¯.

  3. 3.

    Search terminates: 𝚜𝚎𝚊1,𝚜𝚎𝚊2,𝚜𝚎𝚊3 is terminating.

  4. 4.

    Halt: if 𝚀 is final then it has shape (ϵ,v,ϵ,𝙴) and 𝚀¯ is 𝚜𝚔𝚗𝚎𝚎𝚍-normal.

Corollary 26 (Skeletal mechanical bisimulation).

The Skeletal MAD and the structural strategy (𝚜𝚔𝚗𝚎𝚎𝚍,) are in a mechanical bisimulation.

8 Complexity Analysis

In this section, we show that the Skeletal MAD can be concretely implemented within the same complexity of the MAD, that is, with bi-linear overhead.

Sub-Term Property.

As it is standard, the complexity analysis crucially relies on the sub-term property, that bounds the size of manipulated (and thus duplicated) terms using the size of the initial term. The proof requires a similar property about skeletons.

Lemma 27 (Skeleton and flesh size).
  1. 1.

    Auxiliary parametrized statement: let tΛ, 𝚅 be a set of variables, and 𝚜𝚔𝚍𝚎𝚌𝚅(t):=(u,S). Then |u||t| and |s||t| for any ES [xs] in S;

  2. 2.

    Main: let vΛ. Then |𝚜𝚔𝚎𝚕(v)||v| and |s||v| for any ES [xs] in 𝚏𝚕𝚎𝚜𝚑(v).

Lemma 28 (Quantitative sub-term property).

Let t𝚀𝚂𝙼𝙰𝙳𝚀 be a Skeletal MAD run. Then |u||t| for any term u of 𝚀.

Next, some basic observations about the transitions, together with the sub-term property, allow us to bound their number using the two key parameters (that is, number of β-steps/transitions and size of the initial term).

Proposition 29 (Number of transitions).

Let r:t𝚀𝚂𝙼𝙰𝙳𝚀 be a Skeletal MAD run.

  1. 1.

    |r|𝚜𝚔,𝚜𝚜,𝚜𝚎𝚊2,𝚜𝚎𝚊3𝒪(|r|β), and

  2. 2.

    |r|𝚜𝚎𝚊1𝒪(|t|(|r|β+1)).

Proof.

  1. 1.
    1. (a)

      |r|𝚜𝚎𝚊2|r|β, because every 𝚜𝚎𝚊2 transition consumes one a non-value entry from the environment, which are created only by β transitions.

    2. (b)

      |r|𝚜𝚎𝚊3|r|β, because every 𝚜𝚎𝚊3 transition consumes one entry from the chain, which are created only by 𝚜𝚎𝚊2 transitions, which in turn are bound by β transitions.

    3. (c)

      |r|𝚜𝚜𝒪(|r|β), because after a 𝚜𝚜 transition there can be either a 𝚜𝚎𝚊3 transition, a β transition, or no transition (if the 𝚜𝚜 transition is the last one of the run r). Therefore, |r|𝚜𝚜|r|𝚜𝚎𝚊3+|r|β+12|r|β+1.

    4. (d)

      |r|𝚜𝚔𝒪(|r|β), because every 𝚜𝚔 transition is followed by a 𝚜𝚜 transition.

  2. 2.

    Note that 𝚜𝚎𝚊1 transitions decrease the size of the code, which is increased only by 𝚜𝚜 and 𝚜𝚎𝚊2 transition. By the sub-term property (Lemma 28), the code increase is bounded by the size |t| of the initial term. By Point 1, |r|𝚜𝚜,𝚜𝚎𝚊2=𝒪(|r|β). Then |r|𝚜𝚎𝚊1𝒪(|t|(|r|𝚜𝚜,𝚜𝚎𝚊2+1))=𝒪(|t|(|r|β+1)).

Lastly, we need some assumptions on how the Skeletal MAD can be concretely implemented. Our OCaml implementation (see Appendix A) represents variables as memory locations and variable occurrences as pointers to those locations, obtaining random access to environment entries in 𝒪(1). As already mentioned in Sect. 5, all constructors have pointers to their parents in the syntactic tree, and in particular variables have pointers to their occurrences, as to implement efficiently the computation of the skeleton. Moreover, environments are implemented as doubly linked lists, to split/concatenate them in 𝒪(1) in transitions 𝚜𝚎𝚊2 and 𝚜𝚎𝚊3. The cost of transitions 𝚜𝚔 and 𝚜𝚜 is bound by the size of the value to skeletonize/copy, which is bound by the sub-term property.

Lemma 30 (Cost of single transitions).

Let t𝚀𝚂𝙼𝙰𝙳𝚀 be a run. Implementing any transitions from 𝚀 costs 𝒪(1) but for 𝚜𝚔 and 𝚜𝚜 transitions, which cost 𝒪(|t|) each.

Putting all together, we obtain a bilinear bound.

Theorem 31 (The Skeletal MAD is bi-linear).

Let r:t𝚀𝚂𝙼𝙰𝙳𝚀 be a Skeletal MAD run. Then r can be implemented on random access machiness in 𝒪(|t|(|r|β+1)).

9 Conclusions

We show that skeletal call-by-need provides, in some cases, exponentially shorter evaluation sequences involving exponentially smaller terms than call-by-need. We also show that the required skeleton reconstruction can be implemented in linear time and space, by giving a new simpler and graph-free presentation of ideas by Shivers and Wand. Lastly, we smoothly plug this result in the existing distillation technique for abstract machines, obtaining an implementation of skeletal call-by-need with bi-linear overhead. In particular, the bi-linear overhead allows us to establish that the shrinking of evaluation sequences is a real time speed-up: the smart specification of skeletal call-by-need does not hide an expensive overhead.

References

  • [1] Beniamino Accattoli. An abstract factorization theorem for explicit substitutions. In Ashish Tiwari, editor, 23rd International Conference on Rewriting Techniques and Applications (RTA’12) , RTA 2012, May 28 - June 2, 2012, Nagoya, Japan, volume 15 of LIPIcs, pages 6–21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPICS.RTA.2012.6.
  • [2] Beniamino Accattoli. Proof nets and the call-by-value λ-calculus. Theor. Comput. Sci., 606:2–24, 2015. doi:10.1016/J.TCS.2015.08.006.
  • [3] Beniamino Accattoli. Proof nets and the linear substitution calculus. In Bernd Fischer and Tarmo Uustalu, editors, Theoretical Aspects of Computing - ICTAC 2018 - 15th International Colloquium, Stellenbosch, South Africa, October 16-19, 2018, Proceedings, volume 11187 of Lecture Notes in Computer Science, pages 37–61. Springer, 2018. doi:10.1007/978-3-030-02508-3_3.
  • [4] Beniamino Accattoli, Pablo Barenbaum, and Damiano Mazza. Distilling abstract machines. In Johan Jeuring and Manuel M. T. Chakravarty, editors, Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, Gothenburg, Sweden, September 1-3, 2014, pages 363–376. ACM, 2014. doi:10.1145/2628136.2628154.
  • [5] Beniamino Accattoli and Bruno Barras. Environments and the complexity of abstract machines. In Wim Vanhoof and Brigitte Pientka, editors, Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming, Namur, Belgium, October 09 - 11, 2017, pages 4–16. ACM, 2017. doi:10.1145/3131851.3131855.
  • [6] Beniamino Accattoli, Eduardo Bonelli, Delia Kesner, and Carlos Lombardi. A nonstandard standardization theorem. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 659–670. ACM, 2014. doi:10.1145/2535838.2535886.
  • [7] Beniamino Accattoli, Andrea Condoluci, Giulio Guerrieri, and Claudio Sacerdoti Coen. Crumbling abstract machines. In Ekaterina Komendantskaya, editor, Proceedings of the 21st International Symposium on Principles and Practice of Programming Languages, PPDP 2019, Porto, Portugal, October 7-9, 2019, pages 4:1–4:15. ACM, 2019. doi:10.1145/3354166.3354169.
  • [8] Beniamino Accattoli, Andrea Condoluci, and Claudio Sacerdoti Coen. Strong call-by-value is reasonable, implosively. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–14. IEEE, 2021. doi:10.1109/LICS52264.2021.9470630.
  • [9] Beniamino Accattoli and Ugo Dal Lago. On the invariance of the unitary cost model for head reduction. In Ashish Tiwari, editor, 23rd International Conference on Rewriting Techniques and Applications (RTA’12) , RTA 2012, May 28 - June 2, 2012, Nagoya, Japan, volume 15 of LIPIcs, pages 22–37. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPICS.RTA.2012.22.
  • [10] Beniamino Accattoli and Ugo Dal Lago. (leftmost-outermost) beta reduction is invariant, indeed. Log. Methods Comput. Sci., 12(1), 2016. doi:10.2168/LMCS-12(1:4)2016.
  • [11] Beniamino Accattoli, Ugo Dal Lago, and Gabriele Vanoni. The (in)efficiency of interaction. Proc. ACM Program. Lang., 5(POPL):1–33, 2021. doi:10.1145/3434332.
  • [12] Beniamino Accattoli and Giulio Guerrieri. Abstract machines for open call-by-value. Sci. Comput. Program., 184, 2019. doi:10.1016/J.SCICO.2019.03.002.
  • [13] Beniamino Accattoli, Giulio Guerrieri, and Maico Leberle. Types by need. In Luís Caires, editor, Programming Languages and Systems - 28th European Symposium on Programming, ESOP 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, volume 11423 of Lecture Notes in Computer Science, pages 410–439. Springer, 2019. doi:10.1007/978-3-030-17184-1_15.
  • [14] Beniamino Accattoli and Delia Kesner. The structural λ-calculus. In Anuj Dawar and Helmut Veith, editors, Computer Science Logic, 24th International Workshop , CSL 2010, 19th Annual Conference of the EACSL, Brno, Czech Republic, August 23-27, 2010. Proceedings, volume 6247 of Lecture Notes in Computer Science, pages 381–395. Springer, 2010. doi:10.1007/978-3-642-15205-4_30.
  • [15] Beniamino Accattoli, Ugo Dal Lago, and Gabriele Vanoni. Reasonable space for the λ-calculus, logarithmically. Log. Methods Comput. Sci., 20(4), 2024. doi:10.46298/LMCS-20(4:15)2024.
  • [16] Beniamino Accattoli and Adrienne Lancelot. Mirroring call-by-need, or values acting silly. In Jakob Rehof, editor, 9th International Conference on Formal Structures for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia, volume 299 of LIPIcs, pages 23:1–23:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.FSCD.2024.23.
  • [17] Beniamino Accattoli and Maico Leberle. Useful open call-by-need. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany, volume 216 of LIPIcs, pages 4:1–4:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CSL.2022.4.
  • [18] Beniamino Accattoli, Francesco Magliocca, Loïc Peyrot, and Claudio Sacerdoti Coen. The cost of skeletal call-by-need, smoothly, 2025. arXiv:2505.09242.
  • [19] Beniamino Accattoli and Claudio Sacerdoti Coen. On the value of variables. Inf. Comput., 255:224–242, 2017. doi:10.1016/J.IC.2017.01.003.
  • [20] Beniamino Accattoli and Claudio Sacerdoti Coen. IMELL cut elimination with linear overhead. In Jakob Rehof, editor, 9th International Conference on Formal Structures for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia, volume 299 of LIPIcs, pages 24:1–24:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.FSCD.2024.24.
  • [21] Zena M. Ariola, Matthias Felleisen, John Maraist, Martin Odersky, and Philip Wadler. The call-by-need lambda calculus. In Ron K. Cytron and Peter Lee, editors, Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, pages 233–246. ACM Press, 1995. doi:10.1145/199448.199507.
  • [22] Thibaut Balabonski. A unified approach to fully lazy sharing. In John Field and Michael Hicks, editors, Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, pages 469–480. ACM, 2012. doi:10.1145/2103656.2103713.
  • [23] Thibaut Balabonski. Weak optimality, and the meaning of sharing. In Greg Morrisett and Tarmo Uustalu, editors, ACM SIGPLAN International Conference on Functional Programming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013, pages 263–274. ACM, 2013. doi:10.1145/2500365.2500606.
  • [24] Thibaut Balabonski, Pablo Barenbaum, Eduardo Bonelli, and Delia Kesner. Foundations of strong call by need. PACMPL, 1(ICFP):20:1–20:29, 2017. doi:10.1145/3110264.
  • [25] Thibaut Balabonski, Antoine Lanco, and Guillaume Melquiond. A strong call-by-need calculus. In Naoki Kobayashi, editor, 6th International Conference on Formal Structures for Computation and Deduction, FSCD 2021, July 17-24, 2021, Buenos Aires, Argentina (Virtual Conference), volume 195 of LIPIcs, pages 9:1–9:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.FSCD.2021.9.
  • [26] Pablo Barenbaum, Eduardo Bonelli, and Kareem Mohamed. Pattern matching and fixed points: Resource types and strong call-by-need: Extended abstract. In David Sabel and Peter Thiemann, editors, Proceedings of the 20th International Symposium on Principles and Practice of Declarative Programming, PPDP 2018, Frankfurt am Main, Germany, September 03-05, 2018, pages 6:1–6:12. ACM, 2018. doi:10.1145/3236950.3236972.
  • [27] Andrea Condoluci, Beniamino Accattoli, and Claudio Sacerdoti Coen. Sharing equality is linear. In Ekaterina Komendantskaya, editor, Proceedings of the 21st International Symposium on Principles and Practice of Programming Languages, PPDP 2019, Porto, Portugal, October 7-9, 2019, pages 9:1–9:14. ACM, 2019. doi:10.1145/3354166.3354174.
  • [28] Yannick Forster, Fabian Kunze, and Marc Roth. The weak call-by-value λ-calculus is reasonable for both time and space. Proc. ACM Program. Lang., 4(POPL):27:1–27:23, 2020. doi:10.1145/3371095.
  • [29] Tom Gundersen, Willem Heijltjes, and Michel Parigot. Atomic lambda calculus: A typed lambda-calculus with explicit sharing. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013, pages 311–320. IEEE Computer Society, 2013. doi:10.1109/LICS.2013.37.
  • [30] John Hughes. The Design and Implementation of Programming Languages. PhD thesis, Programming Research Group, Oxford University, July 1983.
  • [31] Delia Kesner. Reasoning about call-by-need by means of types. In Bart Jacobs and Christof Löding, editors, Foundations of Software Science and Computation Structures - 19th International Conference, FOSSACS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9634 of Lecture Notes in Computer Science, pages 424–441. Springer, 2016. doi:10.1007/978-3-662-49630-5_25.
  • [32] Delia Kesner, Loïc Peyrot, and Daniel Ventura. Node replication: Theory and practice. Log. Methods Comput. Sci., 20(1), 2024. doi:10.46298/LMCS-20(1:5)2024.
  • [33] Delia Kesner, Alejandro Ríos, and André s Viso. Call-by-need, neededness and all that. In Christel Baier and Ugo Dal Lago, editors, Foundations of Software Science and Computation Structures - 21st International Conference, FOSSACS 2018, Thessaloniki, Greece, April 14-20, 2018 , Proceedings, volume 10803 of Lecture Notes in Computer Science, pages 241–257. Springer, 2018. doi:10.1007/978-3-319-89366-2_13.
  • [34] Arne Kutzner and Manfred Schmidt-Schauß. A non-deterministic call-by-need lambda calculus. In Matthias Felleisen, Paul Hudak, and Christian Queinnec, editors, Proceedings of the third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98), Baltimore, USA, September 27-29, 1998, pages 324–335. ACM, 1998. doi:10.1145/289423.289462.
  • [35] John Launchbury. A natural semantics for lazy evaluation. In Mary S. Van Deusen and Bernard Lang, editors, Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, USA, January 1993, pages 144–154. ACM Press, 1993. doi:10.1145/158511.158618.
  • [36] Simon L. Peyton Jones. The Implementation of Functional Programming Languages. Prentice-Hall, 1987.
  • [37] David Sands, Jörgen Gustavsson, and Andrew Moran. Lambda calculi and linear speedups. In Torben Æ. Mogensen, David A. Schmidt, and Ivan Hal Sudborough, editors, The Essence of Computation, Complexity, Analysis, Transformation. Essays Dedicated to Neil D. Jones [on occasion of his 60th birthday], volume 2566 of Lecture Notes in Computer Science, pages 60–84. Springer, 2002. doi:10.1007/3-540-36377-7_4.
  • [38] Peter Sestoft. Deriving a lazy abstract machine. J. Funct. Program., 7(3):231–264, 1997. doi:10.1017/S0956796897002712.
  • [39] Olin Shivers and Mitchell Wand. Bottom-up beta-reduction: Uplinks and lambda-dags. Fundam. Informaticae, 103(1-4):247–287, 2010. doi:10.3233/FI-2010-328.
  • [40] D. A. Turner. A new implementation technique for applicative languages. Softw. Pract. Exp., 9(1):31–49, 1979. doi:10.1002/SPE.4380090105.
  • [41] Christopher P. Wadsworth. Semantics and pragmatics of the lambda-calculus. PhD Thesis, Oxford, 1971.

Appendix A OCaml Implementation

We provide at https://github.com/Franciman/fullylazymad a reference implementation in OCaml of the Skeletal MAD, meant to guarantee the soundness of the assumptions that underly the complexity analyses. The implementation takes in input a user provided term and it shows the full run printing all transitions and all intermediate machine states. We highlight here a few details on the implementation.

Data structures.

Pre-terms are represented by the following algebraic data type (ADT):

type term =
| Var of { v: var_ref; mutable taken: bool; mutable parent: term option }
| Abs of { v: var_ref; mutable body: term; mutable occurrences: term list; mutable taken: bool; mutable parent: term option }
| App of { mutable head: term; mutable arg: term; mutable taken: bool; mutable parent: term option }
and var_ref =
{ mutable prev : var_ref option;
orig_name : string; (* to help generating new names *)
name : string;
mutable sub : sub;
mutable next : var_ref option }
and sub =
NoSub | Sub of term | SubSkel of term | Hole | Copy of (term list ref * var_ref)

whose constructors App for applications, Abs for abstractions and Var for variable occurrence hold mutable pointers to their subterms, a mutable optional pointer to the parent and a mutable boolean taken for marking terms during skeleton extraction.

Variable occurrences point to a shared, unique variable declaration/substitution in memory of type var_ref, which holds the variable name, a substitution description field of type sub and two optional pointers (prev and next) to insert the variable in an environment, that is a double-linked lists of variable declarations. A substitution description is an ADT whose constructors are Sub t for the explicit substitution [xt], SubSkel t for the skeletal substitutions xv, NoSub for bound variables, Hole to represent [x] in chain items and Copy (parents,x) to point to another variable declaration x together with the list of its parents. The latter form is used temporarily to preserve sharing of variable declarations during the implementation of α: the first time a variable is to be copied its substitution is make to point to the new copy and each time an occurrence of the variable is to be copied, the same variable declaration is reused and a new parent is added for it.

Abstractions also point to the list of occurrences of the bound variable, to implement rule 𝚙𝚛3 in 𝒪(1). This is a minor memory-saving divergence from the paper where it was suggested to have backpointers to the occurrences stored in the bound variable declaration node, of type var_ref: once an abstraction is consumed, the backpointers become useless and thus avoiding them in the implementation saves some space.

Machine states are implemented straightforwardly using lists for stacks, chains, and the set of roots (which is then actually represented as a list, not as a set), and doubly linked lists of variable declarations for environments:

type env = var_ref option
(* Some head of the doubly-linked list, or None for empty list *)
type stack = term list
type chain = (var_ref * stack * env) list
type state = chain * term * stack * env

Skeleton Reconstruction.

The rewriting system for marking terms to extract the skeleton and flesh is implemented sequentially as a recursive function. Each activation record for a call to mark_skeleton in the OCaml stack corresponds in the paper to an occurrence of ¯ in the rewritten term.

let rec mark_skeleton =
function
| None -> ()
| Some (Var v) -> (v.taken <- true; mark_skeleton v.parent)
| Some (Abs a) ->
if not a.taken
then (a.taken <- true;
List.iter (fun v -> mark_skeleton (Some v)) a.occurrences;
mark_skeleton a.parent)
| Some (App a) ->
if not a.taken
then (a.taken <- true; mark_skeleton a.parent)

Parallel implementations of mark_skeleton are possible, but because of the necessity of syncronization over application nodes and because the size of skeletonized terms is supposed to be small, it is unclear if actual parallelization would provide any speed-up.

Remaining Code.

The code to extract the skeleton and flesh from marked terms, and the code for machine transitions follow straightforwardly the rules in the paper, up to noisy code to keep the doubly linked structure of terms and environments. Here is an example of the code for one transition in the Skeletal MAD:

let step : state -> string * state =
function
...
| chain, Abs { v; body; _ }, arg :: args, env ->
set_parent body None; (* unliks body from its parent (double link severed)*)
v.sub <- Sub arg; (* turns v into an explicit substitution *)
push v env; (* pushes v on top of the env *)
β,(chain, body, args, Some v)
...

The code that implements α also follows the standard algorithm to copy a graph in linear time preserving the sharing, described by Accattoli and Barras in [5]. It uses the Copy constructor to temporarily associate each variable to its copy.

Appendix B Example of Skeletal MAD Run

We show here an example of invocation of our implementation. Suppose that the file family_3.lambda contains the following input that encodes the pure term that yields the third term of the family we studied, where backslashes are ASCII art equivalents of λ.

(\i. (\g. (\z. (z i) (z i)) (g (g (g i)))) (\x.\y. (x i) (x i) y)) (\w. w)

The command

dune exec bin/main.exe fully-lazy-functional-linked-env family_3.lambda

yields the following output111 The option fully-lazy-functional-linked-env selects the Skeletal MAD implementation described in the paper, that stays as close as possible to the abstract machine description. The repository also contains code for alternative implementations of the Skeletal MAD, comprising versions that keep the machine state garbage free.:

|(λi.(λg.(λz.zi(zi))(g(g(gi))))(λx.λy.xi(xi)y))(λw.w)¯|||sea1|λi.(λg.(λz.zi(zi))(g(g(gi))))(λx.λy.xi(xi)y)¯|(λw.w)||β|(λg.(λz.zi(zi))(g(g(gi))))(λx.λy.xi(xi)y)¯||[iλw.w]|sea1|λg.(λz.zi(zi))(g(g(gi)))¯|(λx.λy.xi(xi)y)|[iλw.w]|β|(λz.zi(zi))(g(g(gi)))¯||[gλx.λy.xi(xi)y]:[iλw.w]|sea1|λz.zi(zi)¯|g(g(gi))|[gλx.λy.xi(xi)y]:[iλw.w]|β|zi(zi)¯||[zg(g(gi))]:[gλx.λy.xi(xi)y]:[iλw.w]|sea1|zi¯|zi|[zg(g(gi))]:[gλx.λy.xi(xi)y]:[iλw.w]|sea1|z¯|i:zi|[zg(g(gi))]:[gλx.λy.xi(xi)y]:[iλw.w]|sea2(z,i:zi,[z.])|g(g(gi))¯||[gλx.λy.xi(xi)y]:[iλw.w]|sea1(z,i:zi,[z.])|g¯|g(gi)|[gλx.λy.xi(xi)y]:[iλw.w]|sk(z,i:zi,[z.])|g¯|g(gi)|gλx.λy.xi(xi)y:[iλw.w]|ss(z,i:zi,[z.])|λx1.λy2.x1i(x1i)y2¯|g(gi)|gλx.λy.xi(xi)y:[iλw.w]|β(z,i:zi,[z.])|λy2.x1i(x1i)y2¯||[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea3|z¯|i:zi|[zλy2.x1i(x1i)y2]:[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sk|z¯|i:zi|zλy2.p3y2:[p3x1i(x1i)]:[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|ss|λy4.p3y4¯|i:zi|zλy2.p3y2:[p3x1i(x1i)]:[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|β|p3y4¯|zi|[y4i]:zλy2.p3y2:[p3x1i(x1i)]:[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea1|p3¯|y4:zi|[y4i]:zλy2.p3y2:[p3x1i(x1i)]:[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea2(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.])|x1i(x1i)¯||[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea1(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.])|x1i¯|x1i|[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea1(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.])|x1¯|i:x1i|[x1g(gi)]:gλx.λy.xi(xi)y:[iλw.w]|sea2(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.]):(x1,i:x1i,[x1.])|g(gi)¯||gλx.λy.xi(xi)y:[iλw.w]|sea1(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.]):(x1,i:x1i,[x1.])|g¯|gi|gλx.λy.xi(xi)y:[iλw.w]|ss(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.]):(x1,i:x1i,[x1.])|λx5.λy6.x5i(x5i)y6¯|gi|gλx.λy.xi(xi)y:[iλw.w]|β(p3,y4:zi,[y4i]:zλy2.p3y2:[p3.]):(x1,i:x1i,[x1.])|λy6.x5i(x5i)y6¯||[x5gi]:gλx.λy.xi(xi)y:[iλw.w]|

Number of betas: 24
Interpreter fully-lazy-functional-linked-env, final result: λw.w

where in each machine state the chain is red, the code is underlined, the stack has no special marking and the environemnt is green.