Abstract 1 Introduction 2 Preliminaries 3 Differentiation on the coslice 4 Integration from a functorial point of view 5 Conclusion References

Functorial Models of Differential Linear Logic

Marie Kerjean ORCID CNRS, Université Sorbonne Paris Nord, Laboratoire d’Informatique de Paris Nord, LIPN, F-93430 Villetaneuse, France Valentin Maestracci ORCID Aix-Marseille Université, CNRS, I2M, France Morgan Rogers ORCID Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN, F-93430 Villetaneuse, France
Abstract

Differentiation in logic has several sources of inspiration. The most recent is differentiable programming, models of which demand functoriality and good typing properties. More historical is reverse denotational semantics, taking inspiration from models of Linear Logic to differentiate proofs and λ-terms. In this paper, we take advantage of the rich structure of categorical models of Linear Logic to give a new functorial presentation of differentiation. We define differentiation as a functor from a coslice of the category of smooth maps to the category of linear maps. Extending linear–non-linear adjunction models of Linear Logic, this produces models of Differential Linear Logic. We use these functorial presentations to shed new light on integration in differential categories.

Keywords and phrases:
Categorical semantics, Differential Programming, Linear Logic
Funding:
Marie Kerjean: Work partially funded by the project ANR-24-CE48-1914.
Copyright and License:
[Uncaptioned image] © Marie Kerjean, Valentin Maestracci, and Morgan Rogers; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Categorical semantics
; Theory of computation Denotational semantics ; Theory of computation Linear logic
Acknowledgements:
We are grateful to Jean-Simon Lemay, Zeinab Galal, and Jad Koleilat for enriching discussions on this paper. We also thank the reviewers for the many constructive comments, helping us making this paper a better version of itself
Editors:
Maribel Fernández

1 Introduction

Differentiation is the bread and butter of mathematical analysis and approximation theory, providing the closest linear approximation of a function. The differential of a function f:AB at a point aA is the linear map Daf:vADafvB. For scalar functions f:, this becomes Daf:tf(a)t, where __ denotes multiplication. Models of Linear Logic (LL henceforth) can either be axiomatized as strong monoidal adjunctions between a category whose morphisms are “linear maps” and a category whose morphisms are “non-linear maps”, or more accurately “general maps” (smooth maps in our case), or as Seely Categories, a setting in which one considers directly the comonad ! that can be deduced from the previous adjunction.

Linear Logic has been extended to Differential Linear Logic (DiLL henceforth) [12], in which non-linear proofs can be turned into linear ones. Categorical presentations of differentiation originate from investigation of denotational models of this system, resulting in a variety of interrelated categorical definitions [1]. Differentiation in such settings might be axiomatized as an external operator acting on morphisms of a category [3], or as a hard-coded natural transformation 𝖽¯:id! on a Seely category [14].

The differential of composed functions is famously not equal to the composite of their differentials: the latter may not even typecheck, as the points at which differentials are computed matter. This is expressed by the chain rule111We write composite morphisms within a category in diagrammatic order throughout, so f;g means f followed by g, but we compose functors contravariantly to avoid confusion when writing, say, 𝒰(!X).:

Da(f;g)=Daf;Df(a)gDaf;Dag (1)

Functorial differentiation has been brought up to date by differentiable programming: this is a much more recent domain emerging from an effort to express machine learning algorithms in a functional and principled way [5, 13]. In particular, differentiation in differentiable programming is traditionally axiomatized as a transformation of programs acting on pairs (Equation 2). This makes D a functor on a suitable category of programs, with axioms enforcing linearity of differentiation.

𝒟f:(a,v)A×A(f(a),Dafv) (2)

Our work takes a different approach, and exploits the linear–non-linear adjunction at the heart of Linear Logic.

Here 𝒰 represents the forgetful functor from linear to general maps; see Definition 8 for the full hypotheses. We consider differentiation as a functor from the coslice of the category of differentiable maps under the terminal object to the category of linear maps. The coslice category (I𝒞) can be thought of as the category of pointed objects, meaning each object A comes equipped with a point a:IA and morphisms respect these. We re-express models of DiLL in terms of models of LL equipped with a functor

𝒟:(I𝒞)

with the intuition that 𝒟 maps a function f:(a,A)(f(a),B) to Da(f), its differential at the point a. The only constraint we impose is that 𝒟 leaves linear maps unchanged, (11). This simplifies both the pairwise presentation of differentiable programming (equation (2)) and the non-functorial transformation 𝖽¯ of differential categories. We will show that this definition is equivalent to that coming from differential categories up to a pointed version of the chain rule; the converse is straightforward (Proposition 27).

This approach also yields insights into how related constructions from analysis, most notably integration, can be realised in models of LL. We demonstrate this by using the slice category (I𝒞) to present the fundamental theorem of analysis as an adjunction.

Related work.

Work on differential categories [2], cartesian differential categories [3] and tangent categories [8] have been successful in categorifying differentiation from a mathematical perspective, but place models of DiLL as the strictest in the hierarchy of categorical definitions. Our use of coslice categories in the context of DiLL is novel, to our knowledge, although the dual construction of slice categories appears in various places: in reverse cartesian differential categories, partial differentials are defined on slice objects [9], and slice categories are extensively used in modeling dependent type theory. We hope our work encourages research into the relationship between differentiation and dependent types.

Outline and contributions.

We provide background material on differential categories in Section 2. The core results of our study are presented in Section 3, where we define 𝒟 as a functor on the coslice (I𝒞), and demonstrate that with a few assumptions, it induces the structure of a differential category on . We formulate categorical models of DiLL with a functorial definition of differentiation (Definition 15), which we prove to be equivalent to differential storage categories under some hypotheses (Proposition 26 and Proposition 27).

In Section 4, we first review the existing presentations of integration and fundamental theorems of analysis in differential categories, then show that the coslice construction allows for a new presentation as an adjunction between (I𝒞) and (Proposition 34). Finally, in Section 5, we discuss potential future work.

2 Preliminaries

In this section, we give a quick introduction to the syntax and semantics of DiLL. For readers familiar with DiLL, we note that Definition 4 is the only one technically necessary to the rest of the paper; the rest of this section serves as motivation and historical introduction.

Linear Logic [15] (LL) is the result of a decomposition of Intuitionistic Logic via an involutive linear negation. Along with multiplicative and additive versions of conjunction and disjunction, it introduces the exponential connective ! and its dual ?. Some important intuition for working in LL coming from the Curry-Howard correspondence is that a proof of a sequent of the form AB corresponds to the construction of a linear function from A to B, while a proof of !AB constructs a non-linear function from A to B. When modelling LL in vector spaces, these intuitions are literally valid, in that a proof of AB is interpreted as a linear map and a proof of !AB is interpreted as (the transpose across the linear–non-linear adjunction of) a not-necessarily-linear map.

DiLL [12] adds to LL new rules on the connective !, expressing the fact that one is able to transform a non-linear proof into a linear one. While LL features a dereliction rule (𝖽), DiLL introduces a codereliction rule (𝖽¯) which acts like differentiation at 0 into the logic:

This creates a duality between linear and non-linear maps in DiLL. We recall also the exponential rules of LL called respectively weakening (𝗐), contraction (𝖼) and promotion (𝗉). DiLL adds to these dual rules of coweakening (𝗐¯) and cocontraction (𝖼¯).

Since the focus of this paper is on categorical models rather than the proof theory of these logics, we will not give a full presentation of the syntax of DiLL; we omit the cut elimination rules and the corresponding commutative diagrams between their interpretation as these are direct translations of differential calculus rules. We refer the reader to the survey of Ehrhard [11] for a complete introduction. Let us now turn to the categorical models.

Definition 1.

A Seely category [19] is a monoidal closed category (,,1) with finite products and a comonad (!,𝖽,ν) such that ! is a (strong) monoidal functor

!:(,×,)(,,1)

where denotes a terminal object. Monoidality of ! means we have “Seely isomorphisms”:

χA,B:!A!B!(A×B) and χ0:1!.

Following the LL convention, we will denote linear maps by :AB and non-linear or smooth maps in the coKleisli category ! by f:AB (note the difference in arrow head). To model classical LL, one requires that LL be a -autonomous category, with an involutive duality (_)(_,1).

The image of the diagonal map ΔE:EE×E and terminal map nE:E for the product in under ! give each object of the form !E a canonical coalgebra structure:

That is, Seely categories are categorical models of intuitionistic Linear Logic [17], with the above maps interpreting contraction and weakening, while dereliction is interpreted by the counit of the comonad (as suggested by the notation) and promotion is interpreted by coKleisli extension. Moreover, we recover an instance of the linear–non-linear adjunction discussed in the Introduction by constructing the coKleisli category for the comonad !.

The usual definition of categorical models of DiLL refines the axiomatization of Seely categories. For instance, to interpret 𝗐¯ and 𝖼¯ in this case one only needs to replace products with biproducts. This is a natural choice: to reflect the addition of rules to a logic, one adds more constructors to the categories providing the logic’s semantics.

Definition 2.

A semi-additive category is a category with finite products and coproducts such that the initial object is terminal and the binary product coincides with the binary coproduct. In the present paper, we denote the resulting biproduct monoidal structure by , with unit the zero object, denoted 0. We also give the following names to the canonical morphisms associated to this structure:

uE:0EnE:E0E:EEEΔE:EEE

Note that for any object, (E,uE,E) is a commutative monoid and (E,nE,ΔE) is a commutative comonoid. The hom-sets of a semi-additive category canonically carry the structure of commutative monoids: given f,g:AB, we may define f+g to be the composite,

(3)

and this operation has the commutativity and associativity properties one would expect.

Example 3.

The category of real vector spaces and linear maps is semi-additive, with the direct sum and unit the zero vector space. The commutative monoid structure on morphisms is just the pointwise sum.

There is a whole hierarchy of axiomatizations of “differential categories” [1], but we will make use of the strongest notion.

Definition 4.

A differential category, called a “differential storage category” in [2] is a semi-additive Seely category (,,1) (with biproducts denoted as in Definition 2) in which the comonad (!,𝖽,ν) is equipped with a natural transformation 𝖽¯:id! satisfying:

  • invariance of linear maps under differentiation,

    𝖽¯;𝖽=id (4)
  • and the generalized chain rule, requiring the following diagram to commute:

    (5)

where 𝖼¯ and 𝗐¯ are constructed from the algebra structure on each object as follows:

These endow !E with coalgebra structure dualizing the algebra structure constructed earlier.222The definition given in [2] includes more equations; this presentation is equivalent by [1, Corollary 6.3].

The generalized chain rule (5) captures the usual formula for composed functions,

Da(f;g)=Da(f);Df(a)(g).

For the reader unfamiliar with differential categories, we now get a bit more into the categorical transposition of chain rule. We start with the simpler case of differentiation at 0, the chain rule without a context, called the “alternative chain rule” [dC.4] in [1]:

(6)

where jE:=EE1 is the inverse of the right unit map. The relationship becomes more apparent if we post-compose this diagram with !f;g, for f:!EF and g:!FG. The comultiplication νE:!E!!E expresses composition of transposed non-linear maps: if f and g are respectively the transposes of non-linear maps f#,g#𝒞, then νF;!f;g:!EG transpose to f#;g#. As such, 𝖽¯E;νE;!f;g:EG represents the differential D0((f;g)#). Meanwhile, by naturality of 𝖼¯ and 𝖽¯, the alternative composite can be rearranged to 𝖽¯E;f;𝖽¯F;jE;(!uE;νE;!f)idF;𝖼¯;g; the first two terms yield D0(f#), while the rest translates to D0(g#(f#(0)+)), which by a change of variables is equal to Df#(0)(g#). The generalized chain rule (5) adds to this set-up a parameter along which functions are not differentiated, so captures the chain rule for partial differentiation.

Proposition 5 ([1, Lemma 6.5]).

The generalized chain rule 5 can be recovered from the alternative chain rule 6 if one asks for the following strength condition on !, where m:!E!F!(E×F) represents the monoidality of ! within (,,1) and can be reconstructed from the Seely isomorphisms χE,F.

(7)

In terms of the Seely isomorphisms, this becomes a more complex condition:

id!E𝖽¯E;χE,E1;𝖽¯!(EE);!χ;!(dEdE)=id!E𝖽¯E;χ1;νEE;!χE,E;!(dEdE)

The transformations 𝖼¯, 𝗐¯ and 𝖽¯ complement the components of LL interpreted in a general Seely category to produce an interpretation of DiLL. In fact, the differential categories axiomatized above are denotational models of intuitionistic DiLL, and of classical DiLL when we impose the extra condition of -autonomy on . The exploration of less strict axiomatizations has been fruitful in incorporating various mathematical examples from functional analysis and differential geometry into this logical framework. However, the definition above crucially does not reflect the symmetry of exponential rules in DiLL, and does not express differentiation using the traditional objects of computer science. Notably, the chain rule is at first sight quite intricate. In this paper, we attempt to simplify this by showing that it is a consequence of functoriality in a new presentation.

Rather than merely introducing new operations such as 𝖽¯, our approach requires us to introduce new categories on top of the LL setup, expanding Definition 1 to an adjunction involving the coKleisli category of ! mentioned above. Nonetheless, the extension of LL to DiLL described in Definition 4 can be recovered from our construction.

Notation.

Given a differentiable function f:AB between two vector spaces, we denote by Daf:AB the linear map corresponding to the differential of f at a point aA.

Example 6 (The distribution model).

The category of convenient spaces [4] provides an illustrative example of a differential category. Formulas are interpreted by real topological vector spaces endowed with a bornology, which is a well-behaving collection of bounded sets; we shall not attempt to give detailed definitions of these structures here. The exponential !A is interpreted as the completion of the vector space generated by the Dirac distributions,

!A:=δxxA¯.

The Dirac distribution δx, defined by f𝒞(E,)f(x), is formally an object of the dual 𝒞(E,) of the space of smooth functions, otherwise known as the space of distributions of compact support over E [18]. As the Dirac distributions form a linearly independent family of distributions, to define linear morphisms on !E it is enough to specify their value on Diracs and then extend linearly and continuously to the completion. As such, the components of DiLL are interpreted as follows:

𝖽 :δx!ExE 𝖼 :δx!Eδxδx!E!E 𝗐 :δx!E1
𝖽¯ :xED0(_)(x)!E 𝖼¯ :δxδy!E!Eδx+y!E 𝗐¯ :1δ0!E
ν :δx!Eδδx!!E

Convenient spaces do not form a -autonomous category, and constructing -autonomous smooth models of DiLL with a non-trivial duality is tricky, often requiring compromise on the interpretation of the other components of DiLL, such as [10] which features an ad-hoc construction to allow differentiation. An alternative work-around for this issue is to construct polarized models of DiLL [16].

Example 7.

We denote by 𝖱𝖤𝖫 the category of sets and relations. The interpretation of LL in 𝖱𝖤𝖫 produces the well-known, historical “relational model”. The tensor product is given by the cartesian product of sets, XY=X×Y and the product (which coincides with the coproduct) is given by the disjoint union of sets XY. The exponential ! is interpreted by finite multisets: !X=f(X), the set of all finite multisets of X. The dereliction 𝖽X!X×X and codereliction 𝖽¯XX×!X respectively put in relation elements with singleton multisets containing them, and reciprocally. The contraction 𝖼X!X×(!X×!X) and cocontraction 𝖼¯X(!X×!X)×!X relate pairs of finite multisets to their disjoint union, while the weakening 𝗐X!X×{} and coweakening 𝗐¯X{}×!X relate to the empty multiset. The co-multiplication of the co-monad ! is νX!X×!!X, relates a finite multiset to each finite multiset of finite multisets of which it is the disjoint union.

3 Differentiation on the coslice

This section represents the heart of this paper: here we will start from a linear–non-linear adjunction and show how defining a functor from a coslice of the non-linear category 𝒞 to the linear category makes into a differential category, up to a well-pointed assumption.

Definition 8.

For the purposes of the present paper, a linear–non-linear adjunction is an adjunction:

(8)

in which:

  • is both monoidal closed and semi-additive;

  • 𝒞 has cartesian monoidal structure;

  • is a strong monoidal functor, so (A×B)(A)(B) and 1(I).

This adds hypothesis to the usual notion of linear-non-linear adjunction, in which the semi-additivity is not required and the adjunction is only lax monoidal [17, Definition 21].

Notation.

We denote the endofunctor on from the adjunction 8 by !:=𝒰, the unit by η:id𝒞𝒰 and the counit by 𝖽:!id. We moreover write ν=η𝒰 for the comultiplication of the induced comonad (!,𝖽,ν) on . We denote f𝒞(A,B) by f:AB and (E,F) by :EF, which is consistent with our notation for Seely categories in the special case where 𝒞 is the coKleisli category of !. Note that 𝒰 preserves products, being a right adjoint. For notational convenience, we work as if 𝒰 and are strictly monoidal, in the sense that we suppress the isomorphisms 𝒰(AB)𝒰(A)×𝒰(B) and (A×B)(A)(B). With this convention, the projection map π1:ABA satisfies 𝒰(π1)=π1:𝒰(A)×𝒰(B)𝒰(A) and moreover (considering the case B=0) 𝒰 preserves the left and right unit maps for 0 on the nose.

 Remark 9.

We will not actually use the fact that the monoidal structure on is closed, but we wished to ensure the hypotheses in the various definitions lined up.

Example 10.

The distribution model of Example 3 can also be understood from the adjunction point of view instead of the comonadic point of view. Let 𝒞 be the category whose objects are convenient vector spaces and whose morphisms are smooth functions and the category of convenient vector spaces and linear maps. In adjunction (8), is interpreted via (E):=δxxE¯, that is the space generated by Dirac distributions on E, and 𝒰 is the forgetful functor. 𝒞 is a cartesian closed category, is monoidal closed and (U) is a strong monoidal adjunction.

Lemma 11 ([17]).

Given a linear–non-linear adjunction as in Definition 8, acquires the structure of a Seely category.

As such, we can interpret the ingredients of Differential Linear Logic, 𝖼,𝗐,𝖼¯,𝗐¯ and 𝖽, as we did in Section 2. In order to interpret differentiation, we need to expose further structure. To begin with, we observe that the semi-additive structure on is partially preserved by 𝒰.

Proposition 12.

Suppose that we are given a linear–non-linear adjunction as in Definition 8. For objects E and A𝒞, 𝒞(A,𝒰(E)) has a distinguished element,

0A,E:AnAI𝒰(uE)𝒰(E),

where nA is the unique morphism. This is the identity element of a commutative monoid structure on 𝒞(A,𝒰(E)).

Proof.

For f,g:A𝒰(E), we define their sum using the codiagonal map:

fg:Af,g𝒰(EE)𝒰(E)𝒰(E). (9)

To show that 0A,E is the identity element, consider the transpose of f0A,E:

(f,0A,E));(𝒰(E));𝖽E=(f)𝗐¯E;𝖼¯E;𝖽E

by monoidality of . The bialgebra laws over !E, as described for example in [1, Section 7] give us idE𝗐¯E;𝖼¯E=idE and as such (f)𝗐¯E;𝖼¯E;𝖽E=(f);𝖽E, the transpose of f, so f=f0A,E) as required.

Next we formalize the category of pointed objects of 𝒞.

Definition 13.

Suppose that is a category with a zero object 0, that 𝒞 is a category with terminal object I and that we are given a functor 𝒰:𝒞 sending 0 to I.

Recall that the coslice category (I𝒞) is defined as follows. The objects of (I𝒞) are pairs (A,a) with A𝒞 and a𝒞(I,A), and morphisms (A,a)(B,b) are morphisms g:AB such that a;g=b.

We may similarly construct the coslice category (I𝒰) where objects are instead pairs (E,a) with E and a𝒞(I,𝒰(E)), and morphisms (E,a)(F,b) consist of (E,F) with a;𝒰()=b.

Notation.

In order to more explicitly keep track of the points involved in a morphism of (I𝒞), we denote a morphism g:(A,a)(B,b) by (ag).

The coslice categories (I𝒞) and (I𝒰) come equipped with projection functors to 𝒞 and which respectively forget the points; we denote both by Π. Moreover, 𝒰 induces a functor (I𝒰)(I𝒞) which we denote by 𝒰, mapping (E,a) to (𝒰(E),a). Putting these together, we get:

(10)
Example 14.

Let’s see how differentiation can be interpreted in Example 10 using the coslice construction. In this case I is the zero vector space {0} and (I𝒞) is the category of pairs (A,a), where a is a vector in A. Differentiation defines a functor 𝒟:(I𝒞) as follows: an object a:IA is mapped to A and an arrow (ag) is sent to the linear map Daf:AB. The chain rule exactly translates to functoriality of 𝒟. Indeed, morphisms (ag) and (bh) compose if and only if b=g(a) and then we have

𝒟(ag;h)=Da(g;h)=Dag;Dg(a)h=𝒟(ag);𝒟(g(a)h).

Example 14 contains the essential idea of the present section. As we shall shortly see, there will be no need to independently construct a codereliction operator once we equip our adjunction with a differentiation functor, since codereliction can be recovered from 𝒟 as differentiation at 0.

Definition 15.

A functorial model of differentiation consists a linear–non-linear adjunction (𝒰) between and 𝒞 equipped with a functor 𝒟:(I𝒞) such that

𝒟𝒰=Π, (11)

which is to say that the following triangle (ignoring the dashed functors) commutes:

 Remark 16.

The choice of name in Definition 15 is not intended to seem grandiose: as we shall see, these categories do not quite constitute models of DiLL in general. As discussed in the introduction, the idea that the chain rule is compositional on pairs of objects is well-established [13]. Here we explore another presentation of differentiation as a functor compatible with linear–non-linear adjunctions as models of Linear Logic.

More explicitly, (11) says that for any object E of and point a:I𝒰(E) we have 𝒟(a𝒰(E))=E, and that for any arrow :EF in we have 𝒟(a𝒰())=. This identity also implies that 𝒟 preserves products of objects in 𝒰(), in the following sense:

Lemma 17.

The functor 𝒟 in Definition 15 preserves limits of objects in the image of 𝒰. In particular, for h1:A𝒰(E1) and h2:A𝒰(E2) in 𝒞 and a:IA, we have:

𝒟(ah1,h2)=𝒟(ah1),𝒟(ah2). (12)

Proof.

A short proof is that Π creates limits and 𝒰 preserves them (because 𝒰 does), whence the identity (11) forces 𝒟 to preserve them.

Proposition 18.

Given morphisms in 𝒞, I𝑎A1g1A2g2A3, we have:

𝒟(ag1;g2) =𝒟(ag1);𝒟(a;g1g2). (13)

Similarly, given:

we have,

𝒟(ah1h2)=𝒟(ah1)+𝒟(ah2). (14)

Proof.

The first identity is an immediate consequence of functoriality of 𝒟. For the second, the definition of h1h2 produces the following diagram:

Applying 𝒟, the upper square is mapped to 𝒟(ah1),𝒟(ah2) by Lemma 17 and the lower square is mapped to E2 by (11), yielding the desired sum, as required. As such, 𝒟 satisfies the chain rule and is linear with respect to the additive structure on morphisms. It follows that 𝒟 interacts well with products of morphisms too:

Corollary 19.

Let g1:𝒰(E1)𝒰(E) and g2:𝒰(E2)𝒰(E) and a=a1,a2:I𝒰(E1)×𝒰(E2). Then we have:

𝒟(ag1×g2)=𝒟(a1g1)𝒟(a2h2). (15)

Proof.

By definition, g1×g2=𝒰(π1);g1,𝒰(π2);g2. Thus by Lemma 17,

𝒟(ag1×g2)=𝒟(a𝒰(π1);g1),𝒟(a𝒰(π2);g2).

We have: 𝒟(a𝒰(π1);g1)=𝒟(a𝒰(π1));𝒟(a;𝒰(π1)g1)=π1;𝒟(a1g1), by (11) and similarly for 𝒟(a𝒰(π2);g2), whence we recover the claimed product of morphisms.

Comparing with differential categories, the only thing missing from to make it a differential category is a codereliction 𝖽¯:id! satisfying the required identities.

Definition 20.

For E an object of , we define: 𝖽¯E:=𝒟(𝒰(uE)η𝒰(E)):E!E.

Lemma 21.

The transformation 𝖽¯:id! is natural.

Proof.

We have a functor R:(I𝒞) sending E to 𝒰(uE):I𝒰(E) and :EF to (𝒰(uE)𝒰()); this is a section of 𝒟. By inspection, η𝒰() lifts to a natural transformation RR!, and 𝖽¯ is the result of applying 𝒟 to this lifting, so is natural as required.

We denote by #:𝒰(E)𝒰(F) the transpose of a morphism :!EF in across the adjunction (𝒰). Since #=ηU(E);𝒰(), using Proposition 18, the definition of 𝖽¯E and (11) we have:

𝒟(𝒰(uE)#)=𝖽¯;, (16)

In particular, the linear rule 𝖽¯E;𝖽E=idE holds for any object E of , since the counit morphism 𝖽E transposes to the identity in (16). Following [1, Proposition 6.2], by making use of the semi-additive structure we can deduce that the product rule is satisfied:

𝖽¯;𝖼=𝖽¯𝗐+𝗐𝖽¯. (17)

The following uses functoriality to capture differentiation of translated functions: intuitively, Dag=D0(xg(a+x)).

Lemma 22.

The functor 𝒟 internalizes translations, meaning for any point a:I𝒰(E) and morphism (ag) in (I𝒞), we have:

𝒟(ag)=𝒟(𝒰(uE)𝒰(iE);(id𝒰(E)×a);𝒰(E);g),

where iE:EE0 is the inverse of the right unit morphism (or equivalently, the coproduct injection morphism).

Proof.

By compositionality, we may without loss of generality consider the case g=id𝒰(E). By Proposition 12, we have a=a𝒰(uE), so we can construct the following sequence of arrows in (I𝒞):

Observing 𝒟(idIa)=uE (since its domain is 0), the image under 𝒟 of the composite is:

where (11) gives left- and right-hand morphisms, while Corollary 19 gives the middle morphism. By the left unit law for i and , the composite is the identity, as required.

We can use Lemma 22 to strengthen (16): we now consider the differential of a function at any point a and express it with respect to 𝖽¯.

Lemma 23.

For any a:I𝒰(E) and :!EF, we have:

𝒟(a#)=jE;𝖽¯E(a);𝖼¯E; (18)

where again jE:=EE1 is the inverse of the right unit map.

Proof.

Consider first the case =id!E, so #=η𝒰(E). By Lemma 22,

𝒟(fη𝒰(E))=𝒟(𝒰(uE)𝒰(iE);(id𝒰(E)×f);𝒰();η𝒰(E));

By naturality of η and monoidality of , the morphism part of the right-hand side is equal to

(!iE;id!E(a);!E)#,

whence we can apply (16) to conclude,

𝒟(aη𝒰(E)) =𝖽¯E;!iE;id!E(a);𝖼¯E
=jE;𝖽¯E(a);𝖼¯E (19)

by naturality of jE in E, observing that !iE=j!E. The general case follows by the argument used to derive (16).

We now begin working toward the chain rule, beginning with the context-free case, (6). This turns out to be a consequence of naturality of η.

Lemma 24.

For any morphism a:I𝒰(E), we have:

𝒟(aνE#)=𝒟(aη𝒰(E));𝒟(a;η𝒰(E)η𝒰(!E)) (20)

Proof.

Recalling that νE=η𝒰(E), we have a naturality square for η that makes the arrows involved equal in (𝒰𝒞), illustrated in the following diagram, to which we apply 𝒟:

Corollary 25.

The codereliction 𝖽¯A satisfies the chain rule without a context, (6):

𝖽¯E;νE=jE;(𝖽¯E!uE);(𝖽¯!EνE);𝖼¯!E (21)

Proof.

Let a=𝒰(uE) in (20). We can expand the left-hand side with (16); the first term on the right-hand side is the definition of 𝖽¯E and the second term can be expanded using (19). Observing that (𝒰(uE);η𝒰(E))=!uA;νA, the resulting identity is:

𝖽¯E;νE=𝖽¯E;j!E;(𝖽¯!E(!uE;νE));𝖼¯!E.

The right-hand side can be manipulated into the expression in (21) by exploiting naturality of j and rearranging parallel tensored terms.

However, to fully interpret DiLL, one also needs to interpret the chain rule within a context (5). This is the translation of a cut rule with a co-dereliction on the context of a promotion in DiLL with other elements not being cut.

Interpreting this rule in the context of Example 14, the proof takes a two-variable function f:A×BC (with transpose f#:!A!BC) and promotes it:

𝗉(f):!A!B !C;
δaδb ((g:C)g(f(a,b))).

Cutting with the codereliction rule means computing, for each vector vA, the function δb(gD(0,b)(gf)(v,0)), which expands to, δb(gDf(0,b)g(D(0,b)f)(v,0)) via the chain rule. Abstracting this reasoning leads us to a “pointed” version of the generalized chain rule.

Proposition 26.

The codereliction 𝖽¯E satisfies a pointed version of the generalized chain rule (13). Namely, for each a:I𝒰(!E), we have:

idE(a);𝖽¯Eid!E;𝖼¯E;νE=idE(a);𝖽¯EcE;𝖼¯Eid!E;𝖽¯!EνE;𝖼¯!E.

Proof.

We deduce the equation from Lemma 24 as follows. By Lemma 23, the left-hand side of (20) expands as:

𝒟(aνE#) =jE;𝖽¯E(a);𝖼¯E;νE.
=jE;idE(a);𝖽¯Eid!E;𝖼¯E;νE.

Meanwhile, the right-hand side expands via two applications of (19) to give:

𝒟(aν#) =jE;𝖽¯E(a);𝖼¯E;j!E;𝖽¯!E((a);νE);𝖼¯!E
=jE;idEc1;𝖽¯E(a)(a);𝖼¯EidE;𝖽¯!EνE;𝖼¯!E
=jE;idE(a);𝖽¯E𝖼E;𝖼¯EidE;𝖽¯!EνE;𝖼¯!E

Here we pass from the first to the second line by naturality of j and the fact that jE;jE1=jE;idE𝖼0, and from the second to the third line by naturality of 𝖼. Since jE is an isomorphism, the desired identity follows. These developments would be of questionable value if they diverged too far from the established formalism of differential categories, so the following result is important.

Proposition 27.

A differential category is a functorial model of differentiation (Def.15).

Proof.

Let be a differential category, with notation as in Definition 4. Let us denote by 𝒞=! the coKleisli category and by × the product on ! inherited from . Then define a functor 𝒟:(0𝒞) by:

𝒟{(a:!0E)E(ag:!EE)jE;(ν0;!a)𝖽¯E;𝖼¯E;g

The effect of 𝒟 on morphisms is the exact categorical translation of the effect of Dag in Example 14. To show that 𝒟 is functorial, consider:

0𝑎E𝑔EgE′′.

Expanding the definition, we first apply naturality of j and ν, then the argument in the proof of Proposition 26 and naturality of 𝖼¯, then naturality of 𝖼, and finally the chain rule of Definition 4:

𝒟(ag) ;𝒟(ν0;!a;gg)=jE;(ν0;!a)𝖽¯E;𝖼¯E;g;jE;(ν0;!ν0;!!a;!g)𝖽¯E;𝖼¯E;g
=jE;id!0(jE;(ν0;!a)𝖽¯E;𝖼¯E);(ν0;ν!0;!!a;!g)(𝖽¯E;!g);𝖼¯!E;g
=jE;𝖼0idE;id!0((ν0;!a)𝖽¯E;𝖼¯E);(ν0;!a;νE)𝖽¯E;𝖼¯!E;!g;g
=jE;(ν0;!a)idE;𝖼E𝖽¯E;id!E𝖼¯E;νE𝖽¯E;𝖼¯!E;!g;g
=jE;(ν0;!a)idE;id!E𝖽¯E;𝖼¯E;νE;!g;g
=𝒟(aνE;!g;g),

as required. The fact that 𝒟(a𝖽;)= follows directly from 𝖽¯;𝖽=id, as

𝒟(a𝖽;)=!a𝖽¯;𝖼¯;𝖽;=(!a;𝖽)idA;𝖼¯;=!uAidA;𝖼¯;.

 Remark 28.

The separation in axiomatic strength between differential categories and our model thus comes down to the separation between the pointed chain rule of Proposition 26 and the generalized version (5). If morphisms of the form idE(a) are jointly epimorphic (a sort of well-pointedness condition), then the latter follows directly from the former. However, this condition is not satisfied by the category of sets and relations of Example 7 (see [6, section 4]) so this condition seems undesirably strong.

4 Integration from a functorial point of view

We now explain how presenting differentiation as a functor on the coslice allows for a new understanding of integration. We begin with a review of integration in Seely categories, based on work by Lemay, Cockett and Ehrhard [7, 11].

4.1 Integration and antiderivatives

The fundamental theorems of calculus relate differentiation and integration for functions of a single variable. The first (FTC1) states that differentiating the integral of a function returns the original function. For a function f: and a, one has:

f(a)=Dx(x0xf(t)𝑑t)(a).

The second (FTC2) states that integrating a function’s derivative returns the function itself:

f=xf(a)+axf(t)𝑑t.

These are valid for real-valued functions only. For higher-order functions, one replaces expressions f(t)dt by Daf(dt). This means that integration is an operation on functions with a linear and a non-linear argument:

0a:(F:!AAB)((0aF):B)

In Section 3, we considered differentiation at 0, as encoded by 𝖽¯. More generally, we can present differentiation at any point in terms of a so-called deriving transformation.

Definition 29.

In a differential category , we denote by ¯:!id! the natural transformation:

¯E:=id!E𝖽¯E;𝖼¯E.

In the context of Example 14, this represents the differential combinator δav(fDaf(v)). Dually, as introduced by Ehrhard [11], we write :!!id for the natural transformation: E=𝖼E;id!E𝖽E corresponding to the map δvδvv.

Definition 30 ([11]).

A differential category is said to have antiderivatives if the following morphism is invertible on every object E:

JE=id!E+(E;¯E):!E!E

In the distribution model JE would be interpreted as δv(ff(v)+Dv(f)v). We refer to [11, section 3.2] for detailed computations using this intuition, which involves integration by parts of the function x01f(tx)𝑑t.

Lemay and Cockett [7] gave a more general axiomatization of integration independent from differentiation. Integration is represented by a natural transformation s:!A!AA. The basic intuition is that post-composing a function F by s results in the integral from zero, 0_F(t,dt) where dt stands for any linear argument and not specifically for a measure. Cockett and Lemay then formulate (FTC2) categorically as follows:

s;¯+!0=id, (22)

where !0=𝗐;𝗐¯. Theorem (FTC1) involves an extra commutativity of variables condition (Poincaré’s lemma), see [11, Prop. 13] and [7, Sec. 5.2]. We leave its study in our context for future work.

4.2 Derivatives and Anti-Derivatives in an adjunction

Consider a model of Differential Linear Logic, notated as in Definition 4. For the sake of simplicity, we consider 𝒞 as the co-Kleisli category over for the co-monad !. We denote f#:AB the transpose in 𝒞 of the map f:!AB . We still write 𝒰() for the embedding of a linear map in 𝒞, meaning 𝒰()=(𝖽;)#.

Note that all statements and proofs, suitably modified, work in the more general context of a linear–non-linear adjunction 𝒰, with a functor 𝒟:(I𝒞), provided that the generalized chain rule (5) holds: this is used in Proposition 32.

We would like to study a bijection of the following form:

Θ:(I𝒞)((A,a),(B,𝒰(uB)))(!AA,B):Θ1 (23)

The core intuition of our development is that Θ maps a function f#:AB to its derivative ¯;f. Meanwhile, Θ1 should map a function F:!AAB to its integral s;F:=xAaxF(t,dt). Then (FTC1) will translate to Θ1;Θ=id. We now develop that statement categorically.

Definition 31.

We define the functor 𝔇:(I𝒞) and the mapping Θ such that:

𝔇:{(a:IA)!AA(_f#)𝖼AidA;(ν;!f)(¯A;f)
Θ:{(I𝒞)((A,a),(B,𝒰(uB))(𝔇(A,a),B)(_f#)¯A;f

With the notation of Section 3 and of Example 6, one has in particular 𝔇(af#)(δxv)=(δf#(x))𝒟(xf#), and Θ mapping (af#) to (δxv)𝒟(xf#)(v).

Proposition 32.

Θ is natural in A and in B. If Θ is moreover an isomorphism, then it defines an adjunction between 𝔇 and the functor R:B(B,uB) described as a section of 𝒟 in the proof of Lemma 21.

Proof.

Consider morphisms f#:(A,a)(A,a) in (I𝒞) and :BB. To show naturality, one has to prove that (I𝒞)(f#,𝒰());Θ=Θ;(𝔇(f)),). Consider then g#(I𝒞)((A,a),(B,𝒰(uB)). By definition of Θ, 𝔇 and functoriality of 𝒟 one has:

Θ(af#;g#;𝒰()) =¯;ν;!ν;!!f;!g;𝖽;=id𝖽¯;ν;!ν;!!f;!g;𝖽;
=id𝖽¯;ν;!f;g; by naturality and comonad laws
=𝖼A𝖽¯A;id𝖼¯A;νA𝖽¯!A;𝖼¯!A;!f;g; from Equation (5)
(Θ;(𝔇(f#)),))(g#) =𝔇(f#);Θ(ag#);=𝖼AidA;(ν;!f)(¯A;f);id𝖽¯;𝖼¯A;g;
=𝖼AidA;(ν;!f)(id𝖽¯A;𝖼¯A;f);id𝖽¯;𝖼¯A;g;
=𝖼A𝖽¯A;(ν;!f)(𝖼¯A;f);id𝖽¯;𝖼¯A;g;
=𝖼A𝖽¯A;ν𝖼¯A;id𝖽¯;𝖼¯!A;!f;g; by naturality of 𝖽¯ and 𝖼¯

Note that Equation 23 describe (FTC1) only for functions which are null at some point a. Hence, we will need an abelian structure on homsets. Following Proposition 12, 𝒞(A,gitE) has an additive structure for any A𝒞 and E.

Lemma 33.

For morphisms ,:!FE in , one has (##);𝖽=+. In particular, when 𝒞(A,E) is an abelian group, transposition across the adjunction preserves subtraction.

Assuming enrichment in abelian groups, we may now apply Θ to the translation of ηA=id!A# which equals 0 at 0, namely ηA𝗐A;𝗐¯A;ηA. Since 𝖽¯;𝗐=0, we have:

¯=Θ(𝒰(uA)ηA𝗐A;𝗐¯A;ηA) (24)

Define SA:=Θ1(id!AA):(A,𝒰(uA))(!AA,𝒰(u!AA)). Intuitively, SA maps a function F:!AAA to x0xF(t,dt). Projecting SA in 𝒞 one gets Π2(SA):A!AA and as such a morphism of :

s:=!A!AA
Proposition 34.

Consider a functorial model of differentiation, enriched over abelian groups, such that 𝔇 and R form adjunction via (23). Then (FTC2) as formalised in (22) holds.

Proof.

The proof amounts to a straightforward computation:

s;¯+!0 =Θ1(id!AA);¯+𝗐A;𝗐¯A=Θ1(id!AA;¯)+𝗐A;𝗐¯A by naturality of Θ1
=Θ1(Θ(𝒰(uA)ηA𝗐A;bwA;ηA))+𝗐A;𝗐¯A by (24).
=id!A𝗐A;𝗐¯A+𝗐A;𝗐¯A=id!A by Lemma 33

5 Conclusion

In this paper, we showed that differentiation as defined in Differential Linear Logic can be described as an extension of the formalism of linear–non-linear adjunction by a functor from a co-slice of the non-linear category to the linear category, 𝒟:(I𝒞), and that this recovers differential categories up to a well-pointedness condition. We then extended 𝒟 to an adjoint functor in order to describe antiderivatives. We believe that our presentation provides a more accessible axiomatization of differentiation that offers simpler intuitions.

The following avenues present natural next steps in this research.

Polarity and Differentiation.

Once this work stabilizes, we would like to investigate what polarized models of DiLL should be, and refine diagram (8) in the case where is -autonomous, or more generally decomposed into a polarized chirality. One could also use chiralities to relax the coincidence of products and coproducts we relied on in the present paper (constraining us to semi-additive categories) to the mere existence of these structures.

Integration.

Section 4 studies integration and (FTC2) only as an inverse to differentiation. A further step in this line of work would be to integrate the formulation of (FTC1) into this presentation. To complement our work on differentiation, we would like to understand how the axioms of integration can be lifted to a functorial presentation on their own to reach a more complete picture of the interactions between integration and differentiation. An interesting question would be whether any adjunction of the form 𝔇R results in antiderivatives, in the sense that the natural family of bijections is of the form proposed for Θ and its inverse.

Well-pointedness.

We would like to find an extension of our set-up witnessing the generalized chain rule without recourse to the compromises of Remark 28. An avenue suggested by Zeinab Galal is to extend the domain of definition of 𝒟 to a comma category such as (𝒰𝒞) to provide “generalized points” at which to differentiate, possibly abstracting the pairwise functorial differentiation of differentiable programming, 𝒟(f):δxvδf(x)Dxf(v). This was the origin of the functor 𝔇 used to describe antiderivatives in Section 4.

Dependent types.

Finally, this setting gives a dependent flavor to differentiation, and it feels natural to investigate a possible link with dependent types.

References

  • [1] R. F. Blute, J. R. B. Cockett, J.-S. P. Lemay, and R. A. G. Seely. Differential categories revisited. Applied Categorical Structures, 2020. doi:10.1007/s10485-019-09572-y.
  • [2] R. F. Blute, J. R. B. Cockett, and R. A. G. Seely. Differential categories. Mathematical Structures in Computer Science, 2006. doi:10.1017/S0960129506005676.
  • [3] R. F. Blute, J. R. B. Cockett, and R. A. G. Seely. Cartesian differential categories. Theory and Applications of Categories, 2009.
  • [4] Richard Blute, Thomas Ehrhard, and Christine Tasson. A convenient differential category. Cahiers de topologie et géométrie différentielle catégoriques, 53(3):211–232, 2012. arXiv:1006.3140.
  • [5] A. Brunel, D. Mazza, and M. Pagani. Backpropagation in the Simply Typed Lambda-calculus with Linear Negation. POPL, 2020. doi:10.1145/3371132.
  • [6] Antonio Bucciarelli, Thomas Ehrhard, and Giulio Manzonetto. Not enough points is enough. In Computer Science Logic, 2007. doi:10.1007/978-3-540-74915-8_24.
  • [7] J. R. B. Cockett and J.-S. Lemay. Integral categories and calculus categories. Mathematical Structures in Computer Science, 29:243–308, 2019. doi:10.1017/S0960129518000014.
  • [8] J. Robin B. Cockett and Geoff S. H. Cruttwell. Differential Structure, Tangent Structure, and SDG. Appl. Categorical Struct., 22(2):331–417, 2014. doi:10.1007/s10485-013-9312-0.
  • [9] J. Robin B. Cockett, Geoff S. H. Cruttwell, Jonathan Gallagher, Jean-Simon Pacaud Lemay, Benjamin MacAdam, Gordon D. Plotkin, and Dorette Pronk. Reverse derivative categories. In Maribel Fernández and Anca Muscholl, editors, 28th EACSL Annual Conference on Computer Science Logic, CSL 2020, January 13-16, 2020, Barcelona, Spain, volume 152 of LIPIcs, pages 18:1–18:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.CSL.2020.18.
  • [10] Y. Dabrowski and M. Kerjean. Models of Linear Logic based on the Schwartz ϵ product. Theory and Applications of Categories, 2020. doi:10.48550/arXiv.1712.07344.
  • [11] T. Ehrhard. An introduction to differential linear logic: proof-nets, models and antiderivatives. Math. Struct. Comput. Sci., 2017. doi:10.1017/S0960129516000372.
  • [12] T. Ehrhard and L. Regnier. Differential interaction nets. Theoretical Computer Science, 364(2), 2006. doi:10.1016/j.tcs.2006.08.003.
  • [13] Conal Elliott. The simple essence of automatic differentiation. In Proceedings of the ACM on Programming Languages (ICFP), 2018. doi:10.1145/3236765.
  • [14] M. P. Fiore. Differential structure in models of multiplicative biadditive intuitionistic linear logic. In TLCA, 2007. doi:10.1007/978-3-540-73228-0_13.
  • [15] J.-Y. Girard. Linear logic. Theoret. Comput. Sci., 50(1), 1987. doi:10.1016/0304-3975(87)90045-4.
  • [16] Marie Kerjean. A logical account for linear partial differential equations. In Anuj Dawar and Erich Grädel, editors, Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, Oxford, UK, July 09-12, 2018, pages 589–598. ACM, 2018. doi:10.1145/3209108.3209192.
  • [17] Paul-André Mellies. Categorical semantics of linear logic. Société Mathématique de France, 2008.
  • [18] L. Schwartz. Théorie des distributions. Hermann, Paris, 1966.
  • [19] R. A. G. Seely. Linear Logic, *-Autonomous Categories and Cofree Coalgebras. In In Categories in Computer Science and Logic. American Mathematical Society, 1989.