A Theory of Sub-Barcodes

Chubet, Oliver A.; Gardner, Kirk P.; Sheehy, Donald R.

doi:10.4230/LIPIcs.SoCG.2025.35

A Theory of Sub-Barcodes

Oliver A. Chubet

Department of Computer Science, North Carolina State University, Raleigh, NC, USA Kirk P. Gardner

Department of Computer Science, North Carolina State University, Raleigh, NC, USA Donald R. Sheehy

Department of Computer Science, North Carolina State University, Raleigh, NC, USA

Abstract

The primary tool in topological data analysis (TDA) is persistent homology, which involves computing a barcode – often from point-cloud or scalar field data – that serves as a topological signature for the underlying function. In this work, we introduce sub-barcodes and show how they arise naturally from factorizations of persistence module homomorphisms. We show that, as a partial order induced by factorizations, the relation of being a sub-barcode is strictly stronger than the rank invariant, and we apply sub-barcode theory to the problem of inferring information about the barcode of an unknown Lipschitz function from samples. The advantage of this approach is that it permits strong guarantees – with no noise – while requiring no sampling assumptions, and the resulting barcode is guaranteed to be a sub-barcode of every Lipschitz function that agrees with the data. We also present an algorithmic theory that allows for the efficient approximation of sub-barcodes using filtered Delaunay triangulations for Euclidean inputs.

Keywords and phrases:

Topology, Topological Data Analysis, Persistent Homology, Persistence Modules, Barcodes, Sub-barcodes, Factorizations, Lipschitz Extensions

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Algebraic topology ; Mathematics of computing

\rightarrow

Geometric topology ; Theory of computation

\rightarrow

Computational geometry ; Computing methodologies

\rightarrow

Algebraic algorithms

Related Version:

Full Version: https://arxiv.org/abs/2206.10504 [10]

Funding:

This work was partially funded by the NSF under grant CCF-2017980.

DOI:

10.4230/LIPIcs.SoCG.2025.35

Event:

41st International Symposium on Computational Geometry (SoCG 2025)

Editors:

Oswin Aichholzer and Haitao Wang

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Topological data analysis (TDA) seeks to compute topological properties of unknown functions from finite samples. The most widely-used tool for TDA is persistent homology (PH), and the most widely-studied functions are distance functions. Thus, the algorithmic problems arising in this field tend to be either algebraic (PH computation), geometric (distance function representation), or both.

A PH pipeline starts with a real-valued function as input, and then considers the sublevel sets of this function. The homology of the sublevels at each threshold value are tracked and aggregated into a barcode (see Figure 1). The goal is to compute or approximate this barcode, which often requires substantial assumptions on the quality of the sample.

In this paper, we relax this goal to instead compute whatever sub-barcode is supported by the data we have. We form a sub-barcode by taking a sub-interval of each bar in the barcode. The formal definition of a sub-barcode and how it arises naturally in PH is covered in Section 3. The main idea is that, given some of the data, we compute some of the underlying structure.

Refer to caption — Figure 1: (Left) A contour plot of a Lipschitz function $f:X\to\mathbb{R}$ . (Right) The barcode $\mathsf{Bar}(f)$ of $f$ (bottom) and a sub-barcode (top). All colors correspond to the function values indicated on the axis (bottom-right).

In practice, analyzing the topology of a finite sample entails two main challenges:

$\blacksquare$

Extension: If $S$ is a finite sample of the underlying space $X$ , then we need to extrapolate the values for a function $f$ from the known values at $S$ to the unknown values at the rest of $X$ . We won’t get the exact answer, but we would like some guarantees on the extension.
$\blacksquare$

Discretization: Having a good approximation to the function is insufficient: we also need a reliable discrete representation of the underlying space. In practice, these representations usually arise in the form of a simplicial complex or a simplicial set.

In the classic setting of distance functions in Euclidean space, strong sampling assumptions guarantee that the distance to a sample approximates the distance to the underlying set. The Delaunay triangulation provides a perfect discretization, capturing the exact homology of the distance sublevels. This was the original setting for PH [13].

Lipschitz functions generalize distance functions and allow us to represent a much wider class of inputs. Several previous papers have shown that, given a sufficiently dense sample, a good approximation to the barcode of a Lipschitz function can be computed [6, 8, 11]. The guarantees from such work is that the output barcode is close to the true barcode in a standard metric called the bottleneck distance. Our approach eliminates all sampling assumptions, but gives guarantees in terms of a partial order on barcodes; that is, it is guaranteed to produce no noise or spurious bars, but finds only those topological features (bars) that are sufficiently supported by the sample.

Contributions and Outline

The background and definitions appear in sections 2 and 3. The algebraic theory of sub-barcodes is presented in Section 4. First, we show how sub-barcodes are related to the widely-studied rank invariant. Surprisingly, despite being a complete invariant for persistence, the natural extension of the rank invariant to a partial order is strictly less discriminating than sub-barcodes. This would be useless if it were not the case that sub-barcodes can be computed, a fact we show in the main sub-barcode theorem (Theorem 4), which states that sub-barcodes arise whenever we have one persistence module homomorphism factoring through another.

In Section 5, we lay out the extension theory. According to the algebraic theory of the previous section, the key to producing sub-barcodes is to find suitable upper and lower bounds on the unknown function. We show that maximum and minimum Lipschitz extensions serve this purpose (Figure 2), as well as how to give good guarantees with only an approximation to the upper and lower bounds (Corollary 9). This section also contains a much more general result to encompass other classes of functions beyond Lipschitz functions (Theorem 11).

Figure 2: On the left, two functions are depicted, one is an upper bound and the other is a lower bound on an unknown function

f:\mathbb{R}\to\mathbb{R}

. There is a corresponding barcode associated with the pair that matches minima in the upper bound to maxima in the lower bound. On the right is a candidate function

f

that lies between the upper and lower bounds and its barcode

\mathsf{Bar}(f)

. The barcode of the inclusion of the upper and lower bounds is a sub-barcode of

\mathsf{Bar}(f)

.

The discretization theory is presented in Section 6. Here, we show how to approximate the Lipschitz extension sub-barcode of the previous section using Delaunay triangulations. This leads to a theory of semi-supervised TDA in which only a subset of function values are known (Theorem 13). We also give a tighter approximation using the barycentric decomposition of the Delaunay triangulation (Theorem 14), and show that this method also applies to a wider class of functions (Example 15).

The categorical theory of sub-barcodes is presented in Section 7. The TDA and PH pipeline depend on category theory to express the way relationships between inputs impact relationships between outputs, i.e., barcodes. We relate sub-barcodes to the classic approach of smoothing barcodes and show how this arises naturally from categorical considerations. We also give a new perspective on barcodes in general for which sub-barcodes are shown to be true subobjects in the categorical sense, and discuss how this construction is related to the theory of fuzzy sets [15, 1, 21].

2 Background

Functions and Filtrations

Let $(X,\mathrm{d})$ be a metric space. When the metric is Euclidean, we denote the distance as a norm, i.e., $\mathrm{d}(x,y):=\|x-y\|$ . The distance from a point $x$ to a (finite) set $U$ is denoted $\mathrm{d}(x,U):=\min_{u\in U}\mathrm{d}(x,u)$ .

The real-valued functions $f:X\to\mathbb{R}$ form a vector space where $(f+\alpha\,g)(x):=f(x)+\alpha\,g(x)$ , for all $\alpha\in\mathbb{R}$ . We also have a poset of functions, where $f\geq g$ iff $f(x)\geq g(x)$ for all $x\in X$ . A function $f:X\to\mathbb{R}$ is $\lambda$ -Lipschitz if, for all $x,y\in X$ , we have

f(y)-\lambda\;\mathrm{d}(x,y)\leq f(x)\leq f(y)+\lambda\;\mathrm{d}(x,y).

If $\lambda$ is not specified, then it is assumed that $\lambda=1$ . This is perhaps not the most common way to express the Lipschitz condition, but it is closest to how it is used in this work.

The $t$ -sublevel set $\mathrm{Sub}_{f}(t)$ of $f$ is the set of points $x\in X$ with $f(x)\leq t$ :

\mathrm{Sub}_{f}(t):=f^{-1}((-\infty,t])=\{x\in X\mid f(x)\leq t\}.

This notation is designed to indicate that $\mathrm{Sub}_{f}$ is a filtration: for $s\leq t$ we have $\mathrm{Sub}_{f}(s)\subseteq\mathrm{Sub}_{f}(t)$ . There is a partial order on sublevel filtrations where $\mathrm{Sub}_{f}\subseteq\mathrm{Sub}_{g}$ iff $\mathrm{Sub}_{f}(t)\subseteq\mathrm{Sub}_{g}(t)$ for all $t\in\mathbb{R}$ . It is a simple exercise to check that, if $f\geq g$ , then $\mathrm{Sub}_{f}\subseteq\mathrm{Sub}_{g}$ ; i.e., larger functions have smaller sublevel sets.

$\blacktriangleright$ Remark 1.

The mapping from a function to its sublevel filtration is a contravariant functor from the poset of functions to the poset of filtrations. The filtrations themselves are functors $\mathbb{R}\to\mathbf{Top}$ from the totally ordered set $\mathbb{R}$ of real numbers to $\mathbf{Top}$ , the category of topological spaces. Throughout, we use remarks like this one to highlight the categorical interpretations. The reader who is unfamiliar with these terms may still follow all definitions and algorithms.

Persistence Modules

A pointwise finite dimensional (p.f.d.) persistence module $M:\mathbb{R}\to\mathbf{vec}$ assigns a (finite-dimensional) vector space $M(t)$ to all $t\in\mathbb{R}$ and a linear map $M_{s\leq t}:M(s)\to M(t)$ for all $s\leq t$ . The map $M_{s\leq s}$ is the identity on $M(s)$ , and for all $s\leq t\leq u$ , $M_{s\leq u}=M_{t\leq u}\circ M_{s\leq t}$ . For persistence modules $M$ and $N$ , a homomorphism $\phi:M\to N$ assigns linear maps $\phi_{t}:M(t)\to N(t)$ for all $t\in\mathbb{R}$ so that for all $s\leq t$ , we have $\phi_{t}\circ M_{s\leq t}=N_{s\leq t}\circ\phi_{s}$ .

$\blacktriangleright$ Remark 2.

A persistence module $M$ is a functor $M:\mathbb{R}\to\mathbf{vec}$ , and a persistence module homomorphism $M\to N$ is a natural transformation $\phi:M\Rightarrow N$ between functors $\mathbb{R}\to\mathbf{vec}$ , so that the category of persistence modules is precisely the category $\mathbf{vec}^{\mathbb{R}}=[\mathbb{R},\mathbf{vec}]$ of functors $\mathbb{R}\to\mathbf{vec}$ .

A factorization of homomorphisms $F:M\to N$ to $G:M^{\prime}\to N^{\prime}$ is given by a pair of homomorphisms $\varphi=\big{(}\varphi_{1}:M^{\prime}\to M,\ \varphi_{2}:N\to N^{\prime}\big{)}$ such that $G=\varphi_{2}\circ F\circ\varphi_{1}$ (Diagram 1).

\begin{matrix}\includegraphics{dagpub-standalone-tikzcd-2025-04-24-15-39-31-1.% pdf2svg.svg}\end{matrix}

(1)

Let $\mathbf{Fac}(\mathbf{\mathbf{vec^{\mathbb{R}}}})$ denote the category with homomorphisms of persistence modules as objects, and arrows $\varphi:F\rightarrow G$ . Then $\varphi:F\rightarrow G$ is a factorization of $G$ through $F$ .

Example 3.

Letting $M(\epsilon)$ denote a persistence module $M$ shifted by a constant $\epsilon\in\mathbb{R}$ , a $2\epsilon$ -interleaving is traditionally given by a pair of commuting diagrams

\begin{matrix}\includegraphics{dagpub-standalone-tikzcd-2025-04-24-15-39-31-2.% pdf2svg.svg}\end{matrix}

(2)

\begin{matrix}\includegraphics{dagpub-standalone-tikzcd-2025-04-24-15-39-31-3.% pdf2svg.svg}\end{matrix}

(3)

depicting factorizations of homomorphisms $M\varepsilon:M(-\epsilon)\to M(\epsilon)$ and $N\varepsilon:N(-\epsilon)\to N(\epsilon)$ by a pair $\big{(}\Phi:M\to N(\epsilon),\ \Psi:N\to M(\epsilon)\big{)}.$ In this section, we show that the commutativity of Diagrams (2) and (3) imply sub-barcode relations $\mathsf{Bar}(M\varepsilon)\sqsubseteq\mathsf{Bar}(N)$ and $\mathsf{Bar}(M)\sqsupseteq\mathsf{Bar}(N\varepsilon)$ .

Homology

Homology is a topological invariant that is particularily useful because it is both informative and efficiently computable for triangulable topological spaces. A homology functor $\mathrm{H}:\mathbf{Top}\to\mathbf{vec}$ associates a vector space $\mathrm{H}(X)$ to a topological space $X$ , and takes continuous maps between topological spaces to linear maps between the corresponding vector spaces.

Persistence modules naturally arise in TDA by taking the homology of a sublevel filtration of a function. This assembles the functorial TDA pipeline from functions to filtrations to persistence modules. It is useful to see these as functors and not just functions because the additional structure of functoriality tells us that, if $f\geq g$ , then there is a persistence module homomorphism $\mathrm{H}\;\mathrm{Sub}_{f}\to\mathrm{H}\;\mathrm{Sub}_{g}$ . As we will see, these homomorphisms can be combined into factorizations (leading to sub-barcodes) and interleavings (leading to approximations).

3 Barcodes and Sub-Barcodes

Let $\mathcal{I}$ denote the poset of intervals on $\mathbb{R}$ ordered by inclusion. A barcode $B$ is a function $B:\underline{B}\to\mathcal{I}$ where $\underline{B}$ is a set whose elements $\beta\in\underline{B}$ are called bars. The barcode assigns an interval to each bar. Note that $B$ is not necessarily injective. When there is no confusion, we identify the bars in a barcode with their corresponding intervals. A sub-barcode mapping between barcodes $A$ and $B$ is a function $\phi:\underline{A}\to\underline{B}$ such that, for all $\alpha\in\underline{A}$ , we have $A(\alpha)\subseteq B(\phi(\alpha))$ . If $\phi$ is injective, then we call it a sub-barcode matching and say $A$ is a sub-barcode of $B$ , denoted $A\sqsubseteq B$ . That is, $A$ is a sub-barcode of $B$ if $A$ can be formed from $B$ by taking a subinterval of each bar in a subset of $\underline{B}$ (Figure 4).

There is a barcode $\mathsf{Bar}(M)$ corresponding to any persistence module $M$ . Note, the image of a persistence module homomorphism $\phi:M\to N$ is also persistence module. We overload notation and write $\mathsf{Bar}(\phi)$ to indicate $\mathsf{Bar}(\mathrm{im}\,\phi)$ .

Figure 4: Two barcodes

A

and

B

and a sub-barcode matching

M:A\to B

.

From the PH pipeline, we have persistence modules $\mathrm{H}\;\mathrm{Sub}_{f}$ associated with each continuous function $f:X\to\mathbb{R}$ . We again simplify our notation and write $\mathsf{Bar}(f)$ to denote $\mathsf{Bar}(\mathrm{H}\;\mathrm{Sub}_{f})$ and $\mathsf{Bar}(f\geq g)$ to denote $\mathsf{Bar}(\mathrm{im}\,\mathrm{H}(\mathrm{Sub}_{f}\subseteq\mathrm{Sub}_{g}))$ . The object giving us a barcode is always clear from context.

We now have enough definitions to state the main theorem of sub-barcodes. The proof is postponed until Section 4 when we will have developed some necessary tools.

Theorem 4 (The Sub-Barcode Theorem).

If there exists a factorization $\varphi:F\rightarrow G$ of persistence module homomorphisms (i.e., $G=\varphi_{2}F\varphi_{1}$ ), then $\mathsf{Bar}(G)\sqsubseteq\mathsf{Bar}(F)$ .

4 The Algebraic Theory

4.1 Sub-barcodes vs. Ranks

Let $\mathrm{rk}\,\leavevmode\nobreak\ m$ denote the rank of a linear map $m$ defined as the dimension of its image, $\mathrm{im}\,m$ . The rank invariant $\mathrm{rank}\,_{M}$ of a persistence module $M:\mathbb{R}\to\mathbf{vec}$ is a function from ordered pairs in $\mathbb{R}$ to the natural numbers defined as $\mathrm{rank}\,_{M}(s\leq t):=\mathrm{rk}\,M_{s\leq t}$ . It is straightforward to extend this invariant to homomorphisms $F:M\to N$ by letting $\mathrm{rank}\,_{F}(s\leq t):=\mathrm{rk}\,(F_{t}\circ M_{s\leq t})$ . In the following, we consider the rank invariant for a persistence module $M$ to be the rank invariant of its identity homomorphism, i.e., $\mathrm{rank}\,_{M}:=\mathrm{rank}\,_{\mathsf{1}_{M}}$ .

A basic fact from linear algebra is that, if a linear map $g$ factors through a second linear map $f$ , then $\mathrm{rk}\,(g)\leq\mathrm{rk}\,(f)$ . It follows immediately that, if there is a factorization $F\rightarrow G$ of persistence module homomorphisms, then for all $s\leq t$ ,

\mathrm{rank}\,_{G}(s\leq t)\leq\mathrm{rank}\,_{F}(s\leq t).

This puts a partial order on rank invariants that we call the sub-rank relation and denote by $\mathrm{rank}\,_{G}\leq\mathrm{rank}\,_{F}$ .

$\blacktriangleright$ Remark 5.

$\mathrm{rk}\,$ is a contravariant functor $\mathbf{Fac}(\mathbf{\mathbf{vec}})\to\mathbb{N}$ and $\mathrm{rank}\,$ is a contravariant functor $\mathbf{Fac}(\mathbf{\mathbf{vec}^{\mathbb{R}}})\to\mathbb{N}$ , i.e., from factorizations of vector spaces (resp. persistence modules) to the poset of natural numbers. An ordering of two rank invariants is a natural transformation between these functors.

In light of the sub-barcode theorem, the rank invariant is the same type of invariant as sub-barcodes. It puts a partial order on morphisms induced by factorizations. However, although the barcode can be constructed from the rank invariant, the natural ordering of rank invariants is a weaker invariant than sub-barcodes as expressed in the following.

Theorem 6.

The sub-barcode order is a more discriminating invariant than the sub-rank order in the sense that, for all persistence module homomorphisms $F$ and $G$ , $\mathsf{Bar}(G)\sqsubseteq\mathsf{Bar}(F)$ implies that $\mathrm{rank}\,_{G}\leq\mathrm{rank}\,_{F}$ , but not vice-versa.

Proof.

The rank invariant is easily extracted from a barcode: $\mathrm{rank}\,_{F}(s\leq t)$ is the number of bars in $\mathsf{Bar}(F)$ that contain the closed interval $s\leq t$ . It follows immediately from the injectivity of the sub-barcode matching (and the pigeonhole principle) that $\mathsf{Bar}(G)\sqsubseteq\mathsf{Bar}(F)$ implies that $\mathrm{rank}\,_{G}\leq\mathrm{rank}\,_{F}$ .

For the other side of the proof, it suffices to give an example of persistence modules $M$ and $N$ such that $\mathrm{rank}\,_{M}\leq\mathrm{rank}\,_{N}$ while $\mathsf{Bar}(M)\not\sqsubseteq\mathsf{Bar}(N)$ .¹¹1As before, $\mathrm{rank}\,_{M}$ is shorthand for $\mathrm{rank}\,_{\mathsf{1}_{M}}$ . Let $M$ be a persistence module where $\mathsf{Bar}(M)$ has two bars corresponding to intervals $[0,1)$ and $[2,3)$ , and let $N$ be a persistence module such that $\mathsf{Bar}(N)$ has only one bar corresponding to the interval $[0,3)$ . Clearly, $\mathrm{rank}\,_{M}\leq\mathrm{rank}\,_{N}$ , but it is not possible for a barcode with two bars to be a sub-barcode of one with only one bar. Thus $\mathsf{Bar}(M)\not\sqsubseteq\mathsf{Bar}(N)$ . $\hfill\blacktriangleleft$

4.2 Induced Matching Theory Proof of the Sub-Barcode Theorem

Although not expressed in these terms, Bauer and Lesnick’s theory of induced matchings [3] shows that sub-barcode matchings arise from certain homomorphisms. The induced matchings in that work are matchings between the bars that do not (necessarily) satisfy the inclusion requirement of a sub-barcode matching. They showed that for any homomorphism $F:M\to N$ , there is a partial bijective function (a matching) between the bars with the property that every bar of $\mathsf{Bar}(F)$ is the intersection of the intervals of the matched bars from $\mathsf{Bar}(M)$ to $\mathsf{Bar}(N)$ . These are constructed from the pair of canonical (functorial) matchings $\mathsf{Bar}(M)\to\mathsf{Bar}(F)\to\mathsf{Bar}(N)$ induced by the epi-mono factorization of $F$ through its image. Translated into the vocabulary of sub-barcodes, this result implies that $\mathsf{Bar}(F)\sqsubseteq\mathsf{Bar}(M)$ and $\mathsf{Bar}(F)\sqsubseteq\mathsf{Bar}(N)$ .

More generally, if $F$ is a monomorphism (all maps $F_{t}$ are injective), then $\mathsf{Bar}(M)\sqsubseteq\mathsf{Bar}(N)$ . This is because the induced matching of a monomorphism matches every bar, in addition to the property that, if $[a,b]$ is matched to $[c,d]$ , then $a\geq c$ and $b=d$ ; thus, a submodule relation between persistence modules implies a sub-barcode relation between the barcodes.

If $F$ is an epimorphism (all maps $F_{t}$ are surjective), then the induced matching is surjective, and if $[a,b]$ is matched to $[c,d]$ , then $a=c$ and $d\leq b$ . So, reversing this induced matching, we get $\mathsf{Bar}(N)\sqsubseteq\mathsf{Bar}(M)$ . We use these matchings in the proof of the Sub-Barcode Theorem below.

Proof of The Sub-Barcode Theorem (Theorem 4).

The factorization $\varphi=(\varphi_{1},\varphi_{2}):F\rightarrow G$ entails $G=\varphi_{2}F\varphi_{1}$ . Let $m:\mathrm{im}\,G\hookrightarrow\mathrm{im}\,\varphi_{2}F$ be the unique monomorphism given by the universal property of images, and let $e:\mathrm{im}\,F\twoheadrightarrow\mathrm{im}\,\varphi_{2}F$ be the epimorphism given by resticting $\varphi_{2}$ to $\mathrm{im}\,F$ . Using the induced matching theory, $m$ gives a sub-barcode matching $\mathsf{Bar}(\varphi_{2}F)\sqsubseteq\mathsf{Bar}(F)$ . Then, the reverse of the matching induced by $e$ gives a sub-barcode matching $\mathsf{Bar}(G)\sqsubseteq\mathsf{Bar}(\varphi_{2}F)$ . Putting these together, we get

\mathsf{Bar}(G)\sqsubseteq\mathsf{Bar}(\varphi_{2}F)\sqsubseteq\mathsf{Bar}(F).\

$\hfill\blacktriangleleft$

The following corollary for ordered sequences of functions forms the basis for all of our applications of sub-barcodes in TDA.

Corollary 7.

If $h\geq g\geq f\geq e$ are functions $X\to\mathbb{R}$ , then $\mathsf{Bar}(h\geq e)\sqsubseteq\mathsf{Bar}(g\geq f)$ .

Proof.

It suffices to observe that the ordering on the functions gives a factorization $(g\geq f)\to(h\geq e)$ of $h\geq e$ through $g\geq f$ . By the functoriality of the sublevel functor and homology, we get the corresponding factorization of persistence modules. The claim then follows from Theorem 4. $\hfill\blacktriangleleft$

5 The Extension Theory

Given an unknown Lipschitz function $f:X\to\mathbb{R}$ and a sample $S\subset X$ , let $f_{S}$ denote the function $f$ restricted to the points of $S$ . The pair $(S,f_{S})$ denotes input to a TDA problem where the goal is to infer as much as possible about $f$ . Any function $g:X\to\mathbb{R}$ that agrees with $f_{S}$ at the points of $S$ is called an extension of $f_{S}$ . If $g$ is also Lipschitz, then it is called a Lipschitz extension. By the definition of Lipschitz continuity, each point $s\in S$ puts an upper bound and a lower bound on the unknown $f$ so that

f(s)-\mathrm{d}(x,s)\leq f(x)\leq f(s)+\mathrm{d}(x,s).

The combination of these upper and lower bounds gives the maximum and minimum Lipschitz extensions, defined respectively for each $x\in X$ as

\check{f}_{S}(x):=\min_{s\in S}f(s)+\mathrm{d}(x,s)\text{\leavevmode\nobreak\ % \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode\nobreak\ }\hat{f}_{S}% (x):=\max_{s\in S}f(s)-\mathrm{d}(x,s).

Note that the max extension is the minimum of the upper bounds and vice versa (see also [17]).

The following theorem states that, for every Lipschitz function $f$ that agrees with the data, the max and min Lipschitz extensions yield a barcode that is contained in $\mathsf{Bar}(f)$ .

Theorem 8.

Given $S\subseteq X$ and $f_{S}:S\to\mathbb{R}$ , for all Lipschitz extensions $f$ of $f_{S}$ , we have $\mathsf{Bar}(\check{f}_{S}\geq\hat{f}_{S})\sqsubseteq\mathsf{Bar}(f)$ .

Proof.

It suffices to observe that for all Lipschitz extensions of $f$ , we have $\check{f}_{S}\geq f\geq\hat{f}_{S}$ . Because $\mathsf{Bar}(f)=\mathsf{Bar}(f\geq f)$ , the result follows from Corollary 7. $\hfill\blacktriangleleft$

The Choice of Lipschitz Constants

It is natural to ask whether (and how) the choice of the Lipschitz constant affects this result. The previous statements have all assumed $1$ -Lipschitz functions. More generally, tuning the Lipschitz constant naturally leads to a filtration of the barcode itself as shown in the following.

Corollary 9.

For any $t\in\mathbb{R}$ , let $\check{f}_{S}^{(t)}$ and $\hat{f}_{S}^{(t)}$ denote the (resp.) max and min $t$ -Lipschitz extensions. If $f_{S}$ is $t_{0}$ -Lipschitz and $t_{0}\leq t_{1}\leq t_{2}$ , then $\mathsf{Bar}(\check{f}_{S}^{(t_{1})}\geq\hat{f}_{S}^{(t_{1})})\sqsubseteq% \mathsf{Bar}(\check{f}_{S}^{(t_{2})}\geq\hat{f}_{S}^{(t_{2})})$ .

A Sub-Barcode that is Close in Bottleneck Distance

A fundamental challenge in computing sub-barcodes from Lipschitz extensions is that it is not clear how to construct a simplicial complex that can represent the sublevel sets of both the max and min extension at the same time. So, in order to still derive guarantees, it will suffice to have approximations to these extensions. The following theorem shows that a bottleneck-close sub-barcode is possible if we have close approximations to the upper and lower bounds.

Theorem 10.

Let $u^{\prime}\geq u\geq f\geq\ell\geq\ell^{\prime}$ be an ordered sequence of functions $X\to\mathbb{R}$ such that $\|\ell-\ell^{\prime}\|_{\infty}\leq\varepsilon$ and $\|u-u^{\prime}\|\leq\varepsilon$ . Then, $\mathrm{d}_{B}\big{(}\mathsf{Bar}(u\geq\ell),\mathsf{Bar}(u^{\prime}\geq\ell^{% \prime})\big{)}\leq\varepsilon$ .

Proof.

The proof is a straightforward application of the general stability theorem for image persistence. The bounds $\|\ell-\ell^{\prime}\|_{\infty}\leq\varepsilon$ and $\|u-u^{\prime}\|\leq\varepsilon$ imply that $\ell^{\prime}+\varepsilon\geq\ell\geq\ell^{\prime}$ and $u^{\prime}\geq u\geq u^{\prime}-\varepsilon$ . As all maps from ordered functions induce inclusions of sublevel sets, all required maps commute to form an interleaving in the arrow category of filtrations. This gives an interleaving in the arrow category of persistence modules which leads to an $\varepsilon$ -interleaving of the image persistence modules. $\hfill\blacktriangleleft$

Beyond Lipschitz Functions

The careful reader may note that there is nothing particularly special about Lipschitz functions in the treatment above. One could consider other classes of functions as well. More generally, it suffices to have an upper and lower bound on the unknown function. In the Lipschitz case, the Lipschitz extensions form a lattice of hypotheses that we are considering. Recall that a poset is a lattice if every subset has a join (least upper bound) and a meet (greatest lower bound). The symbol $\vee$ is used for joins and $\wedge$ is used for meets. This is the origin of our notation $\check{f}_{S}$ and $\hat{f}_{S}$ .

For a set $\mathcal{F}$ of functions, let $\bigvee\mathcal{F}:=\bigvee_{f\in\mathcal{F}}f$ and $\bigwedge\mathcal{F}:=\bigwedge_{f\in\mathcal{F}}f$ . In particular, if $\mathcal{F}$ denotes the set of Lipschitz extensions of $f_{S}$ , then $\check{f}_{S}=\bigvee\mathcal{F}$ and $\hat{f}_{S}=\bigwedge\mathcal{F}$ . For other lattices of hypotheses, the same proof as for the Lipschitz case (Theorem 8) implies the following theorem.

Theorem 11.

Given a lattice $\mathcal{F}$ of functions $X\to\mathbb{R}$ , for all $f\in\mathcal{F}$ , we have

\mathsf{Bar}\big{(}\bigvee\mathcal{F}\geq\bigwedge\mathcal{F}\big{)}% \sqsubseteq\mathsf{Bar}(f).

6 The Discretization Theory

The preceding sections laid out the theory of sub-barcodes for persistence modules, as well a the useful setting of Lipschitz functions. The following section shows how to approximate these barcodes.

The fundamental challenge is to provide a single discretization of space that approximates (or at least bounds) both Lipschitz extensions at the same time. We assume throughout this section that $X$ is a convex polytope in $\mathbb{R}^{d}$ .²²2The convexity of $X$ can be relaxed in many settings. Assuming $X$ is convex simplifies our use of the Nerve theorem. For a finite set $S\subset X$ , the Voronoi diagram assigns a convex polytope called the Voronoi cell to each point $s\in S$ defined as

\mathrm{Vor}_{S}(s):=\{x\in X\mid\mathrm{d}(x,s)=\mathrm{d}(x,S)\}.

The radius of $\mathrm{Vor}_{S}(s)$ is the maximum distance from $s$ to any point in the cell: $r_{s}:=\max_{x\in\mathrm{Vor}_{S}(s)}\mathrm{d}(x,s)$ . More generally, for $\sigma\subseteq S$ , we have a Voronoi cell

\mathrm{Vor}_{S}(\sigma):=\bigcap_{s\in\sigma}\mathrm{Vor}_{S}(s).

Dually, the Delaunay triangulation $\mathrm{Del}_{S}$ associated with $S$ is the simplicial complex with vertex set $S$ and simplices $\mathrm{Del}_{S}=\{\sigma\subseteq S\mid\mathrm{Vor}_{S}(\sigma)\neq\emptyset\}$ . The Delaunay triangulation is widely used for persistent homology on Euclidean inputs.

6.1 Simple Voronoi Extensions and a Delaunay Filtration

The most direct way to use a Delaunay triangulation to estimate the persistence diagram of a set of points is to extend the function from the vertex set to the simplices, setting the value at the simplex be the maximum value at its vertices. This corresponds to the piecewise linear extension of the function to the geometric realization of the complex (see Section 2.5, Morozov [20]). Unfortunately, this can fail badly in the absence of strong sampling guarantees; Figure 5 shows an example where the piecewise linear extension introduces spurious features. Thus, the problem with some samples is not only that we can miss features, but that we may hallucinate them as well.

The key to using a discretization is to use an image filtration (see Cohen-Steiner et al. [12], Bauer [4]) capturing both an upper and lower bound. The following theorem states that we can compute a sub-barcode for any Lipschitz function by filtering the Delaunay triangulation by upper and lower bounds given on the input values and the radius of the Voronoi cells.

Theorem 12.

Let $S$ be a finite subset of a polytope $X\subset\mathbb{R}^{d}$ . Let $f_{S}:S\to\mathbb{R}$ be a sample of an unknown Lipschitz function $f:X\to\mathbb{R}$ . For each $s\in S$ , let $r_{s}$ denote the radius of the Voronoi cell $\mathrm{Vor}_{S}(s)$ . Let $u$ and $\ell$ be functions $\mathrm{Del}_{S}\to\mathbb{R}$ defined on Delaunay simplices $\sigma\in\mathrm{Del}_{S}$ as

u(\sigma):=\max_{s\in\sigma}f_{S}(s)+r_{s}\text{\leavevmode\nobreak\ % \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode\nobreak\ }\ell(\sigma% ):=\max_{s\in\sigma}f_{S}(s)-r_{s}.

Then, $\mathsf{Bar}(u\geq l)\sqsubseteq\mathsf{Bar}(f)$ .

Proof.

To relate the barcodes, we will first construct functions on $X$ that have the same barcode as the simplicial functions $u$ and $\ell$ . Second, we will relate these functions to the Lipschitz extensions $\check{f}_{S}$ and $\hat{f}_{S}$ .

We define piecewise constant functions on $X$ using the Voronoi cells and $f_{S}$ as

u^{*}(x):=\max_{s\mid x\in\mathrm{Vor}_{S}(s)}f_{S}(s)+r_{s}\text{\leavevmode% \nobreak\ \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode\nobreak\ }% \ell^{*}(x):=\max_{s\mid x\in\mathrm{Vor}_{S}(s)}f_{S}(s)-r_{s}.

By the Persistent Nerve Lemma (Chazal et al. [9], Lemma 3.4), there is a homotopy equivalence between the sublevels of $u$ and $u^{*}$ that commutes with inclusions (see also [2]). The same holds for $\ell$ and $\ell^{*}$ . It follows that $\mathsf{Bar}(u\geq\ell)\cong\mathsf{Bar}(u^{*}\geq\ell^{*})$ .

For any $x\in X$ , there is an $s\in S$ such that $x\in\mathrm{Vor}_{S}(s)$ and $u^{*}(x)=f_{S}(s)+r_{s}$ , so

u^{*}(x)=f_{S}(s)+r_{s}\geq f_{S}(s)+\mathrm{d}(x,s)\geq\min_{s_{0}\in S}f_{S}% (s_{0})+\mathrm{d}(x,s_{0})=\check{f}_{S}(x).

Similarly, there is an $s\in S$ such that $x\in\mathrm{Vor}_{S}(s)$ and $\ell^{*}(x)=f_{S}(s)-r_{s}$ , so

\ell^{*}(x)=f_{S}(s)-r_{s}\leq f_{S}(s)-\mathrm{d}(x,s)\leq\max_{s_{0}\in S}f_% {S}(s_{0})-\mathrm{d}(x,s_{0})=\hat{f}_{S}(x),

thus $u^{*}\geq\check{f}_{S}\geq\hat{f}_{S}\geq\ell^{*}$ , and Corollary 7 therefore implies

\mathsf{Bar}(u\geq\ell)\cong\mathsf{Bar}(u^{*}\geq\ell^{*})\sqsubseteq\mathsf{% Bar}(\check{f}_{S}\geq\hat{f}_{S})\sqsubseteq\mathsf{Bar}(f).\

$\hfill\blacktriangleleft$

6.2 Semi-Supervised TDA

We have assumed thus far that we have access to function values at all points in the sample. This resembles a supervised learning problem. If instead we only have function values at a subset $P\subset S$ , we can still use the points of $S$ to improve our approximation. We call this semi-supervised TDA.

The most common guarantees in PH are derived from interleavings, and thus yield bottleneck distance bounds on the resulting barcodes. In this section, we combine those results with the Lipschitz extension sub-barcode and the Delaunay filtration above to get a guaranteed approximation. Throughout, we assume that $S$ is a sample of $X$ , and that we only have function values $f_{P}:P\to\mathbb{R}$ for $P\subset S$ . The algorithm extends $\check{f}_{P}$ and $\hat{f}_{P}$ to the points of $S$ and then use the Delaunay filtration from the previous subsection.

In the following, we say that $S$ is an $\varepsilon$ -sample of $X$ if every point of $X$ is within $\varepsilon$ of a point in $S$ . Equivalently, the radius of every Voronoi cell $\mathrm{Vor}_{S}(s)$ is at most $\varepsilon$ .

Theorem 13.

Let $X\subset\mathbb{R}^{d}$ be a convex polytope, and suppose $S\subset X$ is an $\varepsilon$ -sample. Let $f_{P}:P\to\mathbb{R}$ be a sample of an unknown Lipschitz function $f:X\to\mathbb{R}$ defined on a subset $P\subset S$ , and let $u$ and $\ell$ be functions $\mathrm{Del}_{S}\to\mathbb{R}$ defined on each Delaunay simplex $\sigma\subseteq S$ as

u(\sigma)=\max_{s\in\sigma}\check{f}_{P}(s)+\varepsilon\text{\leavevmode% \nobreak\ \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode\nobreak\ }% \ell(\sigma)=\max_{s\in\sigma}\hat{f}_{P}(s)-\varepsilon.

Then, $\mathsf{Bar}(u\geq\ell)\sqsubseteq\mathsf{Bar}(f)$ , and $\mathrm{d}_{B}\big{(}\mathsf{Bar}(u\geq\ell),\mathsf{Bar}(\check{f}_{P}\geq% \hat{f}_{P})\big{)}\leq\varepsilon$ .

Proof.

The sub-barcode relation follows from Theorem 12 and the observation that the radius $r_{s}\leq\varepsilon$ for all $s$ in an $\varepsilon$ -sample. The bottleneck bound follows from a more general theory of interleaving of image persitence modules. It depends only on the fact that the extensions $u^{*}$ and $\ell^{*}$ of $u$ and $\ell$ to piecewise constant functions on Voronoi cells (as in the proof of Theorem 12) satisfy $\|u^{*}-\check{f}_{P}\|_{\infty}\leq\varepsilon$ and $\|\ell^{*}-\hat{f}_{P}\|_{\infty}\leq\varepsilon$ , respectively. These bounds yeild an $\varepsilon$ -interleaving of the sublevels, and thus give an interleaving of the images.³³3A more general theorem for stabilty of image persistence can be found in the full version of this paper [10]. $\hfill\blacktriangleleft$

6.3 Finer Approximation with Barycentric Subdivision

In general, we would like to have tight approximations to the sub-barcode of Lipschitz extensions without requiring any sampling assumptions. One way to get a better bound is to manually add points to reduce the radii of the cells. Another approach would be to consider the barycentric subdivision. This makes sense because it can be understood as assigning function values to Delaunay simplices individually rather than inducing the values from the function on the vertices. In this section, we show how to define such constructions in a way that applies to a much more general class of functions.

For each sample $s\in S$ , knowing the value of $f(s)$ implies an upper bound and a lower bound on $f$ . Thus far, these bounds came from the assumption that $f$ is Lipschitz. Similar upper and lower bounds can be defined for other classes of functions. We only assume that these bounds are distance monotone. That is, the upper bound $\check{f}_{s}$ for a point $s$ satisfies the condition that if $\mathrm{d}(x,s)\leq\mathrm{d}(y,s)$ , then $\check{f}_{s}(x)\leq\check{f}_{s}(y)$ . For the lower bound $\hat{f}_{s}$ at $s$ values decrease with distance, i.e., if $\mathrm{d}(x,s)\leq\mathrm{d}(y,s)$ , then $\hat{f}_{s}(x)\geq\hat{f}_{s}(y)$ . We call $\{\check{f}_{s}\}$ and $\{\hat{f}_{s}\}$ a family of pointwise distance monotone bounds. As before, we let $\check{f}_{S}(x):=\min_{s\in S}\check{f}_{s}(x)$ and $\hat{f}_{S}(x):=\max_{s\in S}\hat{f}_{s}(x)$ .

Although the term baryentric subdivision implies that subdivision happen at the barycenters of the cells, we define this operation instead as a combinatorial operation on simplicial complexes. For any simplicial complex $K$ , we define $\mathsf{bary}K$ to be the simplicial complex with a vertex for each simplex in $K$ and a simplex for each ordered subset of simplices (the ordering is by inclusion). For any function $g:K\to\mathbb{R}$ , we can extend it to a filtration $G$ on $\mathsf{bary}K$ , as $\sigma_{0}\hookrightarrow\cdots\hookrightarrow\sigma_{k}$ is in $G(t)$ if and only if $g(\sigma_{i})\leq t$ for all $i\in\{0,\ldots,k\}$ . Proof of the following theorem may be found in the full version [10].

Theorem 14.

Let $S$ be a finite sample of a convex polytope $X$ , and let $\{\check{f}_{s}:\mathrm{Vor}_{S}(s)\to\mathbb{R}\mid s\in S\}$ and $\{\hat{f}_{s}:\mathrm{Vor}_{S}(s)\to\mathbb{R}\mid s\in S\}$ be families of pointwise distance-monotone bounds on some unknown function $f:X\to\mathbb{R}$ . Aggregate these bounds as $\check{f}_{S}:=\bigwedge_{s\in S}\check{f}_{s}$ and $\hat{f}_{S}:=\bigvee_{s\in S}\hat{f}_{s}$ . For $\sigma\in\mathrm{Del}_{S}$ , let

u(\sigma):=\min_{x\in\mathrm{Vor}_{S}(\sigma)}\check{f}_{S}(x)\text{% \leavevmode\nobreak\ \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode% \nobreak\ }\ell(\sigma):=\min_{x\in\mathrm{Vor}_{S}(\sigma)}\hat{f}_{S}(x).

Extend $u$ and $\ell$ to $\mathsf{bary}\mathrm{Del}_{S}$ . Then, $\mathsf{Bar}(u\geq\ell)=\mathsf{Bar}(\check{f}_{S}\geq\hat{f}_{S})\sqsubseteq% \mathsf{Bar}(f)$ .

Example 15.

To apply this theorem to the special case of Lipschitz functions, the resulting filtration captures the usual Lipschitz upper and lower bounds restricted to the Voronoi cells. That is,

\check{f}_{s}:\mathrm{Vor}_{S}(s)\to\mathbb{R}:x\mapsto f_{S}(s)+\mathrm{d}(x,% s)\text{\leavevmode\nobreak\ \leavevmode\nobreak\ and\leavevmode\nobreak\ % \leavevmode\nobreak\ }\hat{f}_{s}:\mathrm{Vor}_{S}(s)\to\mathbb{R}:x\mapsto f_% {S}(s)-\mathrm{d}(x,s).

Letting $\alpha(\sigma):=\min_{x\in\mathrm{Vor}_{S}(\sigma)}\mathrm{d}(x,\sigma)$ be the birth time of the simplex $\sigma$ in the standard Delaunay filtration (also known as the $\alpha$ -complex filtration). Then, the corresponding filtration on $\mathsf{bary}\mathrm{Del}_{S}$ is induced by

u(\sigma):=\max_{s\in\sigma}f_{S}(s)+\alpha(\sigma),\text{\leavevmode\nobreak% \ \leavevmode\nobreak\ and\leavevmode\nobreak\ \leavevmode\nobreak\ }\ell(% \sigma):=\min_{s\in\sigma}f_{S}(s)-\alpha(\sigma).

7 Categorical Constructions

The following section describes how the concepts presented above fit neatly into a category-theoretic framework. It contains some technical concepts that may unfamiliar to computational geometers.

7.1 Smoothing Persistence Modules

Given a barcode $B$ , an $\varepsilon$ -smoothing is a new barcode defined by eliminating bars of length at most $2\varepsilon$ , and replacing all other bars $[b,d]$ with a $[b+\varepsilon,d-\varepsilon]$ ; that is, we trim $\varepsilon$ from the beginning and the end of each bar. The idea of a smoothed barcode naturally arises in the literature on the stability of persistence modules [7, 5, 18]. It is clear from the definition that the smoothed barcode is a sub-barcode of the original. This relationship can be expressed clearly as arising from a factorization as follows.

Let $\delta_{-}$ and $\delta_{+}$ be order-preserving maps $\mathbb{R}\to\mathbb{R}$ with the property that, for all $t\in\mathbb{R}$ , we have $\delta_{-}(t)\leq t\leq\delta_{+}(t)$ . Order preserving maps between posets are functors, so the pointwise order relations on $\delta_{-}$ and $\delta_{+}$ correspond to natural transformations

\varepsilon_{0}:\delta_{-}\Rightarrow\mathsf{1}_{\mathbb{R}}\text{\leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ and\leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ }\varepsilon_{1}:\mathsf{1}_{\mathbb% {R}}\Rightarrow\delta_{+}.

So, for any persistence module $M:\mathbb{R}\to\mathbf{vec}$ , composition with $M$ yields the following factorization.

\begin{matrix}\includegraphics{dagpub-standalone-tikzcd-2025-04-24-15-39-31-4.% pdf2svg.svg}\end{matrix}

(4)

Letting $\varepsilon=\varepsilon_{1}\circ\varepsilon_{0}$ , we can view $M\varepsilon$ (or equivalently, its image) as the smoothed persistence module. The factorization induces the sub-barcode matching in the obvious way:

\mathsf{Bar}(M\varepsilon)\sqsubseteq\mathsf{Bar}(M).

From this perspective, an interleaving of persistence modules may be viewed as a pair of compatible factorizations rather than a pair of compatible (shifted) homomorphisms. In fact, the pair $(\delta_{-},\delta_{+})$ corresponds to an adjunction with counit $\varepsilon_{0}$ and unit $\varepsilon_{1}$ (see [14]).

7.2 Sub-Barcodes are Subobjects

Within the literature, there are several different categorifications of persistence barcodes (or equivalently persistence diagrams). Here, we identify a natural way to define a category of barcodes that plays nicely with our treatment of sub-barcodes.

First, recall that unlike much of the prior work, we do not regard barcodes as multisets. Instead, we define a barcode to be a function from a set to the intervals. In computer programming terms, we have an “is a” versus “has a” distinction; in the multiset view, each bar is an interval. In our view, each bar has an interval.

A standard categorical approach to construct a category $\int_{\mathbf{C}}F$ from a (pseudo)functor $\mathbf{C}\to\mathbf{Cat}$ is the Grothendieck construction. Let $F:\mathbf{C}^{\mathsf{op}}\to\mathbf{Cat}$ be a functor from a category $\mathbf{C}$ to the category $\mathbf{Cat}$ of (small) categories. For the purposes of this work, the (covariant) Grothendieck construction yields a category $\int_{\mathbf{C}}F$ with objects given by pairs $(C,x)$ for each $C\in\mathbf{C}$ and $x\in F(C)$ , and arrows $(f,h):(C,x)\to(C^{\prime},x^{\prime})$ for each pair of arrows $f:C\to C^{\prime}$ in $\mathbf{C}$ and $h:x\to F\left[f\right](x^{\prime})$ in $F(C)$ .

Let $[-,\mathcal{I}]:\mathbf{Set}\to\mathbf{Cat}$ be the (pseudo)functor that takes a set $X$ and returns the category $\mathcal{I}^{X}=[X,\mathcal{I}]$ of functors $X\to\mathcal{I}$ . Then the category $\mathbf{Bar}$ of barcodes described in Section 3 is a contravariant Grothendieck construction

\mathbf{Bar}=\int_{\mathbf{Set}}[-,\mathcal{I}].

The objects of $\mathbf{Bar}$ are pairs $(\underline{B},B:\underline{B}\to\mathcal{I})$ – i.e. barcodes – and a morphism of $\mathbf{Bar}$ $(\underline{A},A)\to(\underline{B},B)$ is a function $\phi:\underline{A}\to\underline{B}$ such that for all $\alpha\in\underline{A}$ , we have that $A(\alpha)\subseteq B(\phi(\alpha))$ . To see this as a Grothendieck construction, it suffices to observe that the barwise inclusion ordering is a natural transformation $A\Rightarrow B\phi$ in $[\underline{A},\mathcal{I}]$ .

The condition on morphisms ensures that a bar $\alpha$ can only map to a bar $\beta$ when $A(\alpha)\subseteq B(\beta)$ . Thus, a injective morphism is exactly a sub-barcode matching. These injective morphisms are the monomorphsisms in this category and thus sub-barcodes are subobjects in this category.

$\blacktriangleright$ Remark 16.

The category $\mathbf{Bar}$ defined above is also known as the category of $\mathcal{I}$ -fuzzy sets [15], and the construction of a category of fuzzy sets using the Grothendieck construction also noted by Jardine [16]. It is also possible to replace $\mathbf{Set}$ with $\mathbf{Mch}$ , the category of sets with partial injective maps (i.e., matchings), to construct barcodes as $\int_{\mathbf{Mch}}[-,\mathcal{I}]$ , or more generally, to replace $\mathbf{Set}$ with the category $\mathbf{Set}_{+}\supset\mathbf{Mch}$ of sets and partial functions. In both cases, the monomorphisms are injective sub-barcode matchings that yield sub-barcode relations.

7.3 Ranks via Presheaves

The rank functor was previously constructed directly from a persistence module, or more generally, a factorization of persistence module homomorphisms. In this section, we show how the rank functor can also be constructed from a barcode. This is already well-known; the novelty here is that we show how the rank functor can be factored through the category of barcodes. This gives the more abstract proof that sub-barcodes are more discriminating than ranks.

For any category $\mathbf{C}$ , there is a functor $L:\int_{\mathbf{Set}}[-,\mathbf{C}]\to[\mathbf{C}^{\mathsf{op}},\mathbf{Set}]$ given for $A:\underline{A}\to\mathbf{C}$ by

LA(c)=\{a\in\underline{A}\mid\exists c\to A(a)\}.

The morphisms $L(A)(c\to c^{\prime})$ are inclusions $L(A)(c^{\prime})\subseteq L(A)(c)$ . For $(f,\varepsilon):A\to B$ in $\int_{\mathbf{Set}}[-,\mathbf{C}]$ , it is easy to check that $f$ gives a natural transformation $LA\Rightarrow LB$ .

For the special case where $\mathbf{C}=\mathcal{I}$ , this is a functor from barcodes to presheaves of intervals. The intuitive meaning is that, for a barcode $B$ , the presheaf $L B$ maps an interval $J$ to the set of bars in $B$ that contain all of $J$ . All morphisms in the image of $L$ are inclusions, and so, for $J\supseteq I$ , we have $LB(J)\subseteq LB(I)$ and thus $|LB(J)|\leq|LB(I)|$ . Here the vertical bars indicate set cardinality which is a functor $\mathsf{card}$ from the poset of finite sets with inclusions to the poset $\mathbb{N}$ . Thus, we have a functor $\mathsf{card}\circ L:\mathbf{Bar}\to\mathbf{Set}^{\mathcal{I}^{\mathsf{op}}}$ .

It is now an easy exercise to show that $\mathrm{rank}\,=\mathsf{card}\circ L\circ\mathsf{Bar}$ . All morphisms in the functor category $[\mathcal{I}^{op},\mathbb{N}]=\mathbb{N}^{\mathcal{I}^{\mathsf{op}}}$ are monomorphisms, so clearly a sub-barcode relation (a monomorphism of barcodes) yields a subrank relation (a monomorphism of rank invariants). The other direction doesn’t hold. The most natural way to define a barcode from the rank invariant does not give a functor to barcodes. The example is given in the proof of Theorem 6.

$\blacktriangleright$ Remark 17.

Barr [1] showed that, when the indexing poset $L$ is a Heyting algebra, the category of $L$ -fuzzy sets is equivalent to a category of sheaves of monomorphisms. Following this result, many of the recent applications of fuzzy sets [21, 19] have defined fuzzy sets directly as functors $L\to\mathbf{Set}$ . Unfortunately, because the poset of intervals $\mathcal{I}$ is not a Heyting algebra, this equivalence cannot be applied directly to the category $\mathbf{Bar}$ . On the other hand, the category of intervals with the product or Egli-Milner ordering is a Heyting algebra. Extending Barr’s equivalence to a category of barcodes and partial functions is the subject of future work.

References

[1] Michael Barr. Fuzzy set theory and topos theory. Canadian Mathematical Bulletin, 29(4):501–508, 1986.
[2] Ulrich Bauer, Michael Kerber, Fabian Roll, and Alexander Rolle. A unified view on the functorial nerve theorem and its variations. Expositiones Mathematicae, 41(4):125503, 2023. doi:10.1016/j.exmath.2023.04.005.
[3] Ulrich Bauer and Michael Lesnick. Induced matchings and the algebraic stability of persistence barcodes. Journal of Computational Geometry, page Vol. 6 No. 2 (2015): Special issue of Selected Papers from SoCG 2014, 2015. doi:10.20382/JOCG.V6I2A9.
[4] Ulrich Bauer and Maximilian Schmahl. Efficient computation of image persistence. arXiv preprint, 2022. arXiv:2201.04170.
[5] Peter Bubenik, Vin De Silva, and Jonathan Scott. Metrics for generalized persistence modules. Foundations of Computational Mathematics, 15(6):1501–1531, 2015. doi:10.1007/S10208-014-9229-5.
[6] Mickaël Buchet, Frédéric Chazal, Tamal K. Dey, Fengtao Fan, Steve Y. Oudot, and Yusu Wang. Topological analysis of scalar fields with outliers. In 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 827–841. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/LIPICS.SOCG.2015.827.
[7] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry, SCG ’09, pages 237–246, New York, NY, USA, 2009. ACM. doi:10.1145/1542362.1542407.
[8] Frédéric Chazal, Leonidas J. Guibas, Steve Y. Oudot, and Primoz Skraba. Scalar field analysis over point cloud data. Discrete & Computational Geometry, 46(4):743–775, 2011. doi:10.1007/S00454-011-9360-X.
[9] Frédéric Chazal and Steve Yann Oudot. Towards persistence-based reconstruction in euclidean spaces. In Proceedings of the Twenty-fourth Annual Symposium on Computational Geometry, SCG ’08, pages 232–241, New York, NY, USA, 2008. ACM. doi:10.1145/1377676.1377719.
[10] Oliver A. Chubet, Kirk P. Gardner, and Donald R. Sheehy. A theory of sub-barcodes, 2022. doi:10.48550/arXiv.2206.10504.
[11] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz functions have l p-stable persistence. Foundations of computational mathematics, 10(2):127–139, 2010. doi:10.1007/S10208-010-9060-6.
[12] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Dmitriy Morozov. Persistent homology for kernels, images, and cokernels. In SODA: ACM-SIAM Symposium on Discrete Algorithms, 2009.
[13] Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification. Discrete & Computational Geometry, 28(4):511–533, November 2002. doi:10.1007/s00454-002-2885-2.
[14] Kirk Patrick Gardner. Verified Topological Data Analysis and a Theory of Sub-Barcodes. PhD thesis, North Carolina State University, 2022.
[15] Joseph A Goguen. L-fuzzy sets. Journal of mathematical analysis and applications, 18(1):145–174, 1967.
[16] John F. Jardine. Fuzzy sets and presheaves. Compositionality, 1:3, December 2019. doi:10.32408/compositionality-1-3.
[17] F William Lawvere. Metric spaces, generalized logic, and closed categories. Rendiconti del seminario matématico e fisico di Milano, 43:135–166, 1973.
[18] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Foundations of Computational Mathematics, 15(3):613–650, 2015. doi:10.1007/S10208-015-9255-Y.
[19] Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint, 2018. arXiv:1802.03426.
[20] Dmitriy Morozov. Homological Illusions of Persistence and Stability. PhD thesis, Duke University, 2008.
[21] David I Spivak. Metric realization of fuzzy simplicial sets. Self-published Notes.

[bib.bib1] [1] Michael Barr. Fuzzy set theory and topos theory. Canadian Mathematical Bulletin, 29(4):501–508, 1986.

[bib.bib2] [2] Ulrich Bauer, Michael Kerber, Fabian Roll, and Alexander Rolle. A unified view on the functorial nerve theorem and its variations. Expositiones Mathematicae, 41(4):125503, 2023. doi:10.1016/j.exmath.2023.04.005.

[bib.bib3] [3] Ulrich Bauer and Michael Lesnick. Induced matchings and the algebraic stability of persistence barcodes. Journal of Computational Geometry, page Vol. 6 No. 2 (2015): Special issue of Selected Papers from SoCG 2014, 2015. doi:10.20382/JOCG.V6I2A9.

[bib.bib4] [4] Ulrich Bauer and Maximilian Schmahl. Efficient computation of image persistence. arXiv preprint, 2022. arXiv:2201.04170.

[bib.bib5] [5] Peter Bubenik, Vin De Silva, and Jonathan Scott. Metrics for generalized persistence modules. Foundations of Computational Mathematics, 15(6):1501–1531, 2015. doi:10.1007/S10208-014-9229-5.

[bib.bib6] [6] Mickaël Buchet, Frédéric Chazal, Tamal K. Dey, Fengtao Fan, Steve Y. Oudot, and Yusu Wang. Topological analysis of scalar fields with outliers. In 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 827–841. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/LIPICS.SOCG.2015.827.

[bib.bib7] [7] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry, SCG ’09, pages 237–246, New York, NY, USA, 2009. ACM. doi:10.1145/1542362.1542407.

[bib.bib8] [8] Frédéric Chazal, Leonidas J. Guibas, Steve Y. Oudot, and Primoz Skraba. Scalar field analysis over point cloud data. Discrete & Computational Geometry, 46(4):743–775, 2011. doi:10.1007/S00454-011-9360-X.

[bib.bib9] [9] Frédéric Chazal and Steve Yann Oudot. Towards persistence-based reconstruction in euclidean spaces. In Proceedings of the Twenty-fourth Annual Symposium on Computational Geometry, SCG ’08, pages 232–241, New York, NY, USA, 2008. ACM. doi:10.1145/1377676.1377719.

[bib.bib10] [10] Oliver A. Chubet, Kirk P. Gardner, and Donald R. Sheehy. A theory of sub-barcodes, 2022. doi:10.48550/arXiv.2206.10504.

[bib.bib11] [11] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz functions have l p-stable persistence. Foundations of computational mathematics, 10(2):127–139, 2010. doi:10.1007/S10208-010-9060-6.

[bib.bib12] [12] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Dmitriy Morozov. Persistent homology for kernels, images, and cokernels. In SODA: ACM-SIAM Symposium on Discrete Algorithms, 2009.

[bib.bib13] [13] Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification. Discrete & Computational Geometry, 28(4):511–533, November 2002. doi:10.1007/s00454-002-2885-2.

[bib.bib14] [14] Kirk Patrick Gardner. Verified Topological Data Analysis and a Theory of Sub-Barcodes. PhD thesis, North Carolina State University, 2022.

[bib.bib15] [15] Joseph A Goguen. L-fuzzy sets. Journal of mathematical analysis and applications, 18(1):145–174, 1967.

[bib.bib16] [16] John F. Jardine. Fuzzy sets and presheaves. Compositionality, 1:3, December 2019. doi:10.32408/compositionality-1-3.

[bib.bib17] [17] F William Lawvere. Metric spaces, generalized logic, and closed categories. Rendiconti del seminario matématico e fisico di Milano, 43:135–166, 1973.

[bib.bib18] [18] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Foundations of Computational Mathematics, 15(3):613–650, 2015. doi:10.1007/S10208-015-9255-Y.

[bib.bib19] [19] Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint, 2018. arXiv:1802.03426.

[bib.bib20] [20] Dmitriy Morozov. Homological Illusions of Persistence and Stability. PhD thesis, Duke University, 2008.

[bib.bib21] [21] David I Spivak. Metric realization of fuzzy simplicial sets. Self-published Notes.

A Theory of Sub-Barcodes

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Funding:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Contributions and Outline

2 Background

Functions and Filtrations

▶ Remark 1.

Persistence Modules

▶ Remark 2.

Example 3.

Homology

3 Barcodes and Sub-Barcodes

Theorem 4 (The Sub-Barcode Theorem).

4 The Algebraic Theory

4.1 Sub-barcodes vs. Ranks

▶ Remark 5.

Theorem 6.

Proof.

4.2 Induced Matching Theory Proof of the Sub-Barcode Theorem

Proof of The Sub-Barcode Theorem (Theorem 4).

Corollary 7.

Proof.

5 The Extension Theory

Theorem 8.

Proof.

The Choice of Lipschitz Constants

Corollary 9.

A Sub-Barcode that is Close in Bottleneck Distance

Theorem 10.

Proof.

Beyond Lipschitz Functions

Theorem 11.

6 The Discretization Theory

6.1 Simple Voronoi Extensions and a Delaunay Filtration

Theorem 12.

Proof.

6.2 Semi-Supervised TDA

Theorem 13.

Proof.

6.3 Finer Approximation with Barycentric Subdivision

Theorem 14.

Example 15.

7 Categorical Constructions

7.1 Smoothing Persistence Modules

7.2 Sub-Barcodes are Subobjects

▶ Remark 16.

7.3 Ranks via Presheaves

▶ Remark 17.

References

$\blacktriangleright$ Remark 1.

$\blacktriangleright$ Remark 2.

$\blacktriangleright$ Remark 5.

$\blacktriangleright$ Remark 16.

$\blacktriangleright$ Remark 17.