Sparsification of the Generalized Persistence Diagrams for Scalability Through Gradient Descent

Carrière, Mathieu; Kim, Seunghyun; Kim, Woojin

doi:10.4230/LIPIcs.SoCG.2025.29

Sparsification of the Generalized Persistence Diagrams for Scalability Through Gradient Descent

Mathieu Carrière

Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France Seunghyun Kim Department of Mathematical Sciences, KAIST, Daejeon, South Korea Woojin Kim

Department of Mathematical Sciences, KAIST, Daejeon, South Korea

Abstract

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge.

In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Keywords and phrases:

Multi-parameter persistent homology, Generalized persistence diagram, Generalized rank invariant, Non-convex optimization, Gradient descent

Funding:

Mathieu Carrière: Partially supported by ANR grant “TopModel”, ANR-23-CE23-0014, and supported by the French government, through the 3IA Cote d’Azur Investments in the project managed by the National Research Agency (ANR) with the reference number ANR-23-IACL-0001.

Woojin Kim: Partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (RS-2025-00515946).

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Algebraic topology ; Theory of computation

\rightarrow

Nonconvex optimization

Related Version:

Full Version: https://arxiv.org/abs/2412.05900 [11]

Supplementary Material:

Software (Source Code): https://github.com/L-ebesgue/sparse_GPDs
archived at

swh:1:dir:2147b0ae5a14e4c082811cb721798cb8974fb0b9

Acknowledgements:

This work stemmed from a conversation between M.C. and W.K. during the Dagstuhl Seminar on “Applied and Combinatorial Topology” (24092), held from February 26 to March 1, 2024. M.C. and W.K. thank the organizers of the Dagstuhl Seminar and appreciate the hospitality of Schloss Dagstuhl – Leibniz Center for Informatics. The authors are also grateful to the OPAL infrastructure from Université Côte d’Azur for providing resources and support.

DOI:

10.4230/LIPIcs.SoCG.2025.29

Event:

41st International Symposium on Computational Geometry (SoCG 2025)

Editors:

Oswin Aichholzer and Haitao Wang

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Persistent homology, a central tool in topological data analysis (TDA), enables the study of topological features in datasets through algebraic invariants. In the classical one-parameter setting, the persistence barcode (or equivalently, the persistence diagram) serves as a complete, discrete, and computationally tractable invariant of a persistence module. Persistent homology can be extended to multi-parameter persistent homology, which provides tools for capturing the topological features of datasets using multiple filtrations instead of just one. However, the transition to multi-parameter persistent homology introduces significant complexity into the algebraic structure of the associated persistence modules [3, 6, 9].

Nevertheless, the generalized persistence diagram (GPD) extends the notion of persistence diagram from the one-parameter to the multi-parameter setting in a natural way [20, 33]. Although the GPD has been extensively studied in terms of stability, discriminating power, computation, and generalizations (see, e.g., [1, 13, 16, 17, 20, 22, 23, 24]), the computational complexity of GPDs remains a major obstacle [19]. The primary challenge arises from the size of their domain: the domain of the complete GPD is either $\mathrm{Int}(\mathbb{R}^{d})$ , the set of all intervals in $\mathbb{R}^{d}$ , or any appropriately chosen, finite subset of $\mathrm{Int}(\mathbb{R}^{d})$ – however, in this case, domains still tend to be enormous to avoid sacrificing the GPD discriminating power [13]. Nevertheless, GPDs are flexible, in the sense that they are still well-defined on any finite subdomain $\mathcal{I}\subset\mathrm{Int}(\mathbb{R}^{d})$ . Even if the subdomain has small size, as the GPD over $\mathcal{I}$ is simply defined as the Möbius inversion of the generalized rank invariant (GRI) over $\mathcal{I}$ (see Definition 1). This allows to control the aforementioned complexity of GPD computation by picking a small, or sparse, subdomain $\mathcal{I}$ . Moreover, the topological information loss due to using such sparse subdomains can be mitigated by looking for relevant intervals, i.e., intervals that are rich in topological content; indeed, even with a substantially small subdomain $\mathcal{I}\subset\mathrm{Int}(\mathbb{R}^{d})$ , the GPD over $\mathcal{I}$ can still be finer than other traditional invariants of multi-parameter persistence modules, such as the rank invariant (RI) [9]; see [13] for details. However, how to design these subdomains in the “best” way so as to reduce computational cost while maintaining the discriminating power of the GPD is a question that has not been much explored so far.

Sparsification of the GPD via gradient descent.

Motivated by the computational challenges of the GPD and the flexibility of this invariant upon selecting subsets of intervals as its domain, we propose a method for automatically sparsifying GPDs computed from $\mathbb{R}^{2}$ -persistence modules based on gradient descent. Namely, we consider the following scenario.

Suppose we aim at classifying instances of a given dataset based on their topological features, and we have already computed a set of corresponding persistence modules $\{M_{i}:\mathbb{R}^{2}\to\mathrm{vect}_{k}\}_{1\leq i\leq t}$ from the dataset, where each persistence module corresponds to an individual data point. We consider the set $\{\mathrm{dgm}_{M_{i}}^{\mathcal{I}}\}_{1\leq i\leq t}$ of GPDs of $M_{i}$ over a large set $\mathcal{I}$ of intervals in $\mathbb{R}^{2}$ (cf. Definition 1). We refer to these as the full GPDs.

Let $n\gg 1$ be the cardinality of $\mathcal{I}$ , and let $m$ be a sparsification parameter $m\in\mathbb{N}^{*}=\{1,2,\ldots\}$ , which is typically significantly smaller than $n$ . Let $\binom{\mathrm{Int}(\mathbb{R}^{2})}{m}$ denote the set of $m$ -subsets of $\mathrm{Int}(\mathbb{R}^{2})$ , i.e., $\{\mathcal{J}\subset\mathrm{Int}(\mathbb{R}^{2}):\left\lvert{\mathcal{J}}% \right\rvert=m\}$ . In order to identify an $m$ -subset of intervals in $\mathbb{R}^{2}$ over which the new GPDs are computed (and subsequently used to classify the persistence modules $\{M_{i}\}_{1\leq i\leq t}$ ), we proceed as follows. Firstly, we identify a loss function defined on $\binom{\mathrm{Int}(\mathbb{R}^{2})}{m}$ :

\begin{array}[]{rrcl}\mathcal{L}_{d_{\ell},m,\{M_{i}\}_{1\leq i\leq t}}:&% \binom{\mathrm{Int}(\mathbb{R}^{2})}{m}&\rightarrow&\mathbb{R}\\ &\mathcal{J}&\mapsto&\sum_{i=1}^{t}d_{\ell}(\mathrm{dgm}_{M_{i}}^{\mathcal{J}}% ,\mathrm{dgm}_{M_{i}}^{\mathcal{I}}),\end{array}

(1)

where $d_{\ell}$ is an appropriate dissimilarity function. Secondly, we search for a minimizer of the loss function. The goal of this search is to identify a subset $\mathcal{J}^{\ast}$ of $m$ intervals such that the GPDs of the $\{M_{i}\}_{i}$ over this sparse subset $\{\mathrm{dgm}_{M_{i}}^{\mathcal{J}^{\ast}}\}_{1\leq i\leq t}$ , the sparse GPDs, best approximate their corresponding full counterparts overall. One natural way of searching for a (local) minimizer is through gradient descent, starting from a randomly chosen $m$ -subset $\mathcal{J}_{\text{init}}$ of $\mathcal{I}$ . To achieve this, the following requirements either must be met or highly desirable:

(I)

(Distance) A suitable distance or dissimilarity function $d_{\ell}$ , utilized in constructing the loss function above, ideally satisfying certain stability guarantees w.r.t. the interleaving distance between persistence modules [12, 25],
(II)

(Vectorization) A certain representation of the loss function $\mathcal{L}_{d_{\ell},m,\{M_{i}\}_{1\leq i\leq t}}$ as a map defined on some subset $\mathcal{D}$ of Euclidean space,
(III)

(Convexity) Convexity of the subset $\mathcal{D}$ ,
(IV)

(Loss regularity) Lipschitz stability and differentiability of $\mathcal{L}_{d_{\ell},m,\{M_{i}\}_{1\leq i\leq t}}$ , and
(V)

(Feasibility) Computational feasibility of $\mathcal{L}_{d_{\ell},m,\{M_{i}\}_{1\leq i\leq t}}$ for practical implementation.

Our contributions can be listed according to Items (I)-(V) mentioned above.

Summary of contributions.

$\blacksquare$

A natural choice for $d_{\ell}$ in Item (I) would be to use the erosion distance $d_{\mathrm{E}}$ [13, 33], a standard metric between GPDs,¹¹1 Note that $d_{\mathrm{E}}$ is also referred to as a metric between generalized rank invariants (GRIs), e.g., in [13, 21]. Since the GPD and the GRI determine one another (cf. Remark 2 (i) and (iii)), $d_{\mathrm{E}}$ can thus also be viewed as a metric between GPDs, as in [33, 36]. that is known to be stable under perturbations of input persistence modules w.r.t. the interleaving distance. However, the use of the erosion distance requires GPDs to be defined on the same set of intervals (Definition 3) that is closed under thickening, which implies the domain must be infinite. All these make it difficult to directly utilize $d_{\mathrm{E}}$ . Hence, we introduce the sparse erosion distance $\hat{d_{\mathrm{E}}}$ between GPDs relative to (possibly) different interval sets, as an adaptation of $d_{\mathrm{E}}$ (Definition 4, Proposition 6 (ii), and Corollary 7).²²2Another possibility would be to use bottleneck and Wasserstein distances, however we show in Remark 17 in the full version [11] that they fail to ensure continuity of the loss function.
$\blacksquare$

Regarding Item (V), we restrict our focus to intervals in $\mathbb{R}^{2}$ with a small number of minimal and maximal points (Remark 10). Then, if the sparse erosion distance $\hat{d_{\mathrm{E}}}$ is computed between GPDs of the same persistence module $M$ , we prove that $\hat{d_{\mathrm{E}}}$ depends solely on the domains $\mathcal{I},\mathcal{J}$ , and not on the input persistence module, i.e. $\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{M},\mathcal% {J}))=\hat{d}(\mathcal{I},\mathcal{J})$ (Proposition 6 (iii)). This fact significantly enhances the tractability of gradient descent as it allows to avoid recomputing the GPDs at every iteration. Moreover, we derive a closed-form formula for the computation of $\hat{d}(\mathcal{I},\mathcal{J})$ (Theorem 9 (i)).

Figure 1: Any $(2,1)$ -interval $I$ of $\mathbb{R}^{2}$ , as depicted above, is represented by $\mathbf{v}_{I}=(x,y,a,b,c,d)\in$ $\mathbb{R}^{6}$ . Any $(1,1)$ -interval can also be represented by $(x,y,a,b,c,d)\in$ $\mathbb{R}^{6}$ with $b=c=0$ .

Figure 2: Our pipeline for sparsifying GPDs in the context of time series classification. The sizes of the GPD points in $\mathbb{R}^{6}$ are proportional to their multiplicities.
$\blacksquare$

For achieving Items (II), (III), and (IV), we only consider intervals with at most two minimal points and one maximal point. This ensures the existence of natural embeddings of these intervals into Euclidean space $\mathbb{R}^{6}$ , which can be obtained by stacking up the coordinates of the interval middle points $(x,y)$ together with the lengths $a,b,c,d\geq 0$ needed to define the interval lower boundaries (Figure 1). The main advantages of this vectorization method (w.r.t. the other natural ones) are that (1) it allows for a simple formulation of the loss function (Theorem 9 (ii)), and (2) its variables are all independent from each other, which makes the implementation of gradient descent easier (Remarks 10 and 11).³³3Alternatively, we can consider intervals with one minimal points and at most two maximal points. Either choice ensures that our restricted GPDs are not a weaker invariant than the rank invariant or the signed barcode [7]; see [13, Section 4]. Using this vectorization method, we also prove that $\mathcal{D}$ not only forms a convex subset of Euclidean space $\mathbb{R}^{6m}$ (Proposition 13), but also ensures the Lipschitz stability and almost everywhere differentiability of the loss function $\mathcal{L}_{d_{\ell},m,\{M_{i}\}_{1\leq i\leq t}}$ (Theorem 14 and Proposition 16). In fact, we prove that the loss function can actually be realized as

$\begin{array}[]{rrcl}\mathcal{L}_{\hat{d_{\mathrm{E}}},m}:&\mathcal{D}(\subset% \mathbb{R}^{6m})&\rightarrow&\mathbb{R}\\ &\mathbb{v}_{\mathcal{J}}&\mapsto&t\cdot\hat{d}(\mathcal{I},\mathcal{J}),\end{array}$ (2)

where $\mathbb{v}_{\mathcal{J}}$ is our $6m$ -dimensional embedding of $\mathcal{J}$ . Since $t$ (the size of the dataset) is a constant, the (local) minimizers of $\mathcal{L}_{\hat{d_{\mathrm{E}}},m}$ coincide with those of the map $t^{-1}\cdot\mathcal{L}_{\hat{d_{\mathrm{E}}},m}:\mathbb{v}_{\mathcal{J}}% \mapsto\hat{d}(\mathcal{I},\mathcal{J})$ . Hence, searching for a (local) minimizer of $\mathcal{L}_{\hat{d_{\mathrm{E}}},m}$ is essentially searching for the (locally) best $m$ intervals that represent the domain $\mathcal{I}$ of the full GPDs.
$\blacksquare$

Finally, regarding Item (V) again, and in order to showcase the efficiency of our proposed sparsification method, we provide numerical experiments on topology-based time series classification with supervised machine learning (Section 5 and Figure 2), in which we show that the sparse GPDs $\{\mathrm{dgm}_{M_{i}}^{\mathcal{J}^{\ast}}\}_{1\leq i\leq t}$ can be computed in much faster time than the full GPDs, while maintaining similar or better classification performances from random forests models. Our code is fully available at sparse GPDs.

Comparison with other works.

While there exist many existing works that utilize multi-parameter persistent homology to enhance the performance of machine learning models [10, 14, 27, 29, 32, 34, 35, 37], our work takes a different approach as it focuses on methods to mitigate the computational overhead associated with multi-parameter persistence descriptors.

The works most closely related to ours are [29] and [37], which use the RI or GRI for multi-parameter persistence modules in a machine learning context. Firstly, [29] is based on an equivalent representation of the GPDs (computed from rectangle intervals⁴⁴4This types of GPDs are often called signed barcodes.) as signed measures [7], which allows to compare GPDs with optimal transport distances, as well as to deploy known vectorization techniques intended for general measures. Secondly, the approach proposed in [37] involves vectorizing the GRIs of 2-parameter persistence modules by evaluating them on intervals with specific shapes called worms. Our goal is different: we rather aim at vectorizing and sparsifying the domains of the GPDs in order to achieve good performance scores. In fact, the sparsification process that we propose can actually be used complementarily to both [29] and [37] by first sparsifying the set of rectangles or worms before applying their vectorization methods. Note that differentiability properties of both of these approaches w.r.t. the multi-parameter filtrations were recently established [32, 34]; in contrast, our work deals with the differentiability of a GPD-based loss function w.r.t. the interval domains (while keeping the multi-parameter filtrations fixed).

Organization.

Section 2 reviews basic properties of the GPD and GRI. Section 3 introduces the sparse erosion distance, clarify its relation to the erosion distance given in [13], and presents its closed-form formula, which is specialized and useful in our setting. Section 4 establishes the Lipschitz stability and differentiability of our loss function. Section 5 presents our numerical experiments. Finally, Section 6 discusses future research directions.

2 Preliminaries

In this article, $P=(P,\leq)$ stands for a poset, regarded as the category whose objects are the elements of $P$ , and for any pair $p,q\in P$ , there exists a unique morphism $p\to q$ if and only if $p\leq q$ . All vector spaces in this article are over a fixed field $k$ . Let $\mathrm{vect}_{k}$ denote the category of finite-dimensional vector spaces and linear maps over $k$ . A functor $P\to\mathrm{vect}_{k}$ will be referred to as a ( $P$ -)persistence module. The direct sum of any two $P$ -persistence modules is defined pointwise. Any $M:P\rightarrow\mathrm{vect}_{k}$ is trivial if $M(x)=0$ for all $x\in P$ . If a nontrivial $M$ is not isomorphic to a direct sum of any two nontrivial persistence modules, $M$ is indecomposable. Every persistence module is decomposed into a direct sum of indecomposable modules, uniquely determined up to isomorphism [2, 4].

An interval $I$ of $P$ is a subset $I\subseteq P$ such that:

(i)

$I$ is nonempty.
(ii)

If $p,q\in I$ and $p\leq r\leq q$ , then $r\in I$ .
(iii)

$I$ is connected, i.e. for any $p,q\in I$ , there is a sequence $p=p_{0},p_{1},\cdots,p_{\ell}=q$ of elements of $I$ with $p_{i}$ and $p_{i+1}$ comparable for $0\leq i\leq\ell-1$ .

By $\mathrm{Int}(P)$ we denote the set of all intervals of $P$ .

Given any $I\in\mathrm{Int}(P)$ , the interval module $k_{I}$ is the $P$ -persistence module, with

(k_{I})(p):=\begin{cases}k&\mathrm{if\ }p\in I\\ 0&\mathrm{otherwise.}\end{cases},\hskip 28.45274ptk_{I}(p\leq q):=\begin{cases% }\mathrm{id}_{k}&\mathrm{if\ }p\leq q\in I\\ 0&\mathrm{otherwise.}\end{cases}

Every interval module is indecomposable [5, Proposition 2.2]. A $P$ -persistence module $M$ is interval-decomposable if it is isomorphic to a direct sum $\bigoplus_{j\in J}k_{I_{j}}$ of interval modules. In this case, the barcode of $M$ is defined as the multiset $\mathrm{barc}(M):=\{k_{I_{j}}:j\in J\}$ .

For $p\in P$ , let $p^{\uparrow}$ denote the set of points $q\in P$ such that $p\leq q$ . Clearly, $p^{\uparrow}$ belongs to $\mathrm{Int}(P)$ . A $P$ -persistence module is finitely presentable if it is isomorphic to the cokernel of a morphism $\bigoplus_{a\in A}k_{a^{\uparrow}}\to\bigoplus_{b\in B}k_{b^{\uparrow}},$ where $A$ and $B$ are finite multisets of elements of $P$ .

When $P$ is a connected poset, the (generalized) rank of $M$ , denoted by $\mathrm{rank}(M)$ , is defined as the rank of the canonical linear map from the limit of $M$ to the colimit of $M$ , which is a nonnegative integer [20]. This isomorphism invariant of $P$ -persistence modules, which takes a single integer value, can be refined into an integer-valued function as follows. Let $\mathcal{I}$ be any nonempty subset of $\mathrm{Int}(P)$ . The generalized rank invariant (GRI) of $M$ over $\mathcal{I}$ is the map $\mathrm{rk}_{M}^{\mathcal{I}}:\mathcal{I}\to\mathbb{Z}_{\geq 0}$ given by $I\mapsto\mathrm{rank}(M|_{I})$ , where $M|_{I}$ is the restriction of $M$ to $I$ [20]. When $\mathcal{I}=\mathrm{Int}(P)$ , we denote $\mathrm{rk}_{M}^{\mathcal{I}}$ simply as $\mathrm{rk}_{M}$ .

The generalized persistence diagram (GPD) of $M$ over $\mathcal{I}$ captures the changes of the GRI values when $I\in\mathcal{I}$ varies. Its formal definition follows.

Definition 1 ([13]).

The generalized persistence diagram (GPD) of $𝐌$ over $\mathcal{I}$ is defined as the function $\mathrm{dgm}_{M}^{\mathcal{I}}:\mathcal{I}\rightarrow\mathbb{Z}$ that satisfies:⁵⁵5The condition in Equation (3) is a generalization of the fundamental lemma of persistent homology [18].

\mbox{for all }I\in\mathcal{I},\hskip 8.53581pt\mathrm{rk}_{M}^{\mathcal{I}}(I% )=\sum_{\begin{subarray}{c}J\in\mathcal{I}\\ J\supseteq I\end{subarray}}\mathrm{dgm}_{M}^{\mathcal{I}}(J).

(3)

$\blacktriangleright$ Remark 2 ([13, Sections 2 and 3] and [7]).

(i)

If $\mathcal{I}$ is finite, then $\mathrm{dgm}_{M}^{\mathcal{I}}$ exists.
(ii)

If $\mathrm{dgm}_{M}^{\mathcal{I}}$ exists, then it is unique.
(iii)

If $\mathrm{dgm}_{M}^{\mathcal{I}}$ exists, then $\mathrm{dgm}_{M}^{\mathcal{I}}$ and $\mathrm{rk}_{M}^{\mathcal{I}}$ determine one another.
(iv)

(Monotonicity) $\mathrm{rk}_{M}^{\mathcal{I}}(I)\leq\mathrm{rk}_{M}^{\mathcal{I}}(J)$ for any pair $I\supseteq J$ in $\mathcal{I}$ .
(v)

(The GPD generalizes the barcode) Let $\mathcal{I}=\mathrm{Int}(P)$ . If $M$ is interval decomposable, then $\mathrm{dgm}_{M}^{\mathcal{I}}$ exists. In this case, for any $I\in\mathrm{Int}(P)$ , $\mathrm{dgm}_{M}^{\mathcal{I}}(I)$ coincides the multiplicity of $I$ in $\mathrm{barc}(M)$ . Also, $\mathrm{dgm}_{M}^{\mathcal{I}}$ often exists even when $M$ is not interval decomposable.
(vi)

If $M$ is a finitely presentable $\mathbb{R}^{d}$ -persistence module, then the GPD over $\mathrm{Int}(\mathbb{R}^{d})$ exists [13, Theorem C(iii)].

In the rest of the article, every $\mathbb{R}^{d}$ -persistence module $M$ is assumed to be finitely presentable, thus its GPD over $\mathrm{Int}(\mathbb{R}^{d})$ exists, and is denoted by $\mathrm{dgm}_{M}$ .

3 Sparse erosion distance between GPDs relative to sampled intervals

We adapt the notion of erosion distance to define a distance between GPDs relative to (possibly different) sampled intervals. When comparing the same GPD with two different sampled intervals, this distance simplifies to a distance between sampled intervals. This is relevant in our case, as we compare the full and sparse GPDs of the same persistence module.

3.1 Sparse erosion distance

In this section, we review the definition of the erosion distance and adapt it to define the sparse erosion distance between GPDs relative to sampled intervals.

For $\epsilon\in\mathbb{R}$ , the vector $\epsilon(1,\ldots,1)\in\mathbb{R}^{d}$ will be simply denoted by $\epsilon$ whenever there is no risk of confusion. For $I\in\mathrm{Int}(\mathbb{R}^{d})$ and $\epsilon\in\mathbb{R}_{\geq 0}$ , we consider the $\epsilon$ -thickening $I^{\epsilon}:=\bigcup_{p\in I}B_{\epsilon}^{\square}(p)$ of $I$ , where $B_{\epsilon}^{\square}(p)$ stands for the closed $\epsilon$ -ball around $p$ w.r.t. the supremum distance, i.e. $B_{\epsilon}^{\square}(p)=\{q\in\mathbb{R}^{d}\mid p-\epsilon\leq q\leq p+\epsilon\}$ . See Figure 3. A subset $\mathcal{I}\subset\mathrm{Int}(\mathbb{R}^{d})$ is said to be closed under thickening if for all $I\in\mathcal{I}$ and for all $\epsilon\in\mathbb{R}_{\geq 0}$ , the interval $I^{\epsilon}$ belongs to $\mathcal{I}$ .

Figure 3:

\epsilon

-thickening

I^{\epsilon}

.

Definition 3 ([13, Definition 5.2]).

Let $M$ and $N$ be $\mathbb{R}^{d}$ -persistence modules. Let $\mathcal{I}$ be any subset of $\mathrm{Int}(\mathbb{R}^{d})$ that is closed under thickening. The erosion distance between $\mathrm{dgm}_{M}^{\mathcal{I}}$ and $\mathrm{dgm}_{N}^{\mathcal{I}}$ (and equivalently between $\mathrm{rk}_{M}^{\mathcal{I}}$ and $\mathrm{rk}_{N}^{\mathcal{I}}$ by Remark 2 (ii)) is

d_{\mathrm{E}}(\mathrm{dgm}_{M}^{\mathcal{I}},\mathrm{dgm}_{N}^{\mathcal{I}}):% =\inf(\epsilon>0:\mbox{for all }I\in\mathcal{I},\ \mathrm{rk}_{N}(I^{\epsilon}% )\leq\mathrm{rk}_{M}(I)\mbox{ and }\mathrm{rk}_{M}(I^{\epsilon})\leq\mathrm{rk% }_{N}(I)).

A correspondence between nonempty sets $A$ and $B$ is a subset $\mathcal{R}\subset A\times B$ satisfying the following: (1) for each $a\in A$ , there exists $b\in B$ such that $(a,b)\in\mathcal{R}$ , and (2) for each $b\in B$ , there exists $a\in A$ such that $(a,b)\in\mathcal{R}$ . For $\epsilon\in\mathbb{R}_{\geq 0}$ , an $\epsilon$ -correspondence $\mathcal{R}$ between nonempty $\mathcal{I},\mathcal{J}\subset\mathrm{Int}(\mathbb{R}^{d})$ is a correspondence $\mathcal{R}\subset\mathcal{I}\times\mathcal{J}$ such that for all $(I,J)\in\mathcal{R}$ , $J\subset I^{\epsilon}\;\mbox{ and }\;I\subset J^{\epsilon}.$ Blending the ideas of the Hausdorff and erosion distances, we obtain our new distance between GPDs relative to sampled intervals:

Definition 4 (Sparse erosion distance between GPDs relative to sampled intervals).

For any $M,N:\mathbb{R}^{d}\rightarrow\mathrm{vect}_{k}$ and any nonempty $\mathcal{I},\mathcal{J}\subset\mathrm{Int}(\mathbb{R}^{d})$ , the sparse erosion distance between pairs $(\mathrm{dgm}_{M},\mathcal{I})$ and $(\mathrm{dgm}_{N},\mathcal{J})$ is

\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J})):=\inf(\epsilon>0:\mbox{ there exists an }\epsilon\textup{-correspondence% }\;\mathcal{R}\subset\mathcal{I}\times\mathcal{J}\\ \mbox{ s.t.}\forall(I,J)\in\mathcal{R},\ \forall\delta\in\mathbb{R}_{\geq 0},% \;\mathrm{rk}_{N}(J^{\epsilon+\delta})\leq\mathrm{rk}_{M}(I^{\delta})\mbox{ % and }\mathrm{rk}_{M}(I^{\epsilon+\delta})\leq\mathrm{rk}_{N}(J^{\delta})).

We remark that $\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J}))$ captures not only (1) the algebraic difference between [ $M$ with respect to $\mathcal{I}$ ] and [ $N$ with respect to $\mathcal{J}$ ], but also (2) the geometric difference between the domains $\mathcal{I}$ and $\mathcal{J}$ . To see (1), consider, for example, the $\mathbb{R}$ -persistence modules $M=k_{[0,1]}$ and $N=k_{[0,2]}$ , and let both $\mathcal{I}$ and $\mathcal{J}$ be the singleton set $\{[3,4]\}$ . Then, one can see that $\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J}))=0$ from the fact that $\mathrm{rk}_{M}^{\mathcal{I}}=\mathrm{rk}_{N}^{\mathcal{J}}=0$ and the monotonicity of $\mathrm{rk}_{M}$ and $\mathrm{rk}_{N}$ . To see (2), let $M$ and $N$ be any isomorphic $\mathbb{R}$ -persistence modules, and set $\mathcal{I}:=\{[0,\frac{1}{2}]\}$ and $\mathcal{J}:=\{[0,1]\}$ . Then, we obtain $\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J}))\geq 1/2$ , solely due to the difference between $\mathcal{I}$ and $\mathcal{J}$ .

Proposition 5.

$\hat{d_{\mathrm{E}}}$ is an extended pseudometric. (See the full version [11] for the proof.)

For any nonempty $\mathcal{I},\mathcal{J}\subset\mathrm{Int}(\mathbb{R}^{d})$ , let

\hat{d}(\mathcal{I},\mathcal{J}):=\inf(\epsilon>0:\;\textup{there exists an}\;% \epsilon\textup{-correspondence between }\mathcal{I}\textup{ and }\mathcal{J}).

(4)

We clarify the relationship among $d_{\mathrm{E}}$ , $\hat{d_{\mathrm{E}}}$ , and $\hat{d}$ :

Proposition 6.

Let $M,N:\mathbb{R}^{d}\rightarrow\mathrm{vect}_{k}$ and let $\mathcal{I},\mathcal{J}\subset\mathrm{Int}(\mathbb{R}^{d})$ be nonempty. We have:

(i)

$\hat{d}(\mathcal{I},\mathcal{J})\leq\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},% \mathcal{I}),(\mathrm{dgm}_{N},\mathcal{J}))$ .
(ii)

If $\mathcal{I}=\mathcal{J}$ are closed under thickening, then

$\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J}))\leq d_{\mathrm{E}}(\mathrm{dgm}^{\mathcal{I}}_{M},\mathrm{dgm}^{\mathcal% {I}}_{N}).$ (5)
(iii)

If $M\cong N$ or $\mathrm{dgm}_{M}=\mathrm{dgm}_{N}$ , then

$\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {J}))=\hat{d}(\mathcal{I},\mathcal{J}).$ (6)

(See the full version [11] for the proof.)

We remark that the inequality given in Item (i) can be strict. For instance, let $d=1$ , $\mathcal{I}=\mathcal{J}=\mathrm{Int}(\mathbb{R}^{d})$ , $M=0$ , and $N=k_{[0,1)}$ . Then, $0=\hat{d}(\mathcal{I},\mathcal{J})<\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},% \mathcal{I}),(\mathrm{dgm}_{N},\mathcal{J}))$ .

By Proposition 6 (ii) and the prior stability result of $d_{\mathrm{E}}$ [13, Theorem H], we have:

Corollary 7.

For any $M,N:\mathbb{R}^{d}\rightarrow\mathrm{vect}_{k}$ and for any $\mathcal{I}\subset\mathrm{Int}(\mathbb{R}^{d})$ , we have

\hat{d_{\mathrm{E}}}((\mathrm{dgm}_{M},\mathcal{I}),(\mathrm{dgm}_{N},\mathcal% {I}))\leq d_{\mathrm{I}}(M,N),

where the right-hand side is the interleaving distance between $M$ and $N$ .

3.2 Closed-form formula for the sparse erosion distance

In this section, we find a closed-form formula for the right-hand side of Equation (6), when each interval in $\mathcal{I}$ and $\mathcal{J}$ has only finitely many minimal and maximal points (Theorem 9). This result is essential when studying our loss function in Section 4 and optimizing the domains of the GPDs in Section 5.

For $p,q\in\mathbb{N}^{\ast}$ , an interval $I\in\mathrm{Int}(\mathbb{R}^{2})$ is a $(p,q)$ -interval if $I$ has exactly $p$ minimal points and exactly $q$ maximal points. For any interval $I\in\mathrm{Int}(\mathbb{R}^{2})$ , let $\min(I)$ (resp. $\max(I)$ ) be the set of all minimal (resp. maximal) elements of $I$ . For $(x,y)\in\mathbb{R}^{2}$ , define

\displaystyle\delta(x,y)

\displaystyle:=\left\{\begin{array}[]{lcl}1&\mbox{if}&x\leq y\\ 0&\mbox{if}&x>y.\end{array}\right.

(9)

Lemma 8.

For $p_{r},q_{r},p^{\prime}_{s},q^{\prime}_{s}\in\mathbb{N}^{\ast}$ , let $I_{r},J_{s}\in\mathrm{Int}(\mathbb{R}^{2})$ be $(p_{r},q_{r})$ - and $(p^{\prime}_{s},q^{\prime}_{s})$ -intervals, respectively. Then, for any $\epsilon\in\mathbb{R}_{\geq 0}$ , $J_{s}\subset I^{\epsilon}_{r}$ and $I_{r}\subset J^{\epsilon}_{s}$ if and only if

\epsilon\geq\max\left(\max_{k}a_{k},\max_{l}b_{l},\max_{i}a^{\prime}_{i},\max_% {j}b^{\prime}_{j}\right),\ \ \mbox{where}

$\blacksquare$

$i, j, k, l$ range from $1$ to $p_{r},q_{r},p^{\prime}_{s},q^{\prime}_{s}$ respectively; $a_{k}$ and $b_{l}$ are defined as (10) and (11) respectively; $a_{i}^{\prime}$ and $b_{j}^{\prime}$ are obtained by interchanging the roles of $i$ and $k$ in (10), and those of $j$ and $l$ in (11), respectively;
$\blacksquare$

$\min(I_{r})=:\{(x^{r}_{i},y^{r}_{i}):1\leq i\leq p_{r}\},\quad\min(J_{s})=:\{(% x^{s}_{k},y^{s}_{k}):1\leq k\leq p^{\prime}_{s}\},\\ \max(I_{r})=:\{(X^{r}_{j},Y^{r}_{j}):1\leq j\leq q_{r}\},\quad\max(J_{s})=:\{(% X^{s}_{l},Y^{s}_{l}):1\leq l\leq q^{\prime}_{s}\}.$

	$\displaystyle\min_{i}\biggl{(}\max\Bigl{(}(1-\delta(x^{r}_{i},x^{s}_{k}))\|x^{s% }_{k}-x^{r}_{i}\|,(1-\delta(y^{r}_{i},y^{s}_{k}))\|y^{s}_{k}-y^{r}_{i}\|\Bigl{)}% \biggl{)}$		(10)
	$\displaystyle\min_{j}\biggl{(}\max\Bigl{(}\delta(X^{r}_{j},X^{s}_{l})\|X^{s}_{l% }-X^{r}_{j}\|,\delta(Y^{r}_{j},Y^{s}_{l})\|Y^{s}_{l}-Y^{r}_{j}\|\Bigl{)}\biggl{)}$		(11)

(See the full version [11] for the proof.) Let $\mathcal{I}:=\{I_{r}\}^{n}_{r=1}$ and $\mathcal{J}:=\{J_{s}\}^{m}_{s=1}$ be sets of intervals of $\mathbb{R}^{2}$ with only finitely many minimal and maximal points. Let $\mathcal{E}$ the ( $n\times m$ )-matrix $(\epsilon_{rs})$ where $\epsilon_{rs}$ is the RHS of the inequality given in Lemma 8, i.e.

\epsilon_{rs}=\max(\max_{k}a_{k},\max_{l}b_{l},\max_{i}a^{\prime}_{i},\max_{j}% b^{\prime}_{j})

(12)

and $i,j,k,l,a_{k},b_{l},a^{\prime}_{i},b^{\prime}_{j}$ as defined in Lemma 8. Next, we show that $\hat{d}(\mathcal{I},\mathcal{J})$ , as defined in Equation 4, equals the largest of the smallest elements across all rows and columns of $\mathcal{E}$ . In particular, the following variables and functions are useful for describing the closed form formula for $\hat{d}(\mathcal{I},\mathcal{J})$ , when each of $\mathcal{I}$ and $\mathcal{J}$ consists solely of $(1,1)$ - or $(2,1)$ -intervals.

When $I_{r}$ and $J_{s}$ are $(2,1)$ -intervals, as depicted in Figure 4, let

$\displaystyle x_{1}$	$\displaystyle:=x^{r}_{2},$	$\displaystyle y_{1}$	$\displaystyle:=y^{r}_{1},$	$\displaystyle a$	$\displaystyle:=Y^{r}_{1}-y^{r}_{1},$	$\displaystyle b$	$\displaystyle:=x^{r}_{2}-x^{r}_{1},$
$\displaystyle c$	$\displaystyle:=y^{r}_{1}-y^{r}_{2},$	$\displaystyle d$	$\displaystyle:=X^{r}_{1}-x^{r}_{2},$	$\displaystyle x_{2}$	$\displaystyle:=x^{s}_{2},$	$\displaystyle y_{2}$	$\displaystyle:=y^{s}_{1},$
$\displaystyle e$	$\displaystyle:=Y^{s}_{1}-y^{s}_{1},$	$\displaystyle f$	$\displaystyle:=x^{s}_{2}-x^{s}_{1},$	$\displaystyle g$	$\displaystyle:=y^{s}_{1}-y^{s}_{2},$	$\displaystyle h$	$\displaystyle:=X^{s}_{1}-x^{s}_{2}.$

Figure 4: The parametrization of

I_{r}

and

J_{s}

.

When $I_{r}$ (resp. $J_{s}$ ) is a $(1,1)$ -interval, set $x^{r}_{1}=x^{r}_{2}$ , $y^{r}_{1}=y^{r}_{2}$ and thus $b=c=0$ (resp. $x^{s}_{1}=x^{s}_{2}$ , $y^{s}_{1}=y^{s}_{2}$ and thus $f=g=0$ ). Also, let

	$\displaystyle F(w_{1},w_{2},w_{3},w_{4})$	$\displaystyle:=\max\bigl{(}\delta(w_{1},w_{2})\|w_{2}-w_{1}\|,\delta(w_{3},w_{4}% )\|w_{4}-w_{3}\|\bigl{)},$
	$\displaystyle G(m_{1},m_{2},m_{3},m_{4},m_{5})$	$\displaystyle:=\min(F(m_{1},m_{2},m_{3},m_{4}),m_{5}),$
	$\displaystyle H(o_{1},o_{2},o_{3},o_{4})$	$\displaystyle:=\max\bigl{(}\min(o_{1},o_{2}),\min(o_{3},o_{4})\bigl{)},$

where all input variables of $F, G$ and $H$ are real numbers.

Theorem 9.

Let $\mathcal{I}:=\{I_{r}\}^{n}_{r=1}$ and $\mathcal{J}:=\{J_{s}\}^{m}_{s=1}$ be sets of intervals of $\mathbb{R}^{2}$ with only finitely many minimal and maximal points.

(i)

We have $\hat{d}(\mathcal{I},\mathcal{J})=\max\Bigl{(}\max_{r}(\min_{s}\epsilon_{rs}),% \max_{s}(\min_{r}\epsilon_{rs})\Bigl{)}$ where $\epsilon_{rs}$ is as defined in Equation 12 and $I_{r}$ (resp. $J_{s}$ ) is a $(p_{r},q_{r})$ -interval (resp. $(p^{\prime}_{s},q^{\prime}_{s})$ -interval).
(ii)

If $\mathcal{I}$ and $\mathcal{J}$ consist solely of $(1,1)$ - or $(2,1)$ -intervals, then $\epsilon_{rs}$ is the maximum of (13)-(16).

	$\displaystyle H\Bigl{(}F(x_{2}-f,x_{1}-b,y_{2},y_{1}),F(x_{2}-f,x_{1},y_{2},y_% {1}-c),$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ F(x_{2},x_{1}-b,y_{2}-g,y% _{1}),F(x_{2},x_{1},y_{2}-g,y_{1}-c)\Bigl{)}$		(13)
	$\displaystyle F(x_{1}+d,x_{2}+h,y_{1}+a,y_{2}+e)$		(14)
	$\displaystyle H\Bigl{(}F(x_{1}-b,x_{2}-f,y_{1},y_{2}),F(x_{1}-b,x_{2},y_{1},y_% {2}-g),$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ F(x_{1},x_{2}-f,y_{1}-c,y% _{2}),F(x_{1},x_{2},y_{1}-c,y_{2}-g)\Bigl{)}$		(15)
	$\displaystyle F(x_{2}+h,x_{1}+d,y_{2}+e,y_{1}+a)$		(16)

(See the full version [11] for the proof.)

$\blacktriangleright$ Remark 10.

In Theorem 9 (i), as $p_{r}$ , $q_{r}$ , $p^{\prime}_{s}$ , and $q^{\prime}_{s}$ increase, the complexity of computing $\epsilon_{rs}$ increases. This is because, the ranges over which the maxima on the RHS of Equation 12 increase, thereby increasing the computational complexity of $\hat{d}(\mathcal{I},\mathcal{J})$ as well.

4 Lipschitz stability and differentiability of the loss function

The goal of this section is to establish Lipchitz continuity/stability and (almost everywhere) differentiability of our loss function (cf. Equation 2). From Figure 1, recall that we denote the embedding of a $(1,1)$ -or $(2,1)$ -interval $I$ of $\mathbb{R}^{2}$ into $\mathbb{R}^{6}$ by $\mathbf{v}_{I}$ .

$\blacktriangleright$ Remark 11 (Independence of variables).

Although there are multiple ways to embed $(p,q)$ -intervals of $\mathbb{R}^{2}$ into Euclidean space, for any $p,q\in\mathbb{N}$ , some vectorization methods are more efficient than others in our context. For instance, suppose that we represent the $(2,1)$ -interval $I$ in Figure 1 by simply concatenating the coordinates of its minimal and maximal points. Namely, we represent $I$ with the vector $(x_{1},y_{1},x_{2},y_{2},X,Y)\in\mathbb{R}^{6}$ , where $x_{1}=x-b$ , $y_{1}=y$ , $x_{2}=x$ , $y_{2}=y-c$ , $X=x+d$ , $Y=y+a$ . A difficulty with this embedding in the context of gradient descent is that the variables are not independent: one must have $(x_{1},y_{1})\leq(X,Y)$ , $(x_{2},y_{2})\leq(X,Y)$ , and $(x_{2},y_{2})\not\leq(x_{1},y_{1})$ . Ensuring such relations between variables is not trivial in non-convex optimization (without using refined techniques such as projected gradient descent). On the other hand, using our proposed embedding is more practical, as all variables are independent, and our only requirements are that $a, b, c, d$ must be nonnegative, which can easily be imposed using, e.g., exponential or ReLU functions.

Let $\mathcal{I}:=\{I_{r}\}^{n}_{r=1}$ and $\mathcal{J}:=\{J_{s}\}^{m}_{s=1}$ consist solely of $(1,1)$ - or $(2,1)$ -intervals of $\mathbb{R}^{2}$ .

Lemma 12.

For $I_{r}\in\mathcal{I}$ and $J_{s}\in\mathcal{J}$ , consider $\mathbf{v}_{I_{r}}=(x_{1},y_{1},a,b,c,d)$ and $\mathbf{v}_{J_{s}}=(x_{2},y_{2},e,f,g,h)$ defined as in Figure 1. If $||\mathbf{v}_{I_{r}}-\mathbf{v}_{J_{s}}||_{\infty}\leq\epsilon$ , then $\epsilon_{rs}\leq 2\epsilon$ where $\epsilon_{rs}$ is described as in Theorem 9 (ii). (See the full version [11] for the proof.)

We define $\mathbb{v}_{\mathcal{I}}$ as the concatenation of $\mathbf{v}_{I_{r}}$ for all $I_{r}\in\mathcal{I}$ , i.e., $\mathbb{v}_{\mathcal{I}}:=(\mathbf{v}_{I_{1}}|\dots|\mathbf{v}_{I_{n}})\in% \mathbb{R}^{6n}$ . Let $\mathfrak{I}$ be the collection of all ordered $n$ -sets $\mathcal{I}$ of $(1,1)$ -or $(2,1)$ -intervals of $\mathbb{R}^{2}$ . Then, define the function $V:\mathfrak{I}\rightarrow\mathbb{R}^{6n}$ by $\mathcal{I}\mapsto\mathbb{v}_{\mathcal{I}}$ .

Proposition 13.

The image of $\mathfrak{I}$ via the function $V$ is a convex subset of $\mathbb{R}^{6n}$ .

Proof.

Let $S$ be the collection of all $(1,1)$ - and all $(2,1)$ -intervals of $\mathbb{R}^{2}$ . Consider the map from $S$ to $\mathbb{R}\times\mathbb{R}\times\mathbb{R}_{>0}\times\mathbb{R}_{\geq 0}\times% \mathbb{R}_{\geq 0}\times\mathbb{R}_{>0}$ depicted in Figure 1. Clearly, this map is a surjection. It follows that the image of $S$ via the map $I\mapsto(x,y,a,b,c,d)$ depicted in Figure 1 is a convex subset of $\mathbb{R}^{6}$ . Now, observe that the image of $\mathfrak{I}$ via the map $V$ is equal to the $n$ -fold Cartesian product of the image of $S$ , which is a subset of $\mathbb{R}^{6n}$ . The fact that any Cartesian product of convex sets is convex implies our claim. $\hfill\blacktriangleleft$

Theorem 14 (Lipschitz stability of loss function).

Let $\mathcal{K}$ be any finite set of $(1,1)$ - or $(2,1)$ -intervals, and let $\mathcal{J}_{1},\mathcal{J}_{2}$ be any two $n$ -sets of $(1,1)$ - or $(2,1)$ -intervals for some $n\in\mathbb{N}^{\ast}$ . Then, we have $|\hat{d}(\mathcal{K},\mathcal{J}_{1})-\hat{d}(\mathcal{K},\mathcal{J}_{2})|% \leq 2\cdot\min_{\pi}||\mathbb{v}_{\pi(\mathcal{J}_{1})}-\mathbb{v}_{\mathcal{% J}_{2}}||_{\infty},$ where the minimum is taken over all permutations on $\mathcal{J}_{1}$ .

Proof.

By the triangle inequality, we have $|\hat{d}(\mathcal{K},\mathcal{J}_{1})-\hat{d}(\mathcal{K},\mathcal{J}_{2})|% \leq\hat{d}(\mathcal{J}_{1},\mathcal{J}_{2})$ and thus it suffices to show that $\hat{d}(\mathcal{J}_{1},\mathcal{J}_{2})\leq 2\cdot\min_{\pi}||\mathbb{v}_{\pi% (\mathcal{J}_{1})}-\mathbb{v}_{\mathcal{J}_{2}}||_{\infty}$ . Let $\pi_{0}$ be a permutation on $\mathcal{J}_{1}$ that attains the minimum. Let $\mathcal{I}:=\pi_{0}(\mathcal{J}_{1})$ and $\mathcal{J}:=\mathcal{J}_{2}$ . By Lemma 12, $\epsilon_{rr}$ (resp. $\epsilon_{ss}$ ) $\leq 2\epsilon$ for all $r,s\in\{1,\dots,n\}$ and hence $\min_{s}\epsilon_{rs}$ (resp. $\min_{r}\epsilon_{rs}$ ) $\leq 2\epsilon$ for each $r$ (resp. $s$ ). Thus, we have $\max_{r}(\min_{s}\epsilon_{rs})\leq 2\epsilon\mbox{ and }\max_{s}(\min_{r}% \epsilon_{rs})\leq 2\epsilon.$ By Theorem 9 (i), we are done. $\hfill\blacktriangleleft$

$\blacktriangleright$ Remark 15.

The opposite direction of Theorem 14 does not hold, i.e. there does not exist $c>0$ such that $|\hat{d}(\mathcal{K},\mathcal{J}_{1})-\hat{d}(\mathcal{K},\mathcal{J}_{2})|% \geq c\cdot\min_{\pi}||\mathbb{v}_{\pi(\mathcal{J}_{1})}-\mathbb{v}_{\mathcal{% J}_{2}}||_{\infty}.$ However, the non-existence of such $c>0$ is not an issue in our work as what we require is the Lipschitz stability and almost everywhere differentiability of the loss function (which we establish in the next proposition), in order to prevent erratic and oscillating gradient descent iterations.

Proposition 16.

The loss function given in Equation (2) is differentiable almost everywhere.

Proof.

The three maps $\mathbb{R}^{2}\rightarrow\mathbb{R}$ given by $(x,y)\mapsto\delta(x,y)|y-x|$ (cf. Equation (9)), $(x,y)\mapsto\max\{x,y\}$ , and $(x,y)\mapsto\min\{x,y\}$ are finitely segmented piecewise linear. Therefore, the functions given in Equations (13)–(16) are all finitely segmented piecewise linear on Euclidean spaces, and thus so are $\epsilon_{rs}$ given in Theorem 9 (ii). Hence, trivially, these functions are differentiable almost everywhere. Now, Proposition 6 (iii) implies our claim. $\hfill\blacktriangleleft$

5 Numerical Experiments on Sparsification

In this section, we make use of the results proved above to provide a method for sparsifying GPDs. Indeed, computing a single GPD $\mathrm{dgm}^{\mathcal{I}}_{M}$ on a persistence module $M$ coming from a simplicial complex $S$ requires computing zigzag persistence modules on all intervals of $\mathcal{I}$ , as described in [16], and then computing the corresponding Möbius inversion, yielding a time complexity of $O(n^{3})$ , where $n$ is the number of intervals in $\mathcal{I}$ , due to the need to compute the Möbius function value for all possible pairs of intervals, each requiring $O(n)$ operations to iterate over each interval. Thus, the complexity of computing a single GPD is $O(n\times N_{s}^{2.376}+n^{3})$ , where $N_{s}$ is the number of simplices [30]. Hence, computing all GPDs from a dataset of persistence modules becomes intractable when $n$ is large.

Therefore, as described in the introduction, our goal in this section is to design a sparse subset of $(2,1)$ -intervals $\mathcal{J}^{*}$ of size $m\ll n$ that minimizes the loss function in Equation 2 with gradient descent by treating every interval in $\mathcal{J}$ as a parameter to optimize.

Time-series datasets.

The datasets we consider are taken from the UCR repository [15], and correspond to classification tasks that have already been studied with persistent homology before [29, Section 4.2]. More precisely, instances in these datasets take the form of labelled time series, that we pre-process with time-delay embedding in order to turn them into point clouds. Specifically, each labelled time series $T=\{f(t_{1}),\dots,f(t_{n})\}$ of length $n$ is transformed into a point cloud $X_{T}\subset\mathbb{R}^{3}$ of cardinality $n-2$ with

X_{T}:=\{(f(t_{1}),f(t_{2}),f(t_{3}))^{T},\dots,(f(t_{n-2}),f(t_{n-1}),f(t_{n}% ))^{T}\}.

Then, we compute both the Vietoris-Rips filtration and the sublevel set filtration induced by a Gaussian kernel density estimator (with bandwidth $\sigma=0.1\cdot{\rm diam}(X_{T})$ , where ${\rm diam}(X_{T}):=\max_{x,y\in X_{T}}\|x-y\|_{2}$ is the diameter of the point cloud) using the PointCloud2FilteredComplex function of the multipers library [28] (a tutorial is available in [26]). Both filtrations are then normalized so that their ranges become equal to the unit interval $[0,1]$ .

Loss function.

In order to compute GPDs out of these 2-parameter filtrations and modules, one needs a subset of intervals. As explained in the introduction, we then minimize

\mathcal{L}_{\hat{d_{\mathrm{E}}},m}:\mathbb{v}_{\mathcal{J}}\mapsto\hat{d}(% \mathcal{I},\mathcal{J}),

where the full domain $\mathcal{I}$ (resp. the sparse domain $\mathcal{J}$ ) is comprised of $n=1,600$ (resp. $m=400$ ) $(2,1)$ -intervals obtained from a grid in $\mathbb{R}^{6}$ computed by evenly sampling $10$ (resp. $5$ ) values for $x$ and $y$ and $2$ values for $a, b, c, d$ within their corresponding filtration ranges.⁶⁶6We also tried random uniform sampling for the initial distributions of $x, y, a, b, c, d$ , but this resulted in slower convergence and lower performances. Note that while the $(2,1)$ -intervals in $\mathcal{J}$ are treated as parameters to optimize, the $(2,1)$ -intervals in $\mathcal{I}$ are fixed throughout the optimization process. Moreover, the formula that we provided in Theorem 9 (ii) can be readily implemented in any library that uses auto-differentiation, such as pytorch. In particular, we run stochastic gradient descent with momentum $0.9$ on $\mathcal{L}$ for $750$ epochs using learning rate $\eta=0.001$ with exponential decay of factor $0.99$ to achieve convergence, and obtain our sparse subset $\mathcal{J}^{*}$ . See Figure 5 for a visualization of the loss decrease. Note how the Lipschitz stability proved in Theorem 14 translates into a smooth decrease with small oscillations.

Refer to caption — Figure 5: Loss decrease across gradient descent iterations. Note that the loss value stays on a plateau for the first $\sim$ 300 iterations; this is due to the fact that during these first iterations, the parameters in $\mathcal{J}$ that are updated with gradient descent are not yet the ones achieving the maxima and minima in the closed-form formula provided in Theorem 9 (ii). Moreover, in this example, gradient descent converges to a local minimum of the loss that is only $\sim$ 0.01 lower than the initial value, partially because we are using the small learning rate $0.001$ .

Accuracy scores.

In order to quantify the running time improvements as well as the information loss (if any) when switching from the full domain $\mathcal{I}$ to the optimized sparse one $\mathcal{J}^{*}$ , we computed the full and sparse GPDs in homology dimensions $0$ and $1$ with the zigzag persistence diagram implementation in dionysus [31], and then we trained random forest classifiers on these GPDs to predict the time series labels.⁷⁷7Recall that optimizing the sparse domain can be performed without computing GPDs, so the optimization process described in the previous paragraph is done once and for all, before computing the GPDs. In order to achieve this, we first turned both the full and optimized GPDs into Euclidean vectors by first binning every GPD (seen as a point cloud in $\mathbb{R}^{6}$ ) with a fixed $6$ -dimensional histogram, and then convolving this histogram with a $6$ -dimensional Gaussian kernel in order to smooth its values, with a procedure similar to the one described in [29, Section 3.2.1]. Both the histogram bins and the kernel bandwidths were found with $3$ -fold cross-validation on the training set (see the provided code for hyperparameter values). Then, random forest classifiers were trained on these smoothed histograms to predict labels; in Table 1 we report the accuracy scores of these classifiers for the initial (before optimization) sparse domain $\mathcal{J}_{\rm init}$ , the optimized sparse domain $\mathcal{J}^{*}$ , and the full domain $\mathcal{I}$ . Moreover, in Table 2, we report the running time needed to compute all GPDs using either $\mathcal{J}_{\rm init}$ , $\mathcal{J}^{*}$ , or $\mathcal{I}$ , as well as the improvement when switching from $\mathcal{J}^{*}$ to $\mathcal{I}$ . Note that our goal is not to improve on the state-of-the-art for time series classification, but rather to assess whether optimizing the loss $\mathcal{L}_{\hat{d_{\mathrm{E}}},m}$ based on our upper bound is indeed beneficial for improving topology-based models. See Figure 2 for a schematic overview of our full pipeline.

Discussion on results.

As one can see from Table 1, there is either a clear improvement or a comparable performance in accuracy scores after optimizing $\mathcal{J}$ . Indeed, by minimizing $\mathcal{L}_{\hat{d_{\mathrm{E}}},m}$ , one forces the sparse domain $\mathcal{J}$ to be as close as possible to the full domain $\mathcal{I}$ , and thus to retain as much topological information as possible. Hence, either the full GPDs are more efficient than the initial sparse GPDs, in which case the optimized GPDs perform much better than the initial sparse ones, or the full GPDs are less efficient than the initial sparse GPDs,⁸⁸8Recall that the accuracy score is only an indirect measure: While the full GPDs are always richer in topological information, we hypothesize that their scores might still be lower due to a large number of intervals on which the full GPDs vanish – a question that we leave for further study. in which case the optimized GPDs have comparable or slightly worse performances than the initial sparse ones thanks to the small sizes of their domains (except for the PC and IPD datasets, for which the optimized GPDs still perform interestingly better). In all cases, we emphasize that optimized GPDs provide the best solution: they maintain scores at levels that are either comparable or better than the best solution between the initial sparse and full GPDs, while avoiding to force users to choose interval domains (as their domains are obtained automatically with gradient descent). Note in particular that whenever optimized GPDs do not achieve the best scores, their performances remain of the same order as those of their competitors (see, e.g., DPC or SC), whereas when optimized GPDs are the best, the performances of the other two competitors are sometimes far behind (see, e.g., GPA).

As for running times, we observe a slight increase from the running times associated to the initial sparse GPDs (except for the IPD dataset), which we hypothesize is due to the use of optimized intervals that intersect more with each other, and thus induce more poset relations when computing Möbius inversions, and a strong improvement over the running times associated to the full GPDs, with ratios ranging between $5$ and $15$ times faster. Our optimized GPDs thus achieve the best of both worlds: they are strinkingly fast to compute while keeping high accuracy scores. Our code was run on a 2x Xeon SP Gold 5115 @ 2.40GHz, and is fully available at sparse GPDs.

Table 1: Accuracy scores (

\%

) of random forest classifiers trained on several UCR datasets. Underline indicates best score between the initial and optimized sparse domains, while bold font indicates best score overall. See the full version [11] for the dataset full names.

	C	DPA	DPC	DPT	PPA	PPC	PPT	ECG	IPD
Init.	0.750	0.705	0.746	0.561	0.790	0.718	0.693	0.790	0.677
Optim.	0.786	0.691	0.721	0.561	0.785	0.742	0.737	0.790	0.690
Full	0.857	0.669	0.743	0.554	0.780	0.729	0.707	0.740	0.651
	MI	P	SL	GP	GPA	GPM	GPO	PC	SC
Init.	0.534	0.924	0.565	0.900	0.835	0.946	0.987	0.789	0.447
Optim.	0.588	0.886	0.546	0.893	0.959	0.959	0.956	0.800	0.487
Full	0.545	0.838	0.531	0.847	0.864	0.946	0.990	0.778	0.503

Table 2: Running times (seconds) needed for computing all GPDs on several UCR datasets. Underline indicates best running time between the initial and optimized sparse domains, while bold font indicates best running time overall. See the full version [11] for the dataset full names.

	C	DPA	DPC	DPT	PPA	PPC	PPT	ECG	IPD
Init.	350	1076	1902	1178	1182	1660	1147	556	1262
Optim.	421	1172	2054	1265	1257	1749	1232	621	1328
Full	2304	11672	20163	13007	12745	18427	12844	5098	19860
Improv.	5.47x	9.95x	9.82x	10.28x	10.13x	10.53x	10.42x	8.20x	15.74x
	MI	P	SL	GP	GPA	GPM	GPO	PC	SC
Init.	2989	792	3478	703	1576	1658	1573	1339	1240
Optim.	3399	909	3919	883	1955	2061	1925	1550	1324
Full	29074	6280	31285	5773	12913	13523	12849	10730	13089
Improv.	8.55x	6.91x	7.98x	6.54x	6.60x	6.56x	6.67x	6.92x	9.88x

6 Conclusion

Our sparsification method demonstrates the practicality of approximating full GPDs with sparse ones: while significantly reducing computational costs, our appropriate loss function also ensures that their discriminative power is not too compromised. We thus believe that our work paves the way for efficiently deploying multi-parameter topological data analysis to large-scale applications, that are currently out of reach for most multi-parameter topological invariants from the literature. In what follows, we outline several potential future directions.

Optimization for finer GPDs.

In our experiments, we optimized the GPDs over intervals with at most two minimal points and exactly one maximal point. Allowing more complex intervals, as well as dataset-dependent terms in the loss function, could improve performance, but finding suitable embeddings of such intervals into Euclidean space remains a challenge (cf. Remark 11) and would increase the computational cost (cf. Remark 10). It would also be interesting to quantify the discriminating power of GPDs with measures that are more direct than the score of a machine learning classifier (or at least to further study the dependencies between GPD sparsification and classifier scores), and to investigate on heuristics (based on drops in the loss values) for deciding whether sparse GPDs are sufficiently good so that optimization can be stopped.

Experimental validation for other GRI-based descriptors.

While we only focused on sparsifying GPDs in this work, it would be interesting to measure the extent to which our interval domain optimization adapts to other descriptors based on the GRI from the TDA literature; of particular interest are the GPDs/signed barcodes coming from rank exact decompositions, which have recently proved to be stable [8] (see also Section 8.2 in [7] which discusses the influence of the choice of the interval domains on the resulting invariants), as well as GRIL, which focuses on specific intervals called worms [37].

References

[1] Hideto Asashiba, Etienne Gauthier, and Enhao Liu. Interval replacements of persistence modules. arXiv preprint, 2024. arXiv:2403.08308.
[2] Gorô Azumaya. Corrections and supplementaries to my paper concerning Krull-Remak-Schmidt’s theorem. Nagoya Mathematical Journal, 1:117–124, 1950.
[3] Ulrich Bauer, Magnus Botnan, Steffen Oppermann, and Johan Steen. Cotorsion torsion triples and the representation theory of filtered hierarchical clustering. Advances in Mathematics, 369:107171, 2020.
[4] Magnus Botnan and William Crawley-Boevey. Decomposition of persistence modules. Proceedings of the American Mathematical Society, 148(11):4581–4596, 2020.
[5] Magnus Botnan and Michael Lesnick. Algebraic stability of zigzag persistence modules. Algebraic & Geometric topology, 18(6):3133–3204, 2018.
[6] Magnus Botnan and Michael Lesnick. An introduction to multiparameter persistence. In Representations of Algebras and Related Structures, pages 77–150. CoRR, 2023.
[7] Magnus Botnan, Steffen Oppermann, and Steve Oudot. Signed barcodes for multi-parameter persistence via rank decompositions and rank-exact resolutions. Foundations of Computational Mathematics, pages 1–60, 2024.
[8] Magnus Botnan, Steffen Oppermann, Steve Oudot, and Luis Scoccola. On the bottleneck stability of rank decompositions of multi-parameter persistence modules. Advances in Mathematics, 451:109780, 2024.
[9] Gunnar Carlsson and Afra Zomorodian. The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93, 2009. doi:10.1007/S00454-009-9176-0.
[10] Mathieu Carrière and Andrew Blumberg. Multiparameter persistence image for topological machine learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 22432–22444. Curran Associates, Inc., 2020.
[11] Mathieu Carriere, Seunghyun Kim, and Woojin Kim. Sparsification of the generalized persistence diagrams for scalability through gradient descent. arXiv preprint arXiv:2412.05900, 2024. doi:10.48550/arXiv.2412.05900.
[12] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas, and Steve Y Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 237–246, 2009. doi:10.1145/1542362.1542407.
[13] Nate Clause, Woojin Kim, and Facundo Memoli. The generalized rank invariant: Möbius invertibility, discriminating power, and connection to other invariants. arXiv preprint, 2024. arXiv:2207.11591v5.
[14] René Corbet, Ulderico Fugacci, Michael Kerber, Claudia Landi, and Bei Wang. A kernel for multi-parameter persistent homology. Computers & Graphics: X, 2:100005, 2019. doi:10.1016/J.CAGX.2019.100005.
[15] Hoang-Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ratanamahatana, and Eamonn Keogh. The UCR time series archive. arXiv, 2018. arXiv:1810.07758.
[16] Tamal Dey, Woojin Kim, and Facundo Mémoli. Computing generalized rank invariant for 2-parameter persistence modules via zigzag persistence and its applications. In 38th International Symposium on Computational Geometry (SoCG 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.SoCG.2022.34.
[17] Tamal Dey, Aman Timalsina, and Cheng Xin. Computing generalized ranks of persistence modules via unfolding to zigzag modules. arXiv preprint, 2024. doi:10.48550/arXiv.2403.08110.
[18] Herbert Edelsbrunner and John L Harer. Computational topology: an introduction. American Mathematical Society, 2008.
[19] Donghan Kim, Woojin Kim, and Wonjun Lee. Super-polynomial growth of the generalized persistence diagram, 2024. doi:10.48550/arXiv.2412.04889.
[20] Woojin Kim and Facundo Mémoli. Generalized persistence diagrams for persistence modules over posets. Journal of Applied and Computational Topology, 5(4):533–581, 2021. doi:10.1007/S41468-021-00075-1.
[21] Woojin Kim and Facundo Mémoli. Spatiotemporal persistent homology for dynamic metric spaces. Discrete & Computational Geometry, 66:831–875, 2021. doi:10.1007/S00454-019-00168-W.
[22] Woojin Kim and Facundo Mémoli. Persistence over posets. Notices of the American Mathematical Society, 70(08), 2023.
[23] Woojin Kim and Facundo Mémoli. Extracting persistent clusters in dynamic data via möbius inversion. Discrete & Computational Geometry, 71(4):1276–1342, 2024. doi:10.1007/S00454-023-00590-1.
[24] Woojin Kim and Samantha Moore. Bigraded Betti numbers and generalized persistence diagrams. Journal of Applied and Computatioal Topology, 2024. doi:10.1007/s41468-024-00180-x.
[25] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Foundations of Computational Mathematics, 15(3):613–650, 2015. doi:10.1007/S10208-015-9255-Y.
[26] David Loiseaux. Time series classification. Accessed: 2025-03-24. URL: https://davidlapous.github.io/multipers/notebooks/time_series_classification.html.
[27] David Loiseaux, Mathieu Carrière, and Andrew Blumberg. A framework for fast and stable representations of multiparameter persistent homology decompositions. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
[28] David Loiseaux and Hannah Schreiber. multipers: Multiparameter persistence for machine learning. Journal of Open Source Software, 9(103):6773, 2024. doi:10.21105/JOSS.06773.
[29] David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Botnan, and Steve Oudot. Stable vectorization of multiparameter persistent homology using signed barcodes as measures. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
[30] Nikola Milosavljević, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology in matrix multiplication time. In Proceedings of the twenty-seventh Annual Symposium on Computational Geometry, pages 216–225, 2011. doi:10.1145/1998196.1998229.
[31] Dmitriy Morozov. Dionysus 2. Accessed: 2025-03-24. URL: https://mrzv.org/software/dionysus2/.
[32] Soham Mukherjee, Shreyas Samaga, Cheng Xin, Steve Oudot, and Tamal Dey. D-gril: End-to-end topological learning with 2-parameter persistence. arXiv, 2024. arXiv:2406.07100.
[33] Amit Patel. Generalized persistence diagrams. Journal of Applied and Computational Topology, 1(3):397–419, 2018. doi:10.1007/S41468-018-0012-6.
[34] Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, and Steve Oudot. Differentiability and optimization of multiparameter persistent homology. In 41st International Conference on Machine Learning (ICML 2024), volume 235, pages 43986–44011. PMLR, 2024.
[35] Oliver Vipond. Multiparameter persistence landscapes. Journal of Machine Learning Research, 21(61):1–38, 2020. URL: https://jmlr.org/papers/v21/19-054.html.
[36] Lu Xian, Henry Adams, Chad M Topaz, and Lori Ziegelmeier. Capturing dynamics of time-varying data via topology. Foundations of Data Science, 4(1):1–36, 2022.
[37] Cheng Xin, Soham Mukherjee, Shreyas Samaga, and Tamal Dey. GRIL: A 2-parameter persistence based vectorization for machine learning. In 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning. OpenReviews.net, 2023.

[bib.bib1] [1] Hideto Asashiba, Etienne Gauthier, and Enhao Liu. Interval replacements of persistence modules. arXiv preprint, 2024. arXiv:2403.08308.

[bib.bib2] [2] Gorô Azumaya. Corrections and supplementaries to my paper concerning Krull-Remak-Schmidt’s theorem. Nagoya Mathematical Journal, 1:117–124, 1950.

[bib.bib3] [3] Ulrich Bauer, Magnus Botnan, Steffen Oppermann, and Johan Steen. Cotorsion torsion triples and the representation theory of filtered hierarchical clustering. Advances in Mathematics, 369:107171, 2020.

[bib.bib4] [4] Magnus Botnan and William Crawley-Boevey. Decomposition of persistence modules. Proceedings of the American Mathematical Society, 148(11):4581–4596, 2020.

[bib.bib5] [5] Magnus Botnan and Michael Lesnick. Algebraic stability of zigzag persistence modules. Algebraic & Geometric topology, 18(6):3133–3204, 2018.

[bib.bib6] [6] Magnus Botnan and Michael Lesnick. An introduction to multiparameter persistence. In Representations of Algebras and Related Structures, pages 77–150. CoRR, 2023.

[bib.bib7] [7] Magnus Botnan, Steffen Oppermann, and Steve Oudot. Signed barcodes for multi-parameter persistence via rank decompositions and rank-exact resolutions. Foundations of Computational Mathematics, pages 1–60, 2024.

[bib.bib8] [8] Magnus Botnan, Steffen Oppermann, Steve Oudot, and Luis Scoccola. On the bottleneck stability of rank decompositions of multi-parameter persistence modules. Advances in Mathematics, 451:109780, 2024.

[bib.bib9] [9] Gunnar Carlsson and Afra Zomorodian. The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93, 2009. doi:10.1007/S00454-009-9176-0.

[bib.bib10] [10] Mathieu Carrière and Andrew Blumberg. Multiparameter persistence image for topological machine learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 22432–22444. Curran Associates, Inc., 2020.

[bib.bib11] [11] Mathieu Carriere, Seunghyun Kim, and Woojin Kim. Sparsification of the generalized persistence diagrams for scalability through gradient descent. arXiv preprint arXiv:2412.05900, 2024. doi:10.48550/arXiv.2412.05900.

[bib.bib12] [12] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas, and Steve Y Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 237–246, 2009. doi:10.1145/1542362.1542407.

[bib.bib13] [13] Nate Clause, Woojin Kim, and Facundo Memoli. The generalized rank invariant: Möbius invertibility, discriminating power, and connection to other invariants. arXiv preprint, 2024. arXiv:2207.11591v5.

[bib.bib14] [14] René Corbet, Ulderico Fugacci, Michael Kerber, Claudia Landi, and Bei Wang. A kernel for multi-parameter persistent homology. Computers & Graphics: X, 2:100005, 2019. doi:10.1016/J.CAGX.2019.100005.

[bib.bib15] [15] Hoang-Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ratanamahatana, and Eamonn Keogh. The UCR time series archive. arXiv, 2018. arXiv:1810.07758.

[bib.bib16] [16] Tamal Dey, Woojin Kim, and Facundo Mémoli. Computing generalized rank invariant for 2-parameter persistence modules via zigzag persistence and its applications. In 38th International Symposium on Computational Geometry (SoCG 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.SoCG.2022.34.

[bib.bib17] [17] Tamal Dey, Aman Timalsina, and Cheng Xin. Computing generalized ranks of persistence modules via unfolding to zigzag modules. arXiv preprint, 2024. doi:10.48550/arXiv.2403.08110.

[bib.bib18] [18] Herbert Edelsbrunner and John L Harer. Computational topology: an introduction. American Mathematical Society, 2008.

[bib.bib19] [19] Donghan Kim, Woojin Kim, and Wonjun Lee. Super-polynomial growth of the generalized persistence diagram, 2024. doi:10.48550/arXiv.2412.04889.

[bib.bib20] [20] Woojin Kim and Facundo Mémoli. Generalized persistence diagrams for persistence modules over posets. Journal of Applied and Computational Topology, 5(4):533–581, 2021. doi:10.1007/S41468-021-00075-1.

[bib.bib21] [21] Woojin Kim and Facundo Mémoli. Spatiotemporal persistent homology for dynamic metric spaces. Discrete & Computational Geometry, 66:831–875, 2021. doi:10.1007/S00454-019-00168-W.

[bib.bib22] [22] Woojin Kim and Facundo Mémoli. Persistence over posets. Notices of the American Mathematical Society, 70(08), 2023.

[bib.bib23] [23] Woojin Kim and Facundo Mémoli. Extracting persistent clusters in dynamic data via möbius inversion. Discrete & Computational Geometry, 71(4):1276–1342, 2024. doi:10.1007/S00454-023-00590-1.

[bib.bib24] [24] Woojin Kim and Samantha Moore. Bigraded Betti numbers and generalized persistence diagrams. Journal of Applied and Computatioal Topology, 2024. doi:10.1007/s41468-024-00180-x.

[bib.bib25] [25] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Foundations of Computational Mathematics, 15(3):613–650, 2015. doi:10.1007/S10208-015-9255-Y.

[bib.bib26] [26] David Loiseaux. Time series classification. Accessed: 2025-03-24. URL: https://davidlapous.github.io/multipers/notebooks/time_series_classification.html.

[bib.bib27] [27] David Loiseaux, Mathieu Carrière, and Andrew Blumberg. A framework for fast and stable representations of multiparameter persistent homology decompositions. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.

[bib.bib28] [28] David Loiseaux and Hannah Schreiber. multipers: Multiparameter persistence for machine learning. Journal of Open Source Software, 9(103):6773, 2024. doi:10.21105/JOSS.06773.

[bib.bib29] [29] David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Botnan, and Steve Oudot. Stable vectorization of multiparameter persistent homology using signed barcodes as measures. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.

[bib.bib30] [30] Nikola Milosavljević, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology in matrix multiplication time. In Proceedings of the twenty-seventh Annual Symposium on Computational Geometry, pages 216–225, 2011. doi:10.1145/1998196.1998229.

[bib.bib31] [31] Dmitriy Morozov. Dionysus 2. Accessed: 2025-03-24. URL: https://mrzv.org/software/dionysus2/.

[bib.bib32] [32] Soham Mukherjee, Shreyas Samaga, Cheng Xin, Steve Oudot, and Tamal Dey. D-gril: End-to-end topological learning with 2-parameter persistence. arXiv, 2024. arXiv:2406.07100.

[bib.bib33] [33] Amit Patel. Generalized persistence diagrams. Journal of Applied and Computational Topology, 1(3):397–419, 2018. doi:10.1007/S41468-018-0012-6.

[bib.bib34] [34] Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, and Steve Oudot. Differentiability and optimization of multiparameter persistent homology. In 41st International Conference on Machine Learning (ICML 2024), volume 235, pages 43986–44011. PMLR, 2024.

[bib.bib35] [35] Oliver Vipond. Multiparameter persistence landscapes. Journal of Machine Learning Research, 21(61):1–38, 2020. URL: https://jmlr.org/papers/v21/19-054.html.

[bib.bib36] [36] Lu Xian, Henry Adams, Chad M Topaz, and Lori Ziegelmeier. Capturing dynamics of time-varying data via topology. Foundations of Data Science, 4(1):1–36, 2022.

[bib.bib37] [37] Cheng Xin, Soham Mukherjee, Shreyas Samaga, and Tamal Dey. GRIL: A 2-parameter persistence based vectorization for machine learning. In 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning. OpenReviews.net, 2023.

Sparsification of the Generalized Persistence Diagrams for Scalability Through Gradient Descent

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Supplementary Material:

Acknowledgements:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Sparsification of the GPD via gradient descent.

Summary of contributions.

Comparison with other works.

Organization.

2 Preliminaries

Definition 1 ([13]).

▶ Remark 2 ([13, Sections 2 and 3] and [7]).

3 Sparse erosion distance between GPDs relative to sampled intervals

3.1 Sparse erosion distance

Definition 3 ([13, Definition 5.2]).

Definition 4 (Sparse erosion distance between GPDs relative to sampled intervals).

Proposition 5.

Proposition 6.

Corollary 7.

3.2 Closed-form formula for the sparse erosion distance

Lemma 8.

Theorem 9.

▶ Remark 10.

4 Lipschitz stability and differentiability of the loss function

▶ Remark 11 (Independence of variables).

Lemma 12.

Proposition 13.

Proof.

Theorem 14 (Lipschitz stability of loss function).

Proof.

▶ Remark 15.

Proposition 16.

Proof.

5 Numerical Experiments on Sparsification

Time-series datasets.

Loss function.

Accuracy scores.

Discussion on results.

6 Conclusion

Optimization for finer GPDs.

Experimental validation for other GRI-based descriptors.

References

$\blacktriangleright$ Remark 2 ([13, Sections 2 and 3] and [7]).

$\blacktriangleright$ Remark 10.

$\blacktriangleright$ Remark 11 (Independence of variables).

$\blacktriangleright$ Remark 15.