Sparsification of the Generalized Persistence Diagrams for Scalability Through Gradient Descent
Abstract
The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge.
In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.
Keywords and phrases:
Multi-parameter persistent homology, Generalized persistence diagram, Generalized rank invariant, Non-convex optimization, Gradient descentFunding:
Mathieu Carrière: Partially supported by ANR grant “TopModel”, ANR-23-CE23-0014, and supported by the French government, through the 3IA Cote d’Azur Investments in the project managed by the National Research Agency (ANR) with the reference number ANR-23-IACL-0001.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Mathematics of computing Algebraic topology ; Theory of computation Nonconvex optimizationSupplementary Material:
Software (Source Code): https://github.com/L-ebesgue/sparse_GPDsarchived at

Acknowledgements:
This work stemmed from a conversation between M.C. and W.K. during the Dagstuhl Seminar on “Applied and Combinatorial Topology” (24092), held from February 26 to March 1, 2024. M.C. and W.K. thank the organizers of the Dagstuhl Seminar and appreciate the hospitality of Schloss Dagstuhl – Leibniz Center for Informatics. The authors are also grateful to the OPAL infrastructure from Université Côte d’Azur for providing resources and support.Editors:
Oswin Aichholzer and Haitao WangSeries and Publisher:

1 Introduction
Persistent homology, a central tool in topological data analysis (TDA), enables the study of topological features in datasets through algebraic invariants. In the classical one-parameter setting, the persistence barcode (or equivalently, the persistence diagram) serves as a complete, discrete, and computationally tractable invariant of a persistence module. Persistent homology can be extended to multi-parameter persistent homology, which provides tools for capturing the topological features of datasets using multiple filtrations instead of just one. However, the transition to multi-parameter persistent homology introduces significant complexity into the algebraic structure of the associated persistence modules [3, 6, 9].
Nevertheless, the generalized persistence diagram (GPD) extends the notion of persistence diagram from the one-parameter to the multi-parameter setting in a natural way [20, 33]. Although the GPD has been extensively studied in terms of stability, discriminating power, computation, and generalizations (see, e.g., [1, 13, 16, 17, 20, 22, 23, 24]), the computational complexity of GPDs remains a major obstacle [19]. The primary challenge arises from the size of their domain: the domain of the complete GPD is either , the set of all intervals in , or any appropriately chosen, finite subset of – however, in this case, domains still tend to be enormous to avoid sacrificing the GPD discriminating power [13]. Nevertheless, GPDs are flexible, in the sense that they are still well-defined on any finite subdomain . Even if the subdomain has small size, as the GPD over is simply defined as the Möbius inversion of the generalized rank invariant (GRI) over (see Definition 1). This allows to control the aforementioned complexity of GPD computation by picking a small, or sparse, subdomain . Moreover, the topological information loss due to using such sparse subdomains can be mitigated by looking for relevant intervals, i.e., intervals that are rich in topological content; indeed, even with a substantially small subdomain , the GPD over can still be finer than other traditional invariants of multi-parameter persistence modules, such as the rank invariant (RI) [9]; see [13] for details. However, how to design these subdomains in the “best” way so as to reduce computational cost while maintaining the discriminating power of the GPD is a question that has not been much explored so far.
Sparsification of the GPD via gradient descent.
Motivated by the computational challenges of the GPD and the flexibility of this invariant upon selecting subsets of intervals as its domain, we propose a method for automatically sparsifying GPDs computed from -persistence modules based on gradient descent. Namely, we consider the following scenario.
Suppose we aim at classifying instances of a given dataset based on their topological features, and we have already computed a set of corresponding persistence modules from the dataset, where each persistence module corresponds to an individual data point. We consider the set of GPDs of over a large set of intervals in (cf. Definition 1). We refer to these as the full GPDs.
Let be the cardinality of , and let be a sparsification parameter , which is typically significantly smaller than . Let denote the set of -subsets of , i.e., . In order to identify an -subset of intervals in over which the new GPDs are computed (and subsequently used to classify the persistence modules ), we proceed as follows. Firstly, we identify a loss function defined on :
(1) |
where is an appropriate dissimilarity function. Secondly, we search for a minimizer of the loss function. The goal of this search is to identify a subset of intervals such that the GPDs of the over this sparse subset , the sparse GPDs, best approximate their corresponding full counterparts overall. One natural way of searching for a (local) minimizer is through gradient descent, starting from a randomly chosen -subset of . To achieve this, the following requirements either must be met or highly desirable:
- (I)
-
(II)
(Vectorization) A certain representation of the loss function as a map defined on some subset of Euclidean space,
-
(III)
(Convexity) Convexity of the subset ,
-
(IV)
(Loss regularity) Lipschitz stability and differentiability of , and
-
(V)
(Feasibility) Computational feasibility of for practical implementation.
Summary of contributions.
-
A natural choice for in Item (I) would be to use the erosion distance [13, 33], a standard metric between GPDs,111 Note that is also referred to as a metric between generalized rank invariants (GRIs), e.g., in [13, 21]. Since the GPD and the GRI determine one another (cf. Remark 2 (i) and (iii)), can thus also be viewed as a metric between GPDs, as in [33, 36]. that is known to be stable under perturbations of input persistence modules w.r.t. the interleaving distance. However, the use of the erosion distance requires GPDs to be defined on the same set of intervals (Definition 3) that is closed under thickening, which implies the domain must be infinite. All these make it difficult to directly utilize . Hence, we introduce the sparse erosion distance between GPDs relative to (possibly) different interval sets, as an adaptation of (Definition 4, Proposition 6 (ii), and Corollary 7).222Another possibility would be to use bottleneck and Wasserstein distances, however we show in Remark 17 in the full version [11] that they fail to ensure continuity of the loss function.
-
Regarding Item (V), we restrict our focus to intervals in with a small number of minimal and maximal points (Remark 10). Then, if the sparse erosion distance is computed between GPDs of the same persistence module , we prove that depends solely on the domains , and not on the input persistence module, i.e. (Proposition 6 (iii)). This fact significantly enhances the tractability of gradient descent as it allows to avoid recomputing the GPDs at every iteration. Moreover, we derive a closed-form formula for the computation of (Theorem 9 (i)).
Figure 1: Any -interval of , as depicted above, is represented by . Any -interval can also be represented by with . Figure 2: Our pipeline for sparsifying GPDs in the context of time series classification. The sizes of the GPD points in are proportional to their multiplicities. -
For achieving Items (II), (III), and (IV), we only consider intervals with at most two minimal points and one maximal point. This ensures the existence of natural embeddings of these intervals into Euclidean space , which can be obtained by stacking up the coordinates of the interval middle points together with the lengths needed to define the interval lower boundaries (Figure 1). The main advantages of this vectorization method (w.r.t. the other natural ones) are that (1) it allows for a simple formulation of the loss function (Theorem 9 (ii)), and (2) its variables are all independent from each other, which makes the implementation of gradient descent easier (Remarks 10 and 11).333Alternatively, we can consider intervals with one minimal points and at most two maximal points. Either choice ensures that our restricted GPDs are not a weaker invariant than the rank invariant or the signed barcode [7]; see [13, Section 4]. Using this vectorization method, we also prove that not only forms a convex subset of Euclidean space (Proposition 13), but also ensures the Lipschitz stability and almost everywhere differentiability of the loss function (Theorem 14 and Proposition 16). In fact, we prove that the loss function can actually be realized as
(2) where is our -dimensional embedding of . Since (the size of the dataset) is a constant, the (local) minimizers of coincide with those of the map . Hence, searching for a (local) minimizer of is essentially searching for the (locally) best intervals that represent the domain of the full GPDs.
-
Finally, regarding Item (V) again, and in order to showcase the efficiency of our proposed sparsification method, we provide numerical experiments on topology-based time series classification with supervised machine learning (Section 5 and Figure 2), in which we show that the sparse GPDs can be computed in much faster time than the full GPDs, while maintaining similar or better classification performances from random forests models. Our code is fully available at sparse GPDs.
Comparison with other works.
While there exist many existing works that utilize multi-parameter persistent homology to enhance the performance of machine learning models [10, 14, 27, 29, 32, 34, 35, 37], our work takes a different approach as it focuses on methods to mitigate the computational overhead associated with multi-parameter persistence descriptors.
The works most closely related to ours are [29] and [37], which use the RI or GRI for multi-parameter persistence modules in a machine learning context. Firstly, [29] is based on an equivalent representation of the GPDs (computed from rectangle intervals444This types of GPDs are often called signed barcodes.) as signed measures [7], which allows to compare GPDs with optimal transport distances, as well as to deploy known vectorization techniques intended for general measures. Secondly, the approach proposed in [37] involves vectorizing the GRIs of 2-parameter persistence modules by evaluating them on intervals with specific shapes called worms. Our goal is different: we rather aim at vectorizing and sparsifying the domains of the GPDs in order to achieve good performance scores. In fact, the sparsification process that we propose can actually be used complementarily to both [29] and [37] by first sparsifying the set of rectangles or worms before applying their vectorization methods. Note that differentiability properties of both of these approaches w.r.t. the multi-parameter filtrations were recently established [32, 34]; in contrast, our work deals with the differentiability of a GPD-based loss function w.r.t. the interval domains (while keeping the multi-parameter filtrations fixed).
Organization.
Section 2 reviews basic properties of the GPD and GRI. Section 3 introduces the sparse erosion distance, clarify its relation to the erosion distance given in [13], and presents its closed-form formula, which is specialized and useful in our setting. Section 4 establishes the Lipschitz stability and differentiability of our loss function. Section 5 presents our numerical experiments. Finally, Section 6 discusses future research directions.
2 Preliminaries
In this article, stands for a poset, regarded as the category whose objects are the elements of , and for any pair , there exists a unique morphism if and only if . All vector spaces in this article are over a fixed field . Let denote the category of finite-dimensional vector spaces and linear maps over . A functor will be referred to as a (-)persistence module. The direct sum of any two -persistence modules is defined pointwise. Any is trivial if for all . If a nontrivial is not isomorphic to a direct sum of any two nontrivial persistence modules, is indecomposable. Every persistence module is decomposed into a direct sum of indecomposable modules, uniquely determined up to isomorphism [2, 4].
An interval of is a subset such that:
-
(i)
is nonempty.
-
(ii)
If and , then .
-
(iii)
is connected, i.e. for any , there is a sequence of elements of with and comparable for .
By we denote the set of all intervals of .
Given any , the interval module is the -persistence module, with
Every interval module is indecomposable [5, Proposition 2.2]. A -persistence module is interval-decomposable if it is isomorphic to a direct sum of interval modules. In this case, the barcode of is defined as the multiset .
For , let denote the set of points such that . Clearly, belongs to . A -persistence module is finitely presentable if it is isomorphic to the cokernel of a morphism where and are finite multisets of elements of .
When is a connected poset, the (generalized) rank of , denoted by , is defined as the rank of the canonical linear map from the limit of to the colimit of , which is a nonnegative integer [20]. This isomorphism invariant of -persistence modules, which takes a single integer value, can be refined into an integer-valued function as follows. Let be any nonempty subset of . The generalized rank invariant (GRI) of over is the map given by , where is the restriction of to [20]. When , we denote simply as .
The generalized persistence diagram (GPD) of over captures the changes of the GRI values when varies. Its formal definition follows.
Definition 1 ([13]).
Remark 2 ([13, Sections 2 and 3] and [7]).
-
(i)
If is finite, then exists.
-
(ii)
If exists, then it is unique.
-
(iii)
If exists, then and determine one another.
-
(iv)
(Monotonicity) for any pair in .
-
(v)
(The GPD generalizes the barcode) Let . If is interval decomposable, then exists. In this case, for any , coincides the multiplicity of in . Also, often exists even when is not interval decomposable.
-
(vi)
If is a finitely presentable -persistence module, then the GPD over exists [13, Theorem C(iii)].
In the rest of the article, every -persistence module is assumed to be finitely presentable, thus its GPD over exists, and is denoted by .
3 Sparse erosion distance between GPDs relative to sampled intervals
We adapt the notion of erosion distance to define a distance between GPDs relative to (possibly different) sampled intervals. When comparing the same GPD with two different sampled intervals, this distance simplifies to a distance between sampled intervals. This is relevant in our case, as we compare the full and sparse GPDs of the same persistence module.
3.1 Sparse erosion distance
In this section, we review the definition of the erosion distance and adapt it to define the sparse erosion distance between GPDs relative to sampled intervals.
For , the vector will be simply denoted by whenever there is no risk of confusion. For and , we consider the -thickening of , where stands for the closed -ball around w.r.t. the supremum distance, i.e. . See Figure 3. A subset is said to be closed under thickening if for all and for all , the interval belongs to .
Definition 3 ([13, Definition 5.2]).
A correspondence between nonempty sets and is a subset satisfying the following: (1) for each , there exists such that , and (2) for each , there exists such that . For , an -correspondence between nonempty is a correspondence such that for all , Blending the ideas of the Hausdorff and erosion distances, we obtain our new distance between GPDs relative to sampled intervals:
Definition 4 (Sparse erosion distance between GPDs relative to sampled intervals).
For any and any nonempty , the sparse erosion distance between pairs and is
We remark that captures not only (1) the algebraic difference between [ with respect to ] and [ with respect to ], but also (2) the geometric difference between the domains and . To see (1), consider, for example, the -persistence modules and , and let both and be the singleton set . Then, one can see that from the fact that and the monotonicity of and . To see (2), let and be any isomorphic -persistence modules, and set and . Then, we obtain , solely due to the difference between and .
Proposition 5.
is an extended pseudometric. (See the full version [11] for the proof.)
For any nonempty , let
(4) |
We clarify the relationship among , , and :
Proposition 6.
Let and let be nonempty. We have:
-
(i)
.
-
(ii)
If are closed under thickening, then
(5) -
(iii)
If or , then
(6)
(See the full version [11] for the proof.)
We remark that the inequality given in Item (i) can be strict. For instance, let , , , and . Then, .
By Proposition 6 (ii) and the prior stability result of [13, Theorem H], we have:
Corollary 7.
For any and for any , we have
where the right-hand side is the interleaving distance between and .
3.2 Closed-form formula for the sparse erosion distance
In this section, we find a closed-form formula for the right-hand side of Equation (6), when each interval in and has only finitely many minimal and maximal points (Theorem 9). This result is essential when studying our loss function in Section 4 and optimizing the domains of the GPDs in Section 5.
For , an interval is a -interval if has exactly minimal points and exactly maximal points. For any interval , let (resp. ) be the set of all minimal (resp. maximal) elements of . For , define
(9) |
Lemma 8.
(See the full version [11] for the proof.) Let and be sets of intervals of with only finitely many minimal and maximal points. Let the ()-matrix where is the RHS of the inequality given in Lemma 8, i.e.
(12) |
and as defined in Lemma 8. Next, we show that , as defined in Equation 4, equals the largest of the smallest elements across all rows and columns of . In particular, the following variables and functions are useful for describing the closed form formula for , when each of and consists solely of - or -intervals.
When and are -intervals, as depicted in Figure 4, let
When (resp. ) is a -interval, set , and thus (resp. , and thus ). Also, let
where all input variables of and are real numbers.
Theorem 9.
Let and be sets of intervals of with only finitely many minimal and maximal points.
-
(i)
We have where is as defined in Equation 12 and (resp. ) is a -interval (resp. -interval).
- (ii)
(13) | |||
(14) | |||
(15) | |||
(16) |
(See the full version [11] for the proof.)
Remark 10.
In Theorem 9 (i), as , , , and increase, the complexity of computing increases. This is because, the ranges over which the maxima on the RHS of Equation 12 increase, thereby increasing the computational complexity of as well.
4 Lipschitz stability and differentiability of the loss function
The goal of this section is to establish Lipchitz continuity/stability and (almost everywhere) differentiability of our loss function (cf. Equation 2). From Figure 1, recall that we denote the embedding of a -or -interval of into by .
Remark 11 (Independence of variables).
Although there are multiple ways to embed -intervals of into Euclidean space, for any , some vectorization methods are more efficient than others in our context. For instance, suppose that we represent the -interval in Figure 1 by simply concatenating the coordinates of its minimal and maximal points. Namely, we represent with the vector , where , , , , , . A difficulty with this embedding in the context of gradient descent is that the variables are not independent: one must have , , and . Ensuring such relations between variables is not trivial in non-convex optimization (without using refined techniques such as projected gradient descent). On the other hand, using our proposed embedding is more practical, as all variables are independent, and our only requirements are that must be nonnegative, which can easily be imposed using, e.g., exponential or ReLU functions.
Let and consist solely of - or -intervals of .
Lemma 12.
We define as the concatenation of for all , i.e., . Let be the collection of all ordered -sets of -or -intervals of . Then, define the function by .
Proposition 13.
The image of via the function is a convex subset of .
Proof.
Let be the collection of all - and all -intervals of . Consider the map from to depicted in Figure 1. Clearly, this map is a surjection. It follows that the image of via the map depicted in Figure 1 is a convex subset of . Now, observe that the image of via the map is equal to the -fold Cartesian product of the image of , which is a subset of . The fact that any Cartesian product of convex sets is convex implies our claim.
Theorem 14 (Lipschitz stability of loss function).
Let be any finite set of - or -intervals, and let be any two -sets of - or -intervals for some . Then, we have where the minimum is taken over all permutations on .
Proof.
By the triangle inequality, we have and thus it suffices to show that . Let be a permutation on that attains the minimum. Let and . By Lemma 12, (resp. ) for all and hence (resp. ) for each (resp. ). Thus, we have By Theorem 9 (i), we are done.
Remark 15.
The opposite direction of Theorem 14 does not hold, i.e. there does not exist such that However, the non-existence of such is not an issue in our work as what we require is the Lipschitz stability and almost everywhere differentiability of the loss function (which we establish in the next proposition), in order to prevent erratic and oscillating gradient descent iterations.
Proposition 16.
The loss function given in Equation (2) is differentiable almost everywhere.
Proof.
The three maps given by (cf. Equation (9)), , and are finitely segmented piecewise linear. Therefore, the functions given in Equations (13)–(16) are all finitely segmented piecewise linear on Euclidean spaces, and thus so are given in Theorem 9 (ii). Hence, trivially, these functions are differentiable almost everywhere. Now, Proposition 6 (iii) implies our claim.
5 Numerical Experiments on Sparsification
In this section, we make use of the results proved above to provide a method for sparsifying GPDs. Indeed, computing a single GPD on a persistence module coming from a simplicial complex requires computing zigzag persistence modules on all intervals of , as described in [16], and then computing the corresponding Möbius inversion, yielding a time complexity of , where is the number of intervals in , due to the need to compute the Möbius function value for all possible pairs of intervals, each requiring operations to iterate over each interval. Thus, the complexity of computing a single GPD is , where is the number of simplices [30]. Hence, computing all GPDs from a dataset of persistence modules becomes intractable when is large.
Therefore, as described in the introduction, our goal in this section is to design a sparse subset of -intervals of size that minimizes the loss function in Equation 2 with gradient descent by treating every interval in as a parameter to optimize.
Time-series datasets.
The datasets we consider are taken from the UCR repository [15], and correspond to classification tasks that have already been studied with persistent homology before [29, Section 4.2]. More precisely, instances in these datasets take the form of labelled time series, that we pre-process with time-delay embedding in order to turn them into point clouds. Specifically, each labelled time series of length is transformed into a point cloud of cardinality with
Then, we compute both the Vietoris-Rips filtration and the sublevel set filtration induced by a Gaussian kernel density estimator (with bandwidth , where is the diameter of the point cloud) using the PointCloud2FilteredComplex function of the multipers library [28] (a tutorial is available in [26]). Both filtrations are then normalized so that their ranges become equal to the unit interval .
Loss function.
In order to compute GPDs out of these 2-parameter filtrations and modules, one needs a subset of intervals. As explained in the introduction, we then minimize
where the full domain (resp. the sparse domain ) is comprised of (resp. ) -intervals obtained from a grid in computed by evenly sampling (resp. ) values for and and values for within their corresponding filtration ranges.666We also tried random uniform sampling for the initial distributions of , but this resulted in slower convergence and lower performances. Note that while the -intervals in are treated as parameters to optimize, the -intervals in are fixed throughout the optimization process. Moreover, the formula that we provided in Theorem 9 (ii) can be readily implemented in any library that uses auto-differentiation, such as pytorch. In particular, we run stochastic gradient descent with momentum on for epochs using learning rate with exponential decay of factor to achieve convergence, and obtain our sparse subset . See Figure 5 for a visualization of the loss decrease. Note how the Lipschitz stability proved in Theorem 14 translates into a smooth decrease with small oscillations.

Accuracy scores.
In order to quantify the running time improvements as well as the information loss (if any) when switching from the full domain to the optimized sparse one , we computed the full and sparse GPDs in homology dimensions and with the zigzag persistence diagram implementation in dionysus [31], and then we trained random forest classifiers on these GPDs to predict the time series labels.777Recall that optimizing the sparse domain can be performed without computing GPDs, so the optimization process described in the previous paragraph is done once and for all, before computing the GPDs. In order to achieve this, we first turned both the full and optimized GPDs into Euclidean vectors by first binning every GPD (seen as a point cloud in ) with a fixed -dimensional histogram, and then convolving this histogram with a -dimensional Gaussian kernel in order to smooth its values, with a procedure similar to the one described in [29, Section 3.2.1]. Both the histogram bins and the kernel bandwidths were found with -fold cross-validation on the training set (see the provided code for hyperparameter values). Then, random forest classifiers were trained on these smoothed histograms to predict labels; in Table 1 we report the accuracy scores of these classifiers for the initial (before optimization) sparse domain , the optimized sparse domain , and the full domain . Moreover, in Table 2, we report the running time needed to compute all GPDs using either , , or , as well as the improvement when switching from to . Note that our goal is not to improve on the state-of-the-art for time series classification, but rather to assess whether optimizing the loss based on our upper bound is indeed beneficial for improving topology-based models. See Figure 2 for a schematic overview of our full pipeline.
Discussion on results.
As one can see from Table 1, there is either a clear improvement or a comparable performance in accuracy scores after optimizing . Indeed, by minimizing , one forces the sparse domain to be as close as possible to the full domain , and thus to retain as much topological information as possible. Hence, either the full GPDs are more efficient than the initial sparse GPDs, in which case the optimized GPDs perform much better than the initial sparse ones, or the full GPDs are less efficient than the initial sparse GPDs,888Recall that the accuracy score is only an indirect measure: While the full GPDs are always richer in topological information, we hypothesize that their scores might still be lower due to a large number of intervals on which the full GPDs vanish – a question that we leave for further study. in which case the optimized GPDs have comparable or slightly worse performances than the initial sparse ones thanks to the small sizes of their domains (except for the PC and IPD datasets, for which the optimized GPDs still perform interestingly better). In all cases, we emphasize that optimized GPDs provide the best solution: they maintain scores at levels that are either comparable or better than the best solution between the initial sparse and full GPDs, while avoiding to force users to choose interval domains (as their domains are obtained automatically with gradient descent). Note in particular that whenever optimized GPDs do not achieve the best scores, their performances remain of the same order as those of their competitors (see, e.g., DPC or SC), whereas when optimized GPDs are the best, the performances of the other two competitors are sometimes far behind (see, e.g., GPA).
As for running times, we observe a slight increase from the running times associated to the initial sparse GPDs (except for the IPD dataset), which we hypothesize is due to the use of optimized intervals that intersect more with each other, and thus induce more poset relations when computing Möbius inversions, and a strong improvement over the running times associated to the full GPDs, with ratios ranging between and times faster. Our optimized GPDs thus achieve the best of both worlds: they are strinkingly fast to compute while keeping high accuracy scores. Our code was run on a 2x Xeon SP Gold 5115 @ 2.40GHz, and is fully available at sparse GPDs.
C | DPA | DPC | DPT | PPA | PPC | PPT | ECG | IPD | |
Init. | 0.750 | 0.705 | 0.746 | 0.561 | 0.790 | 0.718 | 0.693 | 0.790 | 0.677 |
Optim. | 0.786 | 0.691 | 0.721 | 0.561 | 0.785 | 0.742 | 0.737 | 0.790 | 0.690 |
Full | 0.857 | 0.669 | 0.743 | 0.554 | 0.780 | 0.729 | 0.707 | 0.740 | 0.651 |
MI | P | SL | GP | GPA | GPM | GPO | PC | SC | |
Init. | 0.534 | 0.924 | 0.565 | 0.900 | 0.835 | 0.946 | 0.987 | 0.789 | 0.447 |
Optim. | 0.588 | 0.886 | 0.546 | 0.893 | 0.959 | 0.959 | 0.956 | 0.800 | 0.487 |
Full | 0.545 | 0.838 | 0.531 | 0.847 | 0.864 | 0.946 | 0.990 | 0.778 | 0.503 |
C | DPA | DPC | DPT | PPA | PPC | PPT | ECG | IPD | |
Init. | 350 | 1076 | 1902 | 1178 | 1182 | 1660 | 1147 | 556 | 1262 |
Optim. | 421 | 1172 | 2054 | 1265 | 1257 | 1749 | 1232 | 621 | 1328 |
Full | 2304 | 11672 | 20163 | 13007 | 12745 | 18427 | 12844 | 5098 | 19860 |
Improv. | 5.47x | 9.95x | 9.82x | 10.28x | 10.13x | 10.53x | 10.42x | 8.20x | 15.74x |
MI | P | SL | GP | GPA | GPM | GPO | PC | SC | |
Init. | 2989 | 792 | 3478 | 703 | 1576 | 1658 | 1573 | 1339 | 1240 |
Optim. | 3399 | 909 | 3919 | 883 | 1955 | 2061 | 1925 | 1550 | 1324 |
Full | 29074 | 6280 | 31285 | 5773 | 12913 | 13523 | 12849 | 10730 | 13089 |
Improv. | 8.55x | 6.91x | 7.98x | 6.54x | 6.60x | 6.56x | 6.67x | 6.92x | 9.88x |
6 Conclusion
Our sparsification method demonstrates the practicality of approximating full GPDs with sparse ones: while significantly reducing computational costs, our appropriate loss function also ensures that their discriminative power is not too compromised. We thus believe that our work paves the way for efficiently deploying multi-parameter topological data analysis to large-scale applications, that are currently out of reach for most multi-parameter topological invariants from the literature. In what follows, we outline several potential future directions.
Optimization for finer GPDs.
In our experiments, we optimized the GPDs over intervals with at most two minimal points and exactly one maximal point. Allowing more complex intervals, as well as dataset-dependent terms in the loss function, could improve performance, but finding suitable embeddings of such intervals into Euclidean space remains a challenge (cf. Remark 11) and would increase the computational cost (cf. Remark 10). It would also be interesting to quantify the discriminating power of GPDs with measures that are more direct than the score of a machine learning classifier (or at least to further study the dependencies between GPD sparsification and classifier scores), and to investigate on heuristics (based on drops in the loss values) for deciding whether sparse GPDs are sufficiently good so that optimization can be stopped.
Experimental validation for other GRI-based descriptors.
While we only focused on sparsifying GPDs in this work, it would be interesting to measure the extent to which our interval domain optimization adapts to other descriptors based on the GRI from the TDA literature; of particular interest are the GPDs/signed barcodes coming from rank exact decompositions, which have recently proved to be stable [8] (see also Section 8.2 in [7] which discusses the influence of the choice of the interval domains on the resulting invariants), as well as GRIL, which focuses on specific intervals called worms [37].
References
- [1] Hideto Asashiba, Etienne Gauthier, and Enhao Liu. Interval replacements of persistence modules. arXiv preprint, 2024. arXiv:2403.08308.
- [2] Gorô Azumaya. Corrections and supplementaries to my paper concerning Krull-Remak-Schmidt’s theorem. Nagoya Mathematical Journal, 1:117–124, 1950.
- [3] Ulrich Bauer, Magnus Botnan, Steffen Oppermann, and Johan Steen. Cotorsion torsion triples and the representation theory of filtered hierarchical clustering. Advances in Mathematics, 369:107171, 2020.
- [4] Magnus Botnan and William Crawley-Boevey. Decomposition of persistence modules. Proceedings of the American Mathematical Society, 148(11):4581–4596, 2020.
- [5] Magnus Botnan and Michael Lesnick. Algebraic stability of zigzag persistence modules. Algebraic & Geometric topology, 18(6):3133–3204, 2018.
- [6] Magnus Botnan and Michael Lesnick. An introduction to multiparameter persistence. In Representations of Algebras and Related Structures, pages 77–150. CoRR, 2023.
- [7] Magnus Botnan, Steffen Oppermann, and Steve Oudot. Signed barcodes for multi-parameter persistence via rank decompositions and rank-exact resolutions. Foundations of Computational Mathematics, pages 1–60, 2024.
- [8] Magnus Botnan, Steffen Oppermann, Steve Oudot, and Luis Scoccola. On the bottleneck stability of rank decompositions of multi-parameter persistence modules. Advances in Mathematics, 451:109780, 2024.
- [9] Gunnar Carlsson and Afra Zomorodian. The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93, 2009. doi:10.1007/S00454-009-9176-0.
- [10] Mathieu Carrière and Andrew Blumberg. Multiparameter persistence image for topological machine learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 22432–22444. Curran Associates, Inc., 2020.
- [11] Mathieu Carriere, Seunghyun Kim, and Woojin Kim. Sparsification of the generalized persistence diagrams for scalability through gradient descent. arXiv preprint arXiv:2412.05900, 2024. doi:10.48550/arXiv.2412.05900.
- [12] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas, and Steve Y Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 237–246, 2009. doi:10.1145/1542362.1542407.
- [13] Nate Clause, Woojin Kim, and Facundo Memoli. The generalized rank invariant: Möbius invertibility, discriminating power, and connection to other invariants. arXiv preprint, 2024. arXiv:2207.11591v5.
- [14] René Corbet, Ulderico Fugacci, Michael Kerber, Claudia Landi, and Bei Wang. A kernel for multi-parameter persistent homology. Computers & Graphics: X, 2:100005, 2019. doi:10.1016/J.CAGX.2019.100005.
- [15] Hoang-Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ratanamahatana, and Eamonn Keogh. The UCR time series archive. arXiv, 2018. arXiv:1810.07758.
- [16] Tamal Dey, Woojin Kim, and Facundo Mémoli. Computing generalized rank invariant for 2-parameter persistence modules via zigzag persistence and its applications. In 38th International Symposium on Computational Geometry (SoCG 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.SoCG.2022.34.
- [17] Tamal Dey, Aman Timalsina, and Cheng Xin. Computing generalized ranks of persistence modules via unfolding to zigzag modules. arXiv preprint, 2024. doi:10.48550/arXiv.2403.08110.
- [18] Herbert Edelsbrunner and John L Harer. Computational topology: an introduction. American Mathematical Society, 2008.
- [19] Donghan Kim, Woojin Kim, and Wonjun Lee. Super-polynomial growth of the generalized persistence diagram, 2024. doi:10.48550/arXiv.2412.04889.
- [20] Woojin Kim and Facundo Mémoli. Generalized persistence diagrams for persistence modules over posets. Journal of Applied and Computational Topology, 5(4):533–581, 2021. doi:10.1007/S41468-021-00075-1.
- [21] Woojin Kim and Facundo Mémoli. Spatiotemporal persistent homology for dynamic metric spaces. Discrete & Computational Geometry, 66:831–875, 2021. doi:10.1007/S00454-019-00168-W.
- [22] Woojin Kim and Facundo Mémoli. Persistence over posets. Notices of the American Mathematical Society, 70(08), 2023.
- [23] Woojin Kim and Facundo Mémoli. Extracting persistent clusters in dynamic data via möbius inversion. Discrete & Computational Geometry, 71(4):1276–1342, 2024. doi:10.1007/S00454-023-00590-1.
- [24] Woojin Kim and Samantha Moore. Bigraded Betti numbers and generalized persistence diagrams. Journal of Applied and Computatioal Topology, 2024. doi:10.1007/s41468-024-00180-x.
- [25] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Foundations of Computational Mathematics, 15(3):613–650, 2015. doi:10.1007/S10208-015-9255-Y.
- [26] David Loiseaux. Time series classification. Accessed: 2025-03-24. URL: https://davidlapous.github.io/multipers/notebooks/time_series_classification.html.
- [27] David Loiseaux, Mathieu Carrière, and Andrew Blumberg. A framework for fast and stable representations of multiparameter persistent homology decompositions. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
- [28] David Loiseaux and Hannah Schreiber. multipers: Multiparameter persistence for machine learning. Journal of Open Source Software, 9(103):6773, 2024. doi:10.21105/JOSS.06773.
- [29] David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Botnan, and Steve Oudot. Stable vectorization of multiparameter persistent homology using signed barcodes as measures. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
- [30] Nikola Milosavljević, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology in matrix multiplication time. In Proceedings of the twenty-seventh Annual Symposium on Computational Geometry, pages 216–225, 2011. doi:10.1145/1998196.1998229.
- [31] Dmitriy Morozov. Dionysus 2. Accessed: 2025-03-24. URL: https://mrzv.org/software/dionysus2/.
- [32] Soham Mukherjee, Shreyas Samaga, Cheng Xin, Steve Oudot, and Tamal Dey. D-gril: End-to-end topological learning with 2-parameter persistence. arXiv, 2024. arXiv:2406.07100.
- [33] Amit Patel. Generalized persistence diagrams. Journal of Applied and Computational Topology, 1(3):397–419, 2018. doi:10.1007/S41468-018-0012-6.
- [34] Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, and Steve Oudot. Differentiability and optimization of multiparameter persistent homology. In 41st International Conference on Machine Learning (ICML 2024), volume 235, pages 43986–44011. PMLR, 2024.
- [35] Oliver Vipond. Multiparameter persistence landscapes. Journal of Machine Learning Research, 21(61):1–38, 2020. URL: https://jmlr.org/papers/v21/19-054.html.
- [36] Lu Xian, Henry Adams, Chad M Topaz, and Lori Ziegelmeier. Capturing dynamics of time-varying data via topology. Foundations of Data Science, 4(1):1–36, 2022.
- [37] Cheng Xin, Soham Mukherjee, Shreyas Samaga, and Tamal Dey. GRIL: A 2-parameter persistence based vectorization for machine learning. In 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning. OpenReviews.net, 2023.