Abstract 1 Introduction Additional Related Work 2 Preliminaries and Algorithmic Overview 3 Computing 𝑬 when 𝐌 is integers 4 Proof of Theorem 5 5 Proof Sketch of Theorem 6 6 Proof of Theorem 8: CLB for Exact MCDO on an Array 7 Conclusions and Open Problems References

Color Distance Oracles and Snippets: Separation Between Exact and Approximate Solutions

Noam Horowicz ORCID Bar-Ilan University, Ramat Gan, Israel Tsvi Kopelowitz ORCID Bar-Ilan University, Ramat Gan, Israel
Abstract

In the snippets problem, the goal is to preprocess a text T so that given two pattern queries, P1 and P2, one can quickly locate the occurrences of the two patterns in T that are closest to each other, or report the distance between these occurrences. Kopelowitz and Krauthgamer [CPM2016] showed upper bound tradeoffs and conditional lower bounds tradeoffs for the snippets problem, by utilizing connections between the snippets problem and the problem of constructing a color distance oracle (CDO), which is a data structure that preprocess a set of points with associated colors so that given two colors c and c one can quickly find the (distance between the) closest pair of points where one has color c and the other has color c. However, the existing upper bound and lower bound curves are not tight.

Inspired by recent advances by Kopelowitz and Vassilevska-Williams [ICALP2020] regarding tradeoff curves for Set-disjointness data structures, in this paper we introduce new conditionally optimal algorithms for a (1+ε) approximation version of the snippets problem and a (1+ε) approximation version of the CDO problem, by applying fast matrix multiplication. For example, for CDO on n points in an array, if the preprocessing time is O~(na) and the query time is O~(nb) then, assuming that ω=2 (where ω is the exponent of n in the runtime of the fastest matrix multiplication algorithm on two squared matrices of size n×n), we show that approximate CDO can be solved with the following tradeoff

{a+2b=2if 0b132a+b=3if 13b1.

Moreover, we prove that for exact CDO on points in an array, the algorithm of Kopelowitz and Krauthgamer [CPM2016], which obtains a tradeoff of a+b=2, is essentially optimal assuming that the strong all-pairs shortest paths hypothesis holds for randomized algorithms. Thus, we demonstrate that the exact version of CDO is strictly harder than the approximate version. Moreover, this separation carries over to the snippets problem.

Keywords and phrases:
data structures, fast matrix multiplication, fine-grained complexity, pattern matching, distance oracles
Copyright and License:
[Uncaptioned image] © Noam Horowicz and Tsvi Kopelowitz; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
Full Version: https://arxiv.org/abs/2507.04578 [12]
Funding:
Supported by a BSF grant 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).
Acknowledgements:
The authors thank an anonymous reviewer who suggested some ideas for tightening the lower bounds proved in Theorems 8 and 9.
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

In the snippets problem, introduced by Kopelowitz and Krauthgamer [17], the goal is to preprocess a text T so that given two pattern queries, P1 and P2, one can quickly locate the locations of the two patterns in T that are closest to each other, or report the distance between these locations. The snippets problem is motivated by many common text indexing applications, such as searching a corpus of documents for two query keywords. Often, the relevance of a document to the query is measured by the proximity of the two keywords within the document. The term snippet is derived from search engines often providing each result with a snippet of text from the document with the two keywords close to each other.

Kopelowitz and Krauthgamer [17] designed a tradeoff algorithm for the snippets problem based on a connection to a problem on colored points in metric spaces, which we define next.

Colored points and color distances

Let 𝐌 be a metric space with distance function d(,). Let S𝐌 be a set of points where |S|=n. Let 𝒞 be a set of colors. For sake of convenience we assume that 𝒞=[|𝒞|]. Each point p S has an associated color cp𝒞. For a color c𝒞, let PS(c)={pS|cp=c}. When S is clear from context we abuse notation and denote P(c)=PS(c).

For sets of points A,B𝐌 and a point pM, let d(p,A)=minpA{d(p,p)} and let d(A,B)=minpA{d(p,B)}=minpA,p^B{d(p,p^)}. We extend the definition of d to inputs of colors as follows. For colors c,c𝒞 and point p𝐌, let d(p,c)=d(p,P(c)) and let δc,c=d(c,c)=d(P(c),P(c)). We say that δc,c is the color distance between c and c. A natural problem is to construct a color distance oracle (CDO), which is defined as follows.

Problem 1 (The Color Distance Oracle (CDO) problem [17]).

Given a set S𝐌 of n colored points with colors from 𝒞, preprocess S to support the following query: given c,c 𝒞, return δc,c.

In this paper we consider metrics defined by locations in an array of size s, and so the metric is the set111Throughout this paper for a positive integer k we denote [k]={1,2,,,k}. [s], and the distance function of two points p and q is d(p,q)=|pq|.

Multi-colored points and color hierarchies

A natural generalization of a colored set is to allow each pS to be associated with a nonempty set of colors c(p)𝒞, and in such a case S is said to be multi-colored. This version of the CDO problem is called the multi-color distance oracle (MCDO) problem. In general, representing c(p) costs Θ(|c(p)|) space by explicitly listing all colors in c(p). Thus, the input size is n=pS|c(p)|.

Nevertheless, there are interesting cases in which the lists of colors for each point are not required to be given explicitly. One such example, which we focus on, is when for every two colors c,c𝒞, either one of the sets P(c) and P(c) contains the other, or the two sets are disjoint. In such a case, we say that 𝒞 has a color hierarchy with respect to S (a formal terminology is that {P(c)}c𝒞 is a laminar family), represented by a rooted forest (i.e., each tree has a root) TS of size O(|𝒞|). Each color c is associated with a vertex uc in TS, such that the descendants of uc are exactly all the vertices uc whose color c satisfies P(c)P(c). We convert the forest TS to a rooted tree by adding a dummy root vertex and making the dummy root the parent of all of the roots of the trees in the forest.

With the aid of TS, a multi-colored set S that admits a color hierarchy can be represented using only O(n+|𝒞|) machine words, because it suffices to store TS and associate with each point p just one color c (the color with the lowest corresponding vertex in TS); the other colors of p are implicit from TS (the colors on the path from uc to the root of TS). Notice that |𝒞|>n implies that there are at least two colors with the same point set. Thus, we make the simplifying assumption that all colors have different point sets222If we happen to have several colors with the same point set, we ignore all of them but one during preprocessing, and use a straightforward mapping from the original coloring to the reduced coloring during query time., and so the input size is O(n).

Thus, a natural extension of Problem 1 is the following.

Problem 2 (The Multi-Color Distance Oracle with a Color Hierarchy (MCDOCH) problem [17]).

Given a set S𝐌 of n multi-colored points with colors from 𝒞 which form a color hierarchy with respect to S, preprocess S to support the following query: given c,c 𝒞, return δc,c.

Let a and b be real numbers such that the preprocessing and query times of a MCDOCH algorithm are O~(na)333Throughout this paper we use the notation O~()˙ to suppress sub-polynomial factors. and O~(nb), respectively. A tradeoff algorithm for the MCDOCH problem was presented in [17], where given parameter 1τn, the query time is O~(nb)=O~(τ) and the preprocessing time is O~(na)=O~(n2/τ)=O~(n2b). When ignoring sub-polynomial factors, this upper bound tradeoff is given by a+b=2 where 0b1 and 1a2. Notice that the same tradeoff applies to the CDO problem.

In addition to the tradeoff algorithms, [17] proved a 3SUM based conditional lower bound (CLB) tradeoff444The CLB tradeoff stated here is straightforward to derive from the statement of Theorem 4 in [17]. of a+2b2 for the CDO problem (and hence for the MCDOCH problem), even when 𝐌 is defined by locations in an array. Their CLB holds also for approximate versions of the CDO problem (see Theorem 8 in [17]), where for a fixed α>1, the answer to a color distance query between c and c is required to be between δc,c and αδc,c.

Solving the Snippets problem

By designing a reduction from the snippets problem to the MCDOCH problem, [17] were able to design an algorithm for the snippets problem with essentially the same tradeoff: for a chosen parameter 1τ|T| their snippets algorithm uses O~(|T|2/τ) preprocessing time and answers queries in O~(|P1|+|P2|+τ) time. The main idea in the reduction is to construct the suffix tree of T, assign a color to each vertex in the suffix tree, and for each leaf in the suffix tree, the corresponding location in T is assigned all of the colors on the path from to the suffix tree root. It is straightforward to see that the colors define a hierarchy and that TS is exactly the suffix tree.

A complexity gap

The upper bound curve of a+b=2 and the CLB tradeoff of a+2b2 form a complexity gap for the snippets and (approximate) CDO problems, which was left open by [17]. Moreover, based on the combinatorial Boolean Matrix Multiplication (BMM) hypothesis, [17, Theorem 8] implies that any algorithm that beats the a+b=2 curve, even for approximate versions, must use non-combinatorial techniques, such as fast matrix multiplication (FMM) algorithms.

We remark that the CLB tradeoff given in [17] is based on a reduction from a SetDisjointness problem to the (approximate) CDO problem, and at the time [17] was published, the SetDisjointness problem was known to exhibit the same upper bound tradeoff and 3SUM based CLB tradeoff [18]. However, recently Kopelowitz and Vassilevska-Williams [19] closed this complexity gap by, among other ideas, using FMM techniques. Thus, a natural question to ask is whether FMM can assist in obtaining improved algorithms for CDO problems.

Our results

We close the complexity gap for the (1+ε) approximation versions of CDO and MCDOCH, where 𝐌 is defined by locations in an array. Formally, the problems are defined as follows.

Problem 3 (The (1+ε)-Approximate Color Distance Oracle (ACDO) problem on an array).

Let 𝐌 be a metric defined by locations in an array of size n. Given a set S𝐌 of n colored points with colors from 𝒞 and a fixed real 0ε1, preprocess S to support the following query: given two colors c,c 𝒞, return a value δ^ such that δc,cδ^(1+ε)δc,c.

Problem 4 (The (1+ε)-Approximate Multi-Color Distance Oracle problem with a Color Hierarchy (AMCDOCH) on an array).

Let 𝐌 be a metric defined by locations in an array of size Θ(n). Given a set S𝐌 of n multi-colored points with colors from 𝒞 which form a color hierarchy with respect to S, and a fixed real 0ε1, preprocess S to support the following query: given two colors c,c 𝒞, return a value δ^ such that δc,cδ^(1+ε)δc,c.

Our first main result is a new tradeoff algorithm for ACDO, which is stated by the following theorem. Notice that the time complexities presented are dependent on ω2, which is the exponent of n in the runtime of the fastest FMM algorithm on two squared matrices of size n×n. The currently best upper bound on ω is ω<2.371339 given by [2]. However, for clarity, we choose to express our tradeoffs in terms of ω. We remark that ACDO is a special case of AMCDOCH, and so Theorem 6 implies Theorem 5. However, Theorem 5 is used in the proof of Theorem 6, so we explicitly state both theorems.

Theorem 5.

For any real 0b1, there exists an ACDO algorithm with preprocessing time O~(na) and query time O~(nb) where

{a+2ω1b=2if 0bω1ω+12ω1a+b=ω+1ω1if ω1ω+1b1.
Theorem 6.

For any real 0b1, there exists an AMCDOCH algorithm with preprocessing time O~(na) and query time O~(nb) where

{a+2ω1b=2if 0bω1ω+12ω1a+b=ω+1ω1if ω1ω+1b1.

Combining Theorem 6 with the reduction of the snippets problem to the MCDOCH problem given in [17], we obtain the following tradeoff for a 1+ε approximation version of the snippets problem.

Theorem 7.

For any fixed 0<ε1, and 1a2, there exists an algorithm that preprocesses a text T in O~(|T|a) time such that given two pattern strings P1 and P2 where the distance between the closest occurrences of the two patterns in T is δ, the algorithm returns a value δ^ such that δδ^(1+ε)δ in O~(|P1|+|P2|+|T|b) time, where

{a+2ω1b=2if 0bω1ω+12ω1a+b=ω+1ω1if ω1ω+1b1.

We remark that it is straightforward to adapt our ACDO and AMCDOCH algorithms (and hence our new approximate snippets algorithm) to return the two points (one of each color) that define the distance being returned (or the locations of the two patterns in the approximate snippets problem). For details, see the full version [12].

A complexity separation between exact and approximate solutions

We remark that by combining the reduction given in [17, Theorem 8] from SetDisjointness to the CDO problem with the CLBs for SetDisjointness given in [19, Theorem 6] for the case of ω=2 (which some researchers believe should be obtainable), Theorems 5, 6 and 7 are all conditionally optimal555For Theorems 5 and 6, the optimality follows directly from applying [17, Theorem 8] and [19, Theorem 6]. For Theorem 7, the optimality follows since an approximate solution for the snippets problem can be used to solve ACDO on an array as follows: treat each color as a character, and then the array becomes a string. Preprocess the string using the snippets algorithm, so that given an ACDO query on two colors c,c^, query the snippets algorithm with patterns P=c and P^=c^. Thus, the ACDO lower bound applies to approximate snippets. (up to subpolynomial factors). A natural question to ask is whether one can design exact algorithms with the same tradeoff bounds. We provide evidence that this is not possible, based on a strong version of the popular all-pairs shortest paths (APSP) conjecture [7], thereby demonstrating a separation between CDO and ACDO.

The Strong-APSP hypothesis, introduced by Chan, Vassilevska-Williams and Xu [7], states that for a graph with n^ vertices666We use n^ to differentiate from n=|S| in the various CDO problems., even if all of the edge weights are in777Actually, in [7] the weights are bounded by n^3ω, but since ω2 we can bound the weights by n^. [n^], APSP still does not have a truly subcubic algorithm. A closely related problem to APSP is the (min,+)-matrix product problem, where the input is two n^ by n^ matrices A={ai,j} and B={bi,j}, and the output is the matrix D={di,j} where di,j=min1kn^(ai,k+bk,j). Shoshan and Zwick [24] showed that solving APSP with weights [M] is equivalent (in the sense of having the same runtime) to solving (min,+)-matrix product with entries in [O(M)]. Thus, the Strong-APSP hypothesis implies that there is no truly subcubic time (min,+)-matrix product algorithm even when all of the entries are in [n^].

By reducing (min,+)-matrix product with entries in [n^] to MCDO, we are able to prove the following CLB in Section 6, which matches the upper bound given in [17].

Theorem 8.

Assuming the Strong-APSP conjecture, any algorithm for MCDO on n points in an array of size O(n) with O~(na) preprocessing time and O~(nb) query time, respectively, must obey a+b2.

Moreover, we are able to leverage the ideas used in the proof of Theorem 8 to reduce (min,+)-matrix product with entries in [n^] to CDO via a randomized reduction, thereby obtaining a CLB under the assumption that the Strong-APSP hypothesis holds even for randomized algorithms (either in expectation or with high probability).

Theorem 9.

Assuming the Strong-APSP hypothesis for randomized algorithms, any algorithm for CDO on n points in an array of size O(n) with O~(na) preprocessing time and O~(nb) query time, respectively, must obey a+b2.

The proof of Theorem 9 is based on the proof of Theorem 8, combined with probabilistic techniques in order to create instances that are not multi-colored. Due to space considerations, the proof is given in the full version [12]

Theorem 9 combined with Theorem 5 implies that the exact version of CDO is strictly harder than the approximate version. Moreover, we remark that the CLBs for CDO apply also to the snippets problem, since the special case of 𝐌 defined by locations in an array is captured by the snippets problem when P1 and P2 are each a single character. Thus, the CLB of Theorem 9 applies also to the snippets problem, and so the exact version of the snippets problem is also strictly harder than the approximate version.

Additional Related Work

A problem closely related to the CDO is the (approximate) vertex-labeled distance oracles for graphs (VLDO) problem, where the goal is to preprocess a colored graph G with n vertices, so that given a vertex query v and a color c, one can quickly return (an approximation of) d(v,c). Hermelin, Levy, Weimann, and Yuster [11] introduced the VLDO problem and designed a solution using O(kn1+1/k) expected space, with stretch factor 4k5 and O(k) query time. They also showed how to reduce the space usage to O(kN1/k) but the stretch factor is exponential in 2k21. Chechik [8] later showed how to lower the stretch back to 4k5. For planar graphs, Evald, Fredslund-Hansen, and Wulff-Nilsen [10] designed a near-optimal exact tradeoff using n1+o(1) space with O~(1) query time, or O~(n) space with no(1) query time. Li, Ma, and Ning [21] designed a 1+ε approximation for planar graphs.

2 Preliminaries and Algorithmic Overview

Let [n]={1,2,,n}. Let S be a set of n integers. Let max(S) be the largest integer in S and let min(S) be the smallest integer in S. For integers i,j[n] where i<j, let S[i,j]={pS:ipj}.

Our algorithms make use of the following Nearest Neighbor Search (NNS) data structures.

Problem 10 (The Nearest Neighbour Search problem [1, 4, 16, 25]).

Given a set S𝐌 of size n, preprocess S to support the following query: given an integer p, return argminpS{d(p,p)}.

Problem 11 (The Range Nearest Neighbour Search (RNNS) problem [5, 6, 9, 14, 15, 20, 22, 23]).

Given a set S of n integers, preprocess S to support the following query: given i,j[n] and an integer p, return argminpS[i,j]{d(p,p)}.

2.1 Algorithmic Overview

Generic (Approximate) CDO Algorithm

Our algorithm for ACDO is based on a generic algorithm whose structure follows the structure of the algorithm in [17] for the exact CDO problem. The generic algorithm is a classic heavy-light algorithm; our algorithms are a new implementation of the generic algorithm.

For an integer parameter 0τn, a color c𝒞 is said to be heavy if |P(c)|τ and light otherwise. Let ={h1,h2,,h||} be the set of heavy colors, and let be the set of light colors. Notice that ||nτ. In the preprocessing phase, for each color c𝒞, the algorithm stores P(c) in an NNS data structure, denoted by NNSc. In addition, the algorithm pre-computes a matrix E={ei,j} of size ||×||, such that δhi,hjei,j(1+ε)δhi,hj. An ACDO query on c,c𝒞 is processed as follows. If both c,c, then, without loss of generality, c=hi and c=hj. In such a case the algorithm returns ei,j, which is a (1+ε) approximation of δhi,hj. Otherwise, without loss of generality, C is a light color, and the algorithm returns minp^P(c){d(p^,c)}=δc,c, by performing |P(c)|τ NNS queries on NNSc, one for each point in C.

Time complexity

The preprocessing and query time costs of the generic algorithm depend on the implementation of the NNS data structure, and the time used for computing matrix E. Thus, we express the time complexity of the generic algorithm as a function of Tp,NNS(t), Tq,NNS(t), and TE(), which are the preprocessing time and query time of the NNS data structure on a set of size t, and the time used to compute E, respectively. In the preprocessing phase, the algorithm computes E and creates an NNS data structure for each color in 𝒞. Thus, the preprocessing time cost is

O(TE()+c𝒞Tp,NNS(P(c))).

In the query phase, the time cost is dominated by executing at most τ NNS queries on a set of size at most n, and so the time cost is O(τTq,NNS(n)).

Constructing 𝑬 and solving ACDO on an array

In Section 3 we describe an efficient algorithm for constructing E when 𝐌 is the metric of integers. Our algorithm is structured as follows. For pairs of heavy colors whose color distance is smaller than some threshold, the algorithm computes their distance via a straightforward brute-force computation. The more challenging part is dealing with pairs of heavy colors with larger color distance. Our approach is to compute a series of Boolean matrices with specially designed properties, so that after the computation of the matrices we are able scan the matrices and deduce a (1+ε) approximation for every pair of colors with a rather large distance color. We remark that to compute the matrices used for the approximations, our algorithm makes use of FMM, which, as noted in Section 1, is required for obtaining improvements in runtime compared to the exact solution of [17].

To complete the proof of Theorem 5, in Section 4 we analyze the runtime of the generic algorithm with the runtime of constructing E when 𝐌 is defined by locations in an array, and by plugging in known NNS data structures.

Solving AMCDOCH on an array

In Section 5 we prove Theorem 6. Our algorithm is a reduction from AMCDOCH on an array to ACDO on an array. The structure of our reduction somewhat resembles the structure of the MCDOCH algorithm of [17]. In particular, in the algorithm of [17] they partition the points into sets of size τ, preprocess each set into an RNNS data structure, and for each pair of sets apply τ RNNS queries to compute the distance between the sets. To obtain a faster runtime, we compute the distance between every pair of sets by recoloring the input points using the sets as a new color scheme and then applying our ACDO algorithm with the new color set. One important property of our ACDO algorithm is that when a query takes place between two heavy colors, the query time is constant since the algorithm just looks up a single value in E. It turns out that essentially all of the colors in the new color set are heavy, and so after the preprocessing phase of our ACDO algorithm, we are able to compute the distance between a pair of sets in O(1) time. This property turns out to be helpful for our AMCDOCH algorithm, which is described in Section 5.

3 Computing 𝑬 when 𝐌 is integers

In this section, we show how to compute matrix E when our metric 𝐌 is one dimensional, and for p,p𝐌 we have d(p,p)=|pp|. We make the simplifying assumption that min(S)=1 since, otherwise, one could shift S by subtracting min(S)1 from each point.

Let W be a positive integer parameter that will be determined later. The construction algorithm for E has two conceptual parts. The first part deals with pairs of heavy colors with distance at most 1+2εεW, by computing the distances exactly using a brute-force method. The second part deals with pairs of heavy colors with distance greater than 1+2εεW. In this case, the algorithm computes an approximation of the distances by utilizing a series of matrices defined as follows.

Let 0=log1+ε(1ε)+1 and let max=log1+ε(max(S)). Let A={ai,j} be a Boolean ||×max(S)W matrix where

ai,j={1P(hi)S[(j1)W+1,jW]0otherwise

Thus, matrix A indicates for each heavy color and each block of size W in S which starts at a location following an integer multiple of W, whether the heavy color appears in the block or not.

For 0max, let B()={bi,j()} be a Boolean max(S)W×|| matrix where

bi,j()={1P(hj)S[(i1)W+1,iW+(1+ε)W]0otherwise 

Matrix B() indicates for each block of size W+(1+ε)W in S which starts at a location following an integer multiple of W, and for each heavy color, whether the heavy color appears in the block or not.

Let E()=AB()={ei,j()}, where the product is a Boolean matrix product. Thus, E() roughly indicates for each pair of heavy colors whether their color distance is at most W+(1+ε)W.

In Section 3.1 we state and prove lemmas regarding the connections between the matrices E() and approximating color distances. In Section 3.2 we describe the algorithm for constructing E, and prove its correctness based on the lemmas of Section 3.1.

3.1 Properties of The 𝑬() Matrices

The following lemma describes a connection between entries of 0 in E and lower bounds on color distances.

Lemma 12.

For hi,hj and integer 0max, if ei,j()=0 and ej,i()=0, then δhi,hj>(1+ε)W.

Proof.

Let pP(hi) and pP(hj) be chosen such that δhi,hj=d(p,p). Assume towards a contradiction that δhi,hj(1+ε)W. We focus on the case of pp and so δhi,hj=pp; the proof for the case p>p is symmetrical. Thus, there exists an integer 1kmax(S)W such that pS[(k1)W+1,kW] and so ai,k=1. Therefore,

p=p+δhi,hjp+(1+ε)WkW+(1+ε)W.

We conclude that pS[(k1)W+1,kW+(1+ε)W], implying that bk,j()=1, and so ei,j()=1, which is a contradiction.

The following lemma shows a connection between colors with distance greater than (1+2εε)W and entries in E(0) whose value is 0.

Lemma 13.

For hi,hj, if δhi,hj>(1+2εε)W then ei,j(0)=0 and ej,i(0)=0.

Proof.

Assume by contradiction that ei,j(0)=1 or ej,i(0)=1. We focus on the case of ei,j(0)=1; the proof for the case ej,i(0)=1 is symmetrical. Since ei,j(0)=1 , there exists an integer 1kmax(S)W such that ai,k=1 and bk,j(0)=1. Therefore, there exist points pP(hi) and pP(hj) such that pS[(k1)W+1,kW] and pS[(k1)W+1,kW+(1+ε)(0)]. Thus,

δhi,hj =minp^P(c),p~P(c){d(p^,p~)}d(p,p)=max(pp,pp)
=max(W1,(1+ε)(0)W+W1)=(1+ε)(0)W+W1
=(1+ε)log1+ε(1ε)+1W+W1((1+ε)log1+ε(1ε)+1+1)W1
<(1+2εε)W.

The following lemma describes a connection between entries of 1 in E and upper bounds on color distances.

Lemma 14.

For hi,hj and integer 0max, if ei,j()=1, then δhi,hj<(1+ε)+1W.

Proof.

If ei,j()=1, then there exists an integer 1kmax(S)W such that ai,k=1 and bk,j()=1. Hence, there exist pP(hi) and pP(hj) such that pS[(k1)W+1,kW] and pS[(k1)W+1,kW+(1+ε)W]. Thus,

δhi,hj |pp|=max(pp,pp)max(W1,(1+ε)W+W1)
=(1+ε)W+W1<((1+ε)+1)W.

Notice that log1+ε(1ε)+1>log1+ε(1ε). Finally, since log1+ε(1ε)+1=0, we have

(1+ε) (1+ε)log1+ε(1ε)+1>(1+ε)log1+ε(1ε)=1ε,

and so 1<ε(1+ε). Thus,

((1+ε)+1)W<((1+ε)+ε(1+ε))W=((1+ε)+1)W,

completing the proof.

The following lemma shows that ei,j() is weakly monotone increasing with respect to .

Lemma 15.

For hi,hj, and 0max, if ei,j()=1, then for any <max we have ei,j()=1.

Proof.

If ei,j()=1, then there exists an integer 1kmax(S)W such that ai,k=1 and bk,j()=1. Thus, there exists pP(hi) such that pS[(k1)W+1,kW+(1+ε)W]. Moreover, since <, we have S[(k1)W+1,kW+(1+ε)W]S[(k1)W+1,kW+(1+ε)W], and so bk,j()=1. Finally, since ai,k=1, we conclude that ei,j()=1.

The following lemma shows that in E(max), for each pair of heavy colors, there exists at least one corresponding 1 entry.

Lemma 16.

For hi,hj, either ei,j(max)=1 or ej,i(max)=1.

Proof.

Assume by contradiction that ei,j(max)=0 and ej,i(max)=0. Let pP(hi) and pP(hj) be chosen such that δhi,hj=d(p,p). Hence by Lemma 12, we have δhi,hj>(1+ε)maxW=(1+ε)log1+εmax(S)WW. Notice that max(S)δhi,hj. Therefore,

max(S) δhi,hj>(1+ε)(log1+εmax(S)W)W(1+ε)log1+εmax(S)WWmax(S),

which is a contradiction.

In the following lemma we show the main property of the matrices E() used in the algorithm of Section 3.2 when dealing with pairs of heavy colors with color distance greater than 1+2εεW.

Lemma 17.

For hi,hj , if δhi,hj>1+2εεW, then there exists a unique integer 0max1 such that ei,j()=ej,i()=0 and either ei,j(+1)=1 or ej,i(+1)=1.

Proof.

By Lemma 16, at least one of ei,j(max) and ej,i(max) must be 1. So assume without loss of generality that ei,j(max)=1. Since δhi,hj>1+2εεW, by Lemma 13 ei,j(0)=0. Thus, by Lemma 15, there must exist exactly one 0<max such that ei,j()=0 and ei,j(+1)=1.

To complete the proof we show uniqueness. If ei,j(max)=ej,i(max)=1, then assume towards a contradiction that there exist two different integers ^ such that ei,j()=0, ej,i()=0, ei,j(+1)=1, ei,j(^)=0, ej,i(^)=0, and ej,i(^+1)=1. Without loss of generality, <^, and so by Lemma 15, ei,j(+1)=1 and <^ implies ei,j(^)=1 which is a contradiction.

Otherwise, if at least one of ei,j(max) or ej,i(max) is 0, then since we assumed without loss of generality that ei,j(max)=1, we have ej,i(max)=0, which together with Lemma 15 implies that for all 0<max, ej,i()=0. Thus, is unique.

3.2 Algorithm for Computing 𝑬

Algorithm 1 ConstructE(S,,W,ε).

In this section, we describe the algorithm for computing E, prove its correctness and analyze its time cost. Pseudocode for the algorithm is given in Algorithm 1. The algorithm begins by a brute-force exact computation of color distances for all heavy colors hi and hi where δhi,hj1+2εεW. The brute-force computation costs O((1+2εε)Wn)=O(Wnε) time.

The rest of the algorithm deals with the case of δhi,hj>1+2εεW. First, the algorithm computes the matrices E() from matrices A and B(). The time cost of computing all E() is dominated by the cost of log1+ε(max(S))=Θ(logmax(S)ε) matrix multiplications. Each multiplication is between a matrix of size nτ×max(S)W and a matrix of size max(S)W×nτ. It is folklore knowledge that the matrix multiplication of two matrices of size x×y and y×z can be computed in O(xyzmin(x,y,z)3ω). Thus, the time cost of computing the matrix multiplications is O(n2max(S)τ2Wmin(nτ,max(S)W)3ωlogmax(S)ε) time.

Finally, the algorithm utilizes the matrices E() to complete entries in E which correspond to heavy pairs of colors that were not covered by the brute-force computation. To do so, for each such pair (hi,hj)×, the algorithm scans all ei,j() to find the unique from Lemma 17 such that ei,j()=ej,i()=0 and either ei,j(+1)=1 or ej,i(+1)=1, and sets ei,j to be (1+ε)+2W. Computing the entries in E for all such pairs costs O(||2logmax(s)ε))=O((nτ)2logmax(s)ε)), which is dominated by the cost of the matrix multiplications performed during the computation of all of the E() matrices.

Lemma 18.

There exists an algorithm that computes E in O~(Wnε+n2max(S)τ2Wmin(nτ,max(S)W)3ω) time, such that for hi,hj we have δhi,hjei,j(1+ε)δhi,hj.

Proof.

The runtime of the algorithm follows from the discussion above.

If δhi,hj1+2εεW, then there exist pP(h1) and p^P(hi) such that δhi,hj=|p^p|1+2εεW, and so, after the brute-force computation, we have ei,j=δhi,hj.

Otherwise, δhi,hj>1+2εεW, and so the algorithm sets ei,j to be (1+ε)+2W, where is the unique integer from Lemma 17, and in particular, ei,j()=ej,i()=0 and either ei,j(+1)=1 or ej,i(+1)=1. By Lemmas 14 and 12, δhi,hj<(1+ε)+2W<(1+ε)2δhi,hj.

Finally, to obtain a (1+ε) approximation one can run the algorithm with approximation parameter ε=ε3, which does not affect the asymptotic time complexity.

4 Proof of Theorem 5

In this section, we analyze the combination of the construction algorithm for E given in Section 3.2 together with the generic ACDO algorithm described in Section 2.1. However, we focus on the type of instances of ACDO which are used in Section 5 for solving the approximate snippets problem. In particular, our metric is defined by locations in an array of size n, and so we have max(S)=n. Thus, in our case,

TE() =O(Wnε+nωmax(τ,W)3ωτ2Wlognε).

If nω1ω+1log12nτn, then we choose W=(nτ)ω12log12n. In such a case, we have W=(nτ)ω12log12n(nnω1ω+1)ω12log12n=nω1ω+1log12nτ, and so

TE() =O((nτ)ω12nεlog12n+nωτ3ωτ2(nτ)ω12log12nlognε)=O(nω+12ετω12log12n).

If 1τnω1ω+1log12n, then we choose W=nτ2ω1log1ω1n. In such a case we have W=nτ2ω1log1ω1nnnω1ω+12ω1log1ω1n=nω1ω+1log1ω1nτ, and so

TE() =O(n2ετ2ω1log1ω1n+nω(nτ2ω1)2ωlog(n)2ωω1τ2lognε)
=O(n2ετ2ω1log1ω1n).

Thus, to summarize we have shown

TE()={O~(n2ετ2ω1)for 1τnω1ω+1O~(nω+12ετω12)for nω1ω+1τn.

Notice that in either case, TE()=Ω(max(n2/τ2),n).

For the NNS data structure we use the van Emde Boas [25] data structure which for a set of m points from integer universe {1,2,,u} has a preprocessing cost of O(mloglogu) and query time O(loglogu). In our setting, u=n, and so Tp,NNS(P(c))=O(|P(c)|loglogn) and Tq,NNS(n)=O(loglogn).

Thus, the construction time of the ACDO algorithm is

O(TE()+c𝒞Tp,NNS(P(c))) =O(TE()+c𝒞|P(c)|loglogn)
=O(TE()+nloglogn)
=O~(TE),

and the query time is O(τTq,NNS(n))=O(τloglogn)=O~(τ). To complete the proof of Theorem 5, we plug τ=O~(nb) into na=TE(), to obtain the following tradeoff curve for a fixed ε.

{a+2ω1b=2if 0bω1ω+12ω1a+b=ω+1ω1if ω1ω+1b1.
 Remark 19.

One feature of Theorem 5 algorithm, which is used in the proof of Theorem 6, is that the query cost for ACDO queries when both C and C are heavy is O(1) time since all the algorithm does is looking up the answer in E.

5 Proof Sketch of Theorem 6

The proof of Theorem 6 is inspired by the exact MCDOCH algorithm of [17], combined with a new observation which allows to leverage Theorem 5. Due to space considerations, in this section, we only highlight the differences between our algorithm for AMCDOCH and the exact MCDOCH algorithm of [17]. The complete proof of Theorem 6 is given in the full version [12].

The relevant part of the algorithm of [17] creates an array A of size n which is a permutation of the points in S, and preprocesses A into an RNNS data structure. Then, the algorithm partitions A into blocks of size τ, for an integer parameter 1τn. Let ={I1,I2Inτ} denote the set of blocks.

Following [17], the algorithm constructs a matrix B={bi,j} of size nτ×nτ, such that bi,j=d(Ii,Ij). To construct B, the algorithm of [17] performs τ RNNS queries for the computation of each entry in B. Computing B turns out to be the dominating component in the preprocessing runtime of [17].

To obtain a faster preprocessing time for the approximate setting, instead of constructing B, our algorithm constructs an approximate matrix B^={b^i,j} where d(Ii,Ij)b^i,j(1+ε)d(Ii,Ij). The construction of B^ utilizes the algorithm of Theorem 5 as follows. Define a new coloring set 𝒞^={c1^,c2^,c||^} over the points in S, such that for each ci^𝒞^ we have P(ci^)=Ii. Notice that for every ij, we have IiIj=, and so using the colors of 𝒞^ on S, every point in S has only one color. Thus, the algorithm uses the algorithm of Theorem 5 on S, but with the colors of 𝒞^, and the query time designed to be O~(nb)=O~(τ). Now, b^i,j is set to be the answer of the ACDO query on colors c^i and c^j, and so d(Ii,Ij)=δc^i,c^jb^i,j(1+ε)δc^i,c^j=(1+ε)d(Ii,Ij).

In the last step of the preprocessing phase, the algorithm preprocesses B^ (instead of B in [13]) using a 2D Range Minimum Query (2DRMQ) data structure [3] so that given a rectangle in B^, defined by its corners, the algorithm returns in O(1) time the smallest value entry in the rectangle.

Answering queries and correctness

The process for answering a query is the same as in [17], but using B^ instead of B. Thus, the correctness of the algorithm needs to be proven given the use of B^.

As shown in [17], for each c𝒞, P(c) is exactly the points in some interval A[xc,yc]. The challenging part is answering an AMCDOCH query between C and C, when P(c) and P(c) are disjoint. Thus, assume without loss of generality that xcyc<xcyc. For the description here, we assume that xc (and xc) is the first location of some block, and yc (and yc) is the last location of some block. The more general cases are covered in the full version of this paper [12], with an additional execution of O~(τ) RNNS queries.

The query algorithm executes a 2DRMQ data structure query on the rectangle in B^ defined by corners (xc1τ+1,xc1τ+1) and (ycτ,ycτ). Let b^i,j be the answer returned by the 2DRMQ query. Notice that there exist pA[xc,yc] and pA[xc,yc] such that δc,c=d(p,p). Thus, there exist integers xc1τ+1iycτ,xc1τ+1jycτ such that pIi and pIj. Thus, δc,c=d(p,p)=d(Ii,Ij), and so

b^i,j =minxc1τ+1i^ycτ,xc1τ+1j^ycτ{b^i^,j^}minxc1τ+1i^ycτ,xc1τ+1j^ycτ{(1+ε)d(Ii^,Ij^)}
(1+ε)d(Ii,Ij)=(1+ε)δc,c.

Time Complexity

For the RNNS data structure we use the solution of [23] which on n points (which is the size of A) has preprocessing time O(nlogn) and query time O(logεn)=O~(1).

By Remark 19 and the fact that each block has size τ=O~(nb), each ACDO query used for constructing B^ costs O(1) time. So, after preprocessing the ACDO data structure, constructing B^ costs O((nτ)2) time. The preprocessing time of the 2DRMQ data structure of [3] when applied to B^ is O~((nτ)2). Thus, since TE(n)=Ω(max(n2/τ2),n)), the total time cost of the preprocessing phase, which is composed of constructing the ACDO and RNNS data structures on A, computing B^, and preprocessing a 2DRMQ data structure, is

O~(TE(n)+(n/τ)2+n)=O~(TE(n))={O~(n2ετ2ω1)for 1τnω1ω+1O~(nω+12ετω12)for nω1ω+1τn

The query process consists of O(τ) RNNS queries, and a single 2DRMQ query. Thus, the query time cost is O(τ). Finally, similar to the proof of Theorem 5, we obtained the following tradeoff curve for a fixed ε.

{a+2ω1b=2if 0bω1ω+12ω1a+b=ω+1ω1if ω1ω+1b1.

6 Proof of Theorem 8: CLB for Exact MCDO on an Array

To prove Theorem 8, we design a reduction from (min,+)-matrix product on values in [n^] to MCDO. To do so, consider an unbalanced version of (min,+)-matrix product where A is of size n^×m^, B is of size m^×n^, and all of the values are bounded by some positive integer M. It is straightforward to show that for any x0 such that m^=n^x, solving unbalanced (min,+)-matrix product with any polynomial time improvement over (n^2m^)1o(1) time is equivalent to solving (min,+)-matrix product on balanced matrices with m^=n^ in n^3Ω(1) time. Thus, an algorithm for unbalanced (min,+)-matrix product with values bounded by M=[min(n^,m^)]=n^ in time less than (n^2m^)1o(1) would refute the Strong-APSP hypothesis.

The reduction

Our goal is to design a reduction from the unbalanced (min,+)-matrix product problem with values bounded by n^min(n^,m^) to MCDO. In preparation for the proof of Theorem 9, we first describe the reduction using positive integer parameter M as the upper bound on the values in the input matrices, and during the analysis we set M=n^.

For each ai,j, the algorithm defines a point ai,j=(Mai,j)+9Mj with color i. For each bi,j, the algorithm defines a point bi,j=bi,j+3M+9Mi with color n^+j. Thus, the largest point defined is at most O(Mm^), and there are 2n^ colors. Notice that there cannot be two entries in the same row (column) of A (B) that define the same point. However, it is possible that ai,j=ai,j for ii, and so ai,j=ai,j, but the two points have different colors. A similar phenomenon can happen to points defined by entries of B which share the same row. Thus, we merge all occurrences of the same point into a single occurrence but colored with all of the colors of the pre-merge occurrences. The set of points, denoted by S, contains at most 2n^m^ multi-colored points in a metric defined by an array of size O(Mm^), and so the algorithm uses the MCDO algorithm to preprocess S. Finally, for each entry di,j in D, the algorithm sets di,jδi,n^+j2M by performing a query on the MCDO data structure.

Correctness

Suppose di,j=ai,k+bk,j for some k[m^]. For every k[m^], by definition we have bk,j>ai,k and so |ai,kbk,j|=bk,jai,k=bk,j+ai,k+2M. Since ai,k is colored with color i and bk,j is colored with color n^+j, we have

δi,n^+j2M|ai,kbk,j|2M=ai,k+bk,j=di,j. (1)

Let k,k′′[m^]. If kk′′, then

|ai,kbk′′,j| =max(ai,kbk′′,j,bk′′,jai,k)
=max(9M(kk′′)2Mai,kbk′′,j,bk′′,j+ai,k+9M(k′′k)+2M)
5M.

However, if k=k′′ then ai,k+bk,j+2M4M. Since 4M<5M, we have

δi,n^+j=minpP(i),pP(n^+j){|pp|}=min1k^n^{|ai,k^bk^,j|}.

Thus, there exists some k^[m^] such that δi,n^+j=|ai,k^bk^,j|=ai,k^+bk^,j+2Mdi,j+2M, and combined with Equation 1 we have di,j=δi,n^+j2M.

Lower bound

Recall that in our setting M=n^. The reduction preprocesses an MCDO data structure on n2n^m^ points in an array metric of size O(Mm^)=O(n), and then answers n^2 MCDO queries. If tp(n) and tq(n) are preprocessing and query times, respectively, of the MCDO data structure, then the Strong-APSP conjecture implies that tp(n^m^)+n^2tq(n^m^)=(n^2m^)1o(1).

Recall that our goal is to prove that a+b2o(1). Thus, assume towards a contradiction that a+b=2Ω(1), which, by straightforward algebraic manipulation, implies that 2aab=1+2abΩ(1). Moreover, since the unbalanced (min,+)-matrix product is hard for any choice of x0, we can choose x=2ab1 and so n^m^=n^1+x=n^2ab. Notice that 2+2bab=2aab Thus, we have tp(n^m^)+n^2tq(n^m^)=O(n^2aab+n^2+2bab)=O(n^2aab)=O(n^1+2abΩ(1))<(n^2m^)1o(1), which contradicts the Strong-APSP hypothesis.

7 Conclusions and Open Problems

We have shown the existence of FMM based algorithms for both ACDO and the (1+ε)-approximate snippets problem which, assuming ω=2, are essentially optimal. Moreover, we proved CLBs for exact version of CDO, implying that the exact versions of CDO and the snippets problem are strictly harder than their approximate versions.

We remark that one immediate and straightforward way to improve our algorithms if ω>2 is to apply fast rectangular matrix multiplication ([2]). However, we chose not to describe such improvements since they are mostly technical and do not add any additional insight to the problems that we address. Moreover, we remark that it is straightforward to adapt our CDO algorithms to return the two points (one of each color) that define the distance in addition to the actual distance (see the full version [12]).

Our work leaves open the task of designing improved algorithms for more general metrics such as higher dimensional Euclidean space.

References

  • [1] Mohammad Reza Abbasifard, Bijan Ghahremani, and Hassan Naderi. A survey on nearest neighbor search methods. International Journal of Computer Applications, 2014.
  • [2] Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. More asymmetry yields faster matrix multiplication. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2005–2039. SIAM, 2025. doi:10.1137/1.9781611978322.63.
  • [3] Amihood Amir, Johannes Fischer, and Moshe Lewenstein. Two-dimensional range minimum queries. In Combinatorial Pattern Matching: 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007. Proceedings 18, pages 286–294. Springer, 2007. doi:10.1007/978-3-540-73437-6_29.
  • [4] Alexandr Andoni. Nearest neighbor search: The old, the new, and the impossible. PhD thesis, Massachusetts Institute of Technology, 2009.
  • [5] Maxim Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 572–591. SIAM, 2014.
  • [6] Djamal Belazzougui and Simon J Puglisi. Range predecessor and lempel-ziv parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2053–2071. SIAM, 2016. doi:10.1137/1.9781611974331.CH143.
  • [7] Timothy M Chan, Virginia Vassilevska Williams, and Yinzhan Xu. Fredman’s trick meets dominance product: Fine-grained complexity of unweighted apsp, 3sum counting, and more. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023.
  • [8] Shiri Chechik. Improved distance oracles and spanners for vertex-labeled graphs. In Algorithms–ESA 2012: 20th Annual European Symposium, Ljubljana, Slovenia, September 10-12, 2012. Proceedings 20, pages 325–336. Springer, 2012. doi:10.1007/978-3-642-33090-2_29.
  • [9] Maxime Crochemore, Costas S Iliopoulos, Marcin Kubica, M Sohel Rahman, German Tischler, and Tomasz Waleń. Improved algorithms for the range next value problem and applications. Theoretical Computer Science, 434:23–34, 2012. doi:10.1016/J.TCS.2012.02.015.
  • [10] Jacob Evald, Viktor Fredslund-Hansen, and Christian Wulff-Nilsen. Near-optimal distance oracles for vertex-labeled planar graphs. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2021.
  • [11] Danny Hermelin, Avivit Levy, Oren Weimann, and Raphael Yuster. Distance oracles for vertex-labeled graphs. In Automata, Languages and Programming: 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II 38, pages 490–501. Springer, 2011. doi:10.1007/978-3-642-22012-8_39.
  • [12] Noam Horowicz and Tsvi Kopelowitz. Color distance oracles and snippets: Separation between exact and approximate solutions. arXiv preprint, 2025. arXiv:2507.04578.
  • [13] Matti Karppa and Petteri Kaski. Probabilistic tensors and opportunistic boolean matrix multiplication. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 496–515. SIAM, 2019. doi:10.1137/1.9781611975482.31.
  • [14] Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theoretical Computer Science, 525:42–54, 2014. doi:10.1016/J.TCS.2013.10.010.
  • [15] Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. Range non-overlapping indexing and successive list indexing. In Algorithms and Data Structures: 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007. Proceedings 10, pages 625–636. Springer, 2007. doi:10.1007/978-3-540-73951-7_54.
  • [16] Jon M Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, 1997.
  • [17] Tsvi Kopelowitz and Robert Krauthgamer. Color-distance oracles and snippets. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.CPM.2016.24.
  • [18] Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3sum conjecture. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 1272–1287. SIAM, 2016. doi:10.1137/1.9781611974331.CH89.
  • [19] Tsvi Kopelowitz and Virginia Vassilevska Williams. Towards optimal set-disjointness and set-intersection data structures. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 74:1–74:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.ICALP.2020.74.
  • [20] Robert Krauthgamer and James R Lee. Navigating nets: Simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 798–807. Citeseer, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982913.
  • [21] Mingfei Li, Chu Chung Christopher Ma, and Li Ning. (1+ ε)-distance oracles for vertex-labeled planar graphs. In International Conference on Theory and Applications of Models of Computation, pages 42–51. Springer, 2013. doi:10.1007/978-3-642-38236-9_5.
  • [22] J Ian Munro, Yakov Nekrich, and Jeffrey S Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91–97, 2016. doi:10.1016/J.TCS.2015.11.011.
  • [23] Yakov Nekrich and Gonzalo Navarro. Sorted range reporting. In Algorithm Theory – SWAT 2012: 13th Scandinavian Symposium and Workshops, Helsinki, Finland, July 4-6, 2012. Proceedings 13, pages 271–282. Springer, 2012. doi:10.1007/978-3-642-31155-0_24.
  • [24] Avi Shoshan and Uri Zwick. All pairs shortest paths in undirected graphs with integer weights. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 605–614. IEEE, 1999. doi:10.1109/SFFCS.1999.814635.
  • [25] Peter van Emde Boas. Preserving order in a forest in less than logarithmic time. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975), pages 75–84. IEEE, 1975. doi:10.1109/SFCS.1975.26.