Abstract 1 Introduction 2 Approximate 𝒌NN-realization in 𝒅 3 𝒌NN-Realization in 𝟏 4 Concluding remarks and extensions References

Embedding Graphs as Euclidean kNN-Graphs

Thomas Schibler ORCID University of California, Santa Barbara, CA, USA Subhash Suri ORCID University of California Santa Barbara, CA, USA Jie Xue ORCID New York University Shanghai, China
Abstract

Let G=(V,E) be a directed graph on n vertices where each vertex has out-degree k. We say that G is kNN-realizable in d-dimensional Euclidean space if there exists a point set P={p1,p2,,pn} in d along with a one-to-one mapping ϕ:VP such that for any u,vV, u is an out-neighbor of v in G if and only if ϕ(u) is one of the k nearest neighbors of ϕ(v); we call the map ϕ a kNN-realization of G in d. The kNN-realization problem, which aims to compute a kNN-realization of an input graph in d, is known to be NP-hard already for d=2 and k=1 [Eades and Whitesides, Theoretical Computer Science, 1996], and to the best of our knowledge has not been studied in dimension d=1. The main results of this paper are the following:

  • For any fixed dimension d2, we can efficiently compute an embedding realizing at least a 1ε fraction of G’s edges, or conclude that G is not kNN-realizable in d.

  • For d=1, we can decide in O(kn) time whether G is kNN-realizable and, if so, compute a realization in O(n2.5𝗉𝗈𝗅𝗒(logn)) time.

Keywords and phrases:
Geometric graphs, k-nearest neighbors, graph embedding, approximation algorithms
Copyright and License:
[Uncaptioned image] © Thomas Schibler, Subhash Suri, and Jie Xue; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
Full Version: https://arxiv.org/pdf/2504.06503
Editors:
Oswin Aichholzer and Haitao Wang

1 Introduction

The kNN-graph of a set P of points in d is a directed graph with vertex set P and edges defined as follows: there is a directed edge from aP to bP if and only if b is one of the k-nearest neighbors of a in P (under the Euclidean distance). A directed graph G=(V,E) is kNN-realizable in d if it is isomorphic to the kNN-graph of a set of points in d. In this paper, we consider the following natural problems regarding kNN-graphs: Can we efficiently check whether a given directed graph is kNN-realizable in d? If so, can we efficiently find a kNN-realization of G? More formally, a kNN-realization (or kNN-embedding) of G in d is a one-to-one mapping ϕ:VP from the vertex set V of G to a set P of points in d that induces an isomorphism between G and the kNN-graph of P. Throughout the paper, we use the terms embedding and realization interchangeably.

The kNN-graph is a member of the well-known family of proximity graphs in computational geometry that includes minimum spanning trees, relative neighborhood graphs, Gabriel graphs, and Delaunay triangulations [11]. Over the past several decades, a substantial research effort has been directed to design efficient algorithms to compute these structures for an input set of points. The inverse problem – the focus of our paper – where we want to recover the points that produce a given proximity graph, however, remains less well understood.

An early example of a positive result in this direction is on Euclidean minimum spanning trees. Monma and Suri [22] show that any tree with maximum node degree 5 can be realized as a minimum spanning tree of points in the plane and, additionally, every planar point set admits a minimum spanning tree with degree at most 5. This settles the above problem for non-degenerate planar point sets but if we allow co-circularities, then a planar minimum spanning tree can have maximum node degree 6; in that case, the problem of computing a 1NN-realization was proved to be NP-complete by Eades and Whitesides [13]. Dillencourt [12] considers the problem of recognizing triangulations that can be realized as Delaunay triangulations in the plane, and it is also known that all outerplanar triangulations are realizable [24]. (These proofs are non-constructive, however, and only exponential time algorithms are known for computing the coordinates of the points in the realization [1].) Among other related results, Bose et al. [5] give a characterization of trees that can be realized in the plane as relative neighborhood or Gabriel graphs, and Eppstein et al. [15] establish some structural properties of kNN graphs of random points in the plane.

The kNN-realization is not directly related to the metric embedding problem but there are some obvious similarities. The input to metric embedding is a weighted graph satisfying triangle inequalities and the goal is to find an embedding realizing all pairwise distances. This is not always possible [20] – there are simple metrics that cannot be embedded in any finite dimensional Euclidean space. However, if we allow some distortion (i.e. approximation) of distances, then any n-node metric graph can be embedded in O(logn) dimensional space with polylog distortion [6], or using the celebrated Johnson-Lindenstrauss theorem [18] in dimension O(logn/ε2) with distortion 1+ε for any fixed constant ε>0.

In metric embedding the goal is to realize all pairwise distances (approximately), while in kNN-realization the goal is only to preserve all ordinal neighbor relations in a directed graph: a neighbor of each vertex must be closer than any of its non-neighbors but the choice of specific distances does not matter. Although there is some work in embedding ordinal relations as well, the main focus is specialized metric spaces and bounds on ordinal distortion. In particular, Alon et al. [2] consider embeddings with ordinal relaxations into ultrametric spaces, and Bădoiu et al. [7] improve their bounds for tree and line embeddings. A different line of research concerns triplets embedding, where the input is a set of triples (a,b,c) specifying constraints of the form d(a,b)<d(a,c). Even for embedding in one dimension 1, the general triplet problem as well as the related “betweenness” problem is MAXSNP-hard [9], and a FPTAS is known for maximizing the number of satisfied triplet constraints [17]. (If all (n3) triplet constraints are given, then the problem is trivial to solve – indeed, once the leftmost point is determined, the rest of the ordering can be easily decided.) In contrast with the general triplet constraints, the pairwise relations in a kNN-realization problem in 1 have a richer structure that allows us to solve the problem efficiently.

The kNN-realization is also related to the sphericity of graphs [19], where the goal is to embed a (undirected) graph G=(V,E) into an Euclidean space k such that there is a point p(u) for each vertex uV, and d(p(u),p(v))1 iff (u,v)E. The smallest dimension k that admits such an embedding is called the sphericity of G [19]. (Another related problem is the dimension of a graph, introduced by Erdős, Harary and Tutte [16]: it is the smallest number k such that G can be embedded into Euclidean k-space with every edge of G having length 1.) Along these lines, Chatziafratis and Indyk [8] also show that if we want to preserve the relative distances among the k nearest neighbors of each point, then the embedding dimension must grow linearly with k. Unlike these results, our work is algorithmic – we design efficient algorithms to realize a given graph G in a specified dimension d.

Euclidean embedding of neighbor relations is also studied in social sciences for geometric realization of preference orderings. Given a set V of n voters, a set C of m candidates, and a rank ordering of the candidates by each voter, we say that the preference graph can be realized in Euclidean d-space if each voter’s preferences are consistent with its Euclidean distances to all the candidates. In this case, the smallest dimension must satisfy dmin{n,m1} [4].

1.1 Main results

We study the kNN-realization problem, which takes a directed graph G as an input and aims to compute a kNN-realization of G in d (or decide the nonexistence of such a realization). Throughout we focus on the unranked kNN-realization problem where nearest neighbors are realized as an unordered set but our results also hold (with minor caveat) for the ranked version where edges of G specify each of the k neighbors in order (see Concluding Remarks). Our two main results are the following.

  1. 1.

    Assuming that G is kNN-realizable in d, we can find a d-dimensional realization in polynomial time that preserves at least a 1ε fraction of the edges of G, for any fixed ε>0. If our algorithm fails, we can also conclude that G is not kNN-realizable in d. In particular, our algorithm is an EPTAS (efficient polynomial-time approximation scheme) for the kNN-realization problem for fixed k and d.

  2. 2.

    We give a linear-time algorithm to decide wether G is kNN-realizable in 1. Specifically, the algorithm runs in O(kn) time, which is linear in the size of the input graph. If G is realizable, our algorithm can compute a realization in O(n2.5𝗉𝗈𝗅𝗒(logn)) time.

1.2 Basic definitions

For a (directed or undirected) graph G, we use V(G) and E(G) to denote the set of its vertices and edges, respectively. If G is a directed graph, we say G is k-regular if the out-degree of every vertex of G is exactly equal to k. We use 𝗂𝗇[v] (resp., 𝗈𝗎𝗍[v]) to denote the set consisting of v itself and all in-neighbors (resp., out-neighbors) of v. A kNN-realization of G=(V,E) in a metric space =(M,𝖽𝗂𝗌𝗍) is an injective map ϕ:VM such that 𝖽𝗂𝗌𝗍(ϕ(v),ϕ(u))<𝖽𝗂𝗌𝗍(ϕ(v),ϕ(u)), for any distinct triple v,u,uV where (v,u)E and (v,u)E. We say G is kNN-realizable in if there exists a kNN-realization of G in .

2 Approximate 𝒌NN-realization in 𝒅

We first observe that it is easy to decide in polynomial time whether a given graph G is kNN-realizable in a finite-dimensional Euclidean space. We just have to check the acyclicity of an auxiliary graph defined as follows. Let ΛG be a directed graph whose vertices correspond to pairs of vertices in V(G) with a directed edge ({v,u},{v,u}) for every distinct triple v,u,uV(G) where (v,u)E(G) and (v,u)E(G). Intuitively, ΛG encodes the -relation among the pairwise distances of points in any (potential) kNN-realization of G. If ΛG has a directed cycle, then clearly G is not kNN-realizable. Otherwise, a topological sort of ΛG gives a total ordering of the vertex pairs in G. With this ordering, we can find a kNN-realization of G in n using the result of Bilu and Linial [3].

On the other hand, deciding whether G is kNN-realizable in d, for a specific dimension d, is NP-hard. This is shown by Eades and Whitesides [14] who proved the hardness for d=2 and k=1. Therefore, in this section, we explore an approximation algorithm for the realization problem in any fixed dimension d. We first need to define an approximate solution to our problem. There is a natural way to measure how well a map ϕ:V(G)d approximates a kNN-realization: randomly sample an edge (u,v)E(G) and consider the probability that ϕ(v) is among the k nearest neighbors of ϕ(u). Formally, we introduce the following definition.

Definition 1 (approximate kNN-realization).

Let G be a k-regular directed graph. For c[0,1], a map ϕ:V(G)d is a 𝐜-approximate 𝐤NN-realization of G in d if

(u,v)E(G)σϕ(u,v)c|E(G)|,

where σϕ:V(G)×V(G){0,1} is the indicator function defined as σϕ(u,v)=1 if we have |{vV(G)\{u}:ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}|k and σϕ(u,v)=0 otherwise.

Our main result is an algorithm for computing a (1ε)-approximate kNN-realization of G in d with time complexity f(k,d,ε)nO(1), for any given ε>0 (provided that G is kNN-realizable in d), where f is some computable function. In other words, for fixed k and d, we obtain an efficient polynomial-time approximation scheme (EPTAS) for the kNN-realization problem in d.

At a high level, our algorithm consists of two main steps. In the first step, it computes a set EE(G) of edges such that |E|ε|E(G)| and each weakly-connected component of GE contains Oε(1) vertices. The existence of E follows from the result of Miller et al. [21] on graph separators, and the computation of E relies on the approximate minimum cut algorithm of Chuzhoy et al. [10]. The edges in E are the ones we sacrifice in our approximation and so, in the second step, our algorithm computes a map ϕ:V(G)d satisfying σϕ(u,v)=1 for all edges (u,v)E(G)\E. This turns out to be easy, since the weakly-connected components of GE are of size Oε(1) and we can work on these components individually. These two steps will be presented in Sections 2.1 and 2.2, respectively.

2.1 Splitting the graph by removing few edges

A balanced cut of an undirected graph H is a subset EE(H) such that every connected component of HE contains at most |V(H)|/2 vertices. Every directed graph G naturally corresponds to an undirected graph G0 defined as V(G0)=V(G) and E(G0)={{u,v}:(u,v)E(G) or (v,u)E(G)}. We need the following important result.

Lemma 2.

Let G be a directed graph that is kNN-realizable in d, and G0 be the undirected graph corresponding to G. Then any subgraph H of G0 with |V(H)|2 admits a balanced cut of size O(|V(H)|11d), where the constant hidden in O() only depends on k and d.

Proof sketch.

Let k,d be fixed numbers. The lemma follows from two results in [21]. Specifically, it was shown in [21] that if a graph G is kNN-realizable in d, then its corresponding undirected graph G0 satisfies the following conditions.

  1. 1.

    The degree of every vertex in G0 is O(1).

  2. 2.

    For every subgraph H of G0, there exists SV(H) such that |S|=O(|V(H)|11d) and every connected component of HS contains at most d+1d+2|V(H)| vertices.

We remark that [21] only claimed condition 2 above for the case H=G0, but the argument extends to any subgraph H of G0. Using the two conditions above, it is fairly easy to construct the desired balanced cut for any subgraph of G0.

A minimum balanced cut refers to a balanced cut consisting of the minimum number of edges. A c-approximate minimum balanced cut refers to a balanced cut whose size is at most c𝗈𝗉𝗍, where 𝗈𝗉𝗍 is the size of a minimum balanced cut. The above lemma implies that if G is kNN-realizable in d, then the size of a minimum balanced cut of any subgraph H of G0 is O(|V(H)|11d). We need the following algorithm by Chuzhoy et al. for computing approximate minimum balanced cuts.

Theorem 3 ([10]).

For any fixed number α>0, there exists an algorithm that, given an undirected graph H of n vertices and m edges, computes a logO(1/α)n-approximate minimum balanced cut of H in O(m1+α) time.

The following lemma is the main result of this section, which states that one can remove a small fraction of edges from G to split G into small components.

Lemma 4.

Let k,d be fixed numbers. Given a k-regular directed graph G and a number ε(0,1], one can either compute a subset EE(G) such that |E|ε|E(G)| and each weakly-connected component of GE contains at most (1ε)O(1) vertices, or conclude that G is not kNN-realizable in d, in O(|V(G)|1+α) time for any constant α>0. Here the constants hidden in O() depend on k, d, and α.

Proof.

Let G0 be the corresponding undirected graph of G, and α>0 be a constant. It suffices to compute EE(G0) such that |E|ε|E(G0)| and each connected component of G0E contains at most (1ε)O(1) vertices. Indeed, the edges in G corresponding to E form a subset of E(G) of size O(ε|E(G)|) which satisfies the desired property. This is fine since our algorithm works for an arbitrary ε>0.

Pick a constant α(0,α) and let ApprxMinCut(H) denote the algorithm in Theorem 3, which returns an approximate minimum balanced cut of H in O(|V(H)|1+α) time. Our algorithm, presented in Algorithm 1, computes EE(G0) by recursively applying ApprxMinCut as a sub-routine. We fix some threshold δ=(1ε)c for a sufficiently large c (depending on k, d, and α). If |V(G0)|δ, then we simply return E=. Otherwise, we apply ApprxMinCut(G0) to obtain an approximate minimum balanced cut of G0, and add the edges in the cut to E. Then for every connected component C of G0E, we recursively apply our algorithm on C, which returns a subset of ECE(C), and we add the edges in EC to E. Finally, we return E as the output of our algorithm.

Algorithm 1 CutEdge(G0).

Clearly, every connected component of G0E contains at most δ=(1ε)O(1) vertices. So it suffices to show |E|ε|E(G0)| and analyze the running time of the algorithm. To bound |E|, we observe that when applying ApprxMinCut on any subgraph H of G0, the size of the edge set obtained is of size |V(H)|11dlogO(1/α)|V(H)|. Indeed, a minimum balanced cut of H has size O(|V(H)|11d) by Lemma 2, and ApprxMinCut(H) returns a logO(1/α)|V(H)|-approximate minimum balanced cut of H. Since we defined δ=(1ε)c for a sufficiently large c, we may assume that the size of ApprxMinCut(H) is at most |V(H)|112d for any subgraph H of G0 such that |V(H)|>δ. Also, we may assume that ε(213d1)n113d>n112d for all n>δ. We shall prove that when applying our algorithm on a subgraph H of G0 with |V(H)|=n, the size of EE(H) returned is at most ε(nn113d). We apply induction on n. If V(H)δ, then our algorithm returns an empty set and thus the statement trivially holds. Assume the statement holds for |V(H)|[n1], and we consider the case |V(H)|=n (where n>δ). Our algorithm applies ApprxMinCut(H) to obtain an approximate minimum balanced cut E0E(H) of H. As aforementioned, |E0|n112d. Let C1,,Cr be the connected components of HE0. Set ni=|V(Ci)| for i[r]. We have nin/2 for all i[r], by the definition of a balanced cut. We recursively call our algorithm on each Ci to obtain a subset EiE(Ci). By our induction hypothesis, |Ei|ε(nini113d). Finally, the algorithm returns E=i=0rEi. Thus, we have |E|=|E0|+i=1r|Ei|. As i=1rni=n, i=1r|Ei|εnεi=1rni113d. Furthermore, since nin/2 for all i[r] and i=1rni=n, it holds i=1rni113d2(n2)113d=213dn113d. Combining the bounds for |E0| and i=1r|Ei|, we have

|E|n112d+εnε213dn113d.

Recall our assumption ε(213d1)n113dn112d. Together with the inequality above, this gives us |E|ε(nn113d), so our induction works. In particular, when we apply Algorithm 1 on G0, the output EE(G0) satisfies |E|ε(|V(G0)||V(G0)|113d)ε|V(G0)|.

To analyze the time complexity of Algorithm 1, we consider the recursion tree T of our algorithm when applying on G0. Each node tT corresponds to a recursive call, and denote by G0(t) the corresponding graph handled in that call (which is a subgraph of G0). The time cost at a node tT is O(|E(G0(t))|1+α), which is just O(|V(G0(t))|1+α) because |E(G0(t))|k|V(G0(t))|. Since ApprxMinCut always returns a balanced cut, we know that |V(G0(t))||V(G0(t))|/2 if t is the parent of t in T. This implies that the depth of T is O(log|V(G0)|). Also, the sum of |V(G0(t))| for all nodes tT at one level of T is at most |V(G0)|. Thus tT|V(G0(t))||V(G0)|log|V(G0)| and tT|V(G0(t))|1+α|V(G0)|1+αlog|V(G0)|. The latter implies that the time complexity of Algorithm 1 is O(|V(G0)|1+αlog|V(G0)|), which is O(|V(G0)|1+α) since α>α. We remark that a more careful analysis can be applied to show that the running time of our algorithm is actually O(|V(G0)|1+α) instead of O(|V(G0)|1+αlog|V(G0)|). But for simplicity, we chose this looser analysis, which is already sufficient for our purpose.

2.2 Computing the approximate realization

After computing the set EE(G) of edges using Lemma 4, we consider the graph GE. Let C1,,Cr be the weakly-connected components of GE. We have |V(Ci)|=(1ε)O(1) by Lemma 4. For each i[r], we want to compute a “kNN-realization” of Ci in d. Of course, here each Ci is not necessarily k-regular, and thus a kNN-realization of Ci is not defined. But we can slightly generalize the definition of kNN-realization as follows. For a directed graph C, we say a map ϕ:V(C)d is a quasi-kNN-realization of C in d if for every edge (u,v)E(C), |{vV(C)\{u}:ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}|k. By this definition, a kNN-realization is also a quasi-kNN-realization. Also, it is clear that if ϕ:V(H)d is a quasi-kNN-realization of a directed graph H in d, then for any subgraph C of H, the map ϕ|V(C) is a quasi-kNN-realization of C in d. Therefore, if G is kNN-realizable in d, then each of C1,,Cr admits a quasi-kNN-realization in d.

Next, we discuss how to compute a quasi-kNN-realization of each Ci in d (or conclude it does not exist). To this end, we need the following lemma.

Lemma 5.

Let C be a directed graph where |V(C)|k+1. If C admits a quasi-kNN-realization in d, then there exists a k-regular supergraph C of C with V(C)=V(C) such that C is kNN-realizable in d.

Proof.

Assume ϕ:V(C)d is a quasi-kNN-realization of C in d. We can slightly perturb ϕ so that for any distinct u,v,vV(C), ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}. Now define C as a directed graph with V(C)=V(C) and E(C)={(u,v)V(C)×V(C):uv and 𝗋𝖺𝗇𝗄ϕ(u,v)k}, where

𝗋𝖺𝗇𝗄ϕ(u,v)=|{vV(C)\{u}:ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}|.

The construction guarantees that C is k-regular and ϕ is a kNN-realization of C in d. Since ϕ is a quasi-kNN-realization of C, E(C)E(C) and thus C is a supergraph of C.

If |V(Ci)|k, then any map from V(Ci) to d is a quasi-kNN-realization of Ci. Otherwise, by the above lemma, to compute a quasi-kNN-realization of Ci in d, it suffices to consider every supergraph Ci of Ci with V(Ci)=V(Ci) and try to compute a kNN-realization of Ci (or conclude that Ci is not kNN-realizable). Note that the number of such supergraphs is at most exp((1ε)O(1)) because |V(Ci)|=(1ε)O(1). If we obtain a kNN-realization of some Ci, then it is a quasi-kNN-realization of Ci. To compute a kNN-realization ϕ:V(Ci)d of Ci, we formulate the problem as finding a solution to a system of (1ε)O(1) degree-2 polynomial inequalities on (1ε)O(1) variables. For each vV(Ci), we represent the coordinates of ϕ(v) by d variables x1(v),,xd(v). Then for all distinct u,v,vV(Ci) such that (u,v)E(Ci) and (u,v)E(Ci), we introduce a degree-2 polynomial inequality j=1d(xj(u)xj(v))2j=1d(xj(u)xj(v))2, which expresses ϕ(u)ϕ(v)2<ϕ(u)ϕ(v)2. Clearly, the solutions to this system of inequalities one-to-one correspond to the kNN-realizations of Ci in d. Renegar [23] showed that a system of p degree-2 polynomial inequalities on q variables can be solved in pO(q) time. Therefore, we can compute in (1ε)(1ε)O(1) time a kNN-realization of Ci in d (or decide its non-existence). This further implies that we can compute in (1ε)(1ε)O(1) time a quasi-kNN-realization of Ci in d (or decide its non-existence). The total time cost for all Ci is then (1ε)(1ε)O(1)n, since rn.

If Ci does not admit a quasi-kNN-realization in d for some i[r], then we can directly conclude that G is not kNN-realizable in d. Otherwise we construct a (1ε)-approximate kNN-realization of G in d as follows. Let ϕi:V(Ci)d be the quasi-kNN-realization of Ci in d we compute, for i[r]. Define a map ϕ:V(G)d as follows. Pick a vector xd such that x2>>maxi[r]maxu,vV(Ci)ϕi(u)ϕi(v)2. Then for each i[r] and each vV(Ci), set ϕ(v)=ix+ϕi(v). The choice of x guarantees that for a vertex vV(Ci), the ϕ-images of the vertices in Ci are closer to ϕ(v) than the ϕ-images of the vertices outside Ci. Also, for any u,vV(Ci), ϕ(u)ϕ(v)2=ϕi(u)ϕi(v)2. Therefore, for any u,vV(Ci), we have

{vV(G)\{u}:ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}
= {vV(Ci)\{u}:ϕ(u)ϕ(v)2ϕ(u)ϕ(v)2}
= {vV(Ci)\{u}:ϕi(u)ϕi(v)2ϕi(u)ϕi(v)2}.

Recall the function σϕ:V(G)×V(G){0,1} in Definition 1. The above equality shows that for u,vV(Ci), if |{vV(Ci)\{u}:ϕi(u)ϕi(v)2ϕi(u)ϕi(v)2}|k, then σϕ(u,v)=1. It follows that σϕ(u,v)=1 for any (u,v)E(Ci) because ϕi is a quasi-kNN-realization of Ci, so we have

(u,v)E(G)σϕ(u,v)i=1r|E(Ci)|=|E(G)||E|(1ε)|E(G)|.

Therefore, ϕ is a (1ε)-approximate kNN-realization of G in d. Combining the time costs for computing E and the maps ϕ1,,ϕr, the overall time complexity of our algorithm is O(n1+α), where O() hides a constant depending on k, d, α, and ε.

Theorem 6.

Let α>0 be any fixed number. Given a k-regular directed graph G of n vertices and a number ε>0, one can compute in f(k,d,ε)n1+α time a (1ε)-approximate kNN-realization of G in d, or conclude that G is not kNN-realizable in d, where f is some computable function depending on α.

3 𝒌NN-Realization in 𝟏

To the best of our knowledge, the kNN-realization problem on the line does not seem to have been studied, and it is the focus of this section. A number of line embedding problems are NP-hard as mentioned earlier, including triplet constraints and betweenness problems [9]. In the former, we are given a set of triplet constraints of the form d(a,b)<d(a,c), while in the latter each constraint (a,b,c) requires the ordering to satisfy a<b<c. Given the hardness result, the focus in these problems is on approximating the maximum number of satisfied constraints in the linear ordering [17].

Unlike these intractable embedding problems on the line, we show that the partial order constraints implied by a kNN-realization problem have sufficiently rich structure to admit a polynomial time solution. Most of the difficulty is in deciding whether a k-regular directed graph G is kNN-realizable on the line; if the answer is yes, then one can easily compute the embedding in polynomial time using linear programming. (It is also worth pointing out that the decision problem for kNN-realization on the line is significantly more complicated than the special case of triplet constraints in [17] when all (n3) triples are specified. Indeed, in that case, one can guess the leftmost point, and then all the remaining points are immediately determined by the triplet constraints. This is not the case in kNN-realization: fixing the leftmost point does not fix its neighbors’ order.)

The decision version of the kNN-realization problem, of course, is easy if we are given a permutation of G’s vertices (v1,,vn). In this case, we can easily compute a kNN-realization ϕ:V(G)1 satisfying ϕ(v1)<<ϕ(vn), or decide that a feasible realization does not exist, by formulating the problem as a linear program (LP) with n variables x1,,xn and the following set of constraints

  • xixj for all i,j[n] with ij,

  • Δi,j<Δi,k for all distinct i,j,k[n] such that (vi,vj)E(G) and (vi,vk)E(G), where Δi,j=xmax{i,j}xmin{i,j} and Δi,k=xmax{i,k}xmin{i,k}.

Clearly, if x1,,xn satisfy these constraints, then setting ϕ(vi)=xi gives us the desired realization. On the other hand, if ϕ is the realization we want, then the numbers x1,,xn where xi=ϕ(vi) must satisfy the constraints. Therefore, we begin with this key problem: efficiently computing an ordering of the images of the vertices on 1.

3.1 Finding the vertex ordering

We say an ordering (v1,,vn) of V(G) is a feasible vertex ordering of G if there exists a kNN-realization ϕ:V(G)1 of G in 1 satisfying ϕ(v1)<<ϕ(vn). The goal of this section is to give an O(kn)-time algorithm for computing a feasible vertex ordering of G (assuming it exists). We may assume, without loss of generality, that G is weakly-connected; otherwise we can consider each weakly-connected component of G individually.

Our first observation states two key properties of a feasible vertex ordering: (1) for each vi its k out-neighbors form a contiguous block, and for different vi’s their blocks have the same linear ordering as the vertex ordering, and (2) for each vi its in-neighbors also form a continuous block, which extends at most k vertices to the left and at most k to the right of vi. (Recall that both 𝗈𝗎𝗍[v] and 𝗂𝗇[v] include v, for all vertices vV(G).) See figure 1.

Observation 7.

A feasible vertex ordering (v1,,vn) of G satisfies the following.

  1. 1.

    There exist p1,,pn[nk] such that (i) 𝗈𝗎𝗍[vi]={vpi,,vpi+k} for all i[n], (ii) p1pn, and (iii) piipi+k for all in.

  2. 2.

    There exist q1,,qn,q1,,qn[n] such that (i) 𝗂𝗇[vi]={vqi,,vqi} for all i[n], (ii) q1qn, (iii) q1qn, and (iv) qiqiqi+2k for all i[n].

Proof.

Let ϕ:V(G)1 be a kNN-realization of G in 1 satisfying ϕ(v1)<<ϕ(vn). Clearly, for each i[n], the k+1 points in {ϕ(v1),,ϕ(vn)} closest to ϕ(vi) are {ϕ(vpi),,ϕ(vpi+k)} for some pi[nk] such that piipi+k. Furthermore, one can easily verify that p1pn. Assume pi>pj for some i,j[n] with i<j. Then we have ϕ(vpj)<ϕ(vpi)ϕ(vi)<ϕ(vj)ϕ(vpj+k)<ϕ(vpi+k). On the other hand, since vpi+k{ϕ(vpj),,ϕ(vpj+k)} and vpj{ϕ(vpi),,ϕ(vpi+k)},

ϕ(vj)ϕ(vpj)ϕ(vpi+k)ϕ(vj)ϕ(vpi+k)ϕ(vi)ϕ(vi)ϕ(vpj),

which contradicts the fact ϕ(vpj)<ϕ(vpi)ϕ(vi)<ϕ(vj). By the definition of kNN-realization, we see 𝗈𝗎𝗍[vi]={vpi,,vpi+k} for all i[n]. This proves condition 1.

Condition 2 follows from condition 1. Observe that for each i[n], it holds that 𝗂𝗇[vi]={vj:pjipj+k}={vj:ikpji}. Since p1pn, this implies 𝗂𝗇[vi]={vqi,,vqi} where qi=min{j:ikpji} and qi=max{j:ikpji}. The monotoncities q1qn and q1qn follow directly from the monotoncity p1pn. It suffices to show that qiqiqi+2k for all i[n]. The inequality qiqi is trivial. To see qiqi+2k, assume that qi>qi+2k. Then there exists j,j[n] with j>j+2k such that ikpji and ikpji. By condition 1, we have jpj and pjjk>j+k. Therefore, pj>j+kpj+k(ik)+k=i, which contradicts the fact ikpji. This proves condition 2.

Figure 1: A 2-regular graph on 6 vertices and its realization (top). The values of p,p,q,q for each vertex illustrate a natural ordering of the in and out-neighborhoods corresponding to the feasible vertex ordering v1,,v6 (bottom).

Consider the equivalence relation defined on V(G) as uv iff 𝗈𝗎𝗍[u]=𝗈𝗎𝗍[v]. Let 𝒞G be the set of equivalence classes of this relation, which is a partition of V(G) into groups of vertices with the same set of out neighbors. One can easily compute 𝒞G in O(k2n) time as follows. Given a vertex vV(G), we check whether 𝗈𝗎𝗍[u]=𝗈𝗎𝗍[v], for all u𝗈𝗎𝗍[v], which takes O(k2) time since |𝗈𝗎𝗍[v]|=k+1 and each equality test takes O(k) time. Thus, the time for computing all classes is O(k2n). Interestingly, we use Observation 7 to compute 𝒞G in O(kn) time, if G is kNN-realizable in 1, as shown in the following Lemma. Due to space constraints the proof is included in the full version.

Lemma 8.

Let G be a k-regular directed graph of n vertices. One can compute in O(kn) time a partition 𝒞 of V(G) such that 𝒞=𝒞G if G is kNN-realizable in 1.

Write r=|𝒞G|. Let (v1,,vn) be a feasible vertex ordering of G. Observation 7 implies that each class C𝒞G is a set of consecutive vertices in the sequence (v1,,vn), i.e., C={vα,,vβ} for some α,β[n]. Define 𝗈𝗎𝗍[C]=𝗈𝗎𝗍[v] for an arbitrary vertex vC. Therefore, there is a natural ordering (C1,,Cr) of the classes in 𝒞G, in which the indices of the vertices in Ci are smaller than the indices of the vertices in Cj for all i,j[r] with i<j. We call (C1,,Cr) a feasible class ordering of G.

Observation 9.

If (C1,,Cr) is the feasible class ordering of G corresponding to a feasible vertex ordering (v1,,vn) of G, then there exist c1,,cr[nk] satisfying

  • 𝗈𝗎𝗍[Ci]={vci,,vci+k} for all i[r],

  • c1<<cr,

  • if G is weakly-connected, then ci+1ci+k for all i[r1].

Proof.

Let p1,,pn[nk] be the indices in condition 1 of Observation 7, for the feasible vertex ordering (v1,,vn). For each i[r], set ci=ps where s=1+j=1i1|Cj|, and we then have 𝗈𝗎𝗍[Ci]=𝗈𝗎𝗍[vs]={vci,,vci+k}. Since p1pn, we have c1cr. Furthermore, for all i[r1], we have cici+1, simply because 𝗈𝗎𝗍[Ci]𝗈𝗎𝗍[Ci+1]. Thus, c1<<cr. To see the last property, assume ci+1>ci+k for some i[r1]. Then (j=1i𝗈𝗎𝗍[Cj])(j=i+1r𝗈𝗎𝗍[Cj])=. Note that j=1iCjj=1i𝗈𝗎𝗍[Cj] and j=i+1rCjj=i+1r𝗈𝗎𝗍[Cj]. So there is no edge between j=1iCj and j=i+1rCj, which implies that G is not weakly connected.

To compute a feasible vertex ordering of G, we first compute a feasible class ordering, using the previous observation. Due to space constraints the proof of the following lemma is included in the full version.

Lemma 10.

Let G be a weakly-connected k-regular directed graph of n vertices. Given 𝒞G, one can compute in O(kn) time an ordering of 𝒞G, which is a feasible class ordering of G if G is kNN-realizable in 1.

Next, we show how to compute a corresponding feasible vertex ordering of G given a feasible class ordering (C1,,Cr). By definition, we know that in the feasible vertex ordering, the vertices in Ci must appear before the vertices in Cj for all i,j[r] with i<j. So it suffices to figure out the “local” ordering of the vertices in each class Ci. We do this by considering the in-neighbors of each vertex. As observed before, for each vV(G), 𝗂𝗇[v] is the union of several classes in 𝒞G, and in addition, 𝗂𝗇[v]=i=pqCi for some p,q[r], by condition 2 of Observation 7. It turns out that we can sort the vertices in each class according to the values of p and q. Formally, we prove the following lemma.

Lemma 11.

Let G be a weakly-connected k-regular directed graph of n vertices, and r=|𝒞G|. Given an ordering (C1,,Cr) of 𝒞G, one can compute in O(kn) time an ordering (v1,,vn) of V(G) such that if (C1,,Cr) is a feasible class ordering of G, then (v1,,vn) is a feasible vertex ordering of G.

Proof.

Our algorithm for computing a feasible vertex ordering of G is presented in Algorithm 2. First, for each vV(G), we compute a pair R(v)=(p,q) where p[r] (resp., q[r]) is the minimum (resp., maximum) index such that Cp𝗂𝗇[v] (resp., Cq𝗂𝗇[v]). We define a partial order on index-pairs by setting (p,q)(p,q) if p+qp+q. Then we order the vertices in each class Ci as (u1,,u|Ci|) such that R(u1)R(u|Ci|). If there exist u,vCi such that R(u) and R(v) are incomparable under the order , then we just arbitrarily order the vertices in Ci. We will see later that in this case, (C1,,Cr) is not a feasible class ordering of G. By concatenating the orderings of C1,,Cr, we obtain an ordering (v1,,vn) of V(G).

We first show that Algorithm 2 can be implemented in O(kn) time. Computing R(v) for all vV(G) can be done in O(kn) time, because |𝗂𝗇[v]|=O(k) by (ii) of Observation 7. To see the time cost for ordering the vertices in each Ci, observe that |Ci|k. Indeed, v𝗈𝗎𝗍[v]=𝗈𝗎𝗍[Ci] for all vCi, which implies Ci𝗈𝗎𝗍[Ci] and thus |Ci||𝗈𝗎𝗍[Ci]|=k. If we sort the vertices in Ci directly, it takes O(klogk) time. One can improve the time cost to O(k) by observing that for any vCi, the pair R(v)=(p,q) satisfies pik and qi+k, by condition 2 of Observation 7, which implies 2i2kp+q2i+2k. In other words, in the sorting task, the keys are all in the range {2i2k,,2i+2k}, whose size is O(k). Thus, the task can be done in O(k) time using bucket sorting. It follows that Algorithm 2 can be implemented in O(kn) time.

To see the correctness of our algorithm, suppose (C1,,Cr) is a feasible class ordering of G, and let (v1,,vn) be a corresponding feasible vertex ordering of G. We do not necessarily have (v1,,vn)=(v1,,vn). However, as we will see, it holds that (R(v1),,R(vn))=(R(v1),,R(vn)), which turns out to be sufficient. By (ii) of Observation 7, for each vV(G) with R(v)=(p,q), we have 𝗂𝗇[v]=i=pqCi, and furthermore, R(v1)R(vn). Now consider a class Ci. Note that Ci={vα,,vβ}={vα,,vβ}, where α=1+j=1i1|Cj| and β=j=1i|Cj|. We have R(vα)R(vβ), and our algorithm guarantees that R(vα)R(vβ). Therefore, (R(vα),,R(vβ))=(R(vα),,R(vβ)). It then follows that (R(v1),,R(vn))=(R(v1),,R(vn)).

To see (v1,,vn) is a feasible vertex ordering of G, we further observe that the function π:V(G)V(G) defined as π(vi)=vi for i[n] is an automorphism of G. Consider indices i,j[n]. We have 𝗈𝗎𝗍[vi]=𝗈𝗎𝗍[vi], since vi and vi belong to the same class in 𝒞G. Thus, (vi,vj)E(G) iff (vi,vj)E(G). On the other hand, because R(vj)=R(vj), we have 𝗂𝗇[vj]=𝗂𝗇[vj] and hence (vi,vj)E(G) iff (vi,vj)E(G). As such, (vi,vj)E(G) iff (vi,vj)E(G), which implies that π is an automorphism of G. Now consider a kNN-realization ϕ:V(G)1 of G in 1 satisfying ϕ(v1)<<ϕ(vn). Set ϕ=ϕπ, which is a map from V(G) to 1. As π is an automorphism of G, ϕ is also a kNN-realization of G. Furthermore, ϕ(vi)=ϕ(π(vi))=ϕ(vi) for all i[n], which implies ϕ(v1)<<ϕ(vn). The existence of ϕ shows that (v1,,vn) is a feasible vertex ordering of G.

Algorithm 2 VertexOrder(G,(C1,,Cr)).
Theorem 12.

Given a k-regular directed graph G of n vertices, one can compute in O(kn) time an ordering (v1,,vn) of V(G), which is a feasible vertex ordering of G if G is kNN-realizable in 1.

Proof.

For the case where G is weakly-connected, the theorem follows directly from Lemmas 8, 10 and 11. Otherwise, we compute the orderings for the weakly-connected components of G individually and concatenate them; this gives us the desired ordering of V(G).

3.2 Deciding the realizability and computing the realization

To decide the kNN-realizability of G, we first run the algorithm of Theorem 12, which returns the vertex ordering (v1,,vn) of V(G), and then run the linear program given at the beginning of Section 3 for this ordering. Either the LP is feasible, in which case the solution gives a kNN-realization of G in 1, or LP is infeasible, in which case we can conclude that (v1,,vn) is not a feasible vertex ordering and thus G is not kNN-realizable in 1. Thus, the kNN-realization problem in 1 can be solved in polynomial time.

Interestingly, we can show that if we are only interested in the decision problem (and not actual embedding), then solving the LP is not necessary. Specifically, we can decide whether (v1,,vn) is a feasible vertex by simply checking condition 1 of Observation 7. If the ordering satisfies that condition, then the LP is always feasible. The proof of the following lemma is somewhat technical and is presented in the full version.

Lemma 13.

Let G be a k-regular directed graph of n vertices. An ordering (v1,,vn) of V(G) is a feasible vertex ordering of G iff it satisfies condition 1 of Observation 7. In particular, one can decide whether a given ordering of V(G) is a feasible vertex ordering of G or not in O(kn) time.

We can now state the main result of this section.

Theorem 14.

Given a k-regular directed graph G of n vertices, one can decide in O(kn) time whether G is kNN-realizable in 1, and if so, a kNN-realization of G in 1 can be computed in O(n2.5𝗉𝗈𝗅𝗒(logn)) time.

4 Concluding remarks and extensions

We considered the problem of realizing a directed graph G as an Euclidean kNN graph. Our key results are: (1) for any fixed d, we can efficiently embed at least a 1ε fraction of G’s edges in d or conclude that G is not realizable, and (2) a linear time algorithm to decide if G is realizable in 1. Our theorems extend to the case where the neighbors of each vertex in G are given as a ranked list, meaning that the embedding must satisfy ϕ(v)ϕ(ui)<ϕ(v)ϕ(ui+1), for i=1,,k1, where ui is the ith nearest neighbor of v (except in 1 where we need to solve the LP to decide if it is feasible). Our approximation scheme also applies to other proximity graphs that meet the following conditions: (1) the graph can be partitioned into constant-sized components using a sublinear size separator, and (2) each component’s edges can be embedded independently. For example, we can approximately embed Delaunay triangulations in the plane with maximum degree k=O(n13).

References

  • [1] A. Agrawal, S. Saurabh, and M. Zehavi. A finite algorithm for the realizabilty of a delaunay triangulation. In 17th International Symposium on Parameterized and Exact Computation, volume 249, pages 1:1–1:16, 2022. doi:10.4230/LIPICS.IPEC.2022.1.
  • [2] N. Alon, M. Bădoiu, E. D. Demaine, M. Farach-Colton, M. Hajiaghayi, and A. Sidiropoulos. Ordinal embeddings of minimum relaxation: General properties, trees, and ultrametrics. ACM Trans. Algorithms, 4(4), 2008. doi:10.1145/1383369.1383377.
  • [3] Y. Bilu and N. Linial. Monotone maps, sphericity and bounded second eigenvalue. Journal of Combinatorial Theory, Series B, 95(2):283–299, November 2005. doi:10.1016/J.JCTB.2005.04.005.
  • [4] A. Bogomolnaia and J.-F. Laslier. Euclidean preferences. Journal of Mathematical Economics, 43(2):87–98, 2007.
  • [5] P. Bose, W. J. Lenhart, and G. Liotta. Characterizing proximity trees. Algorithmica, 16:83–110, 1996. doi:10.1007/BF02086609.
  • [6] J. Bourgain. On lipschitz embedding of finite metric spaces in hilbert space. Israel J. Math, pages 46–52, 1985.
  • [7] M. Bădoiu, E. D. Demaine, M. Hajiaghayi, A. Sidiropoulos, and M. Zadimoghaddam. Ordinal embedding: Approximation algorithms and dimensionality reduction. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 21–34, 2008.
  • [8] V. Chatziafratis and P. Indyk. Dimension-accuracy tradeoffs in contrastive embeddings for triplets, terminals & top-k nearest neighbors. In Symposium on Simplicity in Algorithms, pages 230–243. 2024.
  • [9] B. Chor and M. Sudan. A geometric approach to betweenness. SIAM J. Discret. Math., 11(4):511–523, 1998. doi:10.1137/S0895480195296221.
  • [10] J. Chuzhoy, Y. Gao, J. Li, D. Nanongkai, R. Peng, and T. Saranurak. A deterministic algorithm for balanced cut with applications to dynamic connectivity, flows, and beyond. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1158–1167. IEEE, 2020.
  • [11] M. de Berg, O. Cheong, M. van Kreveld, and M. H. Overmars. Computational Geometry: Algorithms and Applications. Springer, 2008.
  • [12] M. Dillencourt. Toughness and delaunay triangulations. In Proceedings of the Third Annual Symposium on Computational Geometry, pages 186–194, 1987.
  • [13] P. Eades and S. Whitesides. The realization problem for euclidean minimum spanning trees is np-hard. In Proceedings of the Tenth Annual Symposium on Computational Geometry, pages 49–56, 1994.
  • [14] P. Eades and S. Whitesides. The logic engine and the realization problem for nearest neighbor graphs. Theoretical Computer Science, 169(1):23–37, 1996. doi:10.1016/S0304-3975(97)84223-5.
  • [15] D. Eppstein, M. Paterson, and F. F. Yao. On nearest-neighbor graphs. Discret. Comput. Geom., 17:263–282, 1997. doi:10.1007/PL00009293.
  • [16] P. Erdős, F. Harari, and W. T. Tutte. On the dimension of a graph. Mathematika, 12:118–122, 1965.
  • [17] B. Fan, D. Ihara, N. Mohammadi, F Sgherzi, A. Sidiropoulos, and M. Valizadeh. Learning Lines with Ordinal Constraints. APPROX/RANDOM, 176:45:1–45:15, 2020. doi:10.4230/LIPICS.APPROX/RANDOM.2020.45.
  • [18] W. Johnson and J. Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189–206, 1984.
  • [19] H. Maehara. Space graphs and sphericity. Discrete Applied Mathematics, 7(1):55–64, 1984. doi:10.1016/0166-218X(84)90113-6.
  • [20] J. Matousek. Lectures on Discrete Geometry. Springer, 2002.
  • [21] G. L. Miller, S.H. Teng, W. P. Thurston, and S. A. Vavasis. Separators for sphere-packings and nearest neighbor graphs. J. ACM, 44:1–29, 1997. doi:10.1145/256292.256294.
  • [22] C. L. Monma and S. Suri. Transitions in geometric minimum spanning trees. Discrete & Computational Geometry, 8:265–293, 1992. doi:10.1007/BF02293049.
  • [23] J. Renegar. On the computational complexity and geometry of the first-order theory of the reals. Journal of Symbolic Computation, 13(3):255–299, 1992.
  • [24] K. Sugihara. Simpler proof of a realizability theorem on delaunay triangulations. Information Processing Letters, 50(4):173–176, 1994. doi:10.1016/0020-0190(94)00027-1.