Abstract 1 Introduction 2 Preliminaries 3 Compression for Euclidean Embedding Editing 4 FPT algorithm for Euclidean Embedding Editing parameterized by solution size and dimension 5 FPT-Approximation for Unweighted Euclidean Embedding with Outliers 6 Lower bounds References

When Distances Lie: Euclidean Embeddings in the Presence of Outliers and Distance Violations

Matthias Bentert University of Bergen, Norway Fedor V. Fomin ORCID University of Bergen, Norway Petr A. Golovach ORCID University of Bergen, Norway M. S. Ramanujan ORCID University of Warwick, UK Saket Saurabh ORCID Institute of Mathematical Sciences, Chennai, India
University of Bergen, Norway
Abstract

Distance geometry explores the properties of distance spaces that can be exactly represented as the pairwise Euclidean distances between points in d (d1), or equivalently, distance spaces that can be isometrically embedded in d. In this work, we investigate whether a distance space can be isometrically embedded in d after applying a limited number of modifications. Specifically, we focus on two types of modifications: outlier deletion (removing points) and distance modification (adjusting distances between points). The central problem, Euclidean Embedding Editing, asks whether an input distance space on n points can be transformed, using at most k modifications, into a space that is isometrically embeddable in d.

We present several fixed-parameter tractable (FPT) and approximation algorithms for this problem. Our first result is an algorithm that solves Euclidean Embedding Editing in time (dk)𝒪(d+k)+n𝒪(1). The core subroutine of this algorithm, which is of independent interest, is a polynomial-time method for compressing the input distance space into an equivalent instance of Euclidean Embedding Editing with 𝒪((dk)2) points.

For the special but important case of Euclidean Embedding Editing where only outlier deletions are allowed, we improve the parameter dependence of the FPT algorithm and obtain a running time of min{(d+3)k,2d+k}n𝒪(1). Additionally, we provide an FPT-approximation algorithm for this problem, which outputs a set of at most 2𝖮𝗉𝗍 outliers in time 2dn𝒪(1). This 2-approximation algorithm improves upon the previous (3+ε)-approximation algorithm by Sidiropoulos, Wang, and Wang [SODA ’17]. Furthermore, we complement our algorithms with hardness results motivating our choice of parameterizations.

Keywords and phrases:
Parameterized Complexity, Euclidean Embedding, FPT-approximation
Funding:
Matthias Bentert: Supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819416).
Fedor V. Fomin: Supported by the Research Council of Norway under BWCA project (grant no. 314528).
Petr A. Golovach: Supported by the Research Council of Norway under BWCA project (grant no. 314528).
M. S. Ramanujan: Supported by Engineering and Physical Sciences Research Council (EPSRC) grant EP/V044621/1.
Saket Saurabh: Supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819416); and Swarnajayanti Fellowship grant DST/SJF/MSA-01/2017-18.
Copyright and License:
[Uncaptioned image] © Matthias Bentert, Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan, and
Saket Saurabh; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Combinatorial algorithms
; Theory of computation Fixed parameter tractability
Related Version:
Full Version: http://arxiv.org/abs/2503.19093
Editors:
Oswin Aichholzer and Haitao Wang

1 Introduction

The Euclidean Distance Matrix (EDM) is a matrix containing the squared Euclidean distances between points in a set. A central problem in Distance Geometry is determining whether a given matrix is an EDM [5, 25, 11, 13]. That is, the task is to identify whether a given distance space can be isometrically embedded into 2-spaces, or equivalently, the pairwise Euclidean distances among points in d (d1). This problem has a rich history, originating with Cayley [7], whose observations were formalized by Menger [28], leading to Cayley-Menger determinants. Schoenberg [30] further advanced the field by characterizing 2-embeddable distances using negative type inequalities. Blumenthal’s monograph [5] remains a foundational text, documenting the theoretical underpinnings of this area.

This problem has numerous applications, including sensor network localization [29, 14], molecular conformation [21], data visualization [6], statistics [17], psychology [34, 31], learning image manifolds [36], handwriting recognition [24], studying musical rhythms [12], and signal processing [15]. The complexity of determining whether a matrix is an EDM is well understood, and it can be efficiently addressed using Singular Value Decomposition (SVD) [6, 35].

In many practical applications involving EDMs, errors due to noise, missing values, or approximation inaccuracies are common. To address these challenges, several heuristic approaches based on Multidimensional Scaling, Low-Rank Matrix Approximation, and Semidefinite Programming have been developed to reconstruct the matrix and create an embedding that minimizes the impact of these errors. We refer readers to [6, 15] for a comprehensive overview of the extensive literature on this topic.

Despite the importance and numerous practical applications of the EDM recognition problem with noise and errors, little was known about its computational complexity until recently. A notable exception is the work of Sidiropoulos, Wang, and Wang [32], which initiates the study of EDM in the presence of outliers. The work of Sidiropoulos et al. serves as the initial foundation for our studies.

In this paper, we address the algorithmic question of minimizing the number of edits required to transform a given distance matrix into EDM. Specifically, we aim to apply the minimum number of modifications to a given distance space (X,ρ) with distance function ρ, such that the resulting distance space can be isometrically embedded into a d-dimensional Euclidean space d. We consider two types of editing operations.

The first operation is element deletion. In matrix terms, this operation corresponds to deleting the row and column associated with the element we choose to remove. Following Sidiropoulos, Wang, and Wang [32], we refer to the elements removed from the distance space as outliers.

The second operation is distance modification. Let X(2) denote the set of unordered pairs of distinct elements in X={x1,x2,}. For a pair of elements xi,xjX, the modification operation alters the distance ρ(xi,xj). In terms of the distance matrix, this operation changes the ij-th and the ji-th entries. The problem of minimizing the number of modification operations to embed the resulting distance space in general metric spaces was studied in [18, 9], and embedding into ultrametric spaces was investigated in [9, 8].

Our primary algorithmic question is as follows: given integers kO and kM, can a distance space (X,ρ) be transformed into a distance space that is embeddable in d by removing at most kO outliers and performing at most kM modifications? In fact, we address an even more general weighted version of this problem, where each operation (outlier deletion or distance modification) is assigned a specific weight. More precisely, we study:

Two special cases of Euclidean Embedding Editing are of particular importance. The variant of the problem with kM=0, that is, of placing all but kO outlier points of a distance space (X,ρ) into a Euclidean space d of a given dimension d such that the Euclidean distance between any pair of points x, is equal to ρ(x,y), is called Euclidean Embedding with Outliers [32]. For the variant with kO=0, that is, without deletion of outliers, following [18], we use the name Euclidean Metric Violation Distance.

EXAMPLE: The distance space 𝒳=(X,ρ) with X={1,,9} and ρ defined by distance matrix D, where the ij-th entry of D is ρ2(i,j)

D=[071245145702112105412011144122110218214111201128552411075211048127089451285801542152910],

and kO=1, kM=2, d=2, all weights equal to one, and W=3, is a yes-instance of Euclidean Embedding Editing. Indeed, the distance space 𝒳=(X,ρ) defined by distance matrix D in Fig. 1 that can be obtained from 𝒳 by modifying distances between {1,2} and {3,5} and deleting outlier 7, is isometrically embedable in 2.

Figure 1: Matrix D and an embedding of its distance space in 2.

Our results.

First, we present a compression for Euclidean Embedding Editing– a polynomial-time algorithm that reduces an instance of the problem to an equivalent instance with the number of points bounded by 𝒪(k2d2) (Theorem 8), where k=kO+kM. As part of this compression, we also propose a (d+3)-approximation algorithm for Euclidean Embedding with Outliers, which runs in polynomial time (Lemma 10). Using this compression algorithm, we design an FPT algorithm for Euclidean Embedding Editing with a running time of (dk)𝒪(d+k)+n𝒪(1) (Theorem 20).

For Euclidean Embedding with Outliers, a particular but important special case of Euclidean Embedding Editing – where the goal is to determine whether the distance space, excluding up to kO outliers, can be embedded in d – we obtain better dependence on the parameters with a running time of min{(d+3)kO,2d+kO}n𝒪(1) (Theorem 21). Furthermore, for this problem, we propose a 2-approximation algorithm (Theorem 22). This randomized algorithm, with a running time of 2dn𝒪(1), guarantees a solution with at most 2𝖮𝗉𝗍 outliers. As is common in Computational Geometry, all our algorithms operate under the real RAM computational model, assuming that basic operations over real numbers can be executed in unit time.

We also complement our algorithmic results with lower-bound proofs, establishing that both Euclidean Embedding with Outliers and Euclidean Metric Violation Distance are NP-hard even when d=1 (Theorems 26 and 27). So, FPT algorithms for these problems parameterized by d alone appears to be out of reach, motivating our FPT-approximation algorithm. Additionally, we prove that Euclidean Embedding with Outliers is W[1]-hard when parameterized by kO alone (Theorem 28). These lower bounds indicate that to get an FPT algorithm for Euclidean Embedding Editing, it is important that we parameterize by both d and the solution size kO+kM. Importantly, our lower-bound results remain valid even in the unweighted case.

Related work.

The computational complexity study of Euclidean Embedding with Outliers was initiated by Sidiropoulos, Wang, and Wang [32]. They demonstrated that, assuming the Unique Games Conjecture, the problem of computing a minimum outlier embedding into d-dimensional Euclidean space for any d2 is NP-hard to approximate within a factor of 2ε for any ε>0. On the algorithmic side, they showed that a 2-approximation can be achieved in 𝒪(nd+3) time. Additionally, they established that a (3+ε)-approximation is achievable in (2/ε)dd𝒪(1)n2logn time. For exact solutions, they noted an algorithm with a runtime of 𝒪(nd+3)+2kOn2.

They also obtained results related to bicriteria approximation involving nonisomorphic embeddings, approximation of the number of outliers, and distortion. The first result, is (𝒪(δ),(2d+2)kO)-relative outlier embedding in d in time 1δ𝒪(1)2𝒪(d)n2logn. The second result, (𝒪(δ),2kO)-relative outlier embedding in time 1δ𝒪(1)kO𝒪(d)n2. (They define an algorithm as an 𝒪(f(δ),g(kO))-relative outlier embedding of 𝒳=(X,ρ) to d, for some functions f and g, if it either correctly decides that no embedding with distortion at most δ exists after removing kO outliers, or outputs a set Y of size g(kO) such that there is an embedding XY into d of distortion 𝒪(f(δ)).) Since allowing distortion could significantly decrease the number of outliers, our 2-approximation algorithm for Euclidean Embedding with Outliers, Theorem 22, is incomparable with these results.

We are not aware of any algorithms with guaranteed performance for Euclidean Metric Violation Distance. Parameterized and approximation algorithms for metric violation distance problems for general metric, tree distances and ultrametric could be found in [18, 8, 9, 20]. More generally, embeddings of various metric spaces are a fundamental primitive in the design of algorithms [22, 23, 27, 26, 2, 3], though prior work often focuses on minimizing embedding distortion.

2 Preliminaries

Let X be a set. A function ρ:X×X0 is a distance on X if: (i) ρ is symmetric, that is, for any x,yX, ρ(x,y)=ρ(y,x), and (ii) ρ(x,x)=0 for all xX. Then, (X,ρ) is called a distance space. If, in addition ρ satisfies a triange inequality: ρ(x,z)ρ(x,y)+ρ(y,z), for any x,y,zX, then ρ is called a semimetric on X. And if ρ(x,y)=0 only for x=y, then ρ is called a metric, and (X,ρ) a metric space.

More precisely, recall that for two points p,qd, the Euclidean distance between p and q is pq2=p,p+q,q2p,q. We say that distance space (X,ρ) is isometrically embeddable into d if there is a map, called isometric embedding, φ:Xd such that ρ(x,y)=φ(x)φ(y)2 for all x,yX. Notice that we do not require φ to be injective, that is, several points of (X,ρ) may be mapped to the same point of d. Throughout the paper, whenever we mention an embedding, we mean an isometric embedding. Moreover, when we use the term d-embedding, we are referring to embedding into d. A d-embeddable distance space is strongly d-embeddable if it is not (d1)-embeddable. We use X(2) to denote the set of unordered pairs of two distinct elements of X. As convention, we assume that the empty set of points is d-embeddable for every d.

Let 𝒳=(X,ρ) be a distance space where X={x1,,xn}, and let ρi,j=ρ(xi,xj) for all i,j{1,,n}. Then 𝒳 is equivalently defined by the distance matrix D(ρ), where D(ρ)i,j=ρi,j2.

We drop the explicit reference to ρ when it is clear from the context and instead of D(ρ), simply write D. Suppose that 𝒳 is embeddable into d. The ordered set P=(p1,,pn) of points in d is said to be a realization of 𝒳 if there is an embedding φ:Xd such that φ(xi)=pi for all i{1,,n}. We use the well-known property (see e.g. [15]) that a realization is unique up to rigid transformations of d, that is, distance-preserving transformations of Euclidean space (rotations, reflections, translations).

Proposition 1 ([15]).

Let (p1,,pn) and (q1,,qn) be ordered set of points in d. Then pipj2=qiqj2 for all i,j{1,,n} if and only if there is a rigid transformation of d mapping pi to qi for all i{1,,n}.

A realization can be constructed (if it exists) in polynomial time.

Proposition 2 ([1, 33]).

Given a distance space 𝒳=(X,ρ) with n points and a positive integer d, in 𝒪(n3) time, it can be decided whether 𝒳 can be embedded into d and, if such an embedding exists, then a realization can be constructed in this running time.

Definition 3 (Metric basis).

Let (X,ρ) be a d-embeddable distance space. A set YX is a metric basis if, given an isometric embedding φ of (Y,ρ) into d, there is a unique way to extend φ to an isometric embedding of (X,ρ). Equivalently, if a realization of (Y,ρ) is fixed then the embedding of any point of XY in a d-embedding of (X,ρ) is unique.

We will use the well-known characteristic of strong embeddability of a distance space into a Euclidean space. For r+1 points x0,x1,,xr of distance space (X,ρ) the Cayley-Menger determinant is the determinant of the matrix obtained from the distance matrix by prepending a row and a column whose first element is zero and the other elements are one. Formally, let ρi,j=ρ(xi,xj), i,j{0,,r}. Then the Cayley-Menger determinant is

CM(x0,x1,,xr)=det(0111110ρ0,12ρ0,22ρ0,r21ρ0,120ρ1,22ρ1,r21ρ0,22ρ1,220ρ2,r21ρ0,r2ρ1,r2ρ2,r20)
Proposition 4 ([5, Chapter IV]).

A distance space 𝒳=(X,ρ) with n points is strongly d-embeddable if and only if there exist d+1 points, say Xd={x0,,xd}, such that:

  1. 1.

    (1)j+1CM(x0,x1,,xj)>0 for 1jd, and

  2. 2.

    for any x,yXXd,

    CM(x0,x1,,xd,x)=CM(x0,x1,,xd,y)=CM(x0,x1,,xd,x,y)=0.

Equivalently (see, for example, [32]), 𝒳 is strongly d-embeddable if and only if there is a set of d+1 points Xd={x0,,xd} such that ({x0,,xj},ρ) is strongly j-embeddable for all j{1,,d}, and for every x,yXXd, (Xd{x}{y}) is d-embeddable.

Thus, the embedding of the distance space 𝒳=(X,ρ) into d is essentially characterized by d+1 “anchor” points of X. Note that in a realization of 𝒳 that is strongly d-embeddable, the anchor points correspond to a set of d+1 points in general position in d.

Following [5], we define sets of independent points.

Definition 5.

Let (X,ρ) be a distance space. For a nonnegative integer r, we say that YX of size r+1 is independent if (Y,ρ) is strongly r-embeddable.

Then our “anchors” are independent sets of size d+1. We use the following fact that the family of independent sets has matroid-like properties (implicit in [5, Chapter IV]).

Proposition 6 ([5, Chapter IV]).

Let (X,ρ) be a strongly d-embeddable distance space. Then, (i) any single-element set is independent, (ii) if YX is independent, then any ZY is independent, (iii) if Y,ZX are independent and |Y|>|Z| then there is a yYZ such that Z{y} is independent. Furthermore, the maximum size of an independent set is d+1 and any independent set Y of size d+1 is a metric basis.

Lemma 7 ().

Suppose (X,ρ) is a d-embeddable distance space, YX is independent and xXY. Then, (Y{x},ρ) is |Y|-embeddable.

For statements marked with (), the proofs are omitted in this extended abstract and is deferred to the full version.

3 Compression for Euclidean Embedding Editing

In this section, we show that, given an instance of Euclidean Embedding Editing, one can in polynomial time construct an equivalent instance where the number of points is upper-bounded by a polynomial of k=kO+kM and d and, moreover, the encoding of the weights is also polynomial in these parameters.

Theorem 8.

There is a polynomial-time algorithm that, given an instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) of Euclidean Embedding Editing, either solves the problem or constructs an equivalent instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) such that XX, and for k=kO+kM, it holds that |X|=𝒪((kd)2), W=2𝒪((kd)12), wO(x)=2𝒪((kd)12) for xX, and wM(x,y)=2𝒪((kd)12) for all {x,y}X(2).

We remark that Theorem 8 does not provide a kernelization algorithm according to the standard definition [10] as the size of the output instance is not upper-bounded by a function of the parameter – the distances between the points remain the same as in the input instance. Moreover, since Euclidean Embedding Editing is a decision problem, the word “solve” in the statement of Theorem 8 technically only refers to deciding correctly whether the instance is a yes-instance. However, if the instance is determined to be a yes-instance, then our algorithm can also output a solution that witnesses this, i.e., the outlier set and modified new distance values (if any). This additional feature of our compression algorithm will be used later in our FPT algorithm for Euclidean Embedding Editing.

In the proof of Theorem 8, we first design a polynomial-time (d+3)-approximation algorithm (Section 3.1). This algorithm allows us, given an instance of the problem, either find a set of at most (d+3)k points A such that (XA,ρ) is d-embeddable or conclude that we have a no-instance.

In the first case, we iteratively construct a sequence of metric bases Y1,,Y for (Z,ρ) using Proposition 6, where initially Z:=XA and in each iteration, we delete the points of the constructed basis from Z. If XA is sufficiently large, then there is a subsequence of bases of the same size with at least 2(kO+2kM)+1 elements. We select the first subsequence with this property. The crucial property exploited by our algorithm is that this subsequence uniquely defines embeddings of the majority of the points assuming that we have a yes-instance of the problem. We use this to give a series of reduction rules that identify points in A that should be outliers in any solution and irrelevant points of XA that could be deleted.

3.1 Bootstrapping the Compression through an Approximation

For a distance space 𝒳=(X,ρ), a d-outlier set is a set A of points such that (XA,ρ) is d-embeddable. In unweighted Euclidean Embedding with Outliers (which we call UEEO), the function wO assigns unit weight to every point. Sidiropoulos et al. [32] observed that this problem admits a simple greedy (d+3)-approximation running in n𝒪(d). We show that this algorithm can be sped up to run in n𝒪(1) time, i.e., independent of d in the exponent. To do so, we give a polynomial-time subroutine that produces a small hitting set for all minimal solutions. This subroutine will later be used as the first step of our compression algorithm for Euclidean Embedding Editing.

Lemma 9.

There is a polynomial-time algorithm that, given a distance space 𝒳=(X,ρ) and an integer d1, outputs a set of points A^ of size at most (d+3) such that if 𝒳 is not d-embeddable, then A^ intersects every inclusionwise minimal d-outlier set of 𝒳.

Proof.

Assume that the distance space 𝒳=(X,ρ) does not admit a d-embedding. Note that, in particular, this means that |X|3. We then do the following.

  1. 1.

    Set A:={p} for an arbitrary point pX.

  2. 2.

    While

    |A|d, do the following:

    1. (a)

      If there is a point xXA such that (A{x},ρ) is not |A|-embeddable, then set A^:=A{x} and return A^.

    2. (b)

      If there is a point xXA such that A{x} is independent, then set A:=A{x} and continue the loop, else exit the loop.

  3. 3.

    If there are two distinct points x,yXA such that (A{x,y},ρ) is not (|A|1)-embeddable, then A^:=A{x,y} and return A^.

Clearly, the algorithm runs in polynomial time. Since the while loop has at most d iterations, the set A when the loop is exited has size at most d+1 and so, the returned set A^ has size at most d+3.

We can now use Lemma 9 as a subroutine in the greedy approximation for the minimization variant of unweighted Euclidean Embedding with Outliers.

Lemma 10 ().

There is a polynomial-time algorithm that, given a distance space 𝒳=(X,ρ) and an integer d1, outputs a set of points A of size at most (d+3)𝖮𝗉𝗍 such that (XA,ρ) is embeddable into d where 𝖮𝗉𝗍 is the size of a smallest d-outlier set.

3.2 Compression

We are now ready to give the compression algorithm for Euclidean Embedding Editing.

Proof of Theorem 8.

Let (𝒳=(X,ρ),wO,wM,W,kO,kM,d) be an instance of Euclidean Embedding Editing. First, we preprocess the instance to reduce the number of points and then we explain how to reassign the weights.

Notice that if (𝒳,wO,wM,W,kO,kM,d) is a yes-instance, then it is possible to delete k=kO+kM points to obtain a distance space embeddable in d. In the first step of our algorithm, we call the algorithm from Lemma 10 for the instance (𝒳,k,d) of Euclidean Embedding with Outliers without weights. This algorithm outputs a set of points A of size at most (d+3)𝖮𝗉𝗍 such that (XA,ρ) is embeddable into d where 𝖮𝗉𝗍 is the minimum number of outliers. If |A|>(d+3)k then we conclude that (𝒳,wO,wM,W,kO,kM,d) is a no-instance of Euclidean Embedding Editing and stop. From now on, we assume that |A|(d+3)(kO+kM).

Let Y=XA. We first show that if Y has bounded size, then we can return the original instance of the problem and stop.

Reduction Rule 1.

If |Y|2(kO+2kM)(d+1)2 then return (𝒳,wO,wM,W,kO,kM,d).

We can apply this rule because if |Y|2(kO+2kM)(d+1)2 then |X|=|Y|+|A|2(kO+2kM)(d+1)2+(kO+kM)(d+3)5(kO+kM)(d+3)2=𝒪((kd)2). From now on, we assume that |Y|>2(kO+2kM)(d+1)2.

Because A is a feasible solution to Euclidean Embedding with Outliers, (Y,ρ) can be embedded into d. In the subsequent steps, we may delete some points in A and reduce the parameter kO. This may result in a trivial instance which we solve by applying the following rule whenever possible.

Reduction Rule 2.

If kO<0 then return a no-answer and stop. If kO0 and A= then return a yes-answer and stop.

We greedily partition Y into sets Y1,,Y of size at most d+1 such that each Yi is an inclusion-maximal independent set of points of Yj=1i1Yj, that is, Yi is an inclusion-maximal subset such that (Yi,ρ) is strongly (|Yi|1)-embeddable.

Assume that i1 and the sets Y1,,Yi1 are already constructed. We set Z:=Yj=1i1Yj. If Z=, then the partition is constructed. If Z, then we do the following:

  1. 1.

    To initiate the construction of Yi, we set Yi={x} for an arbitrary point xZ and set h=1.

  2. 2.

    While there is a point yZYi such that Yi{y} is independent, set Yi:=Yi{y} and h:=h+1.

Because Y is d-embeddable, Z is strongly t-embeddable for some td and contains an independent set of size t by Proposition 4. By Proposition 6, the described procedure will construct a set Yi of size t+1d+1 such that (Yi,ρ) is strongly t-embeddable. Notice that by Proposition 6, Yi is a metric basis for (Z,ρ).

Figure 2: A pair (Yi,Yj) of x-compatible sets. Notice that if ji and |Yj||Yi|=t+1 then (YiYj,ρ) admits a strong embedding into t by Proposition 6 because Yi is a metric basis for (YiYj,ρ). Furthermore, given a realization of (Yi,ρ) such that the points are mapped into an affine subspace L of dimension t, the points of Yj are mapped into L and the mapping is unique. If (Yi{x},ρ) is strongly t-embeddable then x is embedded in L and the embedding is unique. If (Yi{x},ρ) is strongly (t+1)-embeddable, that is, Yi{x} is independent then by Proposition 1, the distance space is embedded into a subspace L of dimension t+1 containing L and the embedding of x is unique up to the reflection with respect to L.

Let xA. We say that Yi for i{1,,} is x-compatible if (Yi{x},ρ) is embeddable in d. Otherwise, Yi is x-incompatible. Similarly, for two distinct sets Yi,Yj for i,j{1,,}, the pair (Yi,Yj) is x-compatible if (YiYj{x},ρ) is embeddable in d, and (Yi,Yj) is x-incompatible otherwise (see Figure 2).

Notice that if Yi is x-incompatible then for each solution to the instance either x is an outlier, or Yi contains an outlier, or at least one distance between the points Yi{x} should be modified.

For each i{1,,l}, 1|Yi|d+1. We partition the family {Y1,,Y} of sets into parts C1,,Cd+1 according to their size (some parts may be empty). Formally, for h{1,,d+1}, Ch={Yi1i and |Yi|=h}.

Consider the part Ch for some h{1,,d+1}. For a point xA, we say that two x-compatible sets Yi,YjCh are x-equivalent if the pair (Yi,Yj) is x-compatible, that is, (YiYj{x},ρ) is embeddable into d. We show the following claim.

Claim 11 ().

x-equivalency is an equivalence relation on the family of x-compatible sets in Ch. Furthermore, if Yi,YjCh are x-equivalent and the pair (Yj,Ys) for YsCh with hh is x-compatible then (Yi,Ys) is x-compatible.

The definition of x-equivalency implies the following property.

Claim 12.

Let R1 and R2 be two distinct classes of x-equivalent x-compatible sets. Then for any solution (O,D) to the considered instance, either (i) xO, or (ii) for each set Yi in R1 there is a yYi such that yO or there exists a z such that {y,z}D, or (iii) for each set Yj in R2 there is a yYj such that yO or there exists a z such that {y,z}D.

We use this property in the following rule. Because |Y|>2(kO+2kM)(d+1)2 and each Yi is of size at most d+1, there is h{1,,d+1} with Ch being of size at least 2(kO+2kM)+1. Let h be the maximum value of h{1,,d+1} with this property.

Reduction Rule 3.

If there is xA such that each x-equivalence class R of x-compatible sets of Ch has size at most |Ch|(kO+2kM)1, then set X:=X{x}, A:=A{x}, and kO:=kO1.

After the exhaustive application of Reduction 3, we have that for each xA, there is an x-equivalence class Rx of x-compatible sets in Ch that contains at least |Ch|kO2kM(kO+2kM)+1 sets and all other classes in Ch contain at most kO+2kM elements combined. We say that Rx is large. We exhaustively apply the following rule using the fact that Rx contains at least (kO+2kM)+1 sets.

Reduction Rule 4.

If there is xA such that there are at least kO+kM+1 sets YjCh for any hh such that (Yi,Yj) is x-incompatible for some YiRx, then set X:=X{x}, A:=A{x}, and kO:=kO1.

In the next crucial step of our algorithm, we identify important sets Yi for {1,,} and delete all other sets that are irrelevant.

We apply the following marking procedure that labels important sets Yi for i{1,,}. For every xA, we do the following:

  • mark every Yi in Ch for h>h,

  • for each xA, mark arbitrary (kO+2kM)+1 sets of Rx,

  • for each xA and each h{1,,h}, mark each set YjCh such that there is a set YiRx such that the pair (Yi,Yj) is x-incompatible.

We set I={i1i and Yi is marked} and Y=iIYi, and define X=AY.

Claim 13 ().

The instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) of Euclidean Embedding Editing is equivalent to the instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d).

We have that the original instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) and the obtained instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) of Euclidean Embedding Editing are equivalent. We prove that |X|=𝒪(k2d2).

Claim 14 ().

|X|9(kO+kM)2(d+3)2.

Claim 15 ().

The overall running time of our algorithm is polynomial.

To compress the weights, we use the standard technique (see [16]) based on the algorithm of Frank and Tardos [19].  

4 FPT algorithm for Euclidean Embedding Editing parameterized by solution size and dimension

In this section, we give our FPT algorithm for Euclidean Embedding Editing.

We prepare for it by recalling the relevant definitions and facts from [4]. In what follows, let R be a real closed field and . Let 𝒫R[X1,,Xt] be a finite set of s polynomials each of degree at most .

Definition 16 ([4]).

A 𝒫-atom is one of P=0,P0, P0, P0, where P is a polynomial in 𝒫 and a quantifier-free 𝒫-formula is a formula constructed only from 𝒫-atoms together with the logical connectives , and ¬.

Proposition 17 (Theorem 13.13, [4]).

Let (X1)(Xt)F(X1,,Xt), be a sentence, where F(X1,,Xt) is a quantifier free 𝒫-formula. There exists an algorithm to decide the truth of the sentence with complexity111The measure of complexity here is the number of field operations. st+1𝒪(t) in D where D is the ring generated by the coefficients of the polynomials in 𝒫.

Definition 18.

Consider points x0,x1,,xr of distance space (X,ρ) and a set Z of pairs in X. For every i<j{0,,r}, if {xi,xj}Z, then ρ^i,j=ρ(xi,xj), otherwise ρ^i,j=zi,j where zi,j is an indeterminate. Then the Z-Augmented Cayley-Menger determinant is obtained from the Cayley-Menger determinant by replacing each ρi,j with ρ^i,j.

Observation 19.

Consider r+1 points x0,x1,,xr of distance space (X,ρ) and a set Z of pairs in X. The Z-Augmented Cayley-Menger determinant is a multi-variate polynomial with real coefficients, over the set {zi,ji<j,{xi,xj}Z} of indeterminates and where each monomial has degree at most 2(r+1).

Theorem 20.

Euclidean Embedding Editing is solvable in (d(kO+kM))𝒪(d+kM+kO)+n𝒪(1) time.

Proof.

Our FPT algorithm for Euclidean Embedding Editing works as follows. We begin by running the compression algorithm (Theorem 8). Following this, based on insights from Proposition 4, we reduce the task of solving the resulting instance to the task of testing a bounded number (in terms of k,d) of sentences with purely existential quantifications as described in Proposition 17, where s,,t are all bounded by functions of k and d.

We now formalize this idea. Let (𝒳=(X,ρ),wO,wM,W,kO,kM,d) be the given instance of Euclidean Embedding Editing. We run the algorithm of Theorem 8 on this instance. Recall that this algorithm either solves the instance or produces an equivalent instance (𝒳=(X,ρ),wO,wM,W,kO,kM,d) such that XX, and for k=kO+kM and a bound τ(k,d)=2𝒪((kd)12), we have that |X|=𝒪((kd)2). Note that since X has size bounded by 𝒪((kd)2), we may assume that kO is bounded by 𝒪((kd)2) and kM is bounded by 𝒪((kd)4).

Since we now have a bounded number of points in X, it is straightforward to guess a set ZOX of at most kO outliers and a subset ZM of pairs from X′′=XZO of size at most kM such that (i) wO(ZO)+wM(ZM)W and (ii) if there is a solution to the given instance, then there is a modification of the pairs in ZM such that (X′′,ρ′′) is embeddable into d, where ρ′′ differs from ρ (restricted to X′′) only among the pairs in ZM. We next guess the embedding dimension of (X′′,ρ′′), denoted by r. Note that rd.

By Proposition 4, we know that (X′′,ρ′′) is r-embeddable if and only if there is a subset X^={x0,,xr} of X′′ such that (i) (1)j+1CM(x0,x1,,xj)>0 for 1jr, and (ii) for any x,yX′′X^, CM(x0,x1,,xr,x)=CM(x0,x1,,xr,y)=CM(x0,x1,,xr,x,y)=0, where the matrices are derived from the distance matrix D(ρ′′).

We next guess X^. It remains to verify that there is a modification of the distances between each pair in Z such that the resulting space (X′′,ρ′′) is r-embeddable, i.e., satisfies the above two properties. To test this, we construct the formula F obtained by taking the conjunction of the atoms below and invoke Proposition 17 on the sentence ϕ obtained by prepending F with existential quantifications for every indeterminate {zi,ji<j,{xi,xj}Z}:

  1. 1.

    (1)j+1CMZM(x0,x1,,xj)>0, where 1jr.

  2. 2.

    CMZM(x0,x1,,xr,x)=0 for every xX′′X^.

  3. 3.

    CMZM(x0,x1,,xr,x,y)=0 for every x,yX′′X^.

  4. 4.

    zi,j0 for each indeterminate defined by ZM.

In the conclusion of the section, we remark that the parameter dependence can be improved for Euclidean Embedding with Outliers.

Theorem 21 ().

Euclidean Embedding with Outliers can be solved in time nO(1)min{(d+3)k,2d+k}.

5 FPT-Approximation for Unweighted Euclidean Embedding with Outliers

In this section, we give an FPT-time 2-approximation for Unweighted Euclidean Embedding with Outliers (UEEO) parameterized by the dimension d. Recall that in the optmization version of this problem, the input is (𝒳=(X,ρ),d) where (X,ρ) is a distance space and the goal is to output a d-outlier set of minimum size.

Our main idea is to give a randomized algorithm generating sequences U0,,Ud+1 and A1,,Ad+2, where each Ai is a d-outlier set that is associated with the independent set Ui1 of size i1. The construction of these sequences is roughly as follows, starting from U0= and proceeding iteratively. For each Ui1:

  1. 1.

    Any element that cannot be d-embedded along with Ui1 gets added to Ai.

  2. 2.

    We compute those elements that can be added to Ui1 without increasing its embedding dimension, and associate a graph with these such that every d-outlier set disjoint from Ui1 includes a vertex cover of this graph. We then 2-approximate the minimum vertex cover of this graph and add it to Ai.

  3. 3.

    For the remaining elements (call this set W):

    1. (a)

      Take all of W in Ai (which makes Ai a 2-approximation if at least half of W is in the solution),

    2. (b)

      Uniformly at random add an element of W to Ui1, resulting in Ui (which gives an independent set with sufficiently high probability if at least half of W is not in some fixed minimum d-outlier set).

Theorem 22.

Unweighted Euclidean Embedding with Outliers can be 2-approximated in 2dn𝒪(1) time by a randomized algorithm with one-sided constant probability of error.

Proof.

Let S𝗈𝗉𝗍 be a hypothetical minimum sized d-outlier set. Let X=XS𝗈𝗉𝗍. Let us assume that (X,ρ) is strongly d-embeddable. Else, it is strongly d-embeddable for some dd, in which case we guess d and set d:=d. In what follows, we give a randomized algorithm that aims to obtain a metric basis of (X,ρ) if certain pre-conditions are satisfied.

Alg-3(X,ρ,d):

  1. 1.

    Set U0:=.

  2. 2.

    For i=1 to d+2 do as follows.

    1. (a)

      Construct 𝒞𝖼𝗈𝗆𝗉i, the set of elements yXUi1 such that (U(i1){y},ρ) is (i2)-embeddable.

    2. (b)

      Construct 𝒞𝖽𝖾𝖿i, the set of elements yXUi1 such that (U(i1){y},ρ) is not d-embeddable.

    3. (c)

      Construct 𝒞𝗂𝗇𝖼𝗈𝗆𝗉i, the set of elements yX(Ui1𝒞𝖼𝗈𝗆𝗉i𝒞𝖽𝖾𝖿i) such that (U(i1){y},ρ) is not (i2)-embeddable.

    4. (d)

      Select an element ui uniformly at random from 𝒞𝗂𝗇𝖼𝗈𝗆𝗉i.

    5. (e)

      Set Ui:=U(i1){ui}.

Claim 23 ().

For every i{0,,d+1}, the following holds:

  1. 1.

    𝒞𝖽𝖾𝖿i and 𝒞𝖼𝗈𝗆𝗉i are disjoint.

  2. 2.

    Every d-outlier set of (X,ρ) disjoint from Ui contains 𝒞𝖽𝖾𝖿i+1.

  3. 3.

    If for all ji, 𝒞𝗂𝗇𝖼𝗈𝗆𝗉j, then Ui is an independent set of size i.

We next argue that for each i[d+1], the independent set Ui is preserved in X=XS𝗈𝗉𝗍, with sufficiently high probability if certain conditions are met.

Claim 24 ().

Let i[d+1] such that for all ji, |𝒞𝗂𝗇𝖼𝗈𝗆𝗉jS𝗈𝗉𝗍|<|𝒞𝗂𝗇𝖼𝗈𝗆𝗉j|2. Then, Pr[UiX]12i and Ui is an independent set of size i.

Next, we build a series of d-outlier sets A1,,Ad+2 as follows. For each i[d+2], create an instance of the Vertex Cover problem on the graph G𝖼𝗈𝗆𝗉i, where V(G𝖼𝗈𝗆𝗉i)=𝒞𝖼𝗈𝗆𝗉i, and there is an edge between two elements x,y𝒞𝖼𝗈𝗆𝗉i if they Ui1x,y is not (i2)-embeddable. Using a factor-2 approximation algorithm for Vertex Cover, obtain a set Qi such that |Qi|2|Oi|, where Oi is a minimum vertex cover of G𝖼𝗈𝗆𝗉i. Finally, set Ai:=Qi𝒞𝗂𝗇𝖼𝗈𝗆𝗉i𝒞𝖽𝖾𝖿i. Once every Ai is computed, we compute i=argmini[d+2]|Ai| and return Ai.

Claim 25 ().

Let [d+1] be the least integer such that for all j<, it holds that

|𝒞𝗂𝗇𝖼𝗈𝗆𝗉jS𝗈𝗉𝗍|<|𝒞𝗂𝗇𝖼𝗈𝗆𝗉j|2.

Then, A is a factor-2 approximation with probability at least 12.

Since we compute i=argmini[d+2]|Ai| and return Ai, by Claim 25 we are guaranteed that |Ai|2|S𝗈𝗉𝗍| with probability at least 12d+1.

Running Time Analysis.

We have already argued that we succeed with probability 12d+1. Thus, we can boost the success probability by independently running our polynomial-time algorithm 2d+1 times and returning the minimum size solution among these runs. Thus, the probability that the algorithm fails in all of the independent runs is upper bounded by (112d+1)2d+111e. Moreover, the total running time is upper bounded by 2dn𝒪(1).

6 Lower bounds

In this section, we complement our algorithmic results by proving computational lower bounds. These lower bounds also motivate our choice of parameterizations. First, we show that Euclidean Embedding Editing is ParaNP-hard when parameterized by d. More precisely, we show that even the unweighted versions of both Euclidean Embedding with Outliers and Euclidean Metric Violation Distance are NP-hard even for d=1.

Theorem 26 ().

Euclidean Embedding with Outliers is strongly NP-hard even for instances with unit weights, integer distances, and d=1.

Theorem 27 ().

Euclidean Metric Violation Distance is strongly NP-hard for the instances with unit weights, integer distances, and d=1.

Finally, we prove that Euclidean Embedding Editing is W[1]-hard when parameterized by kO+kM only. Moreover, the hardness holds even for the unweighted variant of Euclidean Embedding with Outliers.

Theorem 28 ().

Euclidean Embedding with Outliers parameterized by k is W[1]-hard for n-point instances with unit weights and integer distance matrices with entries in 𝒪(n).

References

  • [1] Jorge Alencar, Tibérius O. Bonates, Carlile Lavor, and Leo Liberti. An algorithm for realizing euclidean distance matrices. Electron. Notes Discret. Math., 50:397–402, 2015. doi:10.1016/J.ENDM.2015.07.066.
  • [2] Sanjeev Arora, James Lee, and Assaf Naor. Euclidean distortion and the sparsest cut. Journal of the American Mathematical Society, 21(1):1–21, 2008.
  • [3] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings and graph partitioning. J. ACM, 56(2):5, 2009.
  • [4] Saugata Basu, Richard Pollack, and Marie-Françoise Roy. Algorithms in Real Algebraic Geometry (Algorithms and Computation in Mathematics). Springer-Verlag, Berlin, Heidelberg, 2006.
  • [5] Leonard Mascot Blumenthal. Theory and applications of distance geometry. Clarendon Press, Oxford, 1953.
  • [6] Ingwer Borg and Patrick J.F. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, 2005.
  • [7] Arthur Cayley. On the theory of determinants. Philosophical Magazine, 19:1–16, 1841.
  • [8] Moses Charikar and Ruiquan Gao. Improved approximations for ultrametric violation distance. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1704–1737. SIAM, 2024. doi:10.1137/1.9781611977912.68.
  • [9] Vincent Cohen-Addad, Chenglin Fan, Euiwoong Lee, and Arnaud De Mesmay. Fitting metrics and ultrametrics with minimum disagreements. In Proceedings of the 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 301–311. IEEE, 2022. doi:10.1109/FOCS54457.2022.00035.
  • [10] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
  • [11] J. Dattorro. Convex Optimization & Euclidean Distance Geometry. Meboo Publishing USA, 2008.
  • [12] E. D. Demaine, F. Gomez-Martin, H. Meijer, D. Rappaport, P. Taslakian, G. T. Toussaint, T. Winograd, and D. R. Wood. The distance geometry of music. Computational Geometry, 42(5):429–454, July 2009. doi:10.1016/J.COMGEO.2008.04.005.
  • [13] Michel Deza, Monique Laurent, and Robert Weismantel. Geometry of cuts and metrics, volume 2. Springer, 1997.
  • [14] L. Doherty, K. Pister, and L. El Ghaoui. Convex position estimation in wireless sensor networks. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), volume 3, pages 1655–1663. IEEE, 2001.
  • [15] Ivan Dokmanic, Reza Parhizkar, Juri Ranieri, and Martin Vetterli. Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Signal Processing Magazine, 32(6):12–30, 2015. doi:10.1109/MSP.2015.2398954.
  • [16] Michael Etscheid, Stefan Kratsch, Matthias Mnich, and Heiko Röglin. Polynomial kernels for weighted problems. Journal of Computer System Sciences, 84:1–10, 2017. doi:10.1016/J.JCSS.2016.06.004.
  • [17] B. Everitt and S. Rabe-Hesketh. The Analysis of Proximity Data. Arnold, London, 1997.
  • [18] Chenglin Fan, Benjamin Raichel, and Gregory Van Buskirk. Metric violation distance: Hardness and approximation. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 196–209. SIAM, 2018. doi:10.1137/1.9781611975031.14.
  • [19] András Frank and Éva Tardos. An application of simultaneous diophantine approximation in combinatorial optimization. Combinatorica, 7(1):49–65, 1987. doi:10.1007/BF02579200.
  • [20] Anna C. Gilbert and Lalit Jain. If it ain’t broke, don’t fix it: Sparse metric repair. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 612–619. IEEE, 2017. doi:10.1109/ALLERTON.2017.8262793.
  • [21] T. F. Havel and K. Wüthrich. An evaluation of the combined use of nuclear magnetic resonance and distance geometry for the determination of protein conformations in solution. Journal of Molecular Biology, 182(2):281–294, 1985.
  • [22] Piotr Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 10–33. IEEE, 2001. doi:10.1109/SFCS.2001.959878.
  • [23] Piotr Indyk and Jiri Matousek. Low-distortion embeddings of finite metric spaces. In in Handbook of Discrete and Computational Geometry, pages 177–196. CRC Press, 2004.
  • [24] Viren Jain and Lawrence K. Saul. Exploratory analysis and visualization of speech and music by locally linear embedding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3, pages 984–987. IEEE, 2004. doi:10.1109/ICASSP.2004.1326712.
  • [25] Leo Liberti and Carlile Lavor. Euclidean distance geometry, volume 3. Springer, 2017.
  • [26] Nathan Linial. Finite metric-spaces – combinatorics, geometry and algorithms. In Proceedings of the International Congress of Mathematicians, Vol. III, pages 573–586, Beijing, 2002. Higher Ed. Press.
  • [27] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215–245, 1995. doi:10.1007/BF01200757.
  • [28] Karl Menger. Untersuchungen über allgemeine metrik. Mathematische Annalen, 100:75–163, 1928.
  • [29] N. Patwari, J. N. Ash, S. Kyperountas, A. O. Hero, R. L. Moses, and N. S. Correal. Locating the nodes: Cooperative localization in wireless sensor networks. IEEE Signal Processing Magazine, 22(4):54–69, July 2005. doi:10.1109/MSP.2005.1458287.
  • [30] Isaac J. Schoenberg. Remarks to maurice frechet’s article “sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable sur l’espace de hilbert”. Annals of Mathematics, 36:724–732, 1935.
  • [31] R. Shepard. The analysis of proximities: multidimensional scaling with an unknown distance function, part i. Psychometrika, 27:125–140, 1962.
  • [32] Anastasios Sidiropoulos, Dingkang Wang, and Yusu Wang. Metric embeddings with outliers. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 670–689. SIAM, 2017. doi:10.1137/1.9781611974782.43.
  • [33] Manfred J. Sippl and Harold A. Scheraga. Solution of the embedding problem and decomposition of symmetric matrices. Proc. Nat. Acad. Sci. U.S.A., 82(8):2197–2201, 1985. doi:10.1073/pnas.82.8.2197.
  • [34] W. Torgerson. Theory and Methods of Scaling. Wiley, New York, 1958.
  • [35] Warren S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.
  • [36] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of image manifolds by semidefinite programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 988–995. IEEE, 2004. doi:10.1109/CVPR.2004.256.