Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Range-aggregate query is an important type of queries with numerous applications. It aims to obtain some structural information (defined by an aggregate function F(⋅)) of the points (from a point set P) inside a given query range B. In this paper, we study the range-aggregate query problem in high dimensional space for two aggregate functions: (1) F(P ∩ B) is the farthest point in P ∩ B to a query point q in ℝ^d and (2) F(P ∩ B) is the minimum enclosing ball (MEB) of P ∩ B. For problem (1), called In-Range Farthest Point (IFP) Query, we develop a bi-criteria approximation scheme: For any ε > 0 that specifies the approximation ratio of the farthest distance and any γ > 0 that measures the "fuzziness" of the query range, we show that it is possible to pre-process P into a data structure of size Õ_{ε,γ}(dn^{1+ρ}) in Õ_{ε,γ}(dn^{1+ρ}) time such that given any ℝ^d query ball B and query point q, it outputs in Õ_{ε,γ}(dn^ρ) time a point p that is a (1-ε)-approximation of the farthest point to q among all points lying in a (1+γ)-expansion B(1+γ) of B, where 0 < ρ < 1 is a constant depending on ε and γ and the hidden constants in big-O notations depend only on ε, γ and Polylog(nd). For problem (2), we show that the IFP result can be applied to develop query scheme with similar time and space complexities to achieve a (1+ε)-approximation for MEB. To the best of our knowledge, these are the first theoretical results on such high dimensional range-aggregate query problems. Our results are based on several new techniques, such as multi-scale construction and ball difference range query, which are interesting in their own rights and could be potentially used to solve other range-aggregate problems in high dimensional space.

Ziyun Huang and Jinhui Xu. In-Range Farthest Point Queries and Related Problem in High Dimensions. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 75:1-75:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ICALP.2022.75, author = {Huang, Ziyun and Xu, Jinhui}, title = {{In-Range Farthest Point Queries and Related Problem in High Dimensions}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {75:1--75:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.75}, URN = {urn:nbn:de:0030-drops-164167}, doi = {10.4230/LIPIcs.ICALP.2022.75}, annote = {Keywords: Farthest Point Query, Range Aggregate Query, Minimum Enclosing Ball, Approximation, High Dimensional Space} }

Document

**Published in:** LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

In this paper, we present a framework for designing FPT approximation algorithms for many k-clustering problems. Our results are based on a new technique for reducing search spaces. A reduced search space is a small subset of the input data that has the guarantee of containing k clients close to the facilities opened in an optimal solution for any clustering problem we consider. We show, somewhat surprisingly, that greedily sampling O(k) clients yields the desired reduced search space, based on which we obtain FPT(k)-time algorithms with improved approximation guarantees for problems such as capacitated clustering, lower-bounded clustering, clustering with service installation costs, fault tolerant clustering, and priority clustering.

Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. A Unified Framework of FPT Approximation Algorithms for Clustering Problems. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 5:1-5:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ISAAC.2020.5, author = {Feng, Qilong and Zhang, Zhen and Huang, Ziyun and Xu, Jinhui and Wang, Jianxin}, title = {{A Unified Framework of FPT Approximation Algorithms for Clustering Problems}}, booktitle = {31st International Symposium on Algorithms and Computation (ISAAC 2020)}, pages = {5:1--5:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-173-3}, ISSN = {1868-8969}, year = {2020}, volume = {181}, editor = {Cao, Yixin and Cheng, Siu-Wing and Li, Minming}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.5}, URN = {urn:nbn:de:0030-drops-133495}, doi = {10.4230/LIPIcs.ISAAC.2020.5}, annote = {Keywords: clustering, approximation algorithms, fixed-parameter tractability} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

In this paper, we study the following pattern search problem: Given a pair of point sets A and B in fixed dimensional space R^d, with |B| = n, |A| = m and n >= m, the pattern search problem is to find the translations T’s of A such that each of the identified translations induces a matching between T(A) and a subset B' of B with cost no more than some given threshold, where the cost is defined as the minimum bipartite matching cost of T(A) and B'. We present a novel algorithm to produce a small set of candidate translations for the pattern search problem. For any B' subseteq B with |B'| = |A|, there exists at least one translation T in the candidate set such that the minimum bipartite matching cost between T(A) and B' is no larger than (1+epsilon) times the minimum bipartite matching cost between A and B' under any translation (i.e., the optimal translational matching cost). We also show that there exists an alternative solution to this problem, which constructs a candidate set of size O(n log^2 n) in O(n log^2 n) time with high probability of success. As a by-product of our construction, we obtain a weak epsilon-net for hypercube ranges, which significantly improves the construction time and the size of the candidate set. Our technique can be applied to a number of applications, including the translational pattern matching problem.

Ziyun Huang, Qilong Feng, Jianxin Wang, and Jinhui Xu. Small Candidate Set for Translational Pattern Search. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 26:1-26:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ISAAC.2019.26, author = {Huang, Ziyun and Feng, Qilong and Wang, Jianxin and Xu, Jinhui}, title = {{Small Candidate Set for Translational Pattern Search}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {26:1--26:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.26}, URN = {urn:nbn:de:0030-drops-115222}, doi = {10.4230/LIPIcs.ISAAC.2019.26}, annote = {Keywords: Bipartite matching, Alignment, Discretization, Approximate algorithm} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Clustering is a fundamental problem in unsupervised learning. In many real-world applications, the to-be-clustered data often contains various types of noises and thus needs to be removed from the learning process. To address this issue, we consider in this paper two variants of such clustering problems, called k-median with m outliers and k-means with m outliers. Existing techniques for both problems either incur relatively large approximation ratios or can only efficiently deal with a small number of outliers. In this paper, we present improved solution to each of them for the case where k is a fixed number and m could be quite large. Particularly, we gave the first PTAS for the k-median problem with outliers in Euclidean space R^d for possibly high m and d. Our algorithm runs in O(nd((1/epsilon)(k+m))^(k/epsilon)^O(1)) time, which considerably improves the previous result (with running time O(nd(m+k)^O(m+k) + (1/epsilon)k log n)^O(1))) given by [Feldman and Schulman, SODA 2012]. For the k-means with outliers problem, we introduce a (6+epsilon)-approximation algorithm for general metric space with running time O(n(beta (1/epsilon)(k+m))^k) for some constant beta>1. Our algorithm first uses the k-means++ technique to sample O((1/epsilon)(k+m)) points from input and then select the k centers from them. Compared to the more involving existing techniques, our algorithms are much simpler, i.e., using only random sampling, and achieving better performance ratios.

Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. Improved Algorithms for Clustering with Outliers. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 61:1-61:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ISAAC.2019.61, author = {Feng, Qilong and Zhang, Zhen and Huang, Ziyun and Xu, Jinhui and Wang, Jianxin}, title = {{Improved Algorithms for Clustering with Outliers}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {61:1--61:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.61}, URN = {urn:nbn:de:0030-drops-115573}, doi = {10.4230/LIPIcs.ISAAC.2019.61}, annote = {Keywords: Clustering with Outliers, Approximation, Random Sampling} }

Document

**Published in:** LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

In this paper, we consider the following sum query problem:
Given a point set P in R^d, and a distance-based function f(p,q) (i.e. a function of the distance between p and q) satisfying some general properties,
the goal is to develop a data structure and a query algorithm for efficiently computing a (1+epsilon)-approximate solution to the sum
sum_{p in P} f(p,q) for any query point q in R^d and any small constant epsilon>0. Existing techniques for this problem are mainly based on some core-set techniques which often have difficulties to deal with functions with local domination property. Based on several new insights to this problem, we develop in this paper a novel technique to overcome these encountered difficulties. Our algorithm is capable of answering queries with high success probability in time no more than ~O_{epsilon,d}(n^{0.5 + c}), and the underlying
data structure can be constructed in ~O_{epsilon,d}(n^{1+c}) time for any c>0, where the hidden constant has only polynomial dependence on 1/epsilon and d.
Our technique is simple and can be easily implemented for practical purpose.

Ziyun Huang and Jinhui Xu. An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ISAAC.2017.47, author = {Huang, Ziyun and Xu, Jinhui}, title = {{An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions}}, booktitle = {28th International Symposium on Algorithms and Computation (ISAAC 2017)}, pages = {47:1--47:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-054-5}, ISSN = {1868-8969}, year = {2017}, volume = {92}, editor = {Okamoto, Yoshio and Tokuyama, Takeshi}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.47}, URN = {urn:nbn:de:0030-drops-82483}, doi = {10.4230/LIPIcs.ISAAC.2017.47}, annote = {Keywords: Sum Query, Distance-based Function, Local Domination, High Dimen- sions, Data Structure} }

Document

**Published in:** LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

In this paper, we consider the distributed version of Support Vector Machine (SVM) under the coordinator model, where all input data (i.e., points in R^d space) of SVM are arbitrarily distributed among k nodes in some network with a coordinator which can communicate with all nodes. We investigate two variants of this problem, with and without outliers. For distributed SVM without outliers, we prove a lower bound on the communication complexity and give a distributed (1-epsilon)-approximation algorithm to reach this lower bound, where epsilon is a user specified small constant. For distributed SVM with outliers, we present a (1-epsilon)-approximation algorithm to explicitly remove the influence of outliers. Our algorithm is based on a deterministic distributed top t selection algorithm with communication complexity of O(k log (t)) in the coordinator model. Experimental results on benchmark datasets confirm the theoretical guarantees of our algorithms.

Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu. Distributed and Robust Support Vector Machine. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 54:1-54:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ISAAC.2016.54, author = {Liu, Yangwei and Ding, Hu and Huang, Ziyun and Xu, Jinhui}, title = {{Distributed and Robust Support Vector Machine}}, booktitle = {27th International Symposium on Algorithms and Computation (ISAAC 2016)}, pages = {54:1--54:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-026-2}, ISSN = {1868-8969}, year = {2016}, volume = {64}, editor = {Hong, Seok-Hee}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.54}, URN = {urn:nbn:de:0030-drops-68221}, doi = {10.4230/LIPIcs.ISAAC.2016.54}, annote = {Keywords: Distributed Algorithm, Communication Complexity, Robust Algorithm, SVM} }

Document

**Published in:** LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

Truth Discovery is an important problem arising in data analytics related fields such as data mining, database, and big data. It concerns about finding the most trustworthy information from a dataset acquired from a number of unreliable sources. Due to its importance, the problem has been extensively studied in recent years and a number techniques have already been proposed. However, all of them are of heuristic nature and do not have any quality guarantee. In this paper, we formulate the problem as a high dimensional geometric optimization problem, called Entropy based Geometric Variance. Relying on a number of novel geometric techniques (such as Log-Partition and Modified Simplex Lemma), we further discover new insights to this problem. We show, for the first time, that the truth discovery problem can be solved with guaranteed quality of solution. Particularly, we show that it is possible to achieve a (1+eps)-approximation within nearly linear time under some reasonable assumptions. We expect that our algorithm will be useful for other data related applications.

Hu Ding, Jing Gao, and Jinhui Xu. Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 34:1-34:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.SoCG.2016.34, author = {Ding, Hu and Gao, Jing and Xu, Jinhui}, title = {{Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance}}, booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)}, pages = {34:1--34:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-009-5}, ISSN = {1868-8969}, year = {2016}, volume = {51}, editor = {Fekete, S\'{a}ndor and Lubiw, Anna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.34}, URN = {urn:nbn:de:0030-drops-59264}, doi = {10.4230/LIPIcs.SoCG.2016.34}, annote = {Keywords: geometric optimization, data mining, high dimension, entropy} }