Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1

Emiris, Ioannis Z.; Margonis, Vasilis; Psarros, Ioannis

doi:10.4230/LIPIcs.APPROX-RANDOM.2019.47

File

LIPIcs.APPROX-RANDOM.2019.47.pdf

Filesize: 0.48 MB
13 pages

Document Identifiers

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.47
URN: urn:nbn:de:0030-drops-112628

Author Details

Ioannis Z. Emiris

Department of Informatics & Telecommunications, National & Kapodistrian University of Athens, Greece
ATHENA Research & Innovation Center, Greece

Vasilis Margonis

Department of Informatics & Telecommunications, National & Kapodistrian University of Athens, Greece

Ioannis Psarros

Institute of Computer Science, University of Bonn, Germany

Acknowledgements

IZE is member of team AROMATH, joint between INRIA Sophia-Antipolis and NKUA. IP thanks Robert Krauthgamer for useful discussions on the topic.

Cite AsGet BibTex

Ioannis Z. Emiris, Vasilis Margonis, and Ioannis Psarros. Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.47

Abstract

Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest.

Subject Classification

ACM Subject Classification

Theory of computation → Nearest neighbor algorithms
Mathematics of computing → Dimensionality reduction

Keywords

Approximate nearest neighbor
Manhattan metric
randomized embedding

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

D. Achlioptas. Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. J. Comput. Syst. Sci., 66(4):671-687, 2003.
N. Ailon and B. Chazelle. The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput., 39(1):302-322, May 2009.
E. Anagnostopoulos, I. Z. Emiris, and I. Psarros. Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor. ACM Trans. Algorithms, 14(2):18:1-18:21, 2018. URL: https://doi.org/10.1145/3178540.
A. Andoni and P. Indyk. Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Commun. ACM, 51(1):117-122, 2008.
A. Andoni, T. Laarhoven, I. P. Razenshteyn, and E. Waingarten. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors. In Proc. ACM-SIAM Symposium on Discrete Algorithms, SODA, Barcelona, Spain, pages 47-66, 2017.
A. Andoni, H. L. Nguyen, A. Nikolov, I. P. Razenshteyn, and E. Waingarten. Approximate near neighbors for general symmetric norms. In Proc. ACM Symposium on Theory of Computing, STOC, Montreal, Canada, pages 902-913, 2017.
Y. Bartal and L. A. Gottlieb. Dimension Reduction Techniques for 𝓁_p, (1 < p < 2), with Applications. In 32nd International Symposium on Computational Geometry, SoCG 2016, June 14-18, 2016, Boston, MA, USA, pages 16:1-16:15, 2016. URL: https://doi.org/10.4230/LIPIcs.SoCG.2016.16.
Y. Bartal and L. A. Gottlieb. Approximate Nearest Neighbor Search for 𝓁_p-Spaces (2 < p < ∞) via Embeddings. In Proc. LATIN: Theoretical Informatics - 13th Latin American Symp., Buenos Aires, Argentina, pages 120-133, 2018. URL: https://doi.org/10.1007/978-3-319-77404-6_10.
R. Cole and L. A. Gottlieb. Searching Dynamic Point Sets in Spaces with Bounded Doubling Dimension. In Proc. ACM Symp. Theory of Computing, pages 574-583, New York, USA, 2006. ACM.
D. Eppstein, S. Har-Peled, and A. Sidiropoulos. Approximate Greedy Clustering and Distance Selection for Graph Metrics. CoRR, abs/1507.01555, 2015. URL: http://arxiv.org/abs/1507.01555.
S. Har-Peled, P. Indyk, and R. Motwani. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Theory of Computing, 8(1):321-350, 2012. URL: https://doi.org/10.4086/toc.2012.v008a014.
S. Har-Peled and M. Mendel. Fast Construction of Nets in Low Dimensional Metrics, and Their Applications. In Proc. Symp. Computational Geometry, pages 150-158, 2005.
P. Indyk. On Approximate Nearest Neighbors under l_infinity Norm. J. Comput. Syst. Sci., 63(4):627-638, 2001.
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. URL: https://doi.org/10.1145/1147954.1147955.
P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proc. ACM Symp. Theory of Computing, pages 604-613, 1998.
P. Indyk and A. Naor. Nearest-neighbor-preserving Embeddings. ACM Trans. Algorithms, 3(3), 2007.
R. Krauthgamer and J. R. Lee. Navigating Nets: Simple Algorithms for Proximity Search. In Proc. 15th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'04, pages 798-807, 2004.
J.R. Lee, M. Mendel, and A. Naor. Metric structures in L_1: dimension, snowflakes, and average distortion. Eur. J. Comb., 26(8):1180-1190, 2005. URL: https://doi.org/10.1016/j.ejc.2004.07.002.