Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1

Authors Ioannis Z. Emiris, Vasilis Margonis, Ioannis Psarros



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2019.47.pdf
  • Filesize: 0.48 MB
  • 13 pages

Document Identifiers

Author Details

Ioannis Z. Emiris
  • Department of Informatics & Telecommunications, National & Kapodistrian University of Athens, Greece
  • ATHENA Research & Innovation Center, Greece
Vasilis Margonis
  • Department of Informatics & Telecommunications, National & Kapodistrian University of Athens, Greece
Ioannis Psarros
  • Institute of Computer Science, University of Bonn, Germany

Acknowledgements

IZE is member of team AROMATH, joint between INRIA Sophia-Antipolis and NKUA. IP thanks Robert Krauthgamer for useful discussions on the topic.

Cite AsGet BibTex

Ioannis Z. Emiris, Vasilis Margonis, and Ioannis Psarros. Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.47

Abstract

Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest.

Subject Classification

ACM Subject Classification
  • Theory of computation → Nearest neighbor algorithms
  • Mathematics of computing → Dimensionality reduction
Keywords
  • Approximate nearest neighbor
  • Manhattan metric
  • randomized embedding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. D. Achlioptas. Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. Google Scholar
  2. N. Ailon and B. Chazelle. The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput., 39(1):302-322, May 2009. Google Scholar
  3. E. Anagnostopoulos, I. Z. Emiris, and I. Psarros. Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor. ACM Trans. Algorithms, 14(2):18:1-18:21, 2018. URL: https://doi.org/10.1145/3178540.
  4. A. Andoni and P. Indyk. Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Commun. ACM, 51(1):117-122, 2008. Google Scholar
  5. A. Andoni, T. Laarhoven, I. P. Razenshteyn, and E. Waingarten. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors. In Proc. ACM-SIAM Symposium on Discrete Algorithms, SODA, Barcelona, Spain, pages 47-66, 2017. Google Scholar
  6. A. Andoni, H. L. Nguyen, A. Nikolov, I. P. Razenshteyn, and E. Waingarten. Approximate near neighbors for general symmetric norms. In Proc. ACM Symposium on Theory of Computing, STOC, Montreal, Canada, pages 902-913, 2017. Google Scholar
  7. Y. Bartal and L. A. Gottlieb. Dimension Reduction Techniques for 𝓁_p, (1 < p < 2), with Applications. In 32nd International Symposium on Computational Geometry, SoCG 2016, June 14-18, 2016, Boston, MA, USA, pages 16:1-16:15, 2016. URL: https://doi.org/10.4230/LIPIcs.SoCG.2016.16.
  8. Y. Bartal and L. A. Gottlieb. Approximate Nearest Neighbor Search for 𝓁_p-Spaces (2 < p < ∞) via Embeddings. In Proc. LATIN: Theoretical Informatics - 13th Latin American Symp., Buenos Aires, Argentina, pages 120-133, 2018. URL: https://doi.org/10.1007/978-3-319-77404-6_10.
  9. R. Cole and L. A. Gottlieb. Searching Dynamic Point Sets in Spaces with Bounded Doubling Dimension. In Proc. ACM Symp. Theory of Computing, pages 574-583, New York, USA, 2006. ACM. Google Scholar
  10. D. Eppstein, S. Har-Peled, and A. Sidiropoulos. Approximate Greedy Clustering and Distance Selection for Graph Metrics. CoRR, abs/1507.01555, 2015. URL: http://arxiv.org/abs/1507.01555.
  11. S. Har-Peled, P. Indyk, and R. Motwani. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Theory of Computing, 8(1):321-350, 2012. URL: https://doi.org/10.4086/toc.2012.v008a014.
  12. S. Har-Peled and M. Mendel. Fast Construction of Nets in Low Dimensional Metrics, and Their Applications. In Proc. Symp. Computational Geometry, pages 150-158, 2005. Google Scholar
  13. P. Indyk. On Approximate Nearest Neighbors under l_infinity Norm. J. Comput. Syst. Sci., 63(4):627-638, 2001. Google Scholar
  14. P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. URL: https://doi.org/10.1145/1147954.1147955.
  15. P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proc. ACM Symp. Theory of Computing, pages 604-613, 1998. Google Scholar
  16. P. Indyk and A. Naor. Nearest-neighbor-preserving Embeddings. ACM Trans. Algorithms, 3(3), 2007. Google Scholar
  17. R. Krauthgamer and J. R. Lee. Navigating Nets: Simple Algorithms for Proximity Search. In Proc. 15th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'04, pages 798-807, 2004. Google Scholar
  18. J.R. Lee, M. Mendel, and A. Naor. Metric structures in L_1: dimension, snowflakes, and average distortion. Eur. J. Comb., 26(8):1180-1190, 2005. URL: https://doi.org/10.1016/j.ejc.2004.07.002.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail