Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license
The classic k-d tree data structure continues to be widely used in spite of its vulnerability to the so-called curse of dimensionality. Here we provide a rigorous explanation: for randomly rotated data, a k-d tree adapts to the intrinsic dimension of the data and is not affected by the ambient dimension, thus keeping the data structure efficient for objects such as low-dimensional manifolds and sparse data. The main insight of the analysis can be used as an algorithmic pre-processing step to realize the same benefit: rotate the data randomly; then build a k-d tree. Our work can be seen as a refinement of Random Projection trees [Dasgupta 2008], which also adapt to intrinsic dimension but incur higher traversal costs as the resulting cells are polyhedra and not cuboids. Using k-d trees after a random rotation results in cells that are cuboids, thus preserving the traversal efficiency of standard k-d trees.
@InProceedings{vempala:LIPIcs.FSTTCS.2012.48,
author = {Vempala, Santosh S.},
title = {{Randomly-oriented k-d Trees Adapt to Intrinsic Dimension}},
booktitle = {IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)},
pages = {48--57},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-939897-47-7},
ISSN = {1868-8969},
year = {2012},
volume = {18},
editor = {D'Souza, Deepak and Radhakrishnan, Jaikumar and Telikepalli, Kavitha},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2012.48},
URN = {urn:nbn:de:0030-drops-38470},
doi = {10.4230/LIPIcs.FSTTCS.2012.48},
annote = {Keywords: Data structures, Nearest Neighbors, Intrinsic Dimension, k-d Tree}
}