The classic k-d tree data structure continues to be widely used in spite of its vulnerability to the so-called curse of dimensionality. Here we provide a rigorous explanation: for randomly rotated data, a k-d tree adapts to the intrinsic dimension of the data and is not affected by the ambient dimension, thus keeping the data structure efficient for objects such as low-dimensional manifolds and sparse data. The main insight of the analysis can be used as an algorithmic pre-processing step to realize the same benefit: rotate the data randomly; then build a k-d tree. Our work can be seen as a refinement of Random Projection trees [Dasgupta 2008], which also adapt to intrinsic dimension but incur higher traversal costs as the resulting cells are polyhedra and not cuboids. Using k-d trees after a random rotation results in cells that are cuboids, thus preserving the traversal efficiency of standard k-d trees.
@InProceedings{vempala:LIPIcs.FSTTCS.2012.48, author = {Vempala, Santosh S.}, title = {{Randomly-oriented k-d Trees Adapt to Intrinsic Dimension}}, booktitle = {IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)}, pages = {48--57}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-47-7}, ISSN = {1868-8969}, year = {2012}, volume = {18}, editor = {D'Souza, Deepak and Radhakrishnan, Jaikumar and Telikepalli, Kavitha}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2012.48}, URN = {urn:nbn:de:0030-drops-38470}, doi = {10.4230/LIPIcs.FSTTCS.2012.48}, annote = {Keywords: Data structures, Nearest Neighbors, Intrinsic Dimension, k-d Tree} }
Feedback for Dagstuhl Publishing