ACM Other Conferences

10.1145/acmotherconferences

0000000

10.5555/0000000

Proceedings of the Dagstuhl Seminar Proceedings, Volume 4301

10.4230/DagSemProc.04301.3

The Priority R-Tree: A Practically Efficient and Worst-Case-Optimal R-Tree

Arge

Lars

Author de Berg

Mark

Author Haverkort

Herman J.

Author Yi

Author

01 07 2005

1 26

The query efficiency of a data structure that stores a set of objects, can normally be assessed by analysing the number of objects, pointers etc. looked at when answering a query. However, if the data structure is too big to fit in main memory, data may need to be fetched from disk. In that case, the query efficiency is easily dominated by moving the disk head to the correct locations, rather than by reading the data itself.

To reduce the number of disk accesses, once can group the data into blocks, and strive to bound the number of different blocks accessed rather than the number of individual data objects read. An R-tree is a general-purpose data structur that stores a hierarchical grouping of geometric objects into blocks. Many heuristics have been designed to determine which objects should be grouped together, but none of these heuristics could give a guarantee on the resulting worst-case query time.

We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query by accessing $O((N/B)^{1-1/d} + T/B)$ blocks, where $N$ is the number of $d$-dimensional objects stored, $B$ is the number of objects per block, and $T$ is the number of objects whose bounding boxes intersect the query window. This is provably asymptotically optimal. Experiments show that the PR-tree performs similar to the best known heuristics on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.

R-Trees

<book-part-wrapper xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" content-type="research-article">

<collection-meta collection-type="book-series">

<collection-id collection-id-type="doi">10.1145/acmotherconferences</collection-id>

<title-group>

<title>ACM Other Conferences</title>

</title-group>

</collection-meta>

<book-meta>

<book-id book-id-type="acm-id">0000000</book-id>

<book-id book-id-type="doi">10.5555/0000000</book-id>

<book-title-group>

<book-title>Proceedings of the Dagstuhl Seminar Proceedings, Volume 4301</book-title>

<alt-title alt-title-type="acronym"/>

</book-title-group>

</book-meta>

<book-part book-part-type="chapter" xml:lang="en">

<book-part-meta>

<book-part-id book-part-id-type="doi">10.4230/DagSemProc.04301.3</book-part-id>

<book-part-id book-part-id-type="article-no">3</book-part-id>

<subj-group subj-group-type="ccs2012"/>

<title-group>

<title>The Priority R-Tree: A Practically Efficient and Worst-Case-Optimal R-Tree</title>

</title-group>

<contrib-group>

<name>

<given-names>Lars</given-names>

</name>

<role>Author</role>

</contrib>

<name>

<given-names>Mark</given-names>

</name>

<role>Author</role>

</contrib>

<name>

<surname>Haverkort</surname>

<given-names>Herman J.</given-names>

</name>

<role>Author</role>

</contrib>

<name>

<given-names>Ke</given-names>

</name>

<role>Author</role>

</contrib>

</contrib-group>

<pub-date date-type="publication">

</pub-date>

The query efficiency of a data structure that stores a set of objects, can normally be assessed by analysing the number of objects, pointers etc. looked at when answering a query. However, if the data structure is too big to fit in main memory, data may need to be fetched from disk. In that case, the query efficiency is easily dominated by moving the disk head to the correct locations, rather than by reading the data itself.

To reduce the number of disk accesses, once can group the data into blocks, and strive to bound the number of different blocks accessed rather than the number of individual data objects read. An R-tree is a general-purpose data structur that stores a hierarchical grouping of geometric objects into blocks. Many heuristics have been designed to determine which objects should be grouped together, but none of these heuristics could give a guarantee on the resulting worst-case query time.

We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query by accessing $O((N/B)^{1-1/d} + T/B)$ blocks, where $N$ is the number of $d$-dimensional objects stored, $B$ is the number of objects per block, and $T$ is the number of objects whose bounding boxes intersect the query window. This is provably asymptotically optimal. Experiments show that the PR-tree performs similar to the best known heuristics on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.

</abstract>

<kwd-group>

<kwd>R-Trees</kwd>

</kwd-group>

</book-part-meta>

<back>

<ref-list specific-use="unparsed"/>

</back>

</book-part>

</book-part-wrapper>