DagSemProc.04301.3.pdf
- Filesize: 427 kB
- 26 pages
The query efficiency of a data structure that stores a set of objects, can normally be assessed by analysing the number of objects, pointers etc. looked at when answering a query. However, if the data structure is too big to fit in main memory, data may need to be fetched from disk. In that case, the query efficiency is easily dominated by moving the disk head to the correct locations, rather than by reading the data itself. To reduce the number of disk accesses, once can group the data into blocks, and strive to bound the number of different blocks accessed rather than the number of individual data objects read. An R-tree is a general-purpose data structur that stores a hierarchical grouping of geometric objects into blocks. Many heuristics have been designed to determine which objects should be grouped together, but none of these heuristics could give a guarantee on the resulting worst-case query time. We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query by accessing $O((N/B)^{1-1/d} + T/B)$ blocks, where $N$ is the number of $d$-dimensional objects stored, $B$ is the number of objects per block, and $T$ is the number of objects whose bounding boxes intersect the query window. This is provably asymptotically optimal. Experiments show that the PR-tree performs similar to the best known heuristics on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Feedback for Dagstuhl Publishing