Document

# Data Structures for Categorical Path Counting Queries

## File

LIPIcs.CPM.2021.15.pdf
• Filesize: 0.81 MB
• 17 pages

## Cite As

Meng He and Serikzhan Kazi. Data Structures for Categorical Path Counting Queries. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.15

## Abstract

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1,2,…,σ}. We preprocess the tree T in order to support {categorical path counting queries}, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O(√n lg((lg σ)/(lg w))), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n+ n²/t²)-word, O(t lg ((lg σ)/(lg w))) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O((lg n)/(lg lg n)) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n +(n/t)⁴)-word data structure with O(t lg lg n) query time, or an O(n+(n/t)⁴)-word} data structure with O(t lg^ε n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n^{3/4} lg^ε n) query time. We then extend the approach to the trees weighted with vectors from [n]^{d}, where d is a constant integer greater than or equal to 2. We present a data structure with O(n lg^{d-1+ε} n + (n/t)^{2d+2}) words of space and O(t (lg^{d-1} n)/((lg lg n)^{d-2})) query time. For an O(n⋅polylog n)-space solution, one thus has O(n^{{2d+1}/{2d+2}}⋅polylog n) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on {sketching}. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1±ε)-approximation (i.e. within 1±ε of the true answer) of the number of distinct categories on the given path, with probability 1-δ, where 0 < ε,δ < 1 are constants. The data structure occupies O(n+n/t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n/t lg n) lg^d n) words of space and O(t lg^{d+1} n) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space ℝ^{d+1}, for respective d, by replacing one of the dimensions with a tree topology.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Data structures design and analysis
##### Keywords
• data structures
• weighted trees
• path queries
• categorical queries
• coloured queries
• categorical path counting
• categorical path range counting

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Peyman Afshani and Konstantinos Tsakalidis. Optimal Deterministic Shallow Cuttings for 3-d Dominance Ranges. Algorithmica, 80(11):3192-3206, 2018.
2. Pankaj K. Agarwal, Sathish Govindarajan, and S. Muthukrishnan. Range Searching in Categorical Data: Colored Range Searching on Grid. In ESA, pages 17-28, 2002.
3. Nikhil Bansal and Ryan Williams. Regularity Lemmas and Combinatorial Algorithms. Theory Comput., 8(1):69-94, 2012.
4. Timothy M. Chan. Speeding up the Four Russians Algorithm by About One More Logarithmic Factor. In SODA, pages 212-217, 2015.
5. Timothy M. Chan, Meng He, J. Ian Munro, and Gelin Zhou. Succinct Indices for Path Minimum, with Applications. Algorithmica, 78(2):453-491, 2017.
6. Timothy M. Chan, Qizheng He, and Yakov Nekrich. Further Results on Colored Range Searching. In SoCG, volume 164, pages 28:1-28:15, 2020.
7. Timothy M. Chan, Kasper Green Larsen, and Mihai Pǎtraşcu. Orthogonal range searching on the RAM, revisited. In SoCG, pages 1-10, 2011.
8. Timothy M. Chan and Yakov Nekrich. Better Data Structures for Colored Orthogonal Range Reporting. In SODA, pages 627-636, 2020.
9. Moses Charikar, Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. Towards Estimation Error Guarantees for Distinct Values. In PODS, pages 268-279, 2000.
10. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In VLDB, pages 335-345, 2002.
11. Stephane Durocher, Rahul Shah, Matthew Skala, and Sharma V. Thankachan. Top-k color queries on tree paths. In SPIRE, pages 109-115, 2013.
12. Stephane Durocher, Rahul Shah, Matthew Skala, and Sharma V. Thankachan. Linear-Space Data Structures for Range Frequency Queries on Arrays and trees. Algorithmica, 74(1), 2016.
13. Hicham El-Zein, J. Ian Munro, and Yakov Nekrich. Succinct Color Searching in One Dimension. In ISAAC, pages 30:1-30:11, 2017.
14. Michael L. Fredman and Dan E. Willard. Surpassing the Information Theoretic Bound with Fusion Trees. J. Comput. Syst. Sci., 47(3):424-436, 1993.
15. Michael L. Fredman and Dan E. Willard. Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths. J. Comput. Syst. Sci., 48(3):533-551, 1994.
16. Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and related techniques for geometry problems. In STOC, pages 135-143, 1984.
17. Travis Gagie, Meng He, and Gonzalo Navarro. Tree Path Majority Data Structures. In ISAAC, volume 123, pages 68:1-68:12, 2018.
18. Travis Gagie and Juha Kärkkäinen. Counting Colours in Compressed Strings. In CPM, pages 197-207, 2011.
19. Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan. Categorical Range Reporting with Frequencies. In ICDT, pages 9:1-9:19, 2019.
20. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science, 2nd Ed. Addison-Wesley, 1994.
21. Roberto Grossi and Søren Vind. Colored Range Searching in Linear Space. In SWAT, volume 8503, pages 229-240, 2014.
22. Prosenjit Gupta, Ravi Janardan, and Michiel Smid. Further results on generalized intersection searching problems: Counting, reporting, and dynamization. J. Algorithms, 19(2):282-317, 1995.
23. Meng He and Serikzhan Kazi. Path and Ancestor Queries over Trees with Multidimensional Weight Vectors. In ISAAC, volume 149, pages 45:1-45:17, 2019.
24. Meng He, J. Ian Munro, and Srinivasa Rao Satti. Succinct ordinal trees based on tree covering. ACM Trans. Algorithms, 8(4):42:1-42:32, 2012.
25. Meng He, J. Ian Munro, and Gelin Zhou. A Framework for Succinct Labeled Ordinal Trees over Large Alphabets. Algorithmica, 70(4):696-717, 2014.
26. Meng He, J. Ian Munro, and Gelin Zhou. Data Structures for Path Queries. ACM Trans. Algorithms, 12(4):53:1-53:32, 2016.
27. David A. Hutchinson, Anil Maheshwari, and Norbert Zeh. An external memory data structure for shortest path queries. Discret. Appl. Math., 126(1):55-82, 2003.
28. Joseph JáJá, Christian Worm Mortensen, and Qingmin Shi. Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting. In ISAAC, pages 558-568, 2004.
29. Haim Kaplan, Natan Rubin, Micha Sharir, and Elad Verbin. Efficient Colored Orthogonal Range Counting. SIAM J. Comput., 38(3):982-1011, 2008.
30. Ying Kit Lai, Chung Keung Poon, and Benyun Shi. Approximate colored range and point enclosure queries. J. Discrete Algorithms, 6(3):420-432, 2008.
31. Kasper Green Larsen and Freek van Walderveen. Near-Optimal Range Reporting Structures for Categorical Data. In SODA, pages 265-276, 2013.
32. S. Muthukrishnan. Efficient algorithms for document retrieval problems. In SODA, pages 657-666, 2002.
33. Yakov Nekrich. Efficient range searching for categorical and plain data. ACM Trans. Database Syst., 39(1):9:1-9:21, 2014.
34. Manish Patil, Rahul Shah, and Sharma V. Thankachan. Succinct representations of weighted trees supporting path queries. J. Discrete Algorithms, 17:103-108, 2012.
35. Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In STOC, pages 887-898, 2012.
36. Huacheng Yu. An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput., 261:240-247, 2018.
X

Feedback for Dagstuhl Publishing