Data Structures for Categorical Path Counting Queries

Authors Meng He, Serikzhan Kazi



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.15.pdf
  • Filesize: 0.81 MB
  • 17 pages

Document Identifiers

Author Details

Meng He
  • Faculty of Computer Science, Dalhousie University, Halifax, Canada
Serikzhan Kazi
  • Faculty of Computer Science, Dalhousie University, Halifax, Canada

Cite AsGet BibTex

Meng He and Serikzhan Kazi. Data Structures for Categorical Path Counting Queries. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.15

Abstract

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1,2,…,σ}. We preprocess the tree T in order to support {categorical path counting queries}, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O(√n lg((lg σ)/(lg w))), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n+ n²/t²)-word, O(t lg ((lg σ)/(lg w))) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O((lg n)/(lg lg n)) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n +(n/t)⁴)-word data structure with O(t lg lg n) query time, or an O(n+(n/t)⁴)-word} data structure with O(t lg^ε n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n^{3/4} lg^ε n) query time. We then extend the approach to the trees weighted with vectors from [n]^{d}, where d is a constant integer greater than or equal to 2. We present a data structure with O(n lg^{d-1+ε} n + (n/t)^{2d+2}) words of space and O(t (lg^{d-1} n)/((lg lg n)^{d-2})) query time. For an O(n⋅polylog n)-space solution, one thus has O(n^{{2d+1}/{2d+2}}⋅polylog n) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on {sketching}. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1±ε)-approximation (i.e. within 1±ε of the true answer) of the number of distinct categories on the given path, with probability 1-δ, where 0 < ε,δ < 1 are constants. The data structure occupies O(n+n/t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n/t lg n) lg^d n) words of space and O(t lg^{d+1} n) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space ℝ^{d+1}, for respective d, by replacing one of the dimensions with a tree topology.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
Keywords
  • data structures
  • weighted trees
  • path queries
  • categorical queries
  • coloured queries
  • categorical path counting
  • categorical path range counting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Peyman Afshani and Konstantinos Tsakalidis. Optimal Deterministic Shallow Cuttings for 3-d Dominance Ranges. Algorithmica, 80(11):3192-3206, 2018. Google Scholar
  2. Pankaj K. Agarwal, Sathish Govindarajan, and S. Muthukrishnan. Range Searching in Categorical Data: Colored Range Searching on Grid. In ESA, pages 17-28, 2002. Google Scholar
  3. Nikhil Bansal and Ryan Williams. Regularity Lemmas and Combinatorial Algorithms. Theory Comput., 8(1):69-94, 2012. Google Scholar
  4. Timothy M. Chan. Speeding up the Four Russians Algorithm by About One More Logarithmic Factor. In SODA, pages 212-217, 2015. Google Scholar
  5. Timothy M. Chan, Meng He, J. Ian Munro, and Gelin Zhou. Succinct Indices for Path Minimum, with Applications. Algorithmica, 78(2):453-491, 2017. Google Scholar
  6. Timothy M. Chan, Qizheng He, and Yakov Nekrich. Further Results on Colored Range Searching. In SoCG, volume 164, pages 28:1-28:15, 2020. Google Scholar
  7. Timothy M. Chan, Kasper Green Larsen, and Mihai Pǎtraşcu. Orthogonal range searching on the RAM, revisited. In SoCG, pages 1-10, 2011. Google Scholar
  8. Timothy M. Chan and Yakov Nekrich. Better Data Structures for Colored Orthogonal Range Reporting. In SODA, pages 627-636, 2020. Google Scholar
  9. Moses Charikar, Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. Towards Estimation Error Guarantees for Distinct Values. In PODS, pages 268-279, 2000. Google Scholar
  10. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In VLDB, pages 335-345, 2002. Google Scholar
  11. Stephane Durocher, Rahul Shah, Matthew Skala, and Sharma V. Thankachan. Top-k color queries on tree paths. In SPIRE, pages 109-115, 2013. Google Scholar
  12. Stephane Durocher, Rahul Shah, Matthew Skala, and Sharma V. Thankachan. Linear-Space Data Structures for Range Frequency Queries on Arrays and trees. Algorithmica, 74(1), 2016. Google Scholar
  13. Hicham El-Zein, J. Ian Munro, and Yakov Nekrich. Succinct Color Searching in One Dimension. In ISAAC, pages 30:1-30:11, 2017. Google Scholar
  14. Michael L. Fredman and Dan E. Willard. Surpassing the Information Theoretic Bound with Fusion Trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. Google Scholar
  15. Michael L. Fredman and Dan E. Willard. Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths. J. Comput. Syst. Sci., 48(3):533-551, 1994. Google Scholar
  16. Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and related techniques for geometry problems. In STOC, pages 135-143, 1984. Google Scholar
  17. Travis Gagie, Meng He, and Gonzalo Navarro. Tree Path Majority Data Structures. In ISAAC, volume 123, pages 68:1-68:12, 2018. Google Scholar
  18. Travis Gagie and Juha Kärkkäinen. Counting Colours in Compressed Strings. In CPM, pages 197-207, 2011. Google Scholar
  19. Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan. Categorical Range Reporting with Frequencies. In ICDT, pages 9:1-9:19, 2019. Google Scholar
  20. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science, 2nd Ed. Addison-Wesley, 1994. Google Scholar
  21. Roberto Grossi and Søren Vind. Colored Range Searching in Linear Space. In SWAT, volume 8503, pages 229-240, 2014. Google Scholar
  22. Prosenjit Gupta, Ravi Janardan, and Michiel Smid. Further results on generalized intersection searching problems: Counting, reporting, and dynamization. J. Algorithms, 19(2):282-317, 1995. Google Scholar
  23. Meng He and Serikzhan Kazi. Path and Ancestor Queries over Trees with Multidimensional Weight Vectors. In ISAAC, volume 149, pages 45:1-45:17, 2019. Google Scholar
  24. Meng He, J. Ian Munro, and Srinivasa Rao Satti. Succinct ordinal trees based on tree covering. ACM Trans. Algorithms, 8(4):42:1-42:32, 2012. Google Scholar
  25. Meng He, J. Ian Munro, and Gelin Zhou. A Framework for Succinct Labeled Ordinal Trees over Large Alphabets. Algorithmica, 70(4):696-717, 2014. Google Scholar
  26. Meng He, J. Ian Munro, and Gelin Zhou. Data Structures for Path Queries. ACM Trans. Algorithms, 12(4):53:1-53:32, 2016. Google Scholar
  27. David A. Hutchinson, Anil Maheshwari, and Norbert Zeh. An external memory data structure for shortest path queries. Discret. Appl. Math., 126(1):55-82, 2003. Google Scholar
  28. Joseph JáJá, Christian Worm Mortensen, and Qingmin Shi. Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting. In ISAAC, pages 558-568, 2004. Google Scholar
  29. Haim Kaplan, Natan Rubin, Micha Sharir, and Elad Verbin. Efficient Colored Orthogonal Range Counting. SIAM J. Comput., 38(3):982-1011, 2008. Google Scholar
  30. Ying Kit Lai, Chung Keung Poon, and Benyun Shi. Approximate colored range and point enclosure queries. J. Discrete Algorithms, 6(3):420-432, 2008. Google Scholar
  31. Kasper Green Larsen and Freek van Walderveen. Near-Optimal Range Reporting Structures for Categorical Data. In SODA, pages 265-276, 2013. Google Scholar
  32. S. Muthukrishnan. Efficient algorithms for document retrieval problems. In SODA, pages 657-666, 2002. Google Scholar
  33. Yakov Nekrich. Efficient range searching for categorical and plain data. ACM Trans. Database Syst., 39(1):9:1-9:21, 2014. Google Scholar
  34. Manish Patil, Rahul Shah, and Sharma V. Thankachan. Succinct representations of weighted trees supporting path queries. J. Discrete Algorithms, 17:103-108, 2012. Google Scholar
  35. Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In STOC, pages 887-898, 2012. Google Scholar
  36. Huacheng Yu. An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput., 261:240-247, 2018. Google Scholar