Document

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1,2,…,σ}. We preprocess the tree T in order to support {categorical path counting queries}, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O(√n lg((lg σ)/(lg w))), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n+ n²/t²)-word, O(t lg ((lg σ)/(lg w))) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O((lg n)/(lg lg n)) time.
Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n +(n/t)⁴)-word data structure with O(t lg lg n) query time, or an O(n+(n/t)⁴)-word} data structure with O(t lg^ε n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n^{3/4} lg^ε n) query time. We then extend the approach to the trees weighted with vectors from [n]^{d}, where d is a constant integer greater than or equal to 2. We present a data structure with O(n lg^{d-1+ε} n + (n/t)^{2d+2}) words of space and O(t (lg^{d-1} n)/((lg lg n)^{d-2})) query time. For an O(n⋅polylog n)-space solution, one thus has O(n^{{2d+1}/{2d+2}}⋅polylog n) query time.
The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on {sketching}. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1±ε)-approximation (i.e. within 1±ε of the true answer) of the number of distinct categories on the given path, with probability 1-δ, where 0 < ε,δ < 1 are constants. The data structure occupies O(n+n/t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n/t lg n) lg^d n) words of space and O(t lg^{d+1} n) query time.
All these problems generalize the corresponding categorical range counting problems in Euclidean space ℝ^{d+1}, for respective d, by replacing one of the dimensions with a tree topology.

Meng He and Serikzhan Kazi. Data Structures for Categorical Path Counting Queries. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{he_et_al:LIPIcs.CPM.2021.15, author = {He, Meng and Kazi, Serikzhan}, title = {{Data Structures for Categorical Path Counting Queries}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {15:1--15:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.15}, URN = {urn:nbn:de:0030-drops-139662}, doi = {10.4230/LIPIcs.CPM.2021.15}, annote = {Keywords: data structures, weighted trees, path queries, categorical queries, coloured queries, categorical path counting, categorical path range counting} }

Document

**Published in:** LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)

We perform experimental studies on data structures that answer path median, path counting, and path reporting queries in weighted trees. These query problems generalize the well-known range median query problem in arrays, as well as the 2d orthogonal range counting and reporting problems in planar point sets, to tree structured data. We propose practical realizations of the latest theoretical results on path queries. Our data structures, which use tree extraction, heavy-path decomposition and wavelet trees, are implemented in both succinct and pointer-based form. Our succinct data structures are further specialized to be plain or entropy-compressed. Through experiments on large sets, we show that succinct data structures for path queries may present a viable alternative to standard pointer-based realizations, in practical scenarios. Compared to naïve approaches that compute the answer by explicit traversal of the query path, our succinct data structures are several times faster in path median queries and perform comparably in path counting and path reporting queries, while being several times more space-efficient. Plain pointer-based realizations of our data structures, requiring a few times more space than the naïve ones, yield up to 100-times speed-up over them.

Meng He and Serikzhan Kazi. Path Query Data Structures in Practice. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 27:1-27:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{he_et_al:LIPIcs.SEA.2020.27, author = {He, Meng and Kazi, Serikzhan}, title = {{Path Query Data Structures in Practice}}, booktitle = {18th International Symposium on Experimental Algorithms (SEA 2020)}, pages = {27:1--27:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-148-1}, ISSN = {1868-8969}, year = {2020}, volume = {160}, editor = {Faro, Simone and Cantone, Domenico}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.27}, URN = {urn:nbn:de:0030-drops-121012}, doi = {10.4230/LIPIcs.SEA.2020.27}, annote = {Keywords: path query, path median, path counting, path reporting, weighted tree} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

We consider an ordinal tree T on n nodes, with each node assigned a d-dimensional weight vector w in {1,2,...,n}^d, where d in N is a constant. We study path queries as generalizations of well-known {orthogonal range queries}, with one of the dimensions being tree topology rather than a linear order. Since in our definitions d only represents the number of dimensions of the weight vector without taking the tree topology into account, a path query in a tree with d-dimensional weight vectors generalize the corresponding (d+1)-dimensional orthogonal range query. We solve {ancestor dominance reporting} problem as a direct generalization of dominance reporting problem, in time O(lg^{d-1}{n}+k) and space of O(n lg^{d-2}n) words, where k is the size of the output, for d >= 2. We also achieve a tradeoff of O(n lg^{d-2+epsilon}{n}) words of space, with query time of O((lg^{d-1} n)/(lg lg n)^{d-2}+k), for the same problem, when d >= 3. We solve {path successor problem} in O(n lg^{d-1}{n}) words of space and time O(lg^{d-1+epsilon}{n}) for d >= 1 and an arbitrary constant epsilon > 0. We propose a solution to {path counting problem}, with O(n(lg{n}/lg lg{n})^{d-1}) words of space and O((lg{n}/lg lg{n})^{d}) query time, for d >= 1. Finally, we solve {path reporting problem} in O(n lg^{d-1+epsilon}{n}) words of space and O((lg^{d-1}{n})/(lg lg{n})^{d-2}+k) query time, for d >= 2. These results match or nearly match the best tradeoffs of the respective range queries. We are also the first to solve path successor even for d = 1.

Meng He and Serikzhan Kazi. Path and Ancestor Queries over Trees with Multidimensional Weight Vectors. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 45:1-45:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{he_et_al:LIPIcs.ISAAC.2019.45, author = {He, Meng and Kazi, Serikzhan}, title = {{Path and Ancestor Queries over Trees with Multidimensional Weight Vectors}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {45:1--45:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.45}, URN = {urn:nbn:de:0030-drops-115415}, doi = {10.4230/LIPIcs.ISAAC.2019.45}, annote = {Keywords: path queries, range queries, algorithms, data structures, theory} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail