{"@context":"https:\/\/schema.org\/","@type":"ScholarlyArticle","@id":"#article14943","name":"Data Structures for Categorical Path Counting Queries","abstract":"Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [\u03c3] = {1,2,\u2026,\u03c3}. We preprocess the tree T in order to support {categorical path counting queries}, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O(\u221an lg((lg \u03c3)\/(lg w))), where w = \u03a9(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 \u2264 t \u2264 n, we propose an O(n+ n\u00b2\/t\u00b2)-word, O(t lg ((lg \u03c3)\/(lg w))) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O((lg n)\/(lg lg n)) time.\r\nNext, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n +(n\/t)\u2074)-word data structure with O(t lg lg n) query time, or an O(n+(n\/t)\u2074)-word} data structure with O(t lg^\u03b5 n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n^{3\/4} lg^\u03b5 n) query time. We then extend the approach to the trees weighted with vectors from [n]^{d}, where d is a constant integer greater than or equal to 2. We present a data structure with O(n lg^{d-1+\u03b5} n + (n\/t)^{2d+2}) words of space and O(t (lg^{d-1} n)\/((lg lg n)^{d-2})) query time. For an O(n\u22c5polylog n)-space solution, one thus has O(n^{{2d+1}\/{2d+2}}\u22c5polylog n) query time.\r\nThe inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on {sketching}. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1\u00b1\u03b5)-approximation (i.e. within 1\u00b1\u03b5 of the true answer) of the number of distinct categories on the given path, with probability 1-\u03b4, where 0 < \u03b5,\u03b4 < 1 are constants. The data structure occupies O(n+n\/t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d \u2265 1), we propose a data structure with O((n + n\/t lg n) lg^d n) words of space and O(t lg^{d+1} n) query time.\r\nAll these problems generalize the corresponding categorical range counting problems in Euclidean space \u211d^{d+1}, for respective d, by replacing one of the dimensions with a tree topology.","keywords":["data structures","weighted trees","path queries","categorical queries","coloured queries","categorical path counting","categorical path range counting"],"author":[{"@type":"Person","name":"He, Meng","givenName":"Meng","familyName":"He","email":"mailto:mhe@cs.dal.ca","affiliation":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"},{"@type":"Person","name":"Kazi, Serikzhan","givenName":"Serikzhan","familyName":"Kazi","email":"mailto:skazi@dal.ca","affiliation":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"}],"position":15,"pageStart":"15:1","pageEnd":"15:17","dateCreated":"2021-06-30","datePublished":"2021-06-30","isAccessibleForFree":true,"license":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode","copyrightHolder":[{"@type":"Person","name":"He, Meng","givenName":"Meng","familyName":"He","email":"mailto:mhe@cs.dal.ca","affiliation":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"},{"@type":"Person","name":"Kazi, Serikzhan","givenName":"Serikzhan","familyName":"Kazi","email":"mailto:skazi@dal.ca","affiliation":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"}],"copyrightYear":"2021","accessMode":"textual","accessModeSufficient":"textual","creativeWorkStatus":"Published","inLanguage":"en-US","sameAs":"https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2021.15","publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","isPartOf":{"@type":"PublicationVolume","@id":"#volume6394","volumeNumber":191,"name":"32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)","dateCreated":"2021-06-30","datePublished":"2021-06-30","editor":[{"@type":"Person","name":"Gawrychowski, Pawe\u0142","givenName":"Pawe\u0142","familyName":"Gawrychowski","email":"mailto:gawry@cs.uni.wroc.pl","sameAs":"https:\/\/orcid.org\/0000-0002-6993-5440","affiliation":"University of Wroc\u0142aw, Poland"},{"@type":"Person","name":"Starikovskaya, Tatiana","givenName":"Tatiana","familyName":"Starikovskaya","email":"mailto:tat.starikovskaya@gmail.com","affiliation":"\u00c9cole normale sup\u00e9rieure, France"}],"isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#article14943","isPartOf":{"@type":"Periodical","@id":"#series116","name":"Leibniz International Proceedings in Informatics","issn":"1868-8969","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#volume6394"}}}