Document

**Published in:** LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Maximal Common Subsequences (MCSs) between two strings X and Y are subsequences of both X and Y that are maximal under inclusion. MCSs relax and generalize the well known and widely used concept of Longest Common Subsequences (LCSs), which can be seen as MCSs of maximum length. While the number both LCSs and MCSs can be exponential in the length of the strings, LCSs have been long exploited for string and text analysis, as simple compact representations of all LCSs between two strings, built via dynamic programming or automata, have been known since the '70s. MCSs appear to have a more challenging structure: even listing them efficiently was an open problem open until recently, thus narrowing the complexity difference between the two problems, but the gap remained significant. In this paper we close the complexity gap: we show how to build DAG of polynomial size - in polynomial time - which allows for efficient operations on the set of all MCSs such as enumeration in Constant Amortized Time per solution (CAT), counting, and random access to the i-th element (i.e., rank and select operations). Other than improving known algorithmic results, this work paves the way for new sequence analysis methods based on MCSs.

Alessio Conte, Roberto Grossi, Giulia Punzi, and Takeaki Uno. A Compact DAG for Storing and Searching Maximal Common Subsequences. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.ISAAC.2023.21, author = {Conte, Alessio and Grossi, Roberto and Punzi, Giulia and Uno, Takeaki}, title = {{A Compact DAG for Storing and Searching Maximal Common Subsequences}}, booktitle = {34th International Symposium on Algorithms and Computation (ISAAC 2023)}, pages = {21:1--21:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-289-1}, ISSN = {1868-8969}, year = {2023}, volume = {283}, editor = {Iwata, Satoru and Kakimura, Naonori}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.21}, URN = {urn:nbn:de:0030-drops-193231}, doi = {10.4230/LIPIcs.ISAAC.2023.21}, annote = {Keywords: Maximal common subsequence, DAG, Compact data structures, Enumeration, Constant amortized time, Random access} }

Document

**Published in:** LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories.
In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny.
The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny.

Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{guerrini_et_al:LIPIcs.WABI.2022.23, author = {Guerrini, Veronica and Conte, Alessio and Grossi, Roberto and Liti, Gianni and Rosone, Giovanna and Tattini, Lorenzo}, title = {{phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering}}, booktitle = {22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)}, pages = {23:1--23:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-243-3}, ISSN = {1868-8969}, year = {2022}, volume = {242}, editor = {Boucher, Christina and Rahmann, Sven}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.23}, URN = {urn:nbn:de:0030-drops-170577}, doi = {10.4230/LIPIcs.WABI.2022.23}, annote = {Keywords: Phylogeny, partition tree, BWT, positional cluster, alignment-free, reference-free, assembly-free} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Let Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest 𝒮_k-Equivalent Strings: Given a set 𝒮_k of n length-k strings and an integer z > 0, list z shortest distinct strings T₁,…,T_z such that Substr_k(T_i) = 𝒮_k, for all i ∈ [1,z]. The z-Shortest 𝒮_k-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1-Shortest 𝒮_k-Equivalent Strings, referred to as Shortest 𝒮_k-Equivalent String, asks for a shortest string X such that Substr_k(X) = 𝒮_k.
Our main contributions are summarized below:
- Given a directed graph G(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in 𝒪̃(|E||V|) time using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest 𝒮_k-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP.
- We show that the length of a shortest string output by Shortest 𝒮_k-Equivalent String is in 𝒪(k+n²). We generalize this bound by showing that the total length of z shortest strings is in 𝒪(zk+zn²+z²n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs.
- We present an algorithm for solving z-Shortest 𝒮_k-Equivalent Strings in 𝒪(nk+n²log²n+zn²log n+|output|) time. If z = 1, the time becomes 𝒪(nk+n²log²n) by the fact that the size of the input is Θ(nk) and the size of the output is 𝒪(k+n²).

Giulia Bernardini, Alessio Conte, Esteban Gabory, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Giulia Punzi, and Michelle Sweering. On Strings Having the Same Length- k Substrings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2022.16, author = {Bernardini, Giulia and Conte, Alessio and Gabory, Esteban and Grossi, Roberto and Loukides, Grigorios and Pissis, Solon P. and Punzi, Giulia and Sweering, Michelle}, title = {{On Strings Having the Same Length- k Substrings}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {16:1--16:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.16}, URN = {urn:nbn:de:0030-drops-161439}, doi = {10.4230/LIPIcs.CPM.2022.16}, annote = {Keywords: string algorithms, combinatorics on words, de Bruijn graph, Chinese Postman} }

Document

**Published in:** LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)

The analysis of similar motions in a network provides useful information for different applications like route recommendation. We are interested in algorithms to efficiently retrieve trajectories that are similar to a given query trajectory. For this task many studies have focused on extracting the geometrical information of trajectories. In this paper we investigate the properties of trajectories moving along the paths of a network. We provide a similarity function by making use of both the temporal aspect of trajectories and the structure of the underlying network. We propose an approximation technique that offers the top-k similar trajectories with respect to a query trajectory in an efficient way with acceptable precision. We investigate our method over real-world networks, and our experimental results show the effectiveness of the proposed method.

Roberto Grossi, Andrea Marino, and Shima Moghtasedi. Finding Structurally and Temporally Similar Trajectories in Graphs. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 24:1-24:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.SEA.2020.24, author = {Grossi, Roberto and Marino, Andrea and Moghtasedi, Shima}, title = {{Finding Structurally and Temporally Similar Trajectories in Graphs}}, booktitle = {18th International Symposium on Experimental Algorithms (SEA 2020)}, pages = {24:1--24:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-148-1}, ISSN = {1868-8969}, year = {2020}, volume = {160}, editor = {Faro, Simone and Cantone, Domenico}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.24}, URN = {urn:nbn:de:0030-drops-120989}, doi = {10.4230/LIPIcs.SEA.2020.24}, annote = {Keywords: Graph trajectory, approximated similarity, top-k similarity query} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

A k-anticover of a string x is a set of pairwise distinct factors of x of equal length k, such that every symbol of x is contained into an occurrence of at least one of those factors. The existence of a k-anticover can be seen as a notion of non-redundancy, which has application in computational biology, where they are associated with various non-regulatory mechanisms. In this paper we address the complexity of the problem of finding a k-anticover of a string x if it exists, showing that the decision problem is NP-complete on general strings for k ≥ 3. We also show that the problem admits a polynomial-time solution for k=2. For unbounded k, we provide an exact exponential algorithm to find a k-anticover of a string of length n (or determine that none exists), which runs in O*(min {3^{(n-k)/3)}, ((k(k+1))/2)^{n/(k+1)) time using polynomial space.

Mai Alzamel, Alessio Conte, Shuhei Denzumi, Roberto Grossi, Costas S. Iliopoulos, Kazuhiro Kurita, and Kunihiro Wasa. Finding the Anticover of a String. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 2:1-2:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2020.2, author = {Alzamel, Mai and Conte, Alessio and Denzumi, Shuhei and Grossi, Roberto and Iliopoulos, Costas S. and Kurita, Kazuhiro and Wasa, Kunihiro}, title = {{Finding the Anticover of a String}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {2:1--2:11}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.2}, URN = {urn:nbn:de:0030-drops-121270}, doi = {10.4230/LIPIcs.CPM.2020.2}, annote = {Keywords: Anticover, String algorithms, Stringology, NP-complete} }

Document

**Published in:** LIPIcs, Volume 138, 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)

This paper investigates induced Steiner subgraphs as a variant of the classical Steiner trees, so as to compactly represent the (exponentially many) Steiner trees sharing the same underlying induced subgraph. We prove that the enumeration of all (inclusion-minimal) induced Steiner subgraphs is harder than the well-known Hypergraph Transversal enumeration problem if the number of terminals is not fixed. When the number of terminals is fixed, we propose a polynomial delay algorithm for listing all induced Steiner subgraphs of minimum size. We also propose a polynomial delay algorithm for listing the set of minimal induced Steiner subgraphs when the number of terminals is 3.

Alessio Conte, Roberto Grossi, Mamadou Moustapha Kanté, Andrea Marino, Takeaki Uno, and Kunihiro Wasa. Listing Induced Steiner Subgraphs as a Compact Way to Discover Steiner Trees in Graphs. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 73:1-73:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.MFCS.2019.73, author = {Conte, Alessio and Grossi, Roberto and Kant\'{e}, Mamadou Moustapha and Marino, Andrea and Uno, Takeaki and Wasa, Kunihiro}, title = {{Listing Induced Steiner Subgraphs as a Compact Way to Discover Steiner Trees in Graphs}}, booktitle = {44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)}, pages = {73:1--73:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-117-7}, ISSN = {1868-8969}, year = {2019}, volume = {138}, editor = {Rossmanith, Peter and Heggernes, Pinar and Katoen, Joost-Pieter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2019.73}, URN = {urn:nbn:de:0030-drops-110174}, doi = {10.4230/LIPIcs.MFCS.2019.73}, annote = {Keywords: Graph algorithms, enumeration, listing and counting, Steiner trees, induced subgraphs} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Exact string matching in labeled graphs is the problem of searching paths of a graph G=(V,E) such that the concatenation of their node labels is equal to the given pattern string P[1..m]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks.
We prove a conditional lower bound stating that, for any constant epsilon>0, an O(|E|^{1 - epsilon} m)-time, or an O(|E| m^{1 - epsilon})-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. This holds even if restricted to undirected graphs with maximum node degree two, i.e. to zig-zag matching in bidirectional strings, or to deterministic directed acyclic graphs whose nodes have maximum sum of indegree and outdegree three. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS'16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear-time solvable problems.
An interesting corollary is that exact and approximate matching are equally hard (quadratic time) in graphs under SETH. In comparison, the same problems restricted to strings have linear-time vs quadratic-time solutions, respectively (approximate pattern matching having also a matching SETH lower bound (Backurs and Indyk, STOC'15)).

Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the Complexity of String Matching for Graphs. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 55:1-55:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{equi_et_al:LIPIcs.ICALP.2019.55, author = {Equi, Massimo and Grossi, Roberto and M\"{a}kinen, Veli and Tomescu, Alexandru I.}, title = {{On the Complexity of String Matching for Graphs}}, booktitle = {46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)}, pages = {55:1--55:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-109-2}, ISSN = {1868-8969}, year = {2019}, volume = {132}, editor = {Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.55}, URN = {urn:nbn:de:0030-drops-106314}, doi = {10.4230/LIPIcs.ICALP.2019.55}, annote = {Keywords: exact pattern matching, graph query, graph search, labeled graphs, string matching, string search, strong exponential time hypothesis, heterogeneous networks, variation graphs} }

Document

**Published in:** LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)

We investigate a decomposition technique for listing problems in graphs and set systems. It is based on the Cartesian product of some iterators, which list the solutions of simpler problems. Our ideas applies to several problems, and we illustrate one of them in depth, namely, listing all minimum spanning trees of a weighted graph G. Here iterators over the spanning trees for unweighted graphs can be obtained by a suitable modification of the listing algorithm by [Shioura et al., SICOMP 1997], and the decomposition of G is obtained by suitably partitioning its edges according to their weights. By combining these iterators in a Cartesian product scheme that employs Gray coding, we give the first algorithm which lists all minimum spanning trees of G in constant delay, where the delay is the time elapsed between any two consecutive outputs. Our solution requires polynomial preprocessing time and uses polynomial space.

Alessio Conte, Roberto Grossi, Andrea Marino, Romeo Rizzi, and Luca Versari. Listing Subgraphs by Cartesian Decomposition. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 84:1-84:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.MFCS.2018.84, author = {Conte, Alessio and Grossi, Roberto and Marino, Andrea and Rizzi, Romeo and Versari, Luca}, title = {{Listing Subgraphs by Cartesian Decomposition}}, booktitle = {43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)}, pages = {84:1--84:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-086-6}, ISSN = {1868-8969}, year = {2018}, volume = {117}, editor = {Potapov, Igor and Spirakis, Paul and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.84}, URN = {urn:nbn:de:0030-drops-96666}, doi = {10.4230/LIPIcs.MFCS.2018.84}, annote = {Keywords: Graph algorithms, listing, minimum spanning trees, constant delay} }

Document

**Published in:** LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

This paper proposes round-hashing, which is suitable for data storage on distributed servers and for implementing external-memory tables in which each lookup retrieves at most one single block of external memory, using a stash. For data storage, round-hashing is like consistent hashing as it avoids a full rehashing of the keys when new servers are added. Experiments show that the speed to serve requests is tenfold or more than the state of the art. In distributed data storage, this guarantees better throughput for serving requests and, moreover, greatly reduces decision times for which data should move to new servers as rescanning data is much faster.

Roberto Grossi and Luca Versari. Round-Hashing for Data Storage: Distributed Servers and External-Memory Tables. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 43:1-43:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.ESA.2018.43, author = {Grossi, Roberto and Versari, Luca}, title = {{Round-Hashing for Data Storage: Distributed Servers and External-Memory Tables}}, booktitle = {26th Annual European Symposium on Algorithms (ESA 2018)}, pages = {43:1--43:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-081-1}, ISSN = {1868-8969}, year = {2018}, volume = {112}, editor = {Azar, Yossi and Bast, Hannah and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.43}, URN = {urn:nbn:de:0030-drops-95062}, doi = {10.4230/LIPIcs.ESA.2018.43}, annote = {Keywords: consistent hashing, external memory, hash tables} }

Document

**Published in:** LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

A generalised degenerate string (GD string) S^ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet, is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. A similar result can be obtained by employing an automata-based approach but its cost is alphabet-dependent. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S^. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S^. Finally, proof-of-concept experimental results are presented using real protein datasets.

Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Degenerate String Comparison and Applications. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.WABI.2018.21, author = {Alzamel, Mai and Ayad, Lorraine A. K. and Bernardini, Giulia and Grossi, Roberto and Iliopoulos, Costas S. and Pisanti, Nadia and Pissis, Solon P. and Rosone, Giovanna}, title = {{Degenerate String Comparison and Applications}}, booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)}, pages = {21:1--21:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-082-8}, ISSN = {1868-8969}, year = {2018}, volume = {113}, editor = {Parida, Laxmi and Ukkonen, Esko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.21}, URN = {urn:nbn:de:0030-drops-93236}, doi = {10.4230/LIPIcs.WABI.2018.21}, annote = {Keywords: degenerate strings, generalised degenerate strings, elastic-degenerate strings, string comparison, palindromes} }

Document

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts' representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms.

Roberto Grossi, Costas S. Iliopoulos, Chang Liu, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, and Luca Versari. On-Line Pattern Matching on Similar Texts. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.CPM.2017.9, author = {Grossi, Roberto and Iliopoulos, Costas S. and Liu, Chang and Pisanti, Nadia and Pissis, Solon P. and Retha, Ahmad and Rosone, Giovanna and Vayani, Fatima and Versari, Luca}, title = {{On-Line Pattern Matching on Similar Texts}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {9:1--9:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.9}, URN = {urn:nbn:de:0030-drops-73379}, doi = {10.4230/LIPIcs.CPM.2017.9}, annote = {Keywords: string algorithms, pattern matching, degenerate strings, elastic-degenerate strings, on-line algorithms} }

Document

**Published in:** LIPIcs, Volume 55, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)

Due to the sheer size of real-world networks, delay and space become quite relevant measures for the cost of enumeration in network analytics. This paper presents efficient algorithms for listing maximum cliques in networks, providing the first sublinear-space bounds with guaranteed delay per enumerated clique, thus comparing favorably with the known literature.

Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 148:1-148:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.ICALP.2016.148, author = {Conte, Alessio and Grossi, Roberto and Marino, Andrea and Versari, Luca}, title = {{Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques}}, booktitle = {43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)}, pages = {148:1--148:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-013-2}, ISSN = {1868-8969}, year = {2016}, volume = {55}, editor = {Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.148}, URN = {urn:nbn:de:0030-drops-62927}, doi = {10.4230/LIPIcs.ICALP.2016.148}, annote = {Keywords: Enumeration algorithms, maximal cliques, network mining and analytics, reverse search, space efficiency} }

Document

Complete Volume

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

LIPIcs, Volume 54, CPM'16, Complete Volume

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@Proceedings{grossi_et_al:LIPIcs.CPM.2016, title = {{LIPIcs, Volume 54, CPM'16, Complete Volume}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016}, URN = {urn:nbn:de:0030-drops-60935}, doi = {10.4230/LIPIcs.CPM.2016}, annote = {Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation Discrete Mathematics, Information Systems} }

Document

Front Matter

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Front Matter, Table of Contents, Preface, List of Authors

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.CPM.2016.0, author = {Grossi, Roberto and Lewenstein, Moshe}, title = {{Front Matter, Table of Contents, Preface}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, pages = {0:i--0:x}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.0}, URN = {urn:nbn:de:0030-drops-60916}, doi = {10.4230/LIPIcs.CPM.2016.0}, annote = {Keywords: Front Matter, Table of Contents, Preface, List of Authors} }

Document

**Published in:** LIPIcs, Volume 29, 34th International Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014)

Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist, and many share the common idea of strictly specifying some parts of the pattern and to don't care about the remaining parts. Since the number of patterns can be exponential in the length of the sequences, pattern extraction focuses on statistically relevant patterns, where any attempt to further refine or extend them causes a loss of significant information (where the number of occurrences changes). Output-sensitive algorithms have been proposed to enumerate and list these patterns, taking polynomial time O(n^c) per pattern for constant c > 1, which is impractical for massive sequences of very large length n.
We address the problem of extracting maximal patterns with at most k don't care symbols and at least q occurrences. Our contribution is to give the first algorithm that attains a stronger notion of output-sensitivity, borrowed from the analysis of data structures: the cost is proportional to the actual number of occurrences of each pattern, which is at most n and practically much smaller than n in real applications, thus avoiding the aforementioned cost of O(n^c) per pattern.

Roberto Grossi, Giulia Menconi, Nadia Pisanti, Roberto Trani, and Soren Vind. Output-Sensitive Pattern Extraction in Sequences. In 34th International Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 29, pp. 303-314, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.FSTTCS.2014.303, author = {Grossi, Roberto and Menconi, Giulia and Pisanti, Nadia and Trani, Roberto and Vind, Soren}, title = {{Output-Sensitive Pattern Extraction in Sequences}}, booktitle = {34th International Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014)}, pages = {303--314}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-77-4}, ISSN = {1868-8969}, year = {2014}, volume = {29}, editor = {Raman, Venkatesh and Suresh, S. P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2014.303}, URN = {urn:nbn:de:0030-drops-48513}, doi = {10.4230/LIPIcs.FSTTCS.2014.303}, annote = {Keywords: Pattern Extraction, Motif Detection, Pattern Discovery, Motif Trie} }

Document

**Published in:** LIPIcs, Volume 13, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011)

In the packed string matching problem, each machine word accomodates
alpha characters, thus an n-character text occupies n/alpha memory
words. We extend the Crochemore-Perrin constant-space O(n)-time
string matching algorithm to run in optimal O(n/alpha) time and even
in real-time, achieving a factor alpha speedup over traditional
algorithms that examine each character individually. Our solution can
be efficiently implemented, unlike prior theoretical packed string
matching work. We adapt the standard RAM model and only use its AC0
instructions (i.e. no multiplication) plus two specialized AC0 packed
string instructions. The main string-matching instruction is
available in commodity processors (i.e. Intel's SSE4.2 and AVX
Advanced String Operations); the other maximal-suffix instruction is
only required during pattern preprocessing. In the absence of these
two specialized instructions, we propose theoretically-efficient
emulation using integer multiplication (not AC0) and table lookup.

Oren Ben-Kiki, Philip Bille, Dany Breslauer, Leszek Gasieniec, Roberto Grossi, and Oren Weimann. Optimal Packed String Matching. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 13, pp. 423-432, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@InProceedings{benkiki_et_al:LIPIcs.FSTTCS.2011.423, author = {Ben-Kiki, Oren and Bille, Philip and Breslauer, Dany and Gasieniec, Leszek and Grossi, Roberto and Weimann, Oren}, title = {{Optimal Packed String Matching}}, booktitle = {IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011)}, pages = {423--432}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-34-7}, ISSN = {1868-8969}, year = {2011}, volume = {13}, editor = {Chakraborty, Supratik and Kumar, Amit}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2011.423}, URN = {urn:nbn:de:0030-drops-33558}, doi = {10.4230/LIPIcs.FSTTCS.2011.423}, annote = {Keywords: String matching, bit parallelism, real time, space efficiency} }

Document

**Published in:** LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

Given string $S[1..N]$ and integer $k$, the {\em suffix selection} problem is to determine the $k$th lexicographically smallest amongst the suffixes $S[i\ldots N]$, $1 \leq i \leq N$. We study the suffix selection problem in the cache-aware model that captures two-level memory inherent in computing systems, for a \emph{cache} of limited size $M$ and block size $B$. The complexity of interest is the number of block transfers. We present an optimal suffix selection algorithm in the cache-aware model, requiring $\Theta\left(N/B\right)$ block transfers, for any string $S$ over an unbounded alphabet (where characters can only be compared), under the common tall-cache assumption (i.e. $M=\Omega\left(B^{1+\epsilon}\right)$, where $\epsilon<1$). Our algorithm beats the bottleneck bound for permuting an input array to the desired output array, which holds for nearly any nontrivial problem in hierarchical memory models.

Gianni Franceschini, Roberto Grossi, and S. Muthukrishnan. Optimal Cache-Aware Suffix Selection. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 457-468, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{franceschini_et_al:LIPIcs.STACS.2009.1845, author = {Franceschini, Gianni and Grossi, Roberto and Muthukrishnan, S.}, title = {{Optimal Cache-Aware Suffix Selection}}, booktitle = {26th International Symposium on Theoretical Aspects of Computer Science}, pages = {457--468}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-09-5}, ISSN = {1868-8969}, year = {2009}, volume = {3}, editor = {Albers, Susanne and Marion, Jean-Yves}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1845}, URN = {urn:nbn:de:0030-drops-18452}, doi = {10.4230/LIPIcs.STACS.2009.1845}, annote = {Keywords: } }

Document

**Published in:** LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

We consider the problem of representing, in a compressed format, a bit-vector~$S$ of $m$ bits with $n$ $\mathbf{1}$s, supporting the following operations, where $b \in \{ \mathbf{0}, \mathbf{1} \}$:
\begin{itemize}
\item $\mathtt{rank}_b(S,i)$ returns the number of occurrences of bit $b$ in the prefix $S\left[1..i\right]$;
\item $\mathtt{select}_b(S,i)$ returns the position of the $i$th occurrence of bit $b$ in $S$.
\end{itemize}
Such a data structure is called \emph{fully indexable dictionary (\textsc{fid})} [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing $S$ as a set $X = \{ x_1, x_2, \ldots, x_n \}$ of $n$ distinct integers drawn from a universe $[m] = \{1, \ldots, m\}$, the predecessor of integer $y \in [m]$ in $X$ is given by $\ensuremath{\mathtt{select}^{}_1}(S, \ensuremath{\mathtt{rank}_1}(S,y-1))$. {\textsc{fid}}s have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types.
Our focus is on space-efficient {\textsc{fid}}s on the \textsc{ram} model with word size $\Theta(\lg m)$ and constant time for all operations, so that the time cost is independent of the input size.
Given the bitstring $S$ to be encoded, having length $m$ and containing $n$ ones, the minimal amount of information that needs to be stored is $B(n,m) = \lceil \log {{m}\choose{n}} \rceil$. The state of the art in building a \textsc{fid}\ for~$S$ is given in~\mbox{}[P\v{a}tra\c{s}cu, 2008] using $B(m,n)+O( m / ( (\log m/ t) ^t) ) + O(m^{3/4}) $ bits, to support the operations in $O(t)$ time.
Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants $0 < \delta \leq 1/2$, $0 < \varepsilon \leq 1$, and integer $s > 0$, it uses
\[ B(n,m) + O\left(n^{1+\delta} + n \left(\frac{m}{n^s}\right)^\varepsilon\right) \]
bits and performs all the operations in time $O(s\delta^{-1} + \varepsilon^{-1})$. The improvement is twofold: our redundancy can be lowered parametrically and, fixing $s = O(1)$, we get a constant-time \textsc{fid}\ whose space is $B(n,m) + O(m^\varepsilon/\mathrm{poly}(n))$ bits, for sufficiently large $m$. This is a significant improvement compared to the previous bounds for the general case.

Roberto Grossi, Alessio Orlandi, Rajeev Raman, and S. Srinivasa Rao. More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 517-528, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.STACS.2009.1847, author = {Grossi, Roberto and Orlandi, Alessio and Raman, Rajeev and Rao, S. Srinivasa}, title = {{More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries}}, booktitle = {26th International Symposium on Theoretical Aspects of Computer Science}, pages = {517--528}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-09-5}, ISSN = {1868-8969}, year = {2009}, volume = {3}, editor = {Albers, Susanne and Marion, Jean-Yves}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1847}, URN = {urn:nbn:de:0030-drops-18470}, doi = {10.4230/LIPIcs.STACS.2009.1847}, annote = {Keywords: } }