4 Search Results for "Giancarlo, Raffaele"


Document
Invited Talk
Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk)

Authors: Sharma V. Thankachan

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
In the past two decades, we have witnessed the design of various compact data structures for pattern matching over an indexed text [Navarro, 2016]. Popular indexes like the FM-index [Paolo Ferragina and Giovanni Manzini, 2005], compressed suffix arrays/trees [Roberto Grossi and Jeffrey Scott Vitter, 2005; Kunihiko Sadakane, 2007], the recent r-index [Travis Gagie et al., 2020; Takaaki Nishimoto and Yasuo Tabei, 2021], etc., capture the key functionalities of classic suffix arrays/trees [Udi Manber and Eugene W. Myers, 1993; Peter Weiner, 1973] in compact space. Mostly, they rely on the Burrows-Wheeler Transform (BWT) and its associated operations [Burrows and Wheeler, 1994]. However, compactly encoding some advanced suffix tree (ST) variants, like parameterized ST [Brenda S. Baker, 1993; S. Rao Kosaraju, 1995; Juan Mendivelso et al., 2020], order-isomorphic/preserving ST [Maxime Crochemore et al., 2016], two-dimensional ST [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998], etc. [Sung Gwan Park et al., 2019; Tetsuo Shibuya, 2000]- collectively known as suffix trees with missing suffix links [Richard Cole and Ramesh Hariharan, 2003], has been challenging. The previous techniques are not easily extendable because these variants do not hold some structural properties of the standard ST that enable compression. However, some limited progress has been made in these directions recently [Arnab Ganguly et al., 2017; Travis Gagie et al., 2017; Gianni Decaroli et al., 2017; Dhrumil Patel and Rahul Shah, 2021; Arnab Ganguly et al., 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Arnab Ganguly et al., 2017; Arnab Ganguly et al., 2022; Arnab Ganguly et al., 2021]. This talk will briefly survey them and highlight some interesting open problems.

Cite as

Sharma V. Thankachan. Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{thankachan:LIPIcs.CPM.2022.3,
  author =	{Thankachan, Sharma V.},
  title =	{{Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc.}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{3:1--3:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.3},
  URN =		{urn:nbn:de:0030-drops-161300},
  doi =		{10.4230/LIPIcs.CPM.2022.3},
  annote =	{Keywords: Text Indexing, Suffix Trees, String Matching}
}
Document
Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space

Authors: Dhrumil Patel and Rahul Shah

Published in: LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)


Abstract
In a 2-dimensional (2D) pattern matching problem, the text is arranged as a matrix 𝖬[1..n, 1..n] and consists of N = n × n symbols drawn from alphabet set Σ of size σ. The query consists of a m × m square matrix 𝖯[1..m, 1..m] drawn from the same alphabet set Σ and the task is to find all the locations in 𝖬 where 𝖯 appears as a (contiguous) submatrix. The patterns can be of any size, but as long as they are square in shape data structures like suffix trees and suffix array exist [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998] for the task of efficient pattern matching. These are essentially 2D counterparts of classic suffix trees and arrays known for traditional 1-dimensional (1D) pattern matching. They work based on linearization of 2D suffixes which would preserve the prefix match property (i.e., every pattern match is a prefix of some suffix). The main limitation of the suffix trees and the suffix arrays (in 1D) was their space utilization of O(N log N) bits, where N is the size of the text. This was suboptimal compared to Nlog σ bits of space, which is information theoretic optimal for the text. With the advent of the field of succinct/compressed data structures, it was possible to develop compressed variants of suffix trees and array based on Burrows-Wheeler Tansform and LF-mapping (or Φ function) [Roberto Grossi and Jeffrey Scott Vitter, 2005; Paolo Ferragina and Giovanni Manzini, 2005; Kunihiko Sadakane, 2007]. These data structures indeed achieve O(N log σ) bits of space or better. This gives rise to the question: analogous to 1D case, can we design a succinct or compressed index for 2D pattern matching? Can there be a 2D compressed suffix tree? Are there analogues of Burrows-Wheeler Transform or LF-mapping? The problem has been acknowledged for over a decade now and there have been a few attempts at applying Φ function [Ankur Gupta, 2004] and achieving entropy based compression [Veli Mäkinen and Gonzalo Navarro, 2008]. However, achieving the complexity breakthrough akin to 1D case has yet to be found. In this paper, we still do not know how to answer suffix array queries in O(N log σ) bits of space - which would have led to efficient pattern matching. However, for the first time, we show an interesting result that it is indeed possible to compute inverse suffix array (ISA) queries in near compact space in O(polylog n) time. Our 2D succinct text index design is based on two 1D compressed suffix trees and it takes O(N log log N + N logσ) bits of space which is much smaller than its naive design that takes O(N log N) bits. Although the main problem is still evasive, this index gives a hope on the existence of a full 2D succinct index with all functionalities similar to that of 1D case.

Cite as

Dhrumil Patel and Rahul Shah. Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 60:1-60:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{patel_et_al:LIPIcs.ISAAC.2021.60,
  author =	{Patel, Dhrumil and Shah, Rahul},
  title =	{{Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space}},
  booktitle =	{32nd International Symposium on Algorithms and Computation (ISAAC 2021)},
  pages =	{60:1--60:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-214-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{212},
  editor =	{Ahn, Hee-Kap and Sadakane, Kunihiko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2021.60},
  URN =		{urn:nbn:de:0030-drops-154932},
  doi =		{10.4230/LIPIcs.ISAAC.2021.60},
  annote =	{Keywords: Pattern Matching, Succinct Data Structures}
}
Document
A New Class of Searchable and Provably Highly Compressible String Transformations

Authors: Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited success. In this paper we bring new lymph to this area by introducing a whole new family of transformations that have all the "myriad virtues" of the BWT: they can be computed and inverted in linear time, they produce provably highly compressible strings, and they support linear time pattern search directly on the transformed string. This new family is a special case of a more general class of transformations based on context adaptive alphabet orderings, a concept introduced here. This more general class includes also the Alternating BWT, another invertible string transforms recently introduced in connection with a generalization of Lyndon words.

Cite as

Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino. A New Class of Searchable and Provably Highly Compressible String Transformations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:LIPIcs.CPM.2019.12,
  author =	{Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{A New Class of Searchable and Provably Highly Compressible String Transformations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.12},
  URN =		{urn:nbn:de:0030-drops-104833},
  doi =		{10.4230/LIPIcs.CPM.2019.12},
  annote =	{Keywords: Data Indexing and Compression, Burrows-Wheeler Transformation, Combinatorics on Words}
}
Document
Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

Authors: Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)


Abstract
In the quest for a mathematical measure able to capture and shed light on the dual notions of information and complexity in biosequences, Hazen et al. have introduced the notion of Functional Information (FI for short). It is also the result of earlier considerations and findings by Szostak and Carothers et al. Based on the experiments by Charoters et al., regarding FI in RNA binding activities, we decided to study the relation existing between FI and classic measures of complexity applied on protein-DNA interactions on a genome-wide scale. Using classic complexity measures, i.e, Shannon entropy and Kolmogorov Complexity as both estimated by data compression, we found that FI applied to protein-DNA interactions is genuinely different from them. Such a fact, together with the non-triviality of the biological function considered, contributes to the establishment of FI as a novel and useful measure of biocomplexity. Remarkably, we also found a relationship, on a genome-wide scale, between the redundancy of a genomic region and its ability to interact with a protein. This latter finding justifies even more some principles for the design of motif discovery algorithms. Finally, our experiments bring to light methodological limitations of Linguistic Complexity measures, i.e., a class of measures that is a function of the vocabulary richness of a sequence. Indeed, due to the technology and associated statistical preprocessing procedures used to conduct our studies, i.e., genome-wide ChIP-chip experiments, that class of measures cannot give any statistically significant indication about complexity and function. A serious limitation due to the widespread use of the technology. References J.M. Carothers, S.C. Oestreich, J.H. Davis, and J.W. Szostack. Informational complexity and functional activity of RNA structures. J. AM. CHEM. SOC., 126 (2004), pp. 5130-5137. R.M. Hazen, P.L. Griffin, J.M. Carothers, and J.W. Szostak. Functional Information and the emergence of biocomplexity. Proc. of Nat. Acad. Sci, 104 (2007), pp. 8574-8581. J.W. Szostak. Functional Information: molecular messages, Nature, 423 (2003).

Cite as

Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro. Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)


Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:DagSemProc.10231.6,
  author =	{Giancarlo, Raffaele and Corona, Davide and Di Benedetto, Valeria and Gabriele, Alessandra and Utro, Filippo},
  title =	{{Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--13},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.6},
  URN =		{urn:nbn:de:0030-drops-26884},
  doi =		{10.4230/DagSemProc.10231.6},
  annote =	{Keywords: Functional activity, sequence complexity, combinatorics on words, protein-DNA interaction.}
}
  • Refine by Author
  • 2 Giancarlo, Raffaele
  • 1 Corona, Davide
  • 1 Di Benedetto, Valeria
  • 1 Gabriele, Alessandra
  • 1 Manzini, Giovanni
  • Show More...

  • Refine by Classification
  • 2 Theory of computation → Pattern matching
  • 1 Mathematics of computing → Combinatorial algorithms
  • 1 Theory of computation → Data compression

  • Refine by Keyword
  • 1 Burrows-Wheeler Transformation
  • 1 Combinatorics on Words
  • 1 Data Indexing and Compression
  • 1 Functional activity
  • 1 Pattern Matching
  • Show More...

  • Refine by Type
  • 4 document

  • Refine by Publication Year
  • 1 2010
  • 1 2019
  • 1 2021
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail