DROPS

Document

DOI: 10.4230/LIPIcs.FUN.2024.13

The Great Textual Hoax: Boosting Sampled String Matching with Fake Samples

Authors: Simone Faro, Francesco Pio Marino, Andrea Moschetto, Arianna Pavone, and Antonio Scardace

Published in: LIPIcs, Volume 291, 12th International Conference on Fun with Algorithms (FUN 2024)

Abstract

Sampled String Matching is presented as an efficient solution to the string matching problem, aiming to tackle the space constraints of indexed string matching while purportedly reducing search times for online solutions. Despite the problem’s inception dating back to 1991, practical solutions have only recently emerged. These purportedly accelerate online searches by up to 35 times compared to conventional methods, achieved through a partial index occupying a mere 5% of the text size. This paper delves into the intricacies of one of the latest and most effective text sampling techniques, character distance sampling, which revolves around sampling distances between characters of a selected alphabet within the text. Specifically, we introduce fake samples while remaining honest! In other words, the study reveals that, interestingly, strategically introducing fake samples within the sampled sequence slashes the required index space by almost half, just avoid compromising the algorithm’s correctness. Additionally, since efficiency is everything, this approach, in turn, purportedly enhances the algorithm’s efficiency under specific conditions.

Cite as

Simone Faro, Francesco Pio Marino, Andrea Moschetto, Arianna Pavone, and Antonio Scardace. The Great Textual Hoax: Boosting Sampled String Matching with Fake Samples. In 12th International Conference on Fun with Algorithms (FUN 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 291, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{faro_et_al:LIPIcs.FUN.2024.13,
  author =	{Faro, Simone and Marino, Francesco Pio and Moschetto, Andrea and Pavone, Arianna and Scardace, Antonio},
  title =	{{The Great Textual Hoax: Boosting Sampled String Matching with Fake Samples}},
  booktitle =	{12th International Conference on Fun with Algorithms (FUN 2024)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-314-0},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{291},
  editor =	{Broder, Andrei Z. and Tamir, Tami},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FUN.2024.13},
  URN =		{urn:nbn:de:0030-drops-199211},
  doi =		{10.4230/LIPIcs.FUN.2024.13},
  annote =	{Keywords: string matching, sampling}
}

Document

DOI: 10.4230/LIPIcs.WABI.2020.19

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Authors: Domenico Cantone, Simone Faro, and Arianna Pavone

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration. In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a 𝒪(nm³)-time and 𝒪(m²)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a 𝒪(nlog²_{σ} m) average time complexity, for an alphabet of size σ, still maintaining the 𝒪(nm³)-time and the 𝒪(m²)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors.

Cite as

Domenico Cantone, Simone Faro, and Arianna Pavone. Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 19:1-19:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{cantone_et_al:LIPIcs.WABI.2020.19,
  author =	{Cantone, Domenico and Faro, Simone and Pavone, Arianna},
  title =	{{Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{19:1--19:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.19},
  URN =		{urn:nbn:de:0030-drops-128086},
  doi =		{10.4230/LIPIcs.WABI.2020.19},
  annote =	{Keywords: Text processing, approximate matching, inversions, sequence matching}
}

Search Results

Documents authored by Pavone, Arianna

The Great Textual Hoax: Boosting Sampled String Matching with Fake Samples

Abstract

Cite as

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Abstract

Cite as

Thanks for your feedback!

Could not send message