DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2021.22

A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

Authors: Sangsoo Park, Sung Gwan Park, Bastien Cazaux, Kunsoo Park, and Eric Rivals

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n²).

Cite as

Sangsoo Park, Sung Gwan Park, Bastien Cazaux, Kunsoo Park, and Eric Rivals. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 22:1-22:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{park_et_al:LIPIcs.CPM.2021.22,
  author =	{Park, Sangsoo and Park, Sung Gwan and Cazaux, Bastien and Park, Kunsoo and Rivals, Eric},
  title =	{{A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{22:1--22:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.22},
  URN =		{urn:nbn:de:0030-drops-139736},
  doi =		{10.4230/LIPIcs.CPM.2021.22},
  annote =	{Keywords: overlap graph, hierarchical overlap graph, shortest superstring problem, border array}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.16

Cartesian Tree Matching and Indexing

Authors: Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.

Cite as

Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian Tree Matching and Indexing. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{park_et_al:LIPIcs.CPM.2019.16,
  author =	{Park, Sung Gwan and Amir, Amihood and Landau, Gad M. and Park, Kunsoo},
  title =	{{Cartesian Tree Matching and Indexing}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.16},
  URN =		{urn:nbn:de:0030-drops-104879},
  doi =		{10.4230/LIPIcs.CPM.2019.16},
  annote =	{Keywords: Cartesian tree matching, Pattern matching, Indexing, Parent-distance representation}
}

Document

DOI: 10.4230/LIPIcs.SEA.2017.14

Algorithm Engineering for All-Pairs Suffix-Prefix Matching

Authors: Jihyuk Lim and Kunsoo Park

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)

Abstract

All-pairs suffix-prefix matching is an important part of DNA sequence assembly where it is the most time-consuming part of the whole assembly. Although there are algorithms for all-pairs suffix-prefix matching which are optimal in the asymptotic time complexity, they are slower than SOF and Readjoiner which are state-of-the-art algorithms used in practice. In this paper we present an algorithm for all-pairs suffix-prefix matching that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Our algorithm is 14 times faster than SOF and 18 times faster than Readjoiner on average in real datasets and random datasets.

Cite as

Jihyuk Lim and Kunsoo Park. Algorithm Engineering for All-Pairs Suffix-Prefix Matching. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 14:1-14:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{lim_et_al:LIPIcs.SEA.2017.14,
  author =	{Lim, Jihyuk and Park, Kunsoo},
  title =	{{Algorithm Engineering for All-Pairs Suffix-Prefix Matching}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{14:1--14:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.14},
  URN =		{urn:nbn:de:0030-drops-76174},
  doi =		{10.4230/LIPIcs.SEA.2017.14},
  annote =	{Keywords: all-pairs suffix-prefix matching, algorithm engineering, DNA sequence assembly}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ISAAC.2016.2

Compressed and Searchable Indexes for Highly Similar Strings (Invited Talk)

Authors: Kunsoo Park

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

Abstract

The collection indexing problem is defined as follows: Given a collection of highly similar strings, build a compressed index for the collection of strings, and when a pattern is given, find all occurrences of the pattern in the given strings. Since the index is compressed, we also need a separate operation which retrieves a specified substring of one of the given strings. Such a collection of highly similar strings can be found in genome sequences of a species and in documents stored in a version control system. Many indexes for the collection indexing problem have been developed, most of which use classical compression schemes such as run-length encoding and Lempel-Ziv compressions to exploit the similarity of the given strings. We introduce a new index for highly similar strings, called FM index of alignment. We start by finding common regions and non-common regions of highly similar strings. We need not find a multiple alignment of non-common regions. Finding common and non-common regions is much easier and simpler than finding a multiple alignment. Then we make a transformed alignment of the given strings, where gaps in a non-common region are put together into one gap. We define a suffix array of alignment on the transformed alignment, and the FM index of alignment is an FM index of this suffix array of alignment. The FM index of alignment supports the LF mapping and backward search, the key functionalities of the FM index. The FM index of alignment takes less space than other indexes and its pattern search is also fast.

Cite as

Kunsoo Park. Compressed and Searchable Indexes for Highly Similar Strings (Invited Talk). In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{park:LIPIcs.ISAAC.2016.2,
  author =	{Park, Kunsoo},
  title =	{{Compressed and Searchable Indexes for Highly Similar Strings}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.2},
  URN =		{urn:nbn:de:0030-drops-68359},
  doi =		{10.4230/LIPIcs.ISAAC.2016.2},
  annote =	{Keywords: Index for similar strings, FM index, Suffix array, Alignment}
}

Search Results

Documents authored by Park, Kunsoo

A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

Abstract

Cite as

Cartesian Tree Matching and Indexing

Abstract

Cite as

Algorithm Engineering for All-Pairs Suffix-Prefix Matching

Abstract

Cite as

Compressed and Searchable Indexes for Highly Similar Strings (Invited Talk)

Abstract

Cite as

Thanks for your feedback!

Could not send message