Search Results

Documents authored by Kärkkäinen, Juha


Document
Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)


Abstract
The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT (BBWT) is a bijective variant of it. Although it is known that the BWT can be constructed in linear time for integer alphabets by using a linear time suffix array construction algorithm, it was up to now only conjectured that the BBWT can also be constructed in linear time. We confirm this conjecture in the word RAM model by proposing a construction algorithm that is based on SAIS, improving the best known result of O(n lg n / lg lg n) time to linear. Since we can reduce the problem of constructing the extended BWT to constructing the BBWT in linear time, we obtain a linear-time algorithm computing the extended BWT at the same time.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski. Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2021.7,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pi\k{a}tkowski, Marcin},
  title =	{{Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.7},
  URN =		{urn:nbn:de:0030-drops-139588},
  doi =		{10.4230/LIPIcs.CPM.2021.7},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Circular Suffix Array, Suffix Array Construction Algorithm}
}
Document
Indexing the Bijective BWT

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski. Indexing the Bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2019.17,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pia̧tkowski, Marcin},
  title =	{{Indexing the Bijective BWT}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.17},
  URN =		{urn:nbn:de:0030-drops-104887},
  doi =		{10.4230/LIPIcs.CPM.2019.17},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Text Indexing}
}
Document
Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet

Authors: Juha Kärkkäinen and Dominik Kempa

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)


Abstract
The suffix array augmented with the LCP array is perhaps the most important data structure in modern string processing. There has been a lot of recent research activity on constructing these arrays in external memory. In this paper, we engineer the two fastest LCP array construction algorithms (ESA 2016) and improve them in three ways. First, we speed up the algorithms by up to a factor of two through parallelism. Just 8 threads is sufficient for making the algorithms essentially I/O bound. Second, we reduce the disk space usage of the algorithms making them in-place: The input (text and suffix array) is treated as read-only and the working disk space never exceeds the size of the final output (the LCP array). Third, we add support for large alphabets. All previous implementations assume the byte alphabet.

Cite as

Juha Kärkkäinen and Dominik Kempa. Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.SEA.2017.17,
  author =	{K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik},
  title =	{{Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.17},
  URN =		{urn:nbn:de:0030-drops-76116},
  doi =		{10.4230/LIPIcs.SEA.2017.17},
  annote =	{Keywords: LCP array, suffix array, external memory algorithms}
}
Document
Complete Volume
LIPIcs, Volume 78, CPM'17, Complete Volume

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
LIPIcs, Volume 78, CPM'17, Complete Volume

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@Proceedings{karkkainen_et_al:LIPIcs.CPM.2017,
  title =	{{LIPIcs, Volume 78, CPM'17, Complete Volume}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017},
  URN =		{urn:nbn:de:0030-drops-75091},
  doi =		{10.4230/LIPIcs.CPM.2017},
  annote =	{Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation, Discrete Mathematics, Information Systems,}
}
Document
String Inference from Longest-Common-Prefix Array

Authors: Juha Kärkkäinen, Marcin Piatkowski, and Simon J. Puglisi

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)


Abstract
The suffix array, perhaps the most important data structure in modern string processing, is often augmented with the longest common prefix (LCP) array which stores the lengths of the LCPs for lexicographically adjacent suffixes of a string. Together the two arrays are roughly equivalent to the suffix tree with the LCP array representing the tree shape. In order to better understand the combinatorics of LCP arrays, we consider the problem of inferring a string from an LCP array, i.e., determining whether a given array of integers is a valid LCP array, and if it is, reconstructing some string or all strings with that LCP array. There are recent studies of inferring a string from a suffix tree shape but using significantly more information (in the form of suffix links) than is available in the LCP array. We provide two main results. (1) We describe two algorithms for inferring strings from an LCP array when we allow a generalized form of LCP array defined for a multiset of cyclic strings: a linear time algorithm for binary alphabet and a general algorithm with polynomial time complexity for a constant alphabet size. (2) We prove that determining whether a given integer array is a valid LCP array is NP-complete when we require more restricted forms of LCP array defined for a single cyclic or non-cyclic string or a multiset of non-cyclic strings. The result holds whether or not the alphabet is restricted to be binary. In combination, the two results show that the generalized form of LCP array for a multiset of cyclic strings is fundamentally different from the other more restricted forms.

Cite as

Juha Kärkkäinen, Marcin Piatkowski, and Simon J. Puglisi. String Inference from Longest-Common-Prefix Array. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 62:1-62:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.ICALP.2017.62,
  author =	{K\"{a}rkk\"{a}inen, Juha and Piatkowski, Marcin and Puglisi, Simon J.},
  title =	{{String Inference from Longest-Common-Prefix Array}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{62:1--62:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.62},
  URN =		{urn:nbn:de:0030-drops-74989},
  doi =		{10.4230/LIPIcs.ICALP.2017.62},
  annote =	{Keywords: LCP array, string inference, BWT, suffix array, suffix tree, NP-hardness}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.CPM.2017.0,
  author =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.0},
  URN =		{urn:nbn:de:0030-drops-73178},
  doi =		{10.4230/LIPIcs.CPM.2017.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}
}
Document
On the Size of Lempel-Ziv and Lyndon Factorizations

Authors: Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, and Arseny M. Shur

Published in: LIPIcs, Volume 66, 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)


Abstract
Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size.

Cite as

Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, and Arseny M. Shur. On the Size of Lempel-Ziv and Lyndon Factorizations. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 66, pp. 45:1-45:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.STACS.2017.45,
  author =	{K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik and Nakashima, Yuto and Puglisi, Simon J. and Shur, Arseny M.},
  title =	{{On the Size of Lempel-Ziv and Lyndon Factorizations}},
  booktitle =	{34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)},
  pages =	{45:1--45:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-028-6},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{66},
  editor =	{Vollmer, Heribert and Vall\'{e}e, Brigitte},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2017.45},
  URN =		{urn:nbn:de:0030-drops-69878},
  doi =		{10.4230/LIPIcs.STACS.2017.45},
  annote =	{Keywords: Lempel-Ziv factorization, Lempel-Ziv parsing, LZ, Lyndon word, Lyndon factorization, Standard factorization}
}
Document
Faster External Memory LCP Array Construction

Authors: Juha Kärkkäinen and Dominik Kempa

Published in: LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)


Abstract
The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demonstrate that the new algorithms are about a factor of two faster than the fastest previous algorithm.

Cite as

Juha Kärkkäinen and Dominik Kempa. Faster External Memory LCP Array Construction. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 61:1-61:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.ESA.2016.61,
  author =	{K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik},
  title =	{{Faster External Memory LCP Array Construction}},
  booktitle =	{24th Annual European Symposium on Algorithms (ESA 2016)},
  pages =	{61:1--61:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-015-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{57},
  editor =	{Sankowski, Piotr and Zaroliagis, Christos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.61},
  URN =		{urn:nbn:de:0030-drops-64026},
  doi =		{10.4230/LIPIcs.ESA.2016.61},
  annote =	{Keywords: LCP array, suffix array, external memory algorithms}
}
Document
Faster Sparse Suffix Sorting

Authors: Tomohiro I, Juha Kärkkäinen, and Dominik Kempa

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)


Abstract
The sparse suffix sorting problem is to sort b=o(n) arbitrary suffixes of a string of length n using o(n) words of space in addition to the string. We present an O(n) time Monte Carlo algorithm using O(b.log(b)) space and an O(n.log(b)) time Las Vegas algorithm using O(b) space. This is a significant improvement over the best prior solutions of [Bille et al., ICALP 2013]: a Monte Carlo algorithm running in O(n.log(b)) time and O(b^(1+e)) space or O(n.log^2(b)) time and O(b) space, and a Las Vegas algorithm running in O(n.log^2(b)+b^2.log(b)) time and O(b) space. All the above results are obtained with high probability not just in expectation.

Cite as

Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. Faster Sparse Suffix Sorting. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 386-396, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{i_et_al:LIPIcs.STACS.2014.386,
  author =	{I, Tomohiro and K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik},
  title =	{{Faster Sparse Suffix Sorting}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{386--396},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.386},
  URN =		{urn:nbn:de:0030-drops-44738},
  doi =		{10.4230/LIPIcs.STACS.2014.386},
  annote =	{Keywords: string algorithms, sparse suffix sorting, sparse suffix trees, Karp-Rabin fingerprints, space-time tradeoffs}
}