31 Search Results for "Ukkonen, Esko"


Volume

LIPIcs, Volume 113

18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

WABI 2018, August 20-22, 2018, Helsinki, Finland

Editors: Laxmi Parida and Esko Ukkonen

Document
Track A: Algorithms, Complexity and Games
Optimal Bounds for Distinct Quartics

Authors: Panagiotis Charalampopoulos, Paweł Gawrychowski, and Samah Ghazawi

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
A fundamental concept related to strings is that of repetitions. It has been extensively studied in many versions, from both purely combinatorial and algorithmic angles. One of the most basic questions is how many distinct squares, i.e., distinct strings of the form UU, a string of length n can contain as fragments. It turns out that this is always 𝒪(n), and the bound cannot be improved to sublinear in n [Fraenkel and Simpson, JCTA 1998]. Several similar questions about repetitions in strings have been considered, and by now we seem to have a good understanding of their repetitive structure. For higher-dimensional strings, the basic concept of periodicity has been successfully extended and applied to design efficient algorithms - it is inherently more complex than for regular strings. Extending the notion of repetitions and understanding the repetitive structure of higher-dimensional strings is however far from complete. Quartics were introduced by Apostolico and Brimkov [TCS 2000] as analogues of squares in two dimensions. Charalampopoulos, Radoszewski, Rytter, Waleń, and Zuba [ESA 2020] proved that the number of distinct quartics in an n×n 2D string is 𝒪(n²log²n) and that they can be computed in 𝒪(n²log²n) time. Gawrychowski, Ghazawi, and Landau [SPIRE 2021] constructed an infinite family of n×n 2D strings with Ω(n²log n) distinct quartics. This brings the challenge of determining asymptotically tight bounds. Here, we settle both the combinatorial and the algorithmic aspects of this question: the number of distinct quartics in an n×n 2D string is 𝒪(n²log n) and they can be computed in the worst-case optimal 𝒪(n²log n) time. As expected, our solution heavily exploits the periodic structure implied by occurrences of quartics. However, the two-dimensional nature of the problem introduces some technical challenges. Somewhat surprisingly, we overcome the final challenge for the combinatorial bound using a result of Marcus and Tardos [JCTA 2004] for permutation avoidance on matrices.

Cite as

Panagiotis Charalampopoulos, Paweł Gawrychowski, and Samah Ghazawi. Optimal Bounds for Distinct Quartics. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 39:1-39:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ICALP.2024.39,
  author =	{Charalampopoulos, Panagiotis and Gawrychowski, Pawe{\l} and Ghazawi, Samah},
  title =	{{Optimal Bounds for Distinct Quartics}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{39:1--39:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.39},
  URN =		{urn:nbn:de:0030-drops-201823},
  doi =		{10.4230/LIPIcs.ICALP.2024.39},
  annote =	{Keywords: 2D strings, quartics, repetitions, periodicity}
}
Document
Track A: Algorithms, Complexity and Games
Õptimal Dynamic Time Warping on Run-Length Encoded Strings

Authors: Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
Dynamic Time Warping (DTW) distance is the optimal cost of matching two strings when extending runs of letters is for free. Therefore, it is natural to measure the time complexity of DTW in terms of the number of runs n (rather than the string lengths N). In this paper, we give an Õ(n²) time algorithm for computing the DTW distance. This matches (up to log factors) the known (conditional) lower bound, and should be compared with the previous fastest O(n³) time exact algorithm and the Õ(n²) time approximation algorithm. Our method also immediately implies an Õ(nk) time algorithm when the distance is bounded by k. This should be compared with the previous fastest O(n²k) and O(Nk) time exact algorithms and the Õ(nk) time approximation algorithm.

Cite as

Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann. Õptimal Dynamic Time Warping on Run-Length Encoded Strings. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 30:1-30:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ICALP.2024.30,
  author =	{Boneh, Itai and Golan, Shay and Mozes, Shay and Weimann, Oren},
  title =	{{\~{O}ptimal Dynamic Time Warping on Run-Length Encoded Strings}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{30:1--30:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.30},
  URN =		{urn:nbn:de:0030-drops-201730},
  doi =		{10.4230/LIPIcs.ICALP.2024.30},
  annote =	{Keywords: Dynamic time warping, Fr\'{e}chet distance, edit distance, run-length encoding}
}
Document
Complete Volume
LIPIcs, Volume 113, WABI'18, Complete Volume

Authors: Laxmi Parida and Esko Ukkonen

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
LIPIcs, Volume 113, WABI'18, Complete Volume

Cite as

18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Proceedings{parida_et_al:LIPIcs.WABI.2018,
  title =	{{LIPIcs, Volume 113, WABI'18, Complete Volume}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018},
  URN =		{urn:nbn:de:0030-drops-97246},
  doi =		{10.4230/LIPIcs.WABI.2018},
  annote =	{Keywords: Applied computing, Bioinformatics, Theory of computation, Design and analysis of algorithms, Mathematics of computing, Probabilistic inference problem}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Laxmi Parida and Esko Ukkonen

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{parida_et_al:LIPIcs.WABI.2018.0,
  author =	{Parida, Laxmi and Ukkonen, Esko},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.0},
  URN =		{urn:nbn:de:0030-drops-93028},
  doi =		{10.4230/LIPIcs.WABI.2018.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models

Authors: Hooman Zabeti, Tamon Stephen, Bonnie Berger, and Leonid Chindelevitch

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Elemental balance, the property of having the same number of each type of atom on both sides of the equation, is a fundamental feature of chemical reactions. In metabolic network models, this property is typically verified on a reaction-by-reaction basis. In this paper we show how violations of elemental balance can be efficiently detected in an entire network, without the need for specifying the chemical formula of each of the metabolites, which enhances a modeler's ability to automatically verify that their model satisfies elemental balance. Our method makes use of duality theory, linear programming, and mixed integer linear programming, and runs efficiently on genome-scale metabolic networks (GSMNs). We detect elemental balance violations in 40 out of 84 metabolic network models in the BiGG database. We also identify a short list of reactions that are candidates for being elementally imbalanced. Out of these candidates, nearly half turn out to be truly imbalanced reactions, and the rest can be seen as witnesses of elemental balance violations elsewhere in the network. The majority of these violations involve a proton imbalance, a known challenge of metabolic network reconstruction. Our approach is efficient, easy to use and powerful. It can be helpful to metabolic network modelers during model verification. Our methods are fully integrated into the MONGOOSE software suite and are available at https://github.com/WGS-TB/MongooseGUI3.

Cite as

Hooman Zabeti, Tamon Stephen, Bonnie Berger, and Leonid Chindelevitch. A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2018.1,
  author =	{Zabeti, Hooman and Stephen, Tamon and Berger, Bonnie and Chindelevitch, Leonid},
  title =	{{A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{1:1--1:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.1},
  URN =		{urn:nbn:de:0030-drops-93034},
  doi =		{10.4230/LIPIcs.WABI.2018.1},
  annote =	{Keywords: Metabolic network analysis, elemental imbalance, linear programming, model verification}
}
Document
Prefix-Free Parsing for Building Big BWTs

Authors: Christina Boucher, Travis Gagie, Alan Kuhnle, and Giovanni Manzini

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.

Cite as

Christina Boucher, Travis Gagie, Alan Kuhnle, and Giovanni Manzini. Prefix-Free Parsing for Building Big BWTs. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 2:1-2:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{boucher_et_al:LIPIcs.WABI.2018.2,
  author =	{Boucher, Christina and Gagie, Travis and Kuhnle, Alan and Manzini, Giovanni},
  title =	{{Prefix-Free Parsing for Building Big BWTs}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{2:1--2:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.2},
  URN =		{urn:nbn:de:0030-drops-93044},
  doi =		{10.4230/LIPIcs.WABI.2018.2},
  annote =	{Keywords: Burrows-Wheeler Transform, prefix-free parsing, compression-aware algorithms, genomic databases}
}
Document
Detecting Mutations by eBWT

Authors: Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the eBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the eBWT and LCP arrays. Preliminary results show that our method requires much less coverage than state-of-the-art tools while drastically improving precision and sensitivity.

Cite as

Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone. Detecting Mutations by eBWT. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{prezza_et_al:LIPIcs.WABI.2018.3,
  author =	{Prezza, Nicola and Pisanti, Nadia and Sciortino, Marinella and Rosone, Giovanna},
  title =	{{Detecting Mutations by eBWT}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{3:1--3:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.3},
  URN =		{urn:nbn:de:0030-drops-93051},
  doi =		{10.4230/LIPIcs.WABI.2018.3},
  annote =	{Keywords: BWT, LCP Array, SNPs, Reference-free, Assembly-free}
}
Document
Haplotype-aware graph indexes

Authors: Jouni Sirén, Erik Garrison, Adam M. Novak, Benedict J. Paten, and Richard Durbin

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
The variation graph toolkit (VG) represents genetic variation as a graph. Each path in the graph is a potential haplotype, though most paths are unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to be correct. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by indexing the 1000 Genomes Project haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.

Cite as

Jouni Sirén, Erik Garrison, Adam M. Novak, Benedict J. Paten, and Richard Durbin. Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{siren_et_al:LIPIcs.WABI.2018.4,
  author =	{Sir\'{e}n, Jouni and Garrison, Erik and Novak, Adam M. and Paten, Benedict J. and Durbin, Richard},
  title =	{{Haplotype-aware graph indexes}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.4},
  URN =		{urn:nbn:de:0030-drops-93060},
  doi =		{10.4230/LIPIcs.WABI.2018.4},
  annote =	{Keywords: FM-indexes, variation graphs, haplotypes}
}
Document
Reconciling Multiple Genes Trees via Segmental Duplications and Losses

Authors: Riccardo Dondi, Manuel Lafond, and Celine Scornavacca

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost delta and lambda, respectively. We show that the problem is polynomial-time solvable when delta <= lambda (via LCA-mapping), while if delta > lambda the problem is NP-hard, even when lambda = 0 and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are delta/lambda and the number d of segmental duplications, of time complexity O(ceil[delta/lambda]^d * n * delta/lambda). Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.

Cite as

Riccardo Dondi, Manuel Lafond, and Celine Scornavacca. Reconciling Multiple Genes Trees via Segmental Duplications and Losses. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 5:1-5:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{dondi_et_al:LIPIcs.WABI.2018.5,
  author =	{Dondi, Riccardo and Lafond, Manuel and Scornavacca, Celine},
  title =	{{Reconciling Multiple Genes Trees via Segmental Duplications and Losses}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{5:1--5:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.5},
  URN =		{urn:nbn:de:0030-drops-93079},
  doi =		{10.4230/LIPIcs.WABI.2018.5},
  annote =	{Keywords: Gene trees/species tree reconciliation, phylogenetics, computational complexity, fixed-parameter algorithms}
}
Document
Protein Classification with Improved Topological Data Analysis

Authors: Tamal K. Dey and Sayan Mandal

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Automated annotation and analysis of protein molecules have long been a topic of interest due to immediate applications in medicine and drug design. In this work, we propose a topology based, fast, scalable, and parameter-free technique to generate protein signatures. We build an initial simplicial complex using information about the protein's constituent atoms, including its radius and existing chemical bonds, to model the hierarchical structure of the molecule. Simplicial collapse is used to construct a filtration which we use to compute persistent homology. This information constitutes our signature for the protein. In addition, we demonstrate that this technique scales well to large proteins. Our method shows sizable time and memory improvements compared to other topology based approaches. We use the signature to train a protein domain classifier. Finally, we compare this classifier against models built from state-of-the-art structure-based protein signatures on standard datasets to achieve a substantial improvement in accuracy.

Cite as

Tamal K. Dey and Sayan Mandal. Protein Classification with Improved Topological Data Analysis. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{dey_et_al:LIPIcs.WABI.2018.6,
  author =	{Dey, Tamal K. and Mandal, Sayan},
  title =	{{Protein Classification with Improved Topological Data Analysis}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{6:1--6:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.6},
  URN =		{urn:nbn:de:0030-drops-93082},
  doi =		{10.4230/LIPIcs.WABI.2018.6},
  annote =	{Keywords: topological data analysis, persistent homology, simplicial collapse, supervised learning, topology based protein feature vector, protein classification}
}
Document
A Dynamic Algorithm for Network Propagation

Authors: Barak Sternberg and Roded Sharan

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Network propagation is a powerful transformation that amplifies signal-to-noise ratio in biological and other data. To date, most of its applications in the biological domain employed standard techniques for its computation that require O(m) time for a network with n vertices and m edges. When applied in a dynamic setting where the network is constantly modified, the cost of these computations becomes prohibitive. Here we study, for the first time in the biological context, the complexity of dynamic algorithms for network propagation. We develop a vertex decremental algorithm that is motivated by various biological applications and can maintain propagation scores over general weights at an amortized cost of O(m/(n^{1/4})) per update. In application to real networks, the dynamic algorithm achieves significant, 50- to 100-fold, speedups over conventional static methods for network propagation, demonstrating its great potential in practice.

Cite as

Barak Sternberg and Roded Sharan. A Dynamic Algorithm for Network Propagation. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{sternberg_et_al:LIPIcs.WABI.2018.7,
  author =	{Sternberg, Barak and Sharan, Roded},
  title =	{{A Dynamic Algorithm for Network Propagation}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{7:1--7:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.7},
  URN =		{urn:nbn:de:0030-drops-93095},
  doi =		{10.4230/LIPIcs.WABI.2018.7},
  annote =	{Keywords: Network propagation, Dynamic graph algorithm, protein-protein interaction network}
}
Document
New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy

Authors: Qiuyi (Richard) Zhang, Satish Rao, and Tandy Warnow

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCM_NJ, published in SODA 2001. The main empirical advantage of DCM_NJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCM_NJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCM_NJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

Cite as

Qiuyi (Richard) Zhang, Satish Rao, and Tandy Warnow. New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.WABI.2018.8,
  author =	{Zhang, Qiuyi (Richard) and Rao, Satish and Warnow, Tandy},
  title =	{{New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{8:1--8:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.8},
  URN =		{urn:nbn:de:0030-drops-93108},
  doi =		{10.4230/LIPIcs.WABI.2018.8},
  annote =	{Keywords: phylogeny estimation, short quartets, sample complexity, absolute fast converging methods, neighbor joining, maximum likelihood}
}
Document
An Average-Case Sublinear Exact Li and Stephens Forward Algorithm

Authors: Yohei M. Rosen and Benedict J. Paten

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithms as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. To make the Li and Stephens forward algorithm for these datasets computationally tractable, we have created a numerically exact version of the algorithm with observed average case O(nk^{0.35}) runtime in number of genetic sites n and reference panel size k. This avoids any tradeoff between runtime and model complexity. We demonstrate that our approach also provides a succinct data structure for general purpose haplotype data storage. We discuss generalizations of our algorithmic techniques to other hidden Markov models.

Cite as

Yohei M. Rosen and Benedict J. Paten. An Average-Case Sublinear Exact Li and Stephens Forward Algorithm. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 9:1-9:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{rosen_et_al:LIPIcs.WABI.2018.9,
  author =	{Rosen, Yohei M. and Paten, Benedict J.},
  title =	{{An Average-Case Sublinear Exact Li and Stephens Forward Algorithm}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{9:1--9:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.9},
  URN =		{urn:nbn:de:0030-drops-93116},
  doi =		{10.4230/LIPIcs.WABI.2018.9},
  annote =	{Keywords: Haplotype, Hidden Markov Model, Forward Algorithm, Lazy Evaluation}
}
Document
External memory BWT and LCP computation for sequence collections with applications

Authors: Lavinia Egidi, Felipe A. Louza, Giovanni Manzini, and Guilherme P. Telles

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences. In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs.

Cite as

Lavinia Egidi, Felipe A. Louza, Giovanni Manzini, and Guilherme P. Telles. External memory BWT and LCP computation for sequence collections with applications. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{egidi_et_al:LIPIcs.WABI.2018.10,
  author =	{Egidi, Lavinia and Louza, Felipe A. and Manzini, Giovanni and Telles, Guilherme P.},
  title =	{{External memory BWT and LCP computation for sequence collections with applications}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{10:1--10:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.10},
  URN =		{urn:nbn:de:0030-drops-93122},
  doi =		{10.4230/LIPIcs.WABI.2018.10},
  annote =	{Keywords: Burrows-Wheeler Transform, Longest Common Prefix Array, All pairs suffix-prefix overlaps, Succinct de Bruijn graph, Maximal repeats}
}
  • Refine by Author
  • 3 Parida, Laxmi
  • 3 Ukkonen, Esko
  • 2 Boucher, Christina
  • 2 Chindelevitch, Leonid
  • 2 Manzini, Giovanni
  • Show More...

  • Refine by Classification
  • 4 Applied computing → Bioinformatics
  • 4 Theory of computation → Pattern matching
  • 3 Applied computing → Computational biology
  • 3 Applied computing → Computational genomics
  • 3 Theory of computation → Design and analysis of algorithms
  • Show More...

  • Refine by Keyword
  • 2 Burrows-Wheeler Transform
  • 2 dynamic programming
  • 2 persistent homology
  • 2 phylogenetics
  • 2 topological data analysis
  • Show More...

  • Refine by Type
  • 30 document
  • 1 volume

  • Refine by Publication Year
  • 28 2018
  • 2 2024
  • 1 2017