DROPS

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2024.6

Bipartizing (Pseudo-)Disk Graphs: Approximation with a Ratio Better than 3

Authors: Daniel Lokshtanov, Fahad Panolan, Saket Saurabh, Jie Xue, and Meirav Zehavi

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

Abstract

In a disk graph, every vertex corresponds to a disk in ℝ² and two vertices are connected by an edge whenever the two corresponding disks intersect. Disk graphs form an important class of geometric intersection graphs, which generalizes both planar graphs and unit-disk graphs. We study a fundamental optimization problem in algorithmic graph theory, Bipartization (also known as Odd Cycle Transversal), on the class of disk graphs. The goal of Bipartization is to delete a minimum number of vertices from the input graph such that the resulting graph is bipartite. A folklore (polynomial-time) 3-approximation algorithm for Bipartization on disk graphs follows from the classical framework of Goemans and Williamson [Combinatorica'98] for cycle-hitting problems. For over two decades, this result has remained the best known approximation for the problem (in fact, even for Bipartization on unit-disk graphs). In this paper, we achieve the first improvement upon this result, by giving a (3-α)-approximation algorithm for Bipartization on disk graphs, for some constant α > 0. Our algorithm directly generalizes to the broader class of pseudo-disk graphs. Furthermore, our algorithm is robust in the sense that it does not require a geometric realization of the input graph to be given.

Cite as

Daniel Lokshtanov, Fahad Panolan, Saket Saurabh, Jie Xue, and Meirav Zehavi. Bipartizing (Pseudo-)Disk Graphs: Approximation with a Ratio Better than 3. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{lokshtanov_et_al:LIPIcs.APPROX/RANDOM.2024.6,
  author =	{Lokshtanov, Daniel and Panolan, Fahad and Saurabh, Saket and Xue, Jie and Zehavi, Meirav},
  title =	{{Bipartizing (Pseudo-)Disk Graphs: Approximation with a Ratio Better than 3}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{6:1--6:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.6},
  URN =		{urn:nbn:de:0030-drops-209990},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.6},
  annote =	{Keywords: bipartization, geometric intersection graphs, approximation algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.8

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors: Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Cite as

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2024.8,
  author =	{Schmidt, Sebastian and Toivonen, Santeri and Medvedev, Paul and Tomescu, Alexandru I.},
  title =	{{Applying the Safe-And-Complete Framework to Practical Genome Assembly}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.8},
  URN =		{urn:nbn:de:0030-drops-206520},
  doi =		{10.4230/LIPIcs.WABI.2024.8},
  annote =	{Keywords: Genome assembly, Omnitigs, Safe-and-complete framework, graph algorithm, HiFi sequencing data, Assembly evaluation}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.5

Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

Authors: Brian Brubach

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

We present a new approach to approximating the Maximum Duo-Preservation String Mapping Problem (MPSM) based on massaging the constraints into a tractable matching problem. MPSM was introduced in Chen, Chen, Samatova, Peng, Wang, and Tang [Chen et al., 2014] as the complement to the well-studied Minimum Common String Partition problem (MCSP). Prior work also considers the k-MPSM and k-MCSP variants in which each letter occurs at most k times in each string. The authors of [Chen et al., 2014] showed a k^2-appoximation for k >= 3 and 2-approximation for k = 2. Boria, Kurpisz, Leppänen, and Mastrolilli [Boria et al., 2014] gave a 4-approximation independent of k and showed that even 2-MPSM is APX-Hard. A series of improvements led to the current best bounds of a (2 + epsilon)-approximation for any epsilon > 0 in n^{O(1/epsilon)} time for strings of length n and a 2.67-approximation running in O(n^2) time, both by Dudek, Gawrychowski, and Ostropolski-Nalewaja [Dudek et al., 2017]. Here, we show that a 2.67-approximation can surprisingly be achieved in O(n) time for alphabets of constant size and O(n + alpha^7) for alphabets of size alpha. Recently, Mehrabi [Mehrabi, 2017] introduced the more general weighted variant, Maximum Weight Duo-Preservation String Mapping (MWPSM) and provided a 6-approximation. Our approach gives a 2.67-approximation to this problem running in O(n^3) time. This approach can also find an 8/(3(1-epsilon))-approximation to MWPSM for any epsilon > 0 in O(n^2 epsilon^{-1} lg{epsilon^{-1}}) time using the approximate weighted matching algorithm of Duan and Pettie [Duan and Pettie, 2014]. Finally, we introduce the first streaming algorithm for MPSM. We show that a single pass suffices to find a 4-approximation on the size of an optimal solution using only O(alpha^2 lg{n}) space.

Cite as

Brian Brubach. Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{brubach:LIPIcs.CPM.2018.5,
  author =	{Brubach, Brian},
  title =	{{Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{5:1--5:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.5},
  URN =		{urn:nbn:de:0030-drops-87066},
  doi =		{10.4230/LIPIcs.CPM.2018.5},
  annote =	{Keywords: approximation algorithm, maximum duo-preservation string mapping, minimum common string partition, string comparison, streaming algorithm, comparative genomics}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.13

A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

Authors: Brian Brubach and Jay Ghurye

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into k-by-k square blocks and generating a lookup table in O(psi^{2k} k^2 |Sigma|^{2k}) time and O(psi^{2k} k |Sigma|^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and |Sigma| is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi. We further show that for certain problems the O(|Sigma|^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of one-against-many banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm. Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and Ziv-Ukelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variable-sized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < |Sigma|), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space.

Cite as

Brian Brubach and Jay Ghurye. A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 13:1-13:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.CPM.2018.13,
  author =	{Brubach, Brian and Ghurye, Jay},
  title =	{{A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{13:1--13:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.13},
  URN =		{urn:nbn:de:0030-drops-86965},
  doi =		{10.4230/LIPIcs.CPM.2018.13},
  annote =	{Keywords: edit distance, banded alignment, one-against-many alignment, genomics, method of the Four Russians}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.3

Better Greedy Sequence Clustering with Fast Banded Alignment

Authors: Brian Brubach, Jay Ghurye, Mihai Pop, and Aravind Srinivasan

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Comparing a string to a large set of sequences is a key subroutine in greedy heuristics for clustering genomic data. Clustering 16S rRNA gene sequences into operational taxonomic units (OTUs) is a common method used in studying microbial communities. We present a new approach to greedy clustering using a trie-like data structure and Four Russians speedup. We evaluate the running time of our method in terms of the number of comparisons it makes during clustering and show in experimental results that the number of comparisons grows linearly with the size of the dataset as opposed to the quadratic running time of other methods. We compare the clusters output by our method to the popular greedy clustering tool UCLUST. We show that the clusters we generate can be both tighter and larger.

Cite as

Brian Brubach, Jay Ghurye, Mihai Pop, and Aravind Srinivasan. Better Greedy Sequence Clustering with Fast Banded Alignment. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.WABI.2017.3,
  author =	{Brubach, Brian and Ghurye, Jay and Pop, Mihai and Srinivasan, Aravind},
  title =	{{Better Greedy Sequence Clustering with Fast Banded Alignment}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{3:1--3:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.3},
  URN =		{urn:nbn:de:0030-drops-76425},
  doi =		{10.4230/LIPIcs.WABI.2017.3},
  annote =	{Keywords: Sequence Clustering, Metagenomics, String Algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.23

Outlier Detection in BLAST Hits

Authors: Nidhi Shah, Stephen F. Altschul, and Mihai Pop

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences. The similarity-search based methods typically run faster than phylogenetic methods and work well when the organisms in the sample are well represented in the database. On the other hand, phylogenetic methods have the capability to identify new organisms in a sample but are computationally quite expensive. We propose a two-step approach for metagenomic taxon identification; i.e., use a rapid method that accurately classifies sequences using a reference database (this is a filtering step) and then use a more complex phylogenetic method for the sequences that were unclassified in the previous step. In this work, we explore whether and when using top BLAST hit(s) yields a correct taxonomic label. We develop a method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms. We used modified BILD (Bayesian Integral Log Odds) scores, a multiple-alignment scoring function, to define the outliers within a subset of top BLAST hits and assign taxonomic labels. We compared the accuracy of our method to the RDP classifier and show that our method yields fewer misclassifications while properly classifying organisms that are not present in the database. Finally, we evaluated the use of our method as a pre-processing step before more expensive phylogenetic analyses (in our case TIPP) in the context of real 16S rRNA datasets. Our experiments demonstrate the potential of our method to be a filtering step before using phylogenetic methods.

Cite as

Nidhi Shah, Stephen F. Altschul, and Mihai Pop. Outlier Detection in BLAST Hits. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 23:1-23:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{shah_et_al:LIPIcs.WABI.2017.23,
  author =	{Shah, Nidhi and Altschul, Stephen F. and Pop, Mihai},
  title =	{{Outlier Detection in BLAST Hits}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{23:1--23:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.23},
  URN =		{urn:nbn:de:0030-drops-76512},
  doi =		{10.4230/LIPIcs.WABI.2017.23},
  annote =	{Keywords: Taxonomy classification, Metagenomics, Sequence alignment, Outlier detection}
}

Document

DOI: 10.4230/DagRep.6.8.91

Next Generation Sequencing (Dagstuhl Seminar 16351)

Authors: Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow

Published in: Dagstuhl Reports, Volume 6, Issue 8 (2017)

Abstract

Next Generation Sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling. The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. This Dagstuhl Seminar aimed at a free and deep exchange of ideas and needs between the communities of algorithmicists and theoreticians and practitioners from the biomedical field. It identified several relevant fields such as data structures and algorithms for large data sets, hardware acceleration, new problems in the upcoming age of genomes, etc. which were discussed in breakout groups.

Cite as

Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow. Next Generation Sequencing (Dagstuhl Seminar 16351). In Dagstuhl Reports, Volume 6, Issue 8, pp. 91-130, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Article{myers_et_al:DagRep.6.8.91,
  author =	{Myers, Gene and Pop, Mihai and Reinert, Knut and Warnow, Tandy},
  title =	{{Next Generation Sequencing (Dagstuhl Seminar 16351)}},
  pages =	{91--130},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2017},
  volume =	{6},
  number =	{8},
  editor =	{Myers, Gene and Pop, Mihai and Reinert, Knut and Warnow, Tandy},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.6.8.91},
  URN =		{urn:nbn:de:0030-drops-68395},
  doi =		{10.4230/DagRep.6.8.91},
  annote =	{Keywords: Cancer, DNA Sequence Assembly, Expression Profiles, Next Generation Sequencing, Sequence analysis, Software Engineering (Tools \& Libraries)}
}

7 Search Results for "Pop, Mihai"

Bipartizing (Pseudo-)Disk Graphs: Approximation with a Ratio Better than 3

Abstract

Cite as

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Abstract

Cite as

Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

Abstract

Cite as

A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

Abstract

Cite as

Better Greedy Sequence Clustering with Fast Banded Alignment

Abstract

Cite as

Outlier Detection in BLAST Hits

Abstract

Cite as

Next Generation Sequencing (Dagstuhl Seminar 16351)

Abstract

Cite as

Thanks for your feedback!

Could not send message