DROPS

Volume

LIPIcs, Volume 181

31st International Symposium on Algorithms and Computation (ISAAC 2020)

ISAAC 2020, December 14-18, 2020, Hong Kong, China (Virtual Conference)

Editors: Yixin Cao, Siu-Wing Cheng, and Minming Li

Document

DOI: 10.4230/LIPIcs.ESA.2024.28

A Faster Algorithm for the Fréchet Distance in 1D for the Imbalanced Case

Authors: Lotte Blank and Anne Driemel

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

The fine-grained complexity of computing the {Fréchet distance } has been a topic of much recent work, starting with the quadratic SETH-based conditional lower bound by Bringmann from 2014. Subsequent work established largely the same complexity lower bounds for the {Fréchet distance } in 1D. However, the imbalanced case, which was shown by Bringmann to be tight in dimensions d ≥ 2, was still left open. Filling in this gap, we show that a faster algorithm for the {Fréchet distance } in the imbalanced case is possible: Given two 1-dimensional curves of complexity n and n^{α} for some α ∈ (0,1), we can compute their {Fréchet distance } in O(n^{2α} log² n + n log n) time. This rules out a conditional lower bound of the form O((nm)^{1-ε}) that Bringmann showed for d ≥ 2 and any ε > 0 in turn showing a strict separation with the setting d = 1. At the heart of our approach lies a data structure that stores a 1-dimensional curve P of complexity n, and supports queries with a curve Q of complexity m for the continuous {Fréchet distance } between P and Q. The data structure has size in 𝒪(nlog n) and uses query time in 𝒪(m² log² n). Our proof uses a key lemma that is based on the concept of visiting orders and may be of independent interest. We demonstrate this by substantially simplifying the correctness proof of a clustering algorithm by Driemel, Krivošija and Sohler from 2015.

Cite as

Lotte Blank and Anne Driemel. A Faster Algorithm for the Fréchet Distance in 1D for the Imbalanced Case. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 28:1-28:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{blank_et_al:LIPIcs.ESA.2024.28,
  author =	{Blank, Lotte and Driemel, Anne},
  title =	{{A Faster Algorithm for the Fr\'{e}chet Distance in 1D for the Imbalanced Case}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{28:1--28:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.28},
  URN =		{urn:nbn:de:0030-drops-210999},
  doi =		{10.4230/LIPIcs.ESA.2024.28},
  annote =	{Keywords: \{Fr\'{e}chet distance\}, distance oracle, data structures, time series}
}

Document

DOI: 10.4230/LIPIcs.ESA.2024.29

A Euclidean Embedding for Computing Persistent Homology with Gaussian Kernels

Authors: Jean-Daniel Boissonnat and Kunal Dutta

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

Computing persistent homology of large datasets using Gaussian kernels is useful in the domains of topological data analysis and machine learning as shown by Phillips, Wang and Zheng [SoCG 2015]. However, unlike in the case of persistent homology computation using the Euclidean distance or the k-distance, using Gaussian kernels involves significantly higher overhead, as all distance computations are in terms of the Gaussian kernel distance which is computationally more expensive. Further, most algorithmic implementations (e.g. Gudhi, Ripser, etc.) are based on Euclidean distances, so the question of finding a Euclidean embedding - preferably low-dimensional - that preserves the persistent homology computed with Gaussian kernels, is quite important. We consider the Gaussian kernel power distance (GKPD) given by Phillips, Wang and Zheng. Given an n-point dataset and a relative error parameter {ε} ∈ (0,1], we show that the persistent homology of the {Čech } filtration of the dataset computed using the GKPD can be approximately preserved using O({ε}^{-2}log n) dimensions, under a high stable rank condition. Our results also extend to the Delaunay filtration and the (simpler) case of the weighted Rips filtrations constructed using the GKPD. Compared to the Euclidean embedding for the Gaussian kernel function in ∼ n dimensions, which uses the Cholesky decomposition of the matrix of the kernel function applied to all pairs of data points, our embedding may also be viewed as dimensionality reduction - reducing the dimensionality from n to ∼ log n dimensions. Our proof utilizes the embedding of Chen and Phillips [ALT 2017], based on the Random Fourier Functions of Rahimi and Recht [NeurIPS 2007], together with two novel ingredients. The first one is a new decomposition of the squared radii of {Čech } simplices computed using the GKPD, in terms of the pairwise GKPDs between the vertices, which we state and prove. The second is a new concentration inequality for sums of cosine functions of Gaussian random vectors, which we call Gaussian cosine chaoses. We believe these are of independent interest and will find other applications in future.

Cite as

Jean-Daniel Boissonnat and Kunal Dutta. A Euclidean Embedding for Computing Persistent Homology with Gaussian Kernels. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boissonnat_et_al:LIPIcs.ESA.2024.29,
  author =	{Boissonnat, Jean-Daniel and Dutta, Kunal},
  title =	{{A Euclidean Embedding for Computing Persistent Homology with Gaussian Kernels}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.29},
  URN =		{urn:nbn:de:0030-drops-211009},
  doi =		{10.4230/LIPIcs.ESA.2024.29},
  annote =	{Keywords: Persistent homology, Gaussian kernels, Random Fourier Features, Euclidean embedding}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2024.12

Universal Optimization for Non-Clairvoyant Subadditive Joint Replenishment

Authors: Tomer Ezra, Stefano Leonardi, Michał Pawłowski, Matteo Russo, and Seeun William Umboh

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

Abstract

The online joint replenishment problem (JRP) is a fundamental problem in the area of online problems with delay. Over the last decade, several works have studied generalizations of JRP with different cost functions for servicing requests. Most prior works on JRP and its generalizations have focused on the clairvoyant setting. Recently, Touitou [Noam Touitou, 2023] developed a non-clairvoyant framework that provided an O(√{n log n}) upper bound for a wide class of generalized JRP, where n is the number of request types. We advance the study of non-clairvoyant algorithms by providing a simpler, modular framework that matches the competitive ratio established by Touitou for the same class of generalized JRP. Our key insight is to leverage universal algorithms for Set Cover to approximate arbitrary monotone subadditive functions using a simple class of functions termed disjoint. This allows us to reduce the problem to several independent instances of the TCP Acknowledgement problem, for which a simple 2-competitive non-clairvoyant algorithm is known. The modularity of our framework is a major advantage as it allows us to tailor the reduction to specific problems and obtain better competitive ratios. In particular, we obtain tight O(√n)-competitive algorithms for two significant problems: Multi-Level Aggregation and Weighted Symmetric Subadditive Joint Replenishment. We also show that, in contrast, Touitou’s algorithm is Ω(√{n log n})-competitive for both of these problems.

Cite as

Tomer Ezra, Stefano Leonardi, Michał Pawłowski, Matteo Russo, and Seeun William Umboh. Universal Optimization for Non-Clairvoyant Subadditive Joint Replenishment. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 12:1-12:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ezra_et_al:LIPIcs.APPROX/RANDOM.2024.12,
  author =	{Ezra, Tomer and Leonardi, Stefano and Paw{\l}owski, Micha{\l} and Russo, Matteo and Umboh, Seeun William},
  title =	{{Universal Optimization for Non-Clairvoyant Subadditive Joint Replenishment}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{12:1--12:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.12},
  URN =		{urn:nbn:de:0030-drops-210050},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.12},
  annote =	{Keywords: Set Cover, Joint Replenishment, TCP-Acknowledgment, Subadditive Function Approximation, Multi-Level Aggregation}
}

@InProceedings{ezra_et_al:LIPIcs.APPROX/RANDOM.2024.12,
  author =	{Ezra, Tomer and Leonardi, Stefano and Paw{\l}owski, Micha{\l} and Russo, Matteo and Umboh, Seeun William},
  title =	{{Universal Optimization for Non-Clairvoyant Subadditive Joint Replenishment}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{12:1--12:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.12},
  URN =		{urn:nbn:de:0030-drops-210050},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.12},
  annote =	{Keywords: Set Cover, Joint Replenishment, TCP-Acknowledgment, Subadditive Function Approximation, Multi-Level Aggregation}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2024.24

Learning-Augmented Maximum Independent Set

Authors: Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, and Chen Wang

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

Abstract

We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of n^(1-δ) for any δ > 0. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membership queries for a fixed MIS with probability 1/2+ε. In the first setting we consider, the oracle can be queried once per vertex to know if a vertex belongs to a fixed MIS, and the oracle returns the correct answer with probability 1/2 + ε. Under this setting, we show an algorithm that obtains an Õ((√Δ)/ε)-approximation in O(m) time where Δ is the maximum degree of the graph. In the second setting, we allow multiple queries to the oracle for a vertex, each of which is correct with probability 1/2 + ε. For this setting, we show an O(1)-approximation algorithm using O(n/ε²) total queries and Õ(m) runtime.

Cite as

Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, and Chen Wang. Learning-Augmented Maximum Independent Set. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.APPROX/RANDOM.2024.24,
  author =	{Braverman, Vladimir and Dharangutte, Prathamesh and Shah, Vihan and Wang, Chen},
  title =	{{Learning-Augmented Maximum Independent Set}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{24:1--24:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.24},
  URN =		{urn:nbn:de:0030-drops-210179},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.24},
  annote =	{Keywords: Learning-augmented algorithms, maximum independent set, graph algorithms}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2024.6

Cross Module Quickening - The Curious Case of C Extensions

Authors: Felix Berlakovich and Stefan Brunthaler

Published in: LIPIcs, Volume 313, 38th European Conference on Object-Oriented Programming (ECOOP 2024)

Abstract

Dynamic programming languages such as Python offer expressive power and programmer productivity at the expense of performance. Although the topic of optimizing Python has received considerable attention over the years, a key obstacle remains elusive: C extensions. Time and again, optimized run-time environments, such as JIT compilers and optimizing interpreters, fall short of optimizing across C extensions, as they cannot reason about the native code hiding underneath. To bridge this gap, we present an analysis of C extensions for Python. The analysis data indicates that C extensions come in different varieties. One such variety is to merely speed up a single thing, such as reading a file and processing it directly in C. Another variety offers broad access through an API, resulting in a domain-specific language realized by function calls. While the former variety of C extensions offer little optimization potential for optimizing run-times, we find that the latter variety does offer considerable optimization potential. This optimization potential rests on dynamic locality that C extensions cannot readily tap. We introduce a new, interpreter-based optimization leveraging this untapped optimization potential called Cross-Module Quickening. The key idea is that C extensions can use an optimization interface to register highly-optimized operations on C extension-specific datatypes. A quickening interpreter uses these information to continuously specialize programs with C extensions. To quantify the attainable performance potential of going beyond C extensions, we demonstrate a concrete instantiation of Cross-Module Quickening for the CPython interpreter and the popular NumPy C extension. We evaluate our implementation with the NPBench benchmark suite and report performance improvements by a factor of up to 2.84.

Cite as

Felix Berlakovich and Stefan Brunthaler. Cross Module Quickening - The Curious Case of C Extensions. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 6:1-6:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{berlakovich_et_al:LIPIcs.ECOOP.2024.6,
  author =	{Berlakovich, Felix and Brunthaler, Stefan},
  title =	{{Cross Module Quickening - The Curious Case of C Extensions}},
  booktitle =	{38th European Conference on Object-Oriented Programming (ECOOP 2024)},
  pages =	{6:1--6:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-341-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{313},
  editor =	{Aldrich, Jonathan and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.6},
  URN =		{urn:nbn:de:0030-drops-208557},
  doi =		{10.4230/LIPIcs.ECOOP.2024.6},
  annote =	{Keywords: interpreter, optimizations, C extensions, Python}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.2

On the Complexity of the Median and Closest Permutation Problems

Authors: Luís Cunha, Ignasi Sau, and Uéverton Souza

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Genome rearrangements are events where large blocks of DNA exchange places during evolution. The analysis of these events is a promising tool for understanding evolutionary genomics, providing data for phylogenetic reconstruction based on genome rearrangement measures. Many pairwise rearrangement distances have been proposed, based on finding the minimum number of rearrangement events to transform one genome into the other, using some predefined operation. When more than two genomes are considered, we have the more challenging problem of rearrangement-based phylogeny reconstruction. Given a set of genomes and a distance notion, there are at least two natural ways to define the "target" genome. On the one hand, finding a genome that minimizes the sum of the distances from this to any other, called the median genome. On the other hand, finding a genome that minimizes the maximum distance to any other, called the closest genome. Considering genomes as permutations of distinct integers, some distance metrics have been extensively studied. We investigate the median and closest problems on permutations over the following metrics: breakpoint distance, swap distance, block-interchange distance, short-block-move distance, and transposition distance. In biological applications some values are usually very small, such as the solution value d or the number k of input permutations. For each of these metrics and parameters d or k, we analyze the closest and the median problems from the viewpoint of parameterized complexity. We obtain the following results: NP-hardness for finding the median/closest permutation regarding some metrics of distance, even for only k = 3 permutations; Polynomial kernels for the problems of finding the median permutation of all studied metrics, considering the target distance d as parameter; NP-hardness result for finding the closest permutation by short-block-moves; FPT algorithms and infeasibility of polynomial kernels for finding the closest permutation for some metrics when parameterized by the target distance d.

Cite as

Luís Cunha, Ignasi Sau, and Uéverton Souza. On the Complexity of the Median and Closest Permutation Problems. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 2:1-2:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.WABI.2024.2,
  author =	{Cunha, Lu{\'\i}s and Sau, Ignasi and Souza, U\'{e}verton},
  title =	{{On the Complexity of the Median and Closest Permutation Problems}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{2:1--2:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.2},
  URN =		{urn:nbn:de:0030-drops-206468},
  doi =		{10.4230/LIPIcs.WABI.2024.2},
  annote =	{Keywords: Median problem, Closest problem, Genome rearrangements, Parameterized complexity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.4

MEM-Based Pangenome Indexing for k-mer Queries

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO’s index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5× faster than other approaches. MEMO’s small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead. MEM-Based Pangenome Indexing for k-mer Queries. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hwang_et_al:LIPIcs.WABI.2024.4,
  author =	{Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
  title =	{{MEM-Based Pangenome Indexing for k-mer Queries}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.4},
  URN =		{urn:nbn:de:0030-drops-206482},
  doi =		{10.4230/LIPIcs.WABI.2024.4},
  annote =	{Keywords: Pangenomics, Comparative genomics, Compressed indexing}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.8

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors: Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Cite as

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2024.8,
  author =	{Schmidt, Sebastian and Toivonen, Santeri and Medvedev, Paul and Tomescu, Alexandru I.},
  title =	{{Applying the Safe-And-Complete Framework to Practical Genome Assembly}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.8},
  URN =		{urn:nbn:de:0030-drops-206520},
  doi =		{10.4230/LIPIcs.WABI.2024.8},
  annote =	{Keywords: Genome assembly, Omnitigs, Safe-and-complete framework, graph algorithm, HiFi sequencing data, Assembly evaluation}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.17

A*PA2: Up to 19× Faster Exact Global Alignment

Authors: Ragnar Groot Koerkamp

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Motivation. Pairwise alignment is at the core of computational biology. Most commonly used exact methods are either based on O(ns) band doubling or O(n+s²) diagonal transition, where n is the sequence length and s the number of errors. However, as the length of sequences has grown, these exact methods are often replaced by approximate methods based on e.g. seed-and-extend and heuristics to bound the computed region. We would like to develop an exact method that matches the performance of these approximate methods. Recently, Astarix introduced the A* shortest path algorithm with the seed heuristic for exact sequence-to-graph alignment. A*PA adapted and improved this for pairwise sequence alignment and achieves near-linear runtime when divergence (error rate) is low, at the cost of being very slow when divergence is high. Methods. We introduce A*PA2, an exact global pairwise aligner with respect to edit distance. The goal of A*PA2 is to unify the near-linear runtime of A*PA on similar sequences with the efficiency of dynamic programming (DP) based methods. Like Edlib, A*PA2 uses Ukkonen’s band doubling in combination with Myers' bitpacking. A*PA2 1) uses large block sizes inspired by Block Aligner, 2) extends this with SIMD (single instruction, multiple data), 3) introduces a new profile for efficient computations, 4) introduces a new optimistic technique for traceback based on diagonal transition, 5) avoids recomputation of states where possible, and 6) applies the heuristics developed in A*PA and improves them using pre-pruning. Results. With the first 4 engineering optimizations, A*PA2-simple has complexity O(ns) and is 6× to 8× faster than Edlib for sequences ≥ 10 kbp. A*PA2-full also includes the heuristic and is often near-linear in practice for sequences with small divergence. The average runtime of A*PA2 is 19× faster than the exact aligners BiWFA and Edlib on >500 kbp long ONT (Oxford Nanopore Technologies) reads of a human genome having 6% divergence on average. On shorter ONT reads of 11% average divergence the speedup is 5.6× (avg. length 11 kbp) and 0.81× (avg. length 800 bp). On all tested datasets, A*PA2 is competitive with or faster than approximate methods.

Cite as

Ragnar Groot Koerkamp. A*PA2: Up to 19× Faster Exact Global Alignment. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 17:1-17:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{grootkoerkamp:LIPIcs.WABI.2024.17,
  author =	{Groot Koerkamp, Ragnar},
  title =	{{A*PA2: Up to 19× Faster Exact Global Alignment}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{17:1--17:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.17},
  URN =		{urn:nbn:de:0030-drops-206610},
  doi =		{10.4230/LIPIcs.WABI.2024.17},
  annote =	{Keywords: Edit distance, Pairwise alignment, A*, Shortest path, Dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.22

Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads

Authors: Xiaofei Carl Zang, Xiang Li, Kyle Metcalfe, Tuval Ben-Yehezkel, Ryan Kelley, and Mingfu Shao

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test.

Cite as

Xiaofei Carl Zang, Xiang Li, Kyle Metcalfe, Tuval Ben-Yehezkel, Ryan Kelley, and Mingfu Shao. Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 22:1-22:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{zang_et_al:LIPIcs.WABI.2024.22,
  author =	{Zang, Xiaofei Carl and Li, Xiang and Metcalfe, Kyle and Ben-Yehezkel, Tuval and Kelley, Ryan and Shao, Mingfu},
  title =	{{Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{22:1--22:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.22},
  URN =		{urn:nbn:de:0030-drops-206660},
  doi =		{10.4230/LIPIcs.WABI.2024.22},
  annote =	{Keywords: Genome assembly, de Bruijn graph, synthetic long reads, anchor-guided assembly, LoopSeq}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.23

AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction

Authors: Adam Cicherski, Anna Lisiecka, and Norbert Dojer

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the current paper we present AlfaPang, a new pangenome graph building algorithm. AlfaPang is based on a novel alignment-free approach that allows to construct pangenome graphs using significantly less computational resources than state-of-the-art tools. The code of AlfaPang is freely available at https://github.com/AdamCicherski/AlfaPang.

Cite as

Adam Cicherski, Anna Lisiecka, and Norbert Dojer. AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 23:1-23:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cicherski_et_al:LIPIcs.WABI.2024.23,
  author =	{Cicherski, Adam and Lisiecka, Anna and Dojer, Norbert},
  title =	{{AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{23:1--23:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.23},
  URN =		{urn:nbn:de:0030-drops-206673},
  doi =		{10.4230/LIPIcs.WABI.2024.23},
  annote =	{Keywords: pangenome, variation graph, genome alignment, population genomics}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.99

Constrained Level Planarity Is FPT with Respect to the Vertex Cover Number

Authors: Boris Klemz and Marie Diana Sieper

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

The problem Level Planarity asks for a crossing-free drawing of a graph in the plane such that vertices are placed at prescribed y-coordinates (called levels) and such that every edge is realized as a y-monotone curve. In the variant Constrained Level Planarity, each level y is equipped with a partial order ≺_y on its vertices and in the desired drawing the left-to-right order of vertices on level y has to be a linear extension of ≺_y. Constrained Level Planarity is known to be a remarkably difficult problem: previous results by Klemz and Rote [ACM Trans. Alg.'19] and by Brückner and Rutter [SODA'17] imply that it remains NP-hard even when restricted to graphs whose tree-depth and feedback vertex set number are bounded by a constant and even when the instances are additionally required to be either proper, meaning that each edge spans two consecutive levels, or ordered, meaning that all given partial orders are total orders. In particular, these results rule out the existence of FPT-time (even XP-time) algorithms with respect to these and related graph parameters (unless P=NP). However, the parameterized complexity of Constrained Level Planarity with respect to the vertex cover number of the input graph remained open. In this paper, we show that Constrained Level Planarity can be solved in FPT-time when parameterized by the vertex cover number. In view of the previous intractability statements, our result is best-possible in several regards: a speed-up to polynomial time or a generalization to the aforementioned smaller graph parameters is not possible, even if restricting to proper or ordered instances.

Cite as

Boris Klemz and Marie Diana Sieper. Constrained Level Planarity Is FPT with Respect to the Vertex Cover Number. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 99:1-99:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{klemz_et_al:LIPIcs.ICALP.2024.99,
  author =	{Klemz, Boris and Sieper, Marie Diana},
  title =	{{Constrained Level Planarity Is FPT with Respect to the Vertex Cover Number}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{99:1--99:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.99},
  URN =		{urn:nbn:de:0030-drops-202428},
  doi =		{10.4230/LIPIcs.ICALP.2024.99},
  annote =	{Keywords: Parameterized Complexity, Graph Drawing, Planar Poset Diagram, Level Planarity, Constrained Level Planarity, Vertex Cover, FPT, Computational Geometry}
}

@InProceedings{klemz_et_al:LIPIcs.ICALP.2024.99,
  author =	{Klemz, Boris and Sieper, Marie Diana},
  title =	{{Constrained Level Planarity Is FPT with Respect to the Vertex Cover Number}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{99:1--99:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.99},
  URN =		{urn:nbn:de:0030-drops-202428},
  doi =		{10.4230/LIPIcs.ICALP.2024.99},
  annote =	{Keywords: Parameterized Complexity, Graph Drawing, Planar Poset Diagram, Level Planarity, Constrained Level Planarity, Vertex Cover, FPT, Computational Geometry}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2024.31

Geometric Matching and Bottleneck Problems

Authors: Sergio Cabello, Siu-Wing Cheng, Otfried Cheong, and Christian Knauer

Published in: LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

Abstract

Let P be a set of at most n points and let R be a set of at most n geometric ranges, such as disks and rectangles, where each p ∈ P has an associated supply s_{p} > 0, and each r ∈ R has an associated demand d_r > 0. A (many-to-many) matching is a set 𝒜 of ordered triples (p,r,a_{pr}) ∈ P × R × ℝ_{> 0} such that p ∈ r and the a_{pr}’s satisfy the constraints given by the supplies and demands. We show how to compute a maximum matching, that is, a matching maximizing ∑_{(p,r,a_{pr}) ∈ 𝒜} a_{pr}. Using our techniques, we can also solve minimum bottleneck problems, such as computing a perfect matching between a set of n red points P and a set of n blue points Q that minimizes the length of the longest edge. For the L_∞-metric, we can do this in time O(n^{1+ε}) in any fixed dimension, for the L₂-metric in the plane in time O(n^{4/3 + ε}), for any ε > 0.

Cite as

Sergio Cabello, Siu-Wing Cheng, Otfried Cheong, and Christian Knauer. Geometric Matching and Bottleneck Problems. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 31:1-31:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cabello_et_al:LIPIcs.SoCG.2024.31,
  author =	{Cabello, Sergio and Cheng, Siu-Wing and Cheong, Otfried and Knauer, Christian},
  title =	{{Geometric Matching and Bottleneck Problems}},
  booktitle =	{40th International Symposium on Computational Geometry (SoCG 2024)},
  pages =	{31:1--31:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-316-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{293},
  editor =	{Mulzer, Wolfgang and Phillips, Jeff M.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.31},
  URN =		{urn:nbn:de:0030-drops-199768},
  doi =		{10.4230/LIPIcs.SoCG.2024.31},
  annote =	{Keywords: Many-to-many matching, bipartite, planar, geometric, approximation}
}

Document

DOI: 10.4230/DagRep.13.5.165

Computational Geometry (Dagstuhl Seminar 23221)

Authors: Siu-Wing Cheng, Maarten Löffler, Jeff M. Phillips, and Aleksandr Popov

Published in: Dagstuhl Reports, Volume 13, Issue 5 (2023)

Abstract

This report documents the program and the outcomes of the Dagstuhl Seminar 23221 "Computational Geometry". The seminar was held from May 29th to June 2nd, 2023, and 39 participants from various countries attended it, including two remote participants. Recent advances in computational geometry were presented and discussed, and new challenges were identified. This report collects the abstracts of the talks and the open problems presented at the seminar.

Cite as

Siu-Wing Cheng, Maarten Löffler, Jeff M. Phillips, and Aleksandr Popov. Computational Geometry (Dagstuhl Seminar 23221). In Dagstuhl Reports, Volume 13, Issue 5, pp. 165-181, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{cheng_et_al:DagRep.13.5.165,
  author =	{Cheng, Siu-Wing and L\"{o}ffler, Maarten and Phillips, Jeff M. and Popov, Aleksandr},
  title =	{{Computational Geometry (Dagstuhl Seminar 23221)}},
  pages =	{165--181},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{13},
  number =	{5},
  editor =	{Cheng, Siu-Wing and L\"{o}ffler, Maarten and Phillips, Jeff M. and Popov, Aleksandr},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.13.5.165},
  URN =		{urn:nbn:de:0030-drops-193692},
  doi =		{10.4230/DagRep.13.5.165},
  annote =	{Keywords: Algorithms, Combinatorics, Geometric Computing, Reconfiguration, Uncertainty}
}

95 Search Results for "Cheng, Siu-Wing"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message