DROPS

Document

DOI: 10.4230/LIPIcs.WABI.2024.4

MEM-Based Pangenome Indexing for k-mer Queries

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO’s index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5× faster than other approaches. MEMO’s small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead. MEM-Based Pangenome Indexing for k-mer Queries. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hwang_et_al:LIPIcs.WABI.2024.4,
  author =	{Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
  title =	{{MEM-Based Pangenome Indexing for k-mer Queries}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.4},
  URN =		{urn:nbn:de:0030-drops-206482},
  doi =		{10.4230/LIPIcs.WABI.2024.4},
  annote =	{Keywords: Pangenomics, Comparative genomics, Compressed indexing}
}

Document

DOI: 10.4230/LIPIcs.ESA.2023.61

Solving Edge Clique Cover Exactly via Synergistic Data Reduction

Authors: Anthony Hevia, Benjamin Kallus, Summer McClintic, Samantha Reisner, Darren Strash, and Johnathan Wilson

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

The edge clique cover (ECC) problem - where the goal is to find a minimum cardinality set of cliques that cover all the edges of a graph - is a classic NP-hard problem that has received much attention from both the theoretical and experimental algorithms communities. While small sparse graphs can be solved exactly via the branch-and-reduce algorithm of Gramm et al. [JEA 2009], larger instances can currently only be solved inexactly using heuristics with unknown overall solution quality. We revisit computing minimum ECCs exactly in practice by combining data reduction for both the ECC and vertex clique cover (VCC) problems. We do so by modifying the polynomial-time reduction of Kou et al. [Commun. ACM 1978] to transform a reduced ECC instance to a VCC instance; alternatively, we show it is possible to "lift" some VCC reductions to the ECC problem. Our experiments show that combining data reduction for both problems (which we call synergistic data reduction) enables finding exact minimum ECCs orders of magnitude faster than the technique of Gramm et al., and allows solving large sparse graphs on up to millions of vertices and edges that have never before been solved. With these new exact solutions, we evaluate the quality of recent heuristic algorithms on large instances for the first time. The most recent of these, EO-ECC by Abdullah et al. [ICCS 2022], solves 8 of the 27 instances for which we have exact solutions. It is our hope that our strategy rallies researchers to seek improved algorithms for the ECC problem.

Cite as

Anthony Hevia, Benjamin Kallus, Summer McClintic, Samantha Reisner, Darren Strash, and Johnathan Wilson. Solving Edge Clique Cover Exactly via Synergistic Data Reduction. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{hevia_et_al:LIPIcs.ESA.2023.61,
  author =	{Hevia, Anthony and Kallus, Benjamin and McClintic, Summer and Reisner, Samantha and Strash, Darren and Wilson, Johnathan},
  title =	{{Solving Edge Clique Cover Exactly via Synergistic Data Reduction}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{61:1--61:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.61},
  URN =		{urn:nbn:de:0030-drops-187148},
  doi =		{10.4230/LIPIcs.ESA.2023.61},
  annote =	{Keywords: Edge clique cover, Vertex clique cover, Data reduction, Degeneracy}
}

Document

DOI: 10.4230/LIPIcs.OPODIS.2022.20

Computing Power of Hybrid Models in Synchronous Networks

Authors: Pierre Fraigniaud, Pedro Montealegre, Pablo Paredes, Ivan Rapaport, Martín Ríos-Wilson, and Ioan Todinca

Published in: LIPIcs, Volume 253, 26th International Conference on Principles of Distributed Systems (OPODIS 2022)

Abstract

During the last two decades, a small set of distributed computing models for networks have emerged, among which LOCAL, CONGEST, and Broadcast Congested Clique (BCC) play a prominent role. We consider hybrid models resulting from combining these three models. That is, we analyze the computing power of models allowing to, say, perform a constant number of rounds of CONGEST, then a constant number of rounds of LOCAL, then a constant number of rounds of BCC, possibly repeating this figure a constant number of times. We specifically focus on 2-round models, and we establish the complete picture of the relative powers of these models. That is, for every pair of such models, we determine whether one is (strictly) stronger than the other, or whether the two models are incomparable. The separation results are obtained by approaching communication complexity through an original angle, which may be of an independent interest. The two players are not bounded to compute the value of a binary function, but the combined outputs of the two players are constrained by this value. In particular, we introduce the XOR-Index problem, in which Alice is given a binary vector x ∈ {0,1}ⁿ together with an index i ∈ [n], Bob is given a binary vector y ∈ {0,1}ⁿ together with an index j ∈ [n], and, after a single round of 2-way communication, Alice must output a boolean out_A, and Bob must output a boolean out_B, such that out_A ∧ out_B = x_j⊕ y_i. We show that the communication complexity of XOR-Index is Ω(n) bits.

Cite as

Pierre Fraigniaud, Pedro Montealegre, Pablo Paredes, Ivan Rapaport, Martín Ríos-Wilson, and Ioan Todinca. Computing Power of Hybrid Models in Synchronous Networks. In 26th International Conference on Principles of Distributed Systems (OPODIS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 253, pp. 20:1-20:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{fraigniaud_et_al:LIPIcs.OPODIS.2022.20,
  author =	{Fraigniaud, Pierre and Montealegre, Pedro and Paredes, Pablo and Rapaport, Ivan and R{\'\i}os-Wilson, Mart{\'\i}n and Todinca, Ioan},
  title =	{{Computing Power of Hybrid Models in Synchronous Networks}},
  booktitle =	{26th International Conference on Principles of Distributed Systems (OPODIS 2022)},
  pages =	{20:1--20:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-265-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{253},
  editor =	{Hillel, Eshcar and Palmieri, Roberto and Rivi\`{e}re, Etienne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2022.20},
  URN =		{urn:nbn:de:0030-drops-176401},
  doi =		{10.4230/LIPIcs.OPODIS.2022.20},
  annote =	{Keywords: hybrid model, synchronous networks, LOCAL, CONGEST, Broadcast Congested Clique}
}

@InProceedings{fraigniaud_et_al:LIPIcs.OPODIS.2022.20,
  author =	{Fraigniaud, Pierre and Montealegre, Pedro and Paredes, Pablo and Rapaport, Ivan and R{\'\i}os-Wilson, Mart{\'\i}n and Todinca, Ioan},
  title =	{{Computing Power of Hybrid Models in Synchronous Networks}},
  booktitle =	{26th International Conference on Principles of Distributed Systems (OPODIS 2022)},
  pages =	{20:1--20:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-265-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{253},
  editor =	{Hillel, Eshcar and Palmieri, Roberto and Rivi\`{e}re, Etienne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2022.20},
  URN =		{urn:nbn:de:0030-drops-176401},
  doi =		{10.4230/LIPIcs.OPODIS.2022.20},
  annote =	{Keywords: hybrid model, synchronous networks, LOCAL, CONGEST, Broadcast Congested Clique}
}

Document

Brief Announcement

DOI: 10.4230/LIPIcs.DISC.2022.43

Brief Announcement: Computing Power of Hybrid Models in Synchronous Networks

Authors: Pierre Fraigniaud, Pedro Montealegre, Pablo Paredes, Ivan Rapaport, Martín Ríos-Wilson, and Ioan Todinca

Published in: LIPIcs, Volume 246, 36th International Symposium on Distributed Computing (DISC 2022)

Abstract

During the last two decades, a small set of distributed computing models for networks have emerged, among which LOCAL, CONGEST, and Broadcast Congested Clique (BCC) play a prominent role. We consider hybrid models resulting from combining these three models. That is, we analyze the computing power of models allowing to, say, perform a constant number of rounds of CONGEST, then a constant number of rounds of LOCAL, then a constant number of rounds of BCC, possibly repeating this figure a constant number of times. We specifically focus on 2-round models, and we establish the complete picture of the relative powers of these models. That is, for every pair of such models, we determine whether one is (strictly) stronger than the other, or whether the two models are incomparable.

Cite as

Pierre Fraigniaud, Pedro Montealegre, Pablo Paredes, Ivan Rapaport, Martín Ríos-Wilson, and Ioan Todinca. Brief Announcement: Computing Power of Hybrid Models in Synchronous Networks. In 36th International Symposium on Distributed Computing (DISC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 246, pp. 43:1-43:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{fraigniaud_et_al:LIPIcs.DISC.2022.43,
  author =	{Fraigniaud, Pierre and Montealegre, Pedro and Paredes, Pablo and Rapaport, Ivan and R{\'\i}os-Wilson, Mart{\'\i}n and Todinca, Ioan},
  title =	{{Brief Announcement: Computing Power of Hybrid Models in Synchronous Networks}},
  booktitle =	{36th International Symposium on Distributed Computing (DISC 2022)},
  pages =	{43:1--43:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-255-6},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{246},
  editor =	{Scheideler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2022.43},
  URN =		{urn:nbn:de:0030-drops-172345},
  doi =		{10.4230/LIPIcs.DISC.2022.43},
  annote =	{Keywords: hybrid model, synchronous networks, LOCAL, CONGEST, Broadcast Congested Clique}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.24

Learning Definable Hypotheses on Trees

Authors: Emilie Grienenberger and Martin Ritzert

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

We study the problem of learning properties of nodes in tree structures. Those properties are specified by logical formulas, such as formulas from first-order or monadic second-order logic. We think of the tree as a database encoding a large dataset and therefore aim for learning algorithms which depend at most sublinearly on the size of the tree. We present a learning algorithm for quantifier-free formulas where the running time only depends polynomially on the number of training examples, but not on the size of the background structure. By a previous result on strings we know that for general first-order or monadic second-order (MSO) formulas a sublinear running time cannot be achieved. However, we show that by building an index on the tree in a linear time preprocessing phase, we can achieve a learning algorithm for MSO formulas with a logarithmic learning phase.

Cite as

Emilie Grienenberger and Martin Ritzert. Learning Definable Hypotheses on Trees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{grienenberger_et_al:LIPIcs.ICDT.2019.24,
  author =	{Grienenberger, Emilie and Ritzert, Martin},
  title =	{{Learning Definable Hypotheses on Trees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{24:1--24:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.24},
  URN =		{urn:nbn:de:0030-drops-103261},
  doi =		{10.4230/LIPIcs.ICDT.2019.24},
  annote =	{Keywords: monadic second-order logic, trees, query learning}
}

5 Search Results for "Ríos-Wilson, Martín"

MEM-Based Pangenome Indexing for k-mer Queries

Abstract

Cite as

Solving Edge Clique Cover Exactly via Synergistic Data Reduction

Abstract

Cite as

Computing Power of Hybrid Models in Synchronous Networks

Abstract

Cite as

Brief Announcement: Computing Power of Hybrid Models in Synchronous Networks

Abstract

Cite as

Learning Definable Hypotheses on Trees

Abstract

Cite as

Thanks for your feedback!

Could not send message