Search Results

Documents authored by Brown, Nathaniel K.


Artifact
Software
StephenHwang/MEMO

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead


Abstract

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, Ben Langmead. StephenHwang/MEMO (Software, Source Code). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-22508,
   title = {{StephenHwang/MEMO}}, 
   author = {Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
   note = {Software, version 1.0.0., swhId: \href{https://archive.softwareheritage.org/swh:1:dir:793f47e3260ebae1887b07175fe3087c8e93d1f8;origin=https://github.com/StephenHwang/MEMO;visit=swh:1:snp:b23bfa6e000a68e85c5b91961d022de194b4b86b;anchor=swh:1:rev:d61a1a995b8027ae3d3dbe449502e952321f7217}{\texttt{swh:1:dir:793f47e3260ebae1887b07175fe3087c8e93d1f8}} (visited on 2024-11-28)},
   url = {https://github.com/StephenHwang/MEMO},
   doi = {10.4230/artifacts.22508},
}
Artifact
Software
StephenHwang/MEMO_experiments

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead


Abstract

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, Ben Langmead. StephenHwang/MEMO_experiments (Software, Experiments performed for paper). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-22509,
   title = {{StephenHwang/MEMO\underlineexperiments}}, 
   author = {Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
   note = {Software, swhId: \href{https://archive.softwareheritage.org/swh:1:dir:d69ad61b0d1d563b3945a978b1396fd81be04732;origin=https://github.com/StephenHwang/MEMO_experiments;visit=swh:1:snp:c6a9c4193f1f39f83e8987cf1f9dda2ad2fc3e2d;anchor=swh:1:rev:b47d8f5f8a1d7ff511dad707c79f168feef8469f}{\texttt{swh:1:dir:d69ad61b0d1d563b3945a978b1396fd81be04732}} (visited on 2024-11-28)},
   url = {https://github.com/StephenHwang/MEMO_experiments},
   doi = {10.4230/artifacts.22509},
}
Document
MEM-Based Pangenome Indexing for k-mer Queries

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO’s index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5× faster than other approaches. MEMO’s small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead. MEM-Based Pangenome Indexing for k-mer Queries. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hwang_et_al:LIPIcs.WABI.2024.4,
  author =	{Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
  title =	{{MEM-Based Pangenome Indexing for k-mer Queries}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.4},
  URN =		{urn:nbn:de:0030-drops-206482},
  doi =		{10.4230/LIPIcs.WABI.2024.4},
  annote =	{Keywords: Pangenomics, Comparative genomics, Compressed indexing}
}
Document
RLBWT Tricks

Authors: Nathaniel K. Brown, Travis Gagie, and Massimiliano Rossi

Published in: LIPIcs, Volume 233, 20th International Symposium on Experimental Algorithms (SEA 2022)


Abstract
Until recently, most experts would probably have agreed we cannot backwards-step in constant time with a run-length compressed Burrows-Wheeler Transform (RLBWT), since doing so relies on rank queries on sparse bitvectors and those inherit lower bounds from predecessor queries. At ICALP '21, however, Nishimoto and Tabei described a new, simple and constant-time implementation. For a permutation π, it stores an O (r)-space table - where r is the number of positions i where either i = 0 or π (i + 1) ≠ π (i) + 1 - that enables the computation of successive values of π(i) by table look-ups and linear scans. Nishimoto and Tabei showed how to increase the number of rows in the table to bound the length of the linear scans such that the query time for computing π(i) is constant while maintaining O (r)-space. In this paper we refine Nishimoto and Tabei’s approach, including a time-space tradeoff, and experimentally evaluate different implementations demonstrating the practicality of part of their result. We show that even without adding rows to the table, in practice we almost always scan only a few entries during queries. We propose a decomposition scheme of the permutation π corresponding to the LF-mapping that allows an improved compression of the data structure, while limiting the query time. We tested our implementation on real-world genomic datasets and found that without compression of the table, backward-stepping is drastically faster than with sparse bitvector implementations but, unfortunately, also uses drastically more space. After compression, backward-stepping is competitive both in time and space with the best existing implementations.

Cite as

Nathaniel K. Brown, Travis Gagie, and Massimiliano Rossi. RLBWT Tricks. In 20th International Symposium on Experimental Algorithms (SEA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 233, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{brown_et_al:LIPIcs.SEA.2022.16,
  author =	{Brown, Nathaniel K. and Gagie, Travis and Rossi, Massimiliano},
  title =	{{RLBWT Tricks}},
  booktitle =	{20th International Symposium on Experimental Algorithms (SEA 2022)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-251-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{233},
  editor =	{Schulz, Christian and U\c{c}ar, Bora},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2022.16},
  URN =		{urn:nbn:de:0030-drops-165500},
  doi =		{10.4230/LIPIcs.SEA.2022.16},
  annote =	{Keywords: Compressed String Indexes, Repetitive Text Collections, Burrows-Wheeler Transform}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail