Search Results

Documents authored by Hübner, Lukas


Artifact
Software
gfkit

Authors: Lukas Hübner and Alexandros Stamatakis


Abstract

Cite as

Lukas Hübner, Alexandros Stamatakis. gfkit (Software, Source Code). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-22505,
   title = {{gfkit}}, 
   author = {H\"{u}bner, Lukas and Stamatakis, Alexandros},
   note = {Software, swhId: \href{https://archive.softwareheritage.org/swh:1:dir:bffece502c3a579271ac3d91d66b051d089d02f2;origin=https://github.com/lukashuebner/gfkit;visit=swh:1:snp:52733351ce8859561ad78da4ab95296b29658f03;anchor=swh:1:rev:63871e016457fbefa4988bc17c21340f07f9a0f9}{\texttt{swh:1:dir:bffece502c3a579271ac3d91d66b051d089d02f2}} (visited on 2024-11-28)},
   url = {https://github.com/lukashuebner/gfkit},
   doi = {10.4230/artifacts.22505},
}
Document
Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests

Authors: Lukas Hübner and Alexandros Stamatakis

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
The field of population genetics attempts to advance our understanding of evolutionary processes. It has applications, for example, in medical research, wildlife conservation, and - in conjunction with recent advances in ancient DNA sequencing technology - studying human migration patterns over the past few thousand years. The basic toolbox of population genetics includes genealogical trees, which describe the shared evolutionary history among individuals of the same species. They are calculated on the basis of genetic variations. However, in recombining organisms, a single tree is insufficient to describe the evolutionary history of the whole genome. Instead, a collection of correlated trees can be used, where each describes the evolutionary history of a consecutive region of the genome. The current corresponding state of-the-art data structure, tree sequences, compresses these genealogical trees via edit operations when moving from one tree to the next along the genome instead of storing the full, often redundant, description for each tree. We propose a new data structure, genealogical forests, which compresses the set of genealogical trees into a DAG. In this DAG identical subtrees that are shared across the input trees are encoded only once, thereby allowing for straight-forward memoization of intermediate results. Additionally, we provide a C++ implementation of our proposed data structure, called gfkit, which is 2.1 to 11.2 (median 4.0) times faster than the state-of-the-art tool on empirical and simulated datasets at computing important population genetics statistics such as the Allele Frequency Spectrum, Patterson’s f, the Fixation Index, Tajima’s D, pairwise Lowest Common Ancestors, and others. On Lowest Common Ancestor queries with more than two samples as input, gfkit scales asymptotically better than the state-of-the-art, and is thus up to 990 times faster. In conclusion, our proposed data structure compresses genealogical trees by storing shared subtrees only once, thereby enabling straight-forward memoization of intermediate results, yielding a substantial runtime reduction and a potentially more intuitive data representation over the state-of-the-art. Our improvements will boost the development of novel analyses and models in the field of population genetics and increases scalability to ever-growing genomic datasets.

Cite as

Lukas Hübner and Alexandros Stamatakis. Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 5:1-5:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hubner_et_al:LIPIcs.WABI.2024.5,
  author =	{H\"{u}bner, Lukas and Stamatakis, Alexandros},
  title =	{{Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{5:1--5:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.5},
  URN =		{urn:nbn:de:0030-drops-206499},
  doi =		{10.4230/LIPIcs.WABI.2024.5},
  annote =	{Keywords: bioinformatics, population genetics, algorithms}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail