DROPS

Document

DOI: 10.4230/LIPIcs.FSTTCS.2025.15

Extending EFX Allocations to Further Multi-Graph Classes

Authors: Umang Bhaskar and Yeshwant Pandit

Published in: LIPIcs, Volume 360, 45th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2025)

Abstract

The existence of EFX allocations is one of the most significant open questions in fair division. Recent work by Christodoulou, Fiat, Koutsoupias, and Sgouritsa ("Fair allocation in graphs," EC 2023) establishes the existence of EFX allocations for graphical valuations, when agents are vertices in a graph, items are edges, and each item has zero value for all agents other than those at its endpoints. Thus, in this setting, each good has non-zero value for at most two agents, and there is at most one good valued by any pair of agents. This marks one of the few cases when an exact and complete EFX allocation is known to exist for more than three agents. In this work, we partially extend these results to multi-graphs, when each pair of vertices can have more than one edge between them. The existence of EFX allocations in multi-graphs is a natural open question given their existence in simple graphs. We show that EFX allocations exist, and can be computed in polynomial time, for agents with cancelable valuations in the following cases: (i) bipartite multi-graphs, (ii) multi-trees with monotone valuations, and (iii) multi-graphs with girth (2t-1), where t is the chromatic number of the multi-graph. The existence of EFX in cycle multi-graphs follows from (i), (iii), and the known existence of EFX for three agents.

Cite as

Umang Bhaskar and Yeshwant Pandit. Extending EFX Allocations to Further Multi-Graph Classes. In 45th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 360, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bhaskar_et_al:LIPIcs.FSTTCS.2025.15,
  author =	{Bhaskar, Umang and Pandit, Yeshwant},
  title =	{{Extending EFX Allocations to Further Multi-Graph Classes}},
  booktitle =	{45th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2025)},
  pages =	{15:1--15:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-406-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{360},
  editor =	{Aiswarya, C. and Mehta, Ruta and Roy, Subhajit},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2025.15},
  URN =		{urn:nbn:de:0030-drops-250958},
  doi =		{10.4230/LIPIcs.FSTTCS.2025.15},
  annote =	{Keywords: Fair Division, EFX, Multi-graphs}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.11

Mutational Signature Refitting on Sparse Pan-Cancer Data

Authors: Gal Gilad, Teresa M. Przytycka, and Roded Sharan

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Mutational processes shape cancer genomes, leaving characteristic marks that are termed signatures. The level of activity of each such process, or its signature exposure, provides important information on the disease, improving patient stratification and the prediction of drug response. Thus, there is growing interest in developing refitting methods that decipher those exposures. Previous work in this domain was unsupervised in nature, employing algebraic decomposition and probabilistic inference methods. Here we provide a supervised approach to the problem of signature refitting and show its superiority over current methods. Our method, SuRe, leverages a neural network model to capture correlations between signature exposures in real data. We show that SuRe outperforms previous methods on sparse mutation data from tumor type specific data sets, as well as pan-cancer data sets, with an increasing advantage as the data become sparser. We further demonstrate its utility in clinical settings.

Cite as

Gal Gilad, Teresa M. Przytycka, and Roded Sharan. Mutational Signature Refitting on Sparse Pan-Cancer Data. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gilad_et_al:LIPIcs.WABI.2025.11,
  author =	{Gilad, Gal and Przytycka, Teresa M. and Sharan, Roded},
  title =	{{Mutational Signature Refitting on Sparse Pan-Cancer Data}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{11:1--11:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.11},
  URN =		{urn:nbn:de:0030-drops-239374},
  doi =		{10.4230/LIPIcs.WABI.2025.11},
  annote =	{Keywords: mutational signatures, signature refitting, cancer genomics, genomic data analysis, somatic mutations}
}

Document

Position

DOI: 10.4230/TGDK.1.1.5

Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities

Authors: Jiaoyan Chen, Hang Dong, Janna Hastings, Ernesto Jiménez-Ruiz, Vanessa López, Pierre Monnin, Catia Pesquita, Petr Škoda, and Valentina Tamma

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

The term life sciences refers to the disciplines that study living organisms and life processes, and include chemistry, biology, medicine, and a range of other related disciplines. Research efforts in life sciences are heavily data-driven, as they produce and consume vast amounts of scientific data, much of which is intrinsically relational and graph-structured. The volume of data and the complexity of scientific concepts and relations referred to therein promote the application of advanced knowledge-driven technologies for managing and interpreting data, with the ultimate aim to advance scientific discovery. In this survey and position paper, we discuss recent developments and advances in the use of graph-based technologies in life sciences and set out a vision for how these technologies will impact these fields into the future. We focus on three broad topics: the construction and management of Knowledge Graphs (KGs), the use of KGs and associated technologies in the discovery of new knowledge, and the use of KGs in artificial intelligence applications to support explanations (explainable AI). We select a few exemplary use cases for each topic, discuss the challenges and open research questions within these topics, and conclude with a perspective and outlook that summarizes the overarching challenges and their potential solutions as a guide for future research.

Cite as

Jiaoyan Chen, Hang Dong, Janna Hastings, Ernesto Jiménez-Ruiz, Vanessa López, Pierre Monnin, Catia Pesquita, Petr Škoda, and Valentina Tamma. Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 5:1-5:33, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{chen_et_al:TGDK.1.1.5,
  author =	{Chen, Jiaoyan and Dong, Hang and Hastings, Janna and Jim\'{e}nez-Ruiz, Ernesto and L\'{o}pez, Vanessa and Monnin, Pierre and Pesquita, Catia and \v{S}koda, Petr and Tamma, Valentina},
  title =	{{Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{5:1--5:33},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.5},
  URN =		{urn:nbn:de:0030-drops-194791},
  doi =		{10.4230/TGDK.1.1.5},
  annote =	{Keywords: Knowledge graphs, Life science, Knowledge discovery, Explainable AI}
}

Document

Abstract

DOI: 10.4230/LIPIcs.WABI.2023.3

BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract)

Authors: Eleanor Wedell, Chengze Shen, and Tandy Warnow

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Phylogenetic placement is the problem of placing one or more query sequences into a phylogenetic "backbone" tree, which may be a maximum likelihood tree on a multiple sequence alignment for a single gene, a taxonomy with leaves labeled by sequences for a single gene [Nidhi Shah et al., 2021], or a species tree [Jiang et al., 2023]. When the backbone tree is a tree estimated on a single gene, the most accurate techniques for phylogenetic placement are likelihood-based, and can be computationally intensive when the backbone trees are large [Chu and Warnow, 2023]. Phylogenetic placement into gene trees occurs when updating existing gene trees with newly observed sequences, but can also be applied in the "bulk" context, where many sequences are placed at the same time into the backbone tree. For example, phylogenetic placement can be used to taxonomically characterize shotgun sequencing reads generated for an environmental sample in metagenomic analysis [Nidhi Shah et al., 2021; Barbera et al., 2019]. The two most well known maximum likelihood phylogenetic placement methods are pplacer [Chu and Warnow, 2023] and EPA-ng [Barbera et al., 2019]. Of these two, EPA-ng is optimized for scaling the number of query sequences and is capable of placing millions of sequences into phylogenetic trees of up to a few thousand sequences [Barbera et al., 2019], and achieves sublinear runtime in the number of query sequences (see Figure 2 from [Balaban et al., 2022]). Previously we introduced the SCAMPP framework [Wedell et al., 2022] to enable both pplacer and EPA-ng to perform phylogenetic placement into ultra-large backbone trees, and we demonstrated its utility for placing into backbone trees with up to 200,000 sequences. By using maximum likelihood methods pplacer or EPA-ng within the SCAMPP framework, the resulting placements are more accurate than with APPLES-2 [Balaban et al., 2022], with the most notable accuracy improvement for fragmentary sequences, and are computationally similar for single query sequence placement [Wedell et al., 2022]. However, SCAMPP was designed to incrementally update a large tree, one query sequence at a time, and was not optimized for the other uses of phylogenetic placement, where batch placement of many sequencing reads is required. Here we introduce BATCH-SCAMPP, a technique that improves scalability in both dimensions: the number of query sequences being placed into the backbone tree and the size of the backbone tree. Furthermore, BATCH-SCAMPP is specifically designed to improve EPA-ng’s scalability to large backbone trees. Although BATCH-SCAMPP is based on SCAMPP, it uses a substantially modified design in order to be able to take advantage of EPA-ng’s ability to place many query sequences efficiently. The BATCH-SCAMPP method operates by allowing the input set of query sequences to suggest and then vote on placement subtrees, thus enabling many query sequences to select the same placement subtree. We pair BATCH-SCAMPP with EPA-ng to explore the capability of this approach for scaling to many query sequences. We show that this combination of techniques (which we call BSCAMPP+EPA-ng, or BSCAMPP(e)) not only provides high accuracy and scalability to large backbone trees, matching that of SCAMPP used with EPA-ng (i.e., SCAMPP(e)), but also achieves the goal of scaling sublinearly in the number of query sequences. Moreover, it is much more scalable than EPA-ng and faster than SCAMPP+EPA-ng: when placing 10,000 sequences into a backbone tree of 50,000 leaves, EPA-ng is unable to run due to memory issues, SCAMPP+EPA-ng requires 1421 minutes, and BSCAMPP(e) places all sequences in 7 minutes (all given the same computational resources. Figure 1 gives an example of this performance advantage on the nt78 [Chu and Warnow, 2023] simulated dataset.

Cite as

Eleanor Wedell, Chengze Shen, and Tandy Warnow. BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract). In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{wedell_et_al:LIPIcs.WABI.2023.3,
  author =	{Wedell, Eleanor and Shen, Chengze and Warnow, Tandy},
  title =	{{BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.3},
  URN =		{urn:nbn:de:0030-drops-186296},
  doi =		{10.4230/LIPIcs.WABI.2023.3},
  annote =	{Keywords: Phylogenetic Placement, EPA-ng, Phylogenetics}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.23

Outlier Detection in BLAST Hits

Authors: Nidhi Shah, Stephen F. Altschul, and Mihai Pop

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences. The similarity-search based methods typically run faster than phylogenetic methods and work well when the organisms in the sample are well represented in the database. On the other hand, phylogenetic methods have the capability to identify new organisms in a sample but are computationally quite expensive. We propose a two-step approach for metagenomic taxon identification; i.e., use a rapid method that accurately classifies sequences using a reference database (this is a filtering step) and then use a more complex phylogenetic method for the sequences that were unclassified in the previous step. In this work, we explore whether and when using top BLAST hit(s) yields a correct taxonomic label. We develop a method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms. We used modified BILD (Bayesian Integral Log Odds) scores, a multiple-alignment scoring function, to define the outliers within a subset of top BLAST hits and assign taxonomic labels. We compared the accuracy of our method to the RDP classifier and show that our method yields fewer misclassifications while properly classifying organisms that are not present in the database. Finally, we evaluated the use of our method as a pre-processing step before more expensive phylogenetic analyses (in our case TIPP) in the context of real 16S rRNA datasets. Our experiments demonstrate the potential of our method to be a filtering step before using phylogenetic methods.

Cite as

Nidhi Shah, Stephen F. Altschul, and Mihai Pop. Outlier Detection in BLAST Hits. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 23:1-23:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{shah_et_al:LIPIcs.WABI.2017.23,
  author =	{Shah, Nidhi and Altschul, Stephen F. and Pop, Mihai},
  title =	{{Outlier Detection in BLAST Hits}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{23:1--23:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.23},
  URN =		{urn:nbn:de:0030-drops-76512},
  doi =		{10.4230/LIPIcs.WABI.2017.23},
  annote =	{Keywords: Taxonomy classification, Metagenomics, Sequence alignment, Outlier detection}
}

5 Search Results for "Shah, Nidhi"

Extending EFX Allocations to Further Multi-Graph Classes

Abstract

Cite as

Mutational Signature Refitting on Sparse Pan-Cancer Data

Abstract

Cite as

Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities

Abstract

Cite as

BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract)

Abstract

Cite as

Outlier Detection in BLAST Hits

Abstract

Cite as

Thanks for your feedback!

Could not send message