Search Results

Documents authored by Bandeira, Nuno


Document
Computational Proteomics (Dagstuhl Seminar 19351)

Authors: Nuno Bandeira and Lennart Martens

Published in: Dagstuhl Reports, Volume 9, Issue 8 (2020)


Abstract
This report documents the program and the outcomes of Dagstuhl Seminar 19351 ``Computational Proteomics''. The Seminar was originally built around four topics, identification and quantification of DIA data; algorithms for the analysis of protein cross-linking data; creating an online view on complete, browsable proteomes from public data; and detecting interesting biology from proteomics findings. These four topics were led to four correpsonding breakout sessions, which in turn led to five offshoot breakout sessions. The abstracts presented here first describe the four topic introduction talks, as well as a fifth, cross-cutting topic talk on bringin proteomics data into clinical trials. These talk abstracts are followed by one abstract each per breakout session, documenting that breakout's discussion and outcomes. An Executive Summary is also provided, which details the overall seminar structure, the relationship between the breakout sessions and topics, and the most important conclusions for the four topic-derived breakouts.

Cite as

Nuno Bandeira and Lennart Martens. Computational Proteomics (Dagstuhl Seminar 19351). In Dagstuhl Reports, Volume 9, Issue 8, pp. 70-83, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Article{bandeira_et_al:DagRep.9.8.70,
  author =	{Bandeira, Nuno and Martens, Lennart},
  title =	{{Computational Proteomics (Dagstuhl Seminar 19351)}},
  pages =	{70--83},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2019},
  volume =	{9},
  number =	{8},
  editor =	{Bandeira, Nuno and Martens, Lennart},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.9.8.70},
  URN =		{urn:nbn:de:0030-drops-117724},
  doi =		{10.4230/DagRep.9.8.70},
  annote =	{Keywords: computational biology, computational mass spectrometry, proteomics}
}
Document
Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees

Authors: Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin’s well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified against the similarity threshold. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.

Cite as

Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou. Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICDT.2019.11,
  author =	{Li, Yuliang and Wang, Jianguo and Pullman, Benjamin and Bandeira, Nuno and Papakonstantinou, Yannis},
  title =	{{Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.11},
  URN =		{urn:nbn:de:0030-drops-103135},
  doi =		{10.4230/LIPIcs.ICDT.2019.11},
  annote =	{Keywords: Vector databases, Similarity search, Cosine, Threshold Algorithm}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail