2 Search Results for "Pullman, Benjamin"


Document
Steinhaus Filtration and Stable Paths in the Mapper

Authors: Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul, and Amber Thrall

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)


Abstract
We define a new filtration called the Steinhaus filtration built from a single cover based on a generalized Steinhaus distance, a generalization of Jaccard distance. The homology persistence module of a Steinhaus filtration with infinitely many cover elements may not be q-tame, even when the covers are in a totally bounded space. While this may pose a challenge to derive stability results, we show that the Steinhaus filtration is stable when the cover is finite. We show that while the Čech and Steinhaus filtrations are not isomorphic in general, they are isomorphic for a finite point set in dimension one. Furthermore, the VR filtration completely determines the 1-skeleton of the Steinhaus filtration in arbitrary dimension. We then develop a language and theory for stable paths within the Steinhaus filtration. We demonstrate how the framework can be applied to several applications where a standard metric may not be defined but a cover is readily available. We introduce a new perspective for modeling recommendation system datasets. As an example, we look at a movies dataset and we find the stable paths identified in our framework represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper algorithm for model induction by building a filtration from a single Mapper complex, and provide explanations in the form of stable paths between subpopulations. For illustration, we build a Mapper complex from a supervised machine learning model trained on the FashionMNIST dataset. Stable paths in the Steinhaus filtration provide improved explanations of relationships between subpopulations of images.

Cite as

Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul, and Amber Thrall. Steinhaus Filtration and Stable Paths in the Mapper. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 10:1-10:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{arendt_et_al:LIPIcs.SoCG.2025.10,
  author =	{Arendt, Dustin L. and Broussard, Matthew and Krishnamoorthy, Bala and Saul, Nathaniel and Thrall, Amber},
  title =	{{Steinhaus Filtration and Stable Paths in the Mapper}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{10:1--10:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.10},
  URN =		{urn:nbn:de:0030-drops-231625},
  doi =		{10.4230/LIPIcs.SoCG.2025.10},
  annote =	{Keywords: Cover and nerve, Jaccard distance, persistence stability, Mapper, recommender systems, explainable machine learning}
}
Document
Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees

Authors: Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin’s well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified against the similarity threshold. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.

Cite as

Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou. Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICDT.2019.11,
  author =	{Li, Yuliang and Wang, Jianguo and Pullman, Benjamin and Bandeira, Nuno and Papakonstantinou, Yannis},
  title =	{{Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.11},
  URN =		{urn:nbn:de:0030-drops-103135},
  doi =		{10.4230/LIPIcs.ICDT.2019.11},
  annote =	{Keywords: Vector databases, Similarity search, Cosine, Threshold Algorithm}
}
  • Refine by Type
  • 2 Document/PDF
  • 1 Document/HTML

  • Refine by Publication Year
  • 1 2025
  • 1 2019

  • Refine by Author
  • 1 Arendt, Dustin L.
  • 1 Bandeira, Nuno
  • 1 Broussard, Matthew
  • 1 Krishnamoorthy, Bala
  • 1 Li, Yuliang
  • Show More...

  • Refine by Series/Journal
  • 2 LIPIcs

  • Refine by Classification
  • 1 Information systems → Nearest-neighbor search
  • 1 Theory of computation
  • 1 Theory of computation → Data structures and algorithms for data management
  • 1 Theory of computation → Database query processing and optimization (theory)

  • Refine by Keyword
  • 1 Cosine
  • 1 Cover and nerve
  • 1 Jaccard distance
  • 1 Mapper
  • 1 Similarity search
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail