Search Results

Documents authored by Bercea, Ioana O.


Document
Daisy Bloom Filters

Authors: Ioana O. Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh

Published in: LIPIcs, Volume 294, 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)


Abstract
A filter is a widely used data structure for storing an approximation of a given set S of elements from some universe 𝒰 (a countable set). It represents a superset S' ⊇ S that is "close to S" in the sense that for x ∉ S, the probability that x ∈ S' is bounded by some ε > 0. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store S exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in S with probability close to 1. Then it would make sense to always include them in S', saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most ε with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the Daisy Bloom filter, that executes operations faster and uses significantly less space than the standard Bloom filter.

Cite as

Ioana O. Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh. Daisy Bloom Filters. In 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 294, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bercea_et_al:LIPIcs.SWAT.2024.9,
  author =	{Bercea, Ioana O. and Houen, Jakob B{\ae}k Tejs and Pagh, Rasmus},
  title =	{{Daisy Bloom Filters}},
  booktitle =	{19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-318-8},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{294},
  editor =	{Bodlaender, Hans L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2024.9},
  URN =		{urn:nbn:de:0030-drops-200491},
  doi =		{10.4230/LIPIcs.SWAT.2024.9},
  annote =	{Keywords: Bloom filters, input distribution, learned data structures}
}
Document
A Dynamic Space-Efficient Filter with Constant Time Operations

Authors: Ioana O. Bercea and Guy Even

Published in: LIPIcs, Volume 162, 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020)


Abstract
A dynamic dictionary is a data structure that maintains sets of cardinality at most n from a given universe and supports insertions, deletions, and membership queries. A filter approximates membership queries with a one-sided error that occurs with probability at most ε. The goal is to obtain dynamic filters that are space-efficient (the space is 1+o(1) times the information-theoretic lower bound) and support all operations in constant time with high probability. One approach to designing filters is to reduce to the retrieval problem. When the size of the universe is polynomial in n, this approach yields a space-efficient dynamic filter as long as the error parameter ε satisfies log(1/ε) = ω(log log n). For the case that log(1/ε) = O(log log n), we present the first space-efficient dynamic filter with constant time operations in the worst case (whp). In contrast, the space-efficient dynamic filter of Pagh et al. [Anna Pagh et al., 2005] supports insertions and deletions in amortized expected constant time. Our approach employs the classic reduction of Carter et al. [Carter et al., 1978] on a new type of dictionary construction that supports random multisets.

Cite as

Ioana O. Bercea and Guy Even. A Dynamic Space-Efficient Filter with Constant Time Operations. In 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 162, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{bercea_et_al:LIPIcs.SWAT.2020.11,
  author =	{Bercea, Ioana O. and Even, Guy},
  title =	{{A Dynamic Space-Efficient Filter with Constant Time Operations}},
  booktitle =	{17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-150-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{162},
  editor =	{Albers, Susanne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2020.11},
  URN =		{urn:nbn:de:0030-drops-122582},
  doi =		{10.4230/LIPIcs.SWAT.2020.11},
  annote =	{Keywords: Data Structures}
}
Document
APPROX
On the Cost of Essentially Fair Clusterings

Authors: Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)


Abstract
Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters - especially if the data is already biased. At NIPS 2017, Chierichetti et al. [Flavio Chierichetti et al., 2017] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [Clemens Rösner and Melanie Schmidt, 2018]. We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.

Cite as

Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt. On the Cost of Essentially Fair Clusterings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 18:1-18:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bercea_et_al:LIPIcs.APPROX-RANDOM.2019.18,
  author =	{Bercea, Ioana O. and Gro{\ss}, Martin and Khuller, Samir and Kumar, Aounon and R\"{o}sner, Clemens and Schmidt, Daniel R. and Schmidt, Melanie},
  title =	{{On the Cost of Essentially Fair Clusterings}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{18:1--18:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.18},
  URN =		{urn:nbn:de:0030-drops-112337},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.18},
  annote =	{Keywords: approximation, clustering, fairness, LP rounding}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail