DROPS

Document

DOI: 10.4230/LIPIcs.ITP.2025.7

A Verified Cost Model for Call-By-Push-Value

Authors: Zhuo Zoey Chen, Johannes Åman Pohjola, and Christine Rizkallah

Published in: LIPIcs, Volume 352, 16th International Conference on Interactive Theorem Proving (ITP 2025)

Abstract

The call-by-push-value λ-calculus allows for syntactically specifying the order of evaluation as part of the term language. Hence, it serves as a unifying language for embedding various evaluation strategies including call-by-value and call-by-name. Given the impact of call-by-push-value, it is remarkable that its adequacy as a model for computational complexity theory has not yet been studied. In this paper, we show that the call-by-push-value λ-calculus is reasonable for both time and space complexity. A reasonable cost model can encode other reasonable cost models with polynomial overhead in time and constant factor overhead in space. We achieve this by encoding call-by-push-value λ-calculus into Turing machines, following a simulation strategy by Forster et al.; for the converse direction, we prove that Levy’s encoding of the call-by-value λ-calculus has reasonable complexity bounds. The main results have been formalised in the HOL4 theorem prover.

Cite as

Zhuo Zoey Chen, Johannes Åman Pohjola, and Christine Rizkallah. A Verified Cost Model for Call-By-Push-Value. In 16th International Conference on Interactive Theorem Proving (ITP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 352, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITP.2025.7,
  author =	{Chen, Zhuo Zoey and \r{A}man Pohjola, Johannes and Rizkallah, Christine},
  title =	{{A Verified Cost Model for Call-By-Push-Value}},
  booktitle =	{16th International Conference on Interactive Theorem Proving (ITP 2025)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-396-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{352},
  editor =	{Forster, Yannick and Keller, Chantal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITP.2025.7},
  URN =		{urn:nbn:de:0030-drops-246067},
  doi =		{10.4230/LIPIcs.ITP.2025.7},
  annote =	{Keywords: lambda calculus, formalizations of computational models, computability theory, HOL, call-by-push-value reduction, time and space complexity, abstract machines}
}

Document

DOI: 10.4230/LIPIcs.WADS.2025.17

Dynamic Streaming Algorithms for Geometric Independent Set

Authors: Timothy M. Chan and Yuancheng Yu

Published in: LIPIcs, Volume 349, 19th International Symposium on Algorithms and Data Structures (WADS 2025)

Abstract

We present the first space-efficient, fully dynamic streaming algorithm for computing a constant-factor approximation of the maximum independent set size of n axis-aligned rectangles in two dimensions. For an arbitrarily small constant δ > 0, our algorithm obtains an O((1/δ)²) approximation and requires O(U^δ polylog n) space and update time with high probability, assuming that coordinates are integers bounded by U. We also obtain a similar result for fat objects in any constant dimension. This extends recent non-streaming algorithms by Bhore and Chan from SODA'25, and also greatly extends previous streaming results, which were limited to special types of geometric objects such as one-dimensional intervals and unit disks.

Cite as

Timothy M. Chan and Yuancheng Yu. Dynamic Streaming Algorithms for Geometric Independent Set. In 19th International Symposium on Algorithms and Data Structures (WADS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 349, pp. 17:1-17:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chan_et_al:LIPIcs.WADS.2025.17,
  author =	{Chan, Timothy M. and Yu, Yuancheng},
  title =	{{Dynamic Streaming Algorithms for Geometric Independent Set}},
  booktitle =	{19th International Symposium on Algorithms and Data Structures (WADS 2025)},
  pages =	{17:1--17:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-398-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{349},
  editor =	{Morin, Pat and Oh, Eunjin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WADS.2025.17},
  URN =		{urn:nbn:de:0030-drops-242481},
  doi =		{10.4230/LIPIcs.WADS.2025.17},
  annote =	{Keywords: Geometric Independent Set, Dynamic Streaming Algorithms}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.38

The Complexity of Separability for Semilinear Sets and Parikh Automata

Authors: Elias Rojas Collins, Chris Köcher, and Georg Zetzsche

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

In a separability problem, we are given two sets K and L from a class 𝒞, and we want to decide whether there exists a set S from a class 𝒮 such that K ⊆ S and S ∩ L = ∅. In this case, we speak of separability of sets in 𝒞 by sets in 𝒮. We study two types of separability problems. First, we consider separability of semilinear sets (i.e. subsets of ℕ^d for some d) by sets definable by quantifier-free monadic Presburger formulas (or equivalently, the recognizable subsets of ℕ^d). Here, a formula is monadic if each atom uses at most one variable. Second, we consider separability of languages of Parikh automata by regular languages. A Parikh automaton is a machine with access to counters that can only be incremented, and have to meet a semilinear constraint at the end of the run. Both of these separability problems are known to be decidable with elementary complexity. Our main results are that both problems are coNP-complete. In the case of semilinear sets, coNP-completeness holds regardless of whether the input sets are specified by existential Presburger formulas, quantifier-free formulas, or semilinear representations. Our results imply that recognizable separability of rational subsets of Σ* × ℕ^d (shown decidable by Choffrut and Grigorieff) is coNP-complete as well. Another application is that regularity of deterministic Parikh automata (where the target set is specified using a quantifier-free Presburger formula) is coNP-complete as well.

Cite as

Elias Rojas Collins, Chris Köcher, and Georg Zetzsche. The Complexity of Separability for Semilinear Sets and Parikh Automata. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 38:1-38:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{collins_et_al:LIPIcs.MFCS.2025.38,
  author =	{Collins, Elias Rojas and K\"{o}cher, Chris and Zetzsche, Georg},
  title =	{{The Complexity of Separability for Semilinear Sets and Parikh Automata}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{38:1--38:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.38},
  URN =		{urn:nbn:de:0030-drops-241457},
  doi =		{10.4230/LIPIcs.MFCS.2025.38},
  annote =	{Keywords: Vector Addition System, Separability, Regular Language}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.75

Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model

Authors: Shiyuan Feng, William Swartworth, and David Woodruff

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We consider the heavy-hitters and F_p moment estimation problems in the sliding window model. For F_p moment estimation with 1 < p ≤ 2, we show that it is possible to give a (1± ε) multiplicative approximation to the F_p moment with 2/3 probability on any given window of size n using Õ(1/(ε^p)log² n + 1/(ε²)log n) bits of space. We complement this result with a lower bound showing that our algorithm gives tight bounds up to factors of log log n and log1/(ε). As a consequence of our F₂ moment estimation algorithm, we show that the heavy-hitters problem can be solved on an arbitrary window using O(1/(ε²)log² n) space which is tight.

Cite as

Shiyuan Feng, William Swartworth, and David Woodruff. Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 75:1-75:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ICALP.2025.75,
  author =	{Feng, Shiyuan and Swartworth, William and Woodruff, David},
  title =	{{Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{75:1--75:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.75},
  URN =		{urn:nbn:de:0030-drops-234524},
  doi =		{10.4230/LIPIcs.ICALP.2025.75},
  annote =	{Keywords: sketching, streaming, heavy hitters, sliding window, moment estimation}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.101

Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case

Authors: Shaofeng H.-C. Jiang and Jianing Lou

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We devise ε-coresets for robust (k,z)-Clustering with m outliers through black-box reductions to vanilla clustering. Given an ε-coreset construction for vanilla clustering with size N, we construct coresets of size N⋅ polylog(kmε^{-1}) + O_z(min{kmε^{-1}, m ε^{-2z}log^z(kmε^{-1})}) for various metric spaces, where O_z hides 2^{O(zlog z)} factors. This increases the size of the vanilla coreset by a small multiplicative factor of polylog(kmε^{-1}), and the additive term is up to a (ε^{-1}log (km))^{O(z)} factor to the size of the optimal robust coreset. Plugging in recent vanilla coreset results of [Cohen-Addad, Saulpic and Schwiegelshohn, STOC'21; Cohen-Addad, Draganov, Russo, Saulpic and Schwiegelshohn, SODA'25], we obtain the first coresets for (k,z)-Clustering with m outliers with size near-linear in k while previous results have size at least Ω(k²) [Huang, Jiang, Lou and Wu, ICLR'23; Huang, Li, Lu and Wu, SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only O_z(km ε^{-1}) non-dense points to obtain the O_z(km ε^{-1}) additive bound. Another sufficient condition requires the vanilla coreset to possess an extra size-preserving property. To utilize this condition, we further give a black-box reduction that turns a vanilla coreset to the one that satisfies the said size-preserving property, and this leads to the alternative O_z(mε^{-2z}log^{z}(kmε^{-1})) additive size bound. We also give low-space implementations of our reductions in the dynamic streaming setting. Combined with known streaming constructions for vanilla coresets [Braverman, Frahling, Lang, Sohler and Yang, ICML'17; Hu, Song, Yang and Zhong, arXiv'1802.00459], we obtain the first dynamic streaming algorithms for coresets for k-Median (and k-Means) with m outliers, using space Õ(k + m) ⋅ poly(dε^{-1}log Δ) for inputs on a discrete grid [Δ]^d.

Cite as

Shaofeng H.-C. Jiang and Jianing Lou. Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 101:1-101:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2025.101,
  author =	{Jiang, Shaofeng H.-C. and Lou, Jianing},
  title =	{{Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{101:1--101:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.101},
  URN =		{urn:nbn:de:0030-drops-234781},
  doi =		{10.4230/LIPIcs.ICALP.2025.101},
  annote =	{Keywords: Coresets, clustering, outliers, streaming algorithms}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.3

Pydrofoil: Accelerating Sail-Based Instruction Set Simulators

Authors: Carl Friedrich Bolz-Tereick, Luke Panayi, Ferdia McKeogh, Tom Spink, and Martin Berger

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

We present Pydrofoil, a multi-stage compiler that generates instruction set simulators (ISSs) from processor instruction set architectures (ISAs) expressed in the high-level, verification-oriented ISA specification language Sail. Pydrofoil achieves a > 230× speedup over the C-based ISS generated by Sail on our benchmarks, thanks to the following insights. (i) An ISS is effectively an interpreter loop, and tracing just-in-time (JIT) compilers have proven effective at accelerating those, albeit mostly for dynamically typed languages. (ii) ISS workloads are highly atypical, dominated by intensive bit manipulation operations. Conventional compiler optimisations for general-purpose programming languages have limited impact for speeding up such workloads. We develop suitable domain-specific optimisations. (iii) Neither tracing JIT compilers, nor ahead-of-time (AOT) compilation alone, even with domain-specific optimisations, suffice for the generation of performant ISSs. Pydrofoil therefore implements a hybrid approach, pairing an AOT compiler with a tracing JIT built on the meta-tracing PyPy framework. AOT and JIT use domain-specific optimisations. Our benchmarks demonstrate that combining AOT and JIT compilers provides significantly greater performance gains than using either compiler alone.

Cite as

Carl Friedrich Bolz-Tereick, Luke Panayi, Ferdia McKeogh, Tom Spink, and Martin Berger. Pydrofoil: Accelerating Sail-Based Instruction Set Simulators. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 3:1-3:31, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bolztereick_et_al:LIPIcs.ECOOP.2025.3,
  author =	{Bolz-Tereick, Carl Friedrich and Panayi, Luke and McKeogh, Ferdia and Spink, Tom and Berger, Martin},
  title =	{{Pydrofoil: Accelerating Sail-Based Instruction Set Simulators}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{3:1--3:31},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.3},
  URN =		{urn:nbn:de:0030-drops-232962},
  doi =		{10.4230/LIPIcs.ECOOP.2025.3},
  annote =	{Keywords: Instruction set architecture, processor, domain-specific language, just-in-time compilation, meta-tracing}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.34

Taming and Dissecting Recursions Through Interprocedural Weak Topological Ordering

Authors: Jiawei Yang, Xiao Cheng, Bor-Yuh Evan Chang, Xiapu Luo, and Yulei Sui

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

Abstract interpretation provides a foundational framework for approximating program semantics by interpreting code through abstract domains using semantic functions over ordered sets along a program’s control flow graph (CFG). To facilitate fixpoint computation in abstract interpretation, weak topological ordering (WTO) is an effective strategy for handling loops, as it identifies strategic control points in the CFG where widening and narrowing operations should be applied. However, existing abstract interpreters still face challenges when extending WTO computation in the presence of recursive programs. Computing a precise whole-program WTO requires full context-sensitive analysis which is not scalable for large programs, while context-insensitive analysis introduces spurious cycles that compromise precision. Current approaches either ignore recursion (resulting in unsoundness) or rely on conservative approximations, sacrificing precision by adopting the greatest elements of abstract domains and applying widening at function boundaries without subsequent narrowing refinements. These can lead to undesired results for downstream tasks, such as bug detection. To address the above limitations, we present RecTopo, a new technique to boost the efficiency of precise abstract interpretation in the presence of recursive programs through interprocedural weak topological ordering (IWTO). Rather than pursuing an expensive whole-program WTO analysis, RecTopo employs an on-demand approach that strategically decomposes programs at recursion boundaries and constructs targeted IWTOs for each recursive component. RecTopo dissects and analyzes (nested) recursions through interleaved widening and narrowing operations. This approach enables precise control over interpretation ordering within recursive structures while eliminating spurious recursions through systematic correlation of control flow and call graphs. We implemented RecTopo and evaluated its effectiveness using an assertion-based checking client focused on buffer overflow detection, comparing it against three popular open-source abstract interpreters (IKOS, Clam, CSA). The experiments on 8312 programs from the NIST dataset demonstrate that, on average, RecTopo is 31.99% more precise and achieves a 17.49% higher recall rate compared to three other tools. Moreover, RecTopo exhibits an average precision improvement of 46.51% and a higher recall rate of 32.98% compared to our baselines across ten large open-source projects. Further ablation studies reveal that IWTO reduces spurious widening operations compared to whole-program WTO, resulting in a 12.83% reduction in analysis time.

Cite as

Jiawei Yang, Xiao Cheng, Bor-Yuh Evan Chang, Xiapu Luo, and Yulei Sui. Taming and Dissecting Recursions Through Interprocedural Weak Topological Ordering. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 34:1-34:31, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{yang_et_al:LIPIcs.ECOOP.2025.34,
  author =	{Yang, Jiawei and Cheng, Xiao and Chang, Bor-Yuh Evan and Luo, Xiapu and Sui, Yulei},
  title =	{{Taming and Dissecting Recursions Through Interprocedural Weak Topological Ordering}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{34:1--34:31},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.34},
  URN =		{urn:nbn:de:0030-drops-233265},
  doi =		{10.4230/LIPIcs.ECOOP.2025.34},
  annote =	{Keywords: Abstract interpretation, recursion, weak topological ordering}
}

Document

Vision

DOI: 10.4230/TGDK.1.1.6

Towards Ordinal Data Science

Authors: Gerd Stumme, Dominik Dürrschnabel, and Tom Hanika

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Order is one of the main instruments to measure the relationship between objects in (empirical) data. However, compared to methods that use numerical properties of objects, the amount of ordinal methods developed is rather small. One reason for this is the limited availability of computational resources in the last century that would have been required for ordinal computations. Another reason - particularly important for this line of research - is that order-based methods are often seen as too mathematically rigorous for applying them to real-world data. In this paper, we will therefore discuss different means for measuring and ‘calculating’ with ordinal structures - a specific class of directed graphs - and show how to infer knowledge from them. Our aim is to establish Ordinal Data Science as a fundamentally new research agenda. Besides cross-fertilization with other cornerstone machine learning and knowledge representation methods, a broad range of disciplines will benefit from this endeavor, including, psychology, sociology, economics, web science, knowledge engineering, scientometrics.

Cite as

Gerd Stumme, Dominik Dürrschnabel, and Tom Hanika. Towards Ordinal Data Science. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 6:1-6:39, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{stumme_et_al:TGDK.1.1.6,
  author =	{Stumme, Gerd and D\"{u}rrschnabel, Dominik and Hanika, Tom},
  title =	{{Towards Ordinal Data Science}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{6:1--6:39},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.6},
  URN =		{urn:nbn:de:0030-drops-194801},
  doi =		{10.4230/TGDK.1.1.6},
  annote =	{Keywords: Order relation, data science, relational theory of measurement, metric learning, general algebra, lattices, factorization, approximations and heuristics, factor analysis, visualization, browsing, explainability}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.27

Improved Algorithms for Time Decay Streams

Authors: Vladimir Braverman, Harry Lang, Enayat Ullah, and Samson Zhou

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Abstract

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well.

Cite as

Vladimir Braverman, Harry Lang, Enayat Ullah, and Samson Zhou. Improved Algorithms for Time Decay Streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.APPROX-RANDOM.2019.27,
  author =	{Braverman, Vladimir and Lang, Harry and Ullah, Enayat and Zhou, Samson},
  title =	{{Improved Algorithms for Time Decay Streams}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{27:1--27:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.27},
  URN =		{urn:nbn:de:0030-drops-112429},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.27},
  annote =	{Keywords: Streaming algorithms, approximation algorithms, facility location and clustering}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.62

Streaming Coreset Constructions for M-Estimators

Authors: Vladimir Braverman, Dan Feldman, Harry Lang, and Daniela Rus

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Abstract

We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data. M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points.

Cite as

Vladimir Braverman, Dan Feldman, Harry Lang, and Daniela Rus. Streaming Coreset Constructions for M-Estimators. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 62:1-62:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.APPROX-RANDOM.2019.62,
  author =	{Braverman, Vladimir and Feldman, Dan and Lang, Harry and Rus, Daniela},
  title =	{{Streaming Coreset Constructions for M-Estimators}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{62:1--62:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.62},
  URN =		{urn:nbn:de:0030-drops-112778},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.62},
  annote =	{Keywords: Streaming, Clustering, Coresets}
}

Document

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2018.7

Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows

Authors: Vladimir Braverman, Elena Grigorescu, Harry Lang, David P. Woodruff, and Samson Zhou

Published in: LIPIcs, Volume 116, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)

Abstract

We study the distinct elements and l_p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram{} along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and l_p-heavy hitters that are nearly optimal in both n and epsilon. Applying our new composable histogram framework, we provide an algorithm that outputs a (1+epsilon)-approximation to the number of distinct elements in the sliding window model and uses O{1/(epsilon^2) log n log (1/epsilon)log log n+ (1/epsilon) log^2 n} bits of space. For l_p-heavy hitters, we provide an algorithm using space O{(1/epsilon^p) log^2 n (log^2 log n+log 1/epsilon)} for 0<p <=2, improving upon the best-known algorithm for l_2-heavy hitters (Braverman et al., COCOON 2014), which has space complexity O{1/epsilon^4 log^3 n}. We also show complementing nearly optimal lower bounds of Omega ((1/epsilon) log^2 n+(1/epsilon^2) log n) for distinct elements and Omega ((1/epsilon^p) log^2 n) for l_p-heavy hitters, both tight up to O{log log n} and O{log 1/epsilon} factors.

Cite as

Vladimir Braverman, Elena Grigorescu, Harry Lang, David P. Woodruff, and Samson Zhou. Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 7:1-7:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.APPROX-RANDOM.2018.7,
  author =	{Braverman, Vladimir and Grigorescu, Elena and Lang, Harry and Woodruff, David P. and Zhou, Samson},
  title =	{{Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)},
  pages =	{7:1--7:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-085-9},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{116},
  editor =	{Blais, Eric and Jansen, Klaus and D. P. Rolim, Jos\'{e} and Steurer, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2018.7},
  URN =		{urn:nbn:de:0030-drops-94118},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2018.7},
  annote =	{Keywords: Streaming algorithms, sliding windows, heavy hitters, distinct elements}
}

@InProceedings{braverman_et_al:LIPIcs.APPROX-RANDOM.2018.7,
  author =	{Braverman, Vladimir and Grigorescu, Elena and Lang, Harry and Woodruff, David P. and Zhou, Samson},
  title =	{{Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)},
  pages =	{7:1--7:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-085-9},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{116},
  editor =	{Blais, Eric and Jansen, Klaus and D. P. Rolim, Jos\'{e} and Steurer, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2018.7},
  URN =		{urn:nbn:de:0030-drops-94118},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2018.7},
  annote =	{Keywords: Streaming algorithms, sliding windows, heavy hitters, distinct elements}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.21

Approximate Convex Hull of Data Streams

Authors: Avrim Blum, Vladimir Braverman, Ananya Kumar, Harry Lang, and Lin F. Yang

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Given a finite set of points P subseteq R^d, we would like to find a small subset S subseteq P such that the convex hull of S approximately contains P. More formally, every point in P is within distance epsilon from the convex hull of S. Such a subset S is called an epsilon-hull. Computing an epsilon-hull is an important problem in computational geometry, machine learning, and approximation algorithms. In many applications, the set P is too large to fit in memory. We consider the streaming model where the algorithm receives the points of P sequentially and strives to use a minimal amount of memory. Existing streaming algorithms for computing an epsilon-hull require O(epsilon^{(1-d)/2}) space, which is optimal for a worst-case input. However, this ignores the structure of the data. The minimal size of an epsilon-hull of P, which we denote by OPT, can be much smaller. A natural question is whether a streaming algorithm can compute an epsilon-hull using only O(OPT) space. We begin with lower bounds that show, under a reasonable streaming model, that it is not possible to have a single-pass streaming algorithm that computes an epsilon-hull with O(OPT) space. We instead propose three relaxations of the problem for which we can compute epsilon-hulls using space near-linear to the optimal size. Our first algorithm for points in R^2 that arrive in random-order uses O(log n * OPT) space. Our second algorithm for points in R^2 makes O(log(epsilon^{-1})) passes before outputting the epsilon-hull and requires O(OPT) space. Our third algorithm, for points in R^d for any fixed dimension d, outputs, with high probability, an epsilon-hull for all but delta-fraction of directions and requires O(OPT * log OPT) space.

Cite as

Avrim Blum, Vladimir Braverman, Ananya Kumar, Harry Lang, and Lin F. Yang. Approximate Convex Hull of Data Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 21:1-21:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{blum_et_al:LIPIcs.ICALP.2018.21,
  author =	{Blum, Avrim and Braverman, Vladimir and Kumar, Ananya and Lang, Harry and Yang, Lin F.},
  title =	{{Approximate Convex Hull of Data Streams}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{21:1--21:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.21},
  URN =		{urn:nbn:de:0030-drops-90254},
  doi =		{10.4230/LIPIcs.ICALP.2018.21},
  annote =	{Keywords: Convex Hulls, Streaming Algorithms, Epsilon Kernels, Sparse Coding}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2015.350

Clustering on Sliding Windows in Polylogarithmic Space

Authors: Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh

Published in: LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

Abstract

In PODS 2003, Babcock, Datar, Motwani and O'Callaghan gave the first streaming solution for the k-median problem on sliding windows using O(frack k tau^4 W^2tau log^2 W) space, with a O(2^O(1/tau)) approximation factor, where W is the window size and tau in (0,1/2) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade. In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O'Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (alpha,beta)-approximation for metric k-median clustering in the sliding window model, where alpha and beta are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky, which allows us to apply well-known techniques for solving problems in the sliding window model to functions that are not smooth, such as the k-median cost.

Cite as

Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. Clustering on Sliding Windows in Polylogarithmic Space. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, pp. 350-364, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.FSTTCS.2015.350,
  author =	{Braverman, Vladimir and Lang, Harry and Levin, Keith and Monemizadeh, Morteza},
  title =	{{Clustering on Sliding Windows in Polylogarithmic Space}},
  booktitle =	{35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)},
  pages =	{350--364},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-97-2},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{45},
  editor =	{Harsha, Prahladh and Ramalingam, G.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.350},
  URN =		{urn:nbn:de:0030-drops-56549},
  doi =		{10.4230/LIPIcs.FSTTCS.2015.350},
  annote =	{Keywords: Streaming, Clustering, Sliding windows}
}

13 Search Results for "Lang, Harry"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message