Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model

Feng, Shiyuan; Swartworth, William; Woodruff, David

doi:10.4230/LIPIcs.ICALP.2025.75

Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model

Shiyuan Feng EECS, Peking University, Beijing, China William Swartworth Carnegie Mellon University, Pittsburgh, PA, USA David Woodruff Carnegie Mellon University, Pittsburgh, PA, USA

Abstract

We consider the heavy-hitters and $F_{p}$ moment estimation problems in the sliding window model. For $F_{p}$ moment estimation with $1<p\leq 2$ , we show that it is possible to give a $(1\pm\epsilon)$ multiplicative approximation to the $F_{p}$ moment with $2/3$ probability on any given window of size $n$ using $\tilde{O}(\frac{1}{\epsilon^{p}}\log^{2}n+\frac{1}{\epsilon^{2}}\log n)$ bits of space. We complement this result with a lower bound showing that our algorithm gives tight bounds up to factors of $\log\log n$ and $\log\frac{1}{\epsilon}.$ As a consequence of our $F_{2}$ moment estimation algorithm, we show that the heavy-hitters problem can be solved on an arbitrary window using $O(\frac{1}{\epsilon^{2}}\log^{2}n)$ space which is tight.

Keywords and phrases:

sketching, streaming, heavy hitters, sliding window, moment estimation

Category:

Track A: Algorithms, Complexity and Games

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Sketching and sampling

Related Version:

Full Version: http://arxiv.org/abs/2504.21175

Funding:

Authors William Swartworth and David Woodruff were supported by a Simons Investigator Award and Office of Naval Research award number N000142112647.

DOI:

10.4230/LIPIcs.ICALP.2025.75

Event:

52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Editors:

Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

In some situations, such as monitoring network traffic, data is being received at an enormous rate and it is not always worthwhile to store the entire dataset. Instead we may only be interested in some statistics of the data, which we can hope to estimate from a data structure that uses much less space. This motivated the development of streaming algorithms, which process a sequence of updates one at a time, while maintaining some small sketch of the underlying data.

In some applications, it makes sense to focus on only the most recent updates. For instance, when tracking IP addresses over a network [20, 13], we might be more interested in tracking the heavy users on a week-to-week basis rather than over longer time periods. Even if we are interested in longer time periods, we may still be interested in obtaining more fine-grained information about how the heavy users change over time.

This type of problem motivated the sliding window model for streaming algorithms which has a considerable line of work [12, 8, 7, 21]. There are several versions of the sliding window model, but for us our main goal is to respond to queries about the previous $n$ elements of the stream.

We consider perhaps the two most foundational problems in streaming algorithms: moment estimation and heavy-hitters. Both of these models have received an enormous amount of attention in the streaming literature, beginning with [11] who proposed the CountSketch algorithm for finding heavy items, all the way to [6] who gave the optimal space bound for insertion-only streams ¹¹1This means that items may be added to the stream, but not removed., removing one of the $\log$ factors of CountSketch. While fully tight bounds are known for heavy-hitters (up to a $\log(1/\epsilon)$ factor in insertion streams), in the sliding window model tight bounds are still unknown. Previous work [7] gave a $\log^{2}n$ lower bound, and a $\log^{3}n$ upper bound in the sliding window model. We close this gap, and show that a $\log^{2}n$ dependence suffices for computing heavy-hitters on a particular sliding window.

In the $F_{p}$ moment estimation problem, the goal is to approximation the $p^{\text{th}}$ moment for the frequency vector of all items inserted into the stream. Like heavy-hitters, this problem has a long history going back to [1]. Since then, a line of work has led to nearly tight bounds both for large moments $p>2$ , which does not admit polylogarithmic sized sketches [2, 15], as well as the small moments $1\leq p\leq 2$ [14, 18] that we consider here. While this problem has received considerable attention in the sliding window model [8, 10, 21], there still remains a gap in the literature. We aim to close this gap for the problem of estimating the $F_{p}$ moment on a sliding window.

1.1 Our models

The sliding window model

There are several slightly different models that one can consider. By default we consider an insertion-only stream of length $m$ consisting of items $x_{1},\ldots,x_{m}.$ In the sliding window model, there is a window of size $W$ consisting of the $W$ most recent updates to the stream. A correct sliding window algorithm may be queried for the desired statistic at any time $t,$ and it must produce a correct estimate for the statistic on the stream $x_{m-W+1},\ldots,x_{m}$ of the $W$ most recent updates. When we refer to the failure probability for such an algorithm, we mean the probability of failure for a single query (for an arbitrary value of $t$ ).

One could also consider the problem of tracking a stream statistic over the sliding window at all times. This could be accomplished in our framework simply by making the failure probability $1/m$ .

One could also consider a more general model where there is no fixed window. Instead at a given query time $t$ , the algorithm receives a positive integer $n$ and must estimate the stream statistic on the portion of the stream in $(t-n,t].$ The difference is that $n$ is now not known until runtime. Our algorithms all apply to this more general model, and our lower bounds hold in the standard sliding window model. This model may be useful if one wants to observe how the stream has changed up to the present time. For instance if we were monitoring network traffic, we might be interested in which IP addresses occurred frequently in the past hour, day, week, etc.

$F_{p}$ moment estimation

In the version of the $(\varepsilon,\delta,F_{p})$ moment estimation problem that we consider, we receive a series of insertion-only updates $x_{1},\ldots,x_{m}.$ At any point in time $t$ the algorithm may be asked to output an estimate $\hat{F}$ of the $F_{p}$ moment of the window $W=[t-n+1,t]$ consisting of updates $x_{t-n+1},\ldots,x_{t}.$ If $f^{(W)}_{i}$ is the frequency count of universe item $i$ over the window, then the $F_{p}$ moment over the window is defined to be $F_{P}(f^{(W)})=\sum_{i}(f^{(W)}_{i})^{p}.$ In order to be correct, the algorithm must satisfy $\hat{F}=(1\pm\varepsilon)F_{p}(f^{(W)})$ with probability at least $1-\delta.$

Heavy hitters

We say that item $i$ is $(\varepsilon,\ell_{p})$ -heavy for a frequency vector $f$ if $f_{i}\geq\varepsilon\left\|f\right\|_{p}.$ In the $(\varepsilon,\ell_{p})$ -heavy hitters problem the goal is to output a collection $\mathcal{H}$ of $O(\varepsilon^{-p})$ universe items such that all $(\varepsilon,\ell_{p})$ -heavy items for $f^{(W)}$ are contained in $\mathcal{H}$ and such that all elements of $\mathcal{H}$ are at least $c\varepsilon$ -heavy where $c>0$ is a constant.

1.2 Prior Work

[8] introduced the smooth histogram approach for solving problems in the sliding window model. For a function $f$ , given adjacent substreams $A, B$ , and $C$ , an $(\alpha,\beta)$ -smooth function demands that if $(1-\beta)f(A\cup B)\leq f(B)$ , then $(1-\alpha)f(A\cup B\cup C)\leq f(B\cup C)$ for some parameters $0<\beta\leq\alpha<1$ . Throughout the stream, they use the smooth histogram to remove the unnecessary information and only keep track of the time when $F_{p}$ differs by $(1+\epsilon^{p})$ . For $p\in(1,2]$ , their work achieves space $O(\epsilon^{-(2+p)}\log^{3}n)$ .

[21] introduced a new tool called a Difference Estimator. The idea is to estimate the difference of $F_{p}(u+v)$ and $F_{p}(v)$ with additive error $\epsilon\cdot F_{p}(v)$ , given that $\max(F_{p}(u),F_{p}(u+v)-F_{p}(v))\leq\gamma F(v)$ , the dimension of the sketch only needs to be $\tilde{O}(\frac{\gamma^{2/p}}{\epsilon^{2}})$ . They use the idea in [8] to maintain a constant factor partition on the top level. Then using a binary tree-like structure to give a fine-grained partition, they use a Difference Estimator to exclude the part that is not in the window.

[6] gives a $\tilde{O}(\epsilon^{-2}\log n)$ bits of space algorithm to give strong tracking over $\ell_{2}$ in insertion-only stream. Also, [4] generalize this to $p\in(0,2)$ . They introduce a weak tracking property, which is a key motivation for our Strong Estimator .

For heavy hitters, [7] gives a $\tilde{O}(\epsilon^{-p}\log^{3}n)$ algorithm, with the $\ell_{2}$ estimation being the bottleneck. If given the $\ell_{2}$ estimation over the window, then their algorithm can work in space $\tilde{O}(\epsilon^{-p}\log^{2}n)$ . This is the result we apply to obtain improved heavy-hitter bounds from our $F_{2}$ estimator. They also prove an $\Omega(\epsilon^{-p}\log^{2}n)$ lower bound for the heavy hitter problem in the sliding window model.

1.3 Our Results

Table 1: Results in the Sliding Window Model.

Problem (in Sliding Window Model)

Previous Bound

New Bound

L_{p}

-Heavy Hitters,

p\in(0,2]

\tilde{O}(\epsilon^{-p}\log^{3}n)

\Omega(\epsilon^{-p}\log^{2}n)

[7]

\tilde{\mathcal{O}}(\epsilon^{-p}\log^{2}n)

F_{p}

Estimation,

p\in(1,2]

\tilde{\mathcal{O}}(\epsilon^{-2}\log^{3}n)

[21]

\tilde{O}(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n)

,

\Omega(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n)

$F_{p}$ estimation

We give an algorithm for $F_{p}$ moment estimation for $1<p\leq 2$ that achieves optimal bounds in the sliding window model, in terms of the accuracy $\varepsilon$ and the window size $n$ .

Theorem 1.

There is an algorithm that uses

O\Big{(}(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n\log^{4}\frac{1}{\epsilon})% (\log\frac{1}{\epsilon}+(\log\log n)^{2})\Big{)}=\tilde{O}(\varepsilon^{-p}% \log^{2}n+\varepsilon^{-2}\log n)

bits of space, and solves the $(\varepsilon,F_{p})$ moment estimation problem in the sliding window model.

Note that this result is for approximating the $F_{p}$ on any fixed window with $2/3$ probability. To track $F_{p}$ at all times, and guarantee correctness at all times with good probability, one could apply our algorithm $O(\log m)$ times in parallel, and take the median estimate at each time.

Heavy hitters

As a consequence of our $F_{2}$ approximation scheme, we obtain an optimal heavy-hitters algorithm by combining with the results of [7].

Theorem 2.

Given $\epsilon>0$ and $0<p\leq 2$ , there exists an algorithm in the sliding window model that, with probability at least $\frac{2}{3}$ , outputs all indices $i\in[U]$ for which frequency $f_{i}\geq\epsilon\ell_{p}^{W}$ , and reports no indices $i\in[U]$ for which $f_{i}\leq\frac{\epsilon}{12}\ell_{p}^{W}$ . The algorithm has space complexity $O\left(\varepsilon^{-p}\log^{2}n\left(\log\frac{1}{\epsilon}+\log\log n\right)\right)$ .

An earlier version of [7] claimed this bound for the stronger problem of tracking the heavy-hitters on the window at all times. Unfortunately this original bound was incorrect, and their algorithm actually gives a $\log^{3}n$ dependence, even in our setting where we are interested in obtaining good success probability for a single window.

Lower bounds

We also show that our $F_{p}$ estimation algorithms are tight in the sliding window model up to $\log\frac{1}{\varepsilon}\log\log n$ factors. Specifically, we show

Theorem 3.

Fix $p\in(1,2]$ . Suppose that $\mathcal{A}$ is a streaming algorithm that operates on a window of size $n$ , and outputs a $(1\pm\varepsilon)$ approximation of $F_{p}$ on any fixed window with $9/10$ probability. For $n\geq U,$ $\mathcal{A}$ must use at least

\Omega\left(\varepsilon^{-p}\log^{2}(\varepsilon U)+\varepsilon^{-2}\log(% \varepsilon^{1/p}U)\right)

space.

The first term in the lower bound is our contribution. The second term applies to general insertion-only streams, and follows from a recent work of [5]. While the lower bound only applies to $p>1$ , our proof also applies to $p=1$ if the stream is allowed to contain empty insertions that do not affect the frequency counts, but that move the window. (Without empty insertions $p=1$ is trivial, since $F_{1}$ is always the window size.)

1.4 Notation

We use $m$ to denote the length of the stream, $U$ be the universe size, and $u_{1},\dots,u_{m}\in[U]$ to denote the elements in the stream. For sliding window, we use $n$ to denote the window size. We use $\text{poly}(n)$ to denote a constant degree polynomial in $n$ . We assume $m\leq\text{poly}(n),U\leq\text{poly}(n)$ .

For interval $[l,r]\subseteq[1,m]$ , we use $x^{(l,r)}$ to denote the frequency vector of elements $u_{l},u_{l+1},\dots,u_{r}$ . For $p\in(0,2]$ , we use $\ell_{p}^{(l,r)}$ to denote the $\ell_{p}$ norm of the frequency vector $x^{(l,r)}$ , more specifically, $\|x^{(l,r)}\|_{p}$ , and $F_{p}^{(l,r)}$ to denote the $F_{p}$ moment of frequency vector $x^{(l,r)}$ , more specifically, $\|x^{(l,r)}\|_{p}^{p}$ . For window $W=[l,r]$ , we also use $\ell_{p}^{W}$ or $F_{p}^{W}$ to denote the $\ell_{p}$ norm or $F_{p}$ frequency moment on window $W$ . We use $a=(1\pm\epsilon)b$ to denote $a\in[(1-\epsilon)b,(1+\epsilon)b]$ .

1.5 Technical Overview

$F_{p}$ estimation

To motivate our approach, consider approximating $F_{2}$ to within a constant factor. We first recall the framework introduced by [9] for the sliding window model. The rough idea is to start a new $F_{p}$ estimation sketch at each point in the stream. This clearly gives a sliding window estimate, but uses far too much space. However, the observation is that if sketch $i$ and sketch $i+2$ give similar estimates for $F_{p}$ at a given point in the stream, then they will continue to do so for the remainder of the stream. The estimate given by sketch $i+1$ is sandwiched between the estimates from the neighboring sketches, so sketch $i+1$ is redundant, and we may safely delete it. After deleting all redundant sketches, we are left with $O(\log n)$ sketches (for constant $\varepsilon$ ) with geometrically increasing $F_{p}$ estimates.

Each of these $F_{2}$ sketches uses $O(\log n\log\frac{1}{\delta})$ bits of space to estimate $F_{2}$ with probability $1-\delta$ , so this approach requires $O(\log^{2}n\log\frac{1}{\delta})$ bits. Since we only store $\log n$ sketches at a time, it might seem that we can take $\delta=O(\frac{1}{\log n}).$ This is a subtle point and has been a source of error in prior work.

In fact we cannot take $\delta=O(\frac{1}{\log n})$ if we are only guaranteed that our sketches correctly estimate $F_{2}$ with probability at least $1-\delta$ . To see why, suppose that each sketch we create is correct with probability at least $1-\delta$ and is otherwise allowed to be adversarially incorrect with probability $\delta.$ With $\delta=O(1/\log n),$ it is likely that two consecutive sketches will eventually both be incorrect. In that case, they can conspire to in turn sandwich and wipe out all sketches in front of them, from which we will not be able to recover. While extreme, this example illustrates that an $F_{p}$ estimation sketch with a failure probability guarantee is not sufficient for us. This is why prior approaches for estimating the $\ell_{p}$ norm or $F_{p}$ moment in sliding windows require $\Omega(\log^{3}n)$ bits of space, with one of the logarithmic factors arising from the need for sketches with high success rates. These high-success-rate sketches necessitate union bounds over all sketches, even though most of them are not kept by the algorithm when we make a query.

Of course we should not expect sketches to fail this way in practice. Could there be a somewhat stronger guarantee that rules out this type of adversarial behavior? In this work, we overcome this limitation by introducing a new tool: the Strong Estimator . This estimator enables a refined analysis that avoids the union-bound overhead, allowing us to achieve the first space complexity which breaks the $O(\log^{3}n)$ bit barrier.

The intuition of Strong Estimator is for a window $[l,r]$ , we do not need our estimator to be right (meaning give an $\epsilon$ approximation) on any sub-interval of this window, but rather only an additive error: $\epsilon$ fraction of the $\ell_{p}$ norm of the window suffices for maintaining the timestamps. Thus we can avoid the union bound for all sub-intervals to be right. The intuition is similar to the weak tracking in [4].

In Section $3$ , we show that Strong Estimator with the simplest algorithm in [8] suffices to give an $\epsilon$ approximation to the $\ell_{p}$ norm of the sliding window. However, this simple algorithm uses $\tilde{O}(\epsilon^{-3p}\log^{2}n)$ bits of space, which requires a high dependence on $\epsilon$ .

In Section $4$ , we introduce Difference Estimator [21] to improve the $\epsilon$ dependence. We first use our simple algorithm to give a constant factor partition on the top level. We then use a binary tree-like structure to make a more detailed partition between the top level, and use Difference Estimator estimator to exclude the contribution that is not in the window. The algorithm in [21] requires $\tilde{O}(\epsilon^{-2}\log^{3}n)$ space. The extra $\log n$ factor also comes from the high success probability requirement. Using the Strong Estimator and the structure of their algorithm, we can obtain an algorithm using $O(\epsilon^{-2}\log^{2}n)$ bits of memory. We further discover that the bottleneck of the algorithm comes from the Difference Estimator , which can be stored by rounded values using the rounding procedure in [16]. We thus improve the space to $\tilde{O}(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n)$ bits, which gives a separation between $p=2$ and $p<2$ . We further prove a lower bound on $F_{p}$ moment estimation over sliding windows, see below, showing that our algorithm is nearly optimal.

Heavy Hitters

[7] showed that if one has an $\ell_{2}$ estimation algorithm over a sliding window, then one can find the $\ell_{2}$ heavy hitters with $\tilde{O}(\epsilon^{-2}\log^{2}n)$ additional bits of space. In [17], they show that an $\epsilon^{p/2}$ - heavy hitter algorithm for $\ell_{2}$ also finds $\epsilon$ - heavy hitters for $\ell_{p}$ . Thus, using our simplest algorithm to provide a constant $\ell_{2}$ estimation over the window will give us an algorithm that uses $\tilde{O}(\epsilon^{-p}\log^{2}n)$ bits of space, and returns all the $\epsilon$ - $\ell_{p}$ heavy hitters. This matches the lower bound $\Omega(\epsilon^{-p}\log^{2}n)$ for heavy hitters in [7].

Lower Bounds

For our lower bound for $F_{p}$ estimation, we construct a stream that consists of roughly $\log(n)$ blocks, where each block has half the $F_{p}$ value of the prior block. Within each block, we place $\varepsilon^{-p}$ items that are each $(\varepsilon,\ell_{p})$ -heavy for the block, and arrange them so that each of these heavy items occurs in a contiguous “mini-block”. Overall, there are $\frac{1}{\varepsilon^{p}}\log(n)$ of these “mini-blocks”.

The main observation is that a correct streaming algorithm for $F_{p}$ -moment estimation in the sliding window model must remember the location of each mini-block. Doing so requires $\log n$ bits of space per mini-block as long as there are at least, say, $n^{0.1}$ disjoint positions that each block can occur in, resulting in our $\Omega(\frac{1}{\varepsilon^{p}}\log^{2}n)$ lower bound.

To see why our algorithm must remember each location, consider one of the heavy items $x$ in our construction, and suppose we want to decide whether the mini-block for $x$ occurs entirely before or entirely after stream index $i .$ We first shift the sliding window so that it begins at index $i$ . We would like to decide whether or not $x$ occurs in the resulting window. By the geometric decay of the blocks, $x$ remains heavy for the entire window. Then to decide if $x$ remains in the window we append $Q$ copies of $x$ to the window, where $Q$ is a rough estimate of the $F_{p}$ over the window. If $x$ does not occur in the window, then appending the $x$ ’s increases the $F_{p}$ by $Q^{p}.$ On the other hand, if $x$ does occur roughly $\varepsilon Q$ times in the window then the $F_{p}$ increases by $(\varepsilon Q+Q)^{p}-(\varepsilon Q)^{p}=\Theta((1+\varepsilon)Q^{p}).$ Since $Q^{p}$ is approximately the $F_{p}$ moment of the window, a $\Theta(\varepsilon)$ heavy-hitters algorithm can distinguish between these two cases. A minor issue is that as stated, we would need to append the $x$ ’s without shifting the window past the index $i$ . However we can easily fix this by appending the $x$ ’s before performing appending singletons to accomplish the shift. To make the above intuition precise we give a reduction from the IndexGreater communication game in Section 5.

2 Preliminary Results

In this section, we will introduce some basic definitions and lemmas that form the foundation of our Strong Estimator. We will also introduce the previous results for heavy hitters. We first require the following definition for $p$ -stable distribution.

Definition 4 ( $p$ -stable distribution. [22]).

For $0<p\leq 2$ , there exists a probability distribution $\mathcal{D}_{p}$ called the $p$ -stable distribution so that for any positive integer $n$ with $Z_{1},\ldots,Z_{n}\sim\mathcal{D}_{p}$ and vector $x\in\mathbb{R}^{n}$ , then $\sum_{i=1}^{n}Z_{i}x_{i}\sim\|x\|_{p}Z$ for $Z\sim\mathcal{D}_{p}$ .

The probability density function $f_{X}$ of a $p$ -stable random variable $X$ satisfies $f_{X}(x)=\Theta\left(\frac{1}{1+|x|^{1+p}}\right)$ for $p<2$ , while the normal distribution corresponds to $p=2$ . Moreover, [19] details standard methods for generating $p$ -stable random variables by taking $\theta$ uniformly at random from the interval $\left[-\frac{\pi}{2},\frac{\pi}{2}\right],r$ uniformly at random from the interval $[0,1]$ , and setting

X=f(r,\theta)=\frac{\sin(p\theta)}{\cos^{1/p}(\theta)}\cdot\left(\frac{\cos(% \theta(1-p))}{\log\frac{1}{r}}\right)^{\frac{1}{p}-1}

These $p$ -stable random variables are crucial to obtaining our Strong Estimator.

Definition 5 (Strong Estimator).

Let $\epsilon_{1}\leq\epsilon_{2}\in(0,1)$ , $p\in(0,2]$ and let $u_{1},u_{2},\dots,u_{m}$ be the sequence of stream elements. Let $x^{(i,j)}$ denote the frequency vector for elements $u_{i},\dots,u_{j}.$ We say that $f_{p}$ has the $(\varepsilon_{1},\varepsilon_{2},\delta)$ Strong Estimator property on the window $[l,r]$ if with probability $1-\delta$ , we have the bound $f_{p}(x^{(a,b)})=(1\pm\epsilon_{1})\ell_{p}^{(a,b)}\pm\epsilon_{2}\ell_{p}^{(l% ,r)}$ for all sub-windows $[a,b]\subseteq[l,r]$ simultaneously. We say that $f_{p}$ has the $(\varepsilon_{1},\varepsilon_{2},\delta)$ Strong Estimator property if it has the $(\varepsilon_{1},\varepsilon_{2},\delta)$ Strong Estimator property on all windows $[l,r].$

For simplicity, we will use $f_{p}^{(a,b)}$ to denote the estimate $f_{p}(x^{(a,b)})$ .

To support the construction and analysis of such an estimator, we rely on several probabilistic tools. Lemma 6 gives a Chernoff-type concentration bound for $k$ -wise independent random variables. Lemma 7 provides tail bounds on the supremum of inner products between a $p$ -stable random vector and a sequence with an increasing frequency vector.

Lemma 6 ([3]).

Let $X_{1},\ldots X_{n}\in\{0,1\}$ be a sequence of $k$ -wise independent random variables, and let $\mu=\sum\mathbb{E}X_{i}$ . Then

\forall\lambda>0,\mathbb{P}\left(\sum X_{i}\geq(1+\lambda)\mu\right)\leq\exp% \left(-\Omega\left(\min\left\{\lambda,\lambda^{2}\right\}\mu\right)\right)+% \exp(-\Omega(k))

Lemma 7 ([4]).

Let $x^{(1)},x^{(2)},\ldots x^{(m)}\in\mathbb{R}^{n}$ satisfy $0\preceq x^{(1)}\preceq x^{(2)}\preceq\ldots\preceq x^{(m)}$ . Let $Z\in\mathbb{R}^{n}$ have $k$ -wise independent entries marginally distributed according to $\mathcal{D}_{p}$ . Then for some $C_{p}$ depending only on $p$ ,

\mathbb{P}\left(\sup_{k\leq m}\left|\left\langle Z,x^{(k)}\right\rangle\right|% \geq\lambda\left\|x^{(m)}\right\|_{p}\right)\leq C_{p}\left(\frac{1}{\lambda^{% 2p/(2+p)}}+k^{-1/p}\right)

Finally, we include two additional lemmas that are essential for our heavy hitter detection algorithm in the sliding window model.

Lemma 8, due to Braverman et al. [7], shows that a constant-factor approximation to the $\ell_{2}$ norm within a window suffices to recover all $\epsilon$ -heavy hitters under the $\ell_{2}$ norm, with provable accuracy guarantees and near-optimal space complexity.

Lemma 8 ([7]).

For a fixed window $W$ , if given a $\frac{1}{2}$ approximation of $\ell_{2}^{W}$ , then there is an algorithm that outputs all the $\epsilon$ heavy hitters for $\ell_{2}$ within the window, and can guarantee that all the output elements have frequency $\geq\frac{\epsilon}{12}\ell_{2}^{W}$ . This algorithm uses $O(\epsilon^{-2}\log^{2}n(\log\frac{1}{\epsilon}+\log\log n))$ and has success probability $\geq\frac{2}{3}$ .

To extend this result to the $\ell_{p}$ setting for any $p\in(0,2]$ , we use Lemma 9, which establishes that any algorithm capable of recovering $\epsilon^{p/2}$ -heavy hitters under $\ell_{2}$ (with a suitable tail guarantee) also successfully identifies all $\epsilon$ -heavy hitters under the $\ell_{p}$ norm.

Lemma 9 ([17]).

For any $p\in(0,2]$ , any algorithm that returns the $\epsilon^{p/2}$ -heavy hitters for $\ell_{2}$ satisfying the tail guarantee also finds the $\epsilon$ -heavy hitters for $\ell_{p}$ .

3 Warmup: Constant-factor $F_{p}$ approximation

In this section, we will introduce our result for Strong Estimator. This estimator enables us to design a constant-factor approximation algorithm for $F_{p}$ , and in combination with previous results, it leads to a nearly optimal algorithm for identifying $\epsilon$ -heavy hitters over sliding windows.

3.1 Constructing a Strong Estimator

First, we will show how to construct a space efficient Strong Estimator defined in Definition 5.

Construction 10.

Let $d$ , $r$ and $s$ be parameters to be specified later, the Strong Estimator takes $\Pi\in\mathbb{R}^{d\times U}$ to be a random matrix with entries drawn according to $\mathcal{D}_{p}$ , and such that the rows are r-wise independent, and all entries within a row are s-wise independent. For frequency vector $x^{(a,b)}$ , the estimate $f_{p}(x^{(a,b)})$ is given by $\operatorname{median}(|(\Pi x^{(a,b)})_{1}|,|(\Pi x^{(a,b)})_{2}|,\dots,|(\Pi x% ^{(a,b)})_{d}|)$ .

The following lemma shows that, by appropriately setting the parameters $d$ , $r$ , and $s$ , the above construction yields a valid Strong Estimator.

Lemma 11.

Let $\epsilon_{1},\epsilon_{2}\in(0,1)$ with $\epsilon_{2}\leq\epsilon_{1}$ , $\delta\in(0,1)$ , and $p\in[1,2]$ . Given a stream of elements $u_{1},u_{2},\dots,u_{m}$ , Construction 10 yields an $(\epsilon_{1},\epsilon_{2},\delta)$ Strong Estimator by setting the parameters as follows:

d=\Theta\left(\epsilon_{1}^{-2}\left(\log\tfrac{1}{\epsilon_{2}}+\log\tfrac{1}% {\delta}\right)\right),\quad r=\Theta\left(\log\tfrac{1}{\epsilon_{2}}+\log% \tfrac{1}{\delta}\right),\quad s=\Theta\left(\epsilon_{1}^{-p}\right).

To show the space efficiency, we also analyze the space complexity of storing the matrix $\Pi$ used in the construction.

Lemma 12.

The memory required to store the matrix $\Pi$ for an $(\epsilon_{1},\epsilon_{2},\delta)$ Strong Estimator is

O\left(\epsilon_{1}^{-p}\left(\log\tfrac{1}{\epsilon_{2}}+\log\tfrac{1}{\delta% }\right)\log(mU)\right)\text{ bits}.

Last, we will show an additional property of Strong Estimator, which is crucial for solving $F_{p}$ estimation over sliding windows.

Lemma 13 (Additional Property for the Strong Estimator).

The $(\varepsilon_{1},\varepsilon_{2},\delta)$ Strong Estimator $f_{p}$ has the following property:

For a sequence of elements in the stream $u_{1},u_{2},\dots,u_{m}$ and a window $[l,r]$ , with probability $\geq 1-\delta\cdot\log r$ , for any interval $[a,b]$ that $1\leq a<l,l\leq b\leq r$ , $f_{p}(x^{(a,b)})=(1\pm(\epsilon_{1}+2\epsilon_{2}))\ell_{p}^{(a,b)}\pm 2% \epsilon_{2}\ell_{p}^{(l,r)}$ .

3.2 Constant-factor $F_{p}$ approximation

First, we will introduce the algorithm given in [8]: it maintains $O(\epsilon^{-p}\log n)$ timestamps to track the times at which $\ell_{p}$ changes by a factor of $(1+\epsilon^{p})$ . This algorithm requires $O(\log^{3}n)$ bits of space, where one $\log n$ factor accounts for the number of sketches, another $\log n$ arises from the need for high success probability in order to apply a union bound over all sketches, and the final $\log n$ comes from the fact that each entry in the sketch requires $\log n$ bits.

Our key improvement over their algorithm is that, by using the Strong Estimator , we no longer require the sketches to have high success probability. The Strong Estimator provides guarantees over all sub-intervals directly, eliminating the need for a union bound.

We present an $\epsilon$ -approximation algorithm for $\ell_{p}$ over sliding windows that uses $\tilde{O}(\epsilon^{-3p}\log^{2}n)$ space. This is the first algorithm achieving $o(\log^{3}n)$ space for this problem. As $\ell_{p}$ estimation and $F_{p}$ estimation are fundamentally the same, we do not distinguish between the two here.

The following theorem formalizes the guarantees of our approach:

Theorem 14.

For any fixed window $W$ , with probability $\geq 1-\delta$ , Algorithm 1 will give an $\epsilon$ approximation of $\ell_{p}^{W}$ , and use $O(\epsilon^{-3p}\log^{2}n(\log\frac{1}{\epsilon}+\log\frac{1}{\delta}+\log\log n))$ bits of space.

We now apply our algorithm to the heavy hitter problem in sliding windows. Combining our $\frac{1}{2}$ -approximation for $\ell_{2}$ with known results for heavy hitters (Lemma 8 and Lemma 9), we obtain the following result:

Theorem 15 (Main Theorem on Heavy Hitters).

Given $\epsilon>0$ and $0<p\leq 2$ , there exists an algorithm in the sliding window model that, with probability at least $\frac{2}{3}$ , outputs all indices $i\in[U]$ for which frequency $f_{i}\geq\epsilon\ell_{p}^{W}$ , and reports no indices $i\in[U]$ for which $f_{i}\leq\frac{\epsilon}{12}\ell_{p}^{W}$ . The algorithm has space complexity (in bits) $O\left(\frac{1}{\epsilon^{p}}\log^{2}n\left(\log\frac{1}{\epsilon}+\log\log n% \right)\right)$ .

Proof.

We use Algorithm 1 to obtain a $\frac{1}{2}$ approximation over $\ell_{p}^{W}$ , the rest of the proof follows by Lemma 8 and Lemma 9. $\hfill\blacktriangleleft$

Algorithm 1 Smooth Histogram with Strong Estimator .

Input: Stream $a_{1},a_{2},\dots\in[U]$ , window length $n$ , approximate ratio $\epsilon\in(0,1)$ and error rate $\delta\in(0,1)$ , parameter $p$ .

Output: At each time $i$ , output a $(1\pm\epsilon)$ approximation to $\ell_{p}$ in window $[i-n,i]$ .

Initialization:

1.

Generate the matrix $\Pi\in\mathbb{R}^{d\times U}$ for an $(\frac{\epsilon^{p}}{64},\frac{\epsilon^{p}}{128},\delta^{\prime}=\frac{\delta% }{O(\log m)})$ Strong Estimator, where $d=O(\epsilon^{-2p}\log\frac{1}{\epsilon\delta^{\prime}}))$ .
2.

Random matrix $\pi\in\mathbb{R}^{d^{\prime}\times U}$ for a Indyk’s p-stable Sketch, where $d^{\prime}=O(\epsilon^{-2}\log\frac{1}{\delta})$ .

Let $f^{(D)}$ be the estimated value of $\|x^{(D)}\|_{p}$ from Strong Estimator on data stream $D$ , and $g^{(D)}$ be the estimated value of $\|x^{(D)}\|_{p}$ from Indyk’s p-stable Sketch.

During the Stream:

Define $t_{k}(j)$ to be the $k$ -th timestamps when the stream goes to $j$ . The following lemma shows that the $\ell_{p}$ norm does not decreases significantly between two consecutive timestamps.

Lemma 16.

Let $p\in[1,2]$ , for fixed window $W=[i-n,i]$ , with probability $\geq 1-\delta$ , either $t_{1}(i)+1=t_{2}(i)$ , or $\ell_{p}^{(t_{2}(i),i)}\geq(1-\frac{\epsilon}{2})\ell_{p}^{(t_{1}(i),i)}$ .

Now we prove theorem 14.

Proof of Theorem 14.

Our Indyk’s p-stable Sketch will give a $\frac{\epsilon}{4}$ approximation with probability $\geq 1-\delta$ . By a union bound over the event of Lemma 16 and the condition that Indyk’s p-stable Sketch is right, by adjusting the constant factor for $\delta$ , we obtain an $\epsilon$ approximation of $\ell_{p}^{W}$ with probability $\geq 1-\delta$ .

For the space, as the value $f$ is bounded by $\text{poly}(n)$ , the number of timestamps will be bounded by $O(\epsilon^{-p}\log n)$ .

For each timestamp, we keep a vector with dimension $d=O(\epsilon^{-2p}\log\frac{\log m}{\epsilon\delta})$ for the Strong Estimator , and a vector with dimension $d=O(\epsilon^{-2}\log\frac{1}{\delta})$ for Indyk’s p-stable Sketch . So the total space for the sketches is $O(\epsilon^{-3p}\log^{2}n(\log\frac{1}{\epsilon}+\log\log n+\log\frac{1}{% \delta}))$ , as we have assumed $m\leq\text{poly}(n)$ . The space to store the matrix is of lower order by Lemma 12. $\hfill\blacktriangleleft$

4 Improving the $\varepsilon$ dependence

4.1 Preliminary Results

Here we use $F(u)$ to denote $\|u\|_{p}^{p}$ for frequency vector $u$ .

First, we introduce the definition and construction of Difference Estimator, which is essential in our algorithm that gives tight bound on $F_{p}$ estimation over sliding window.

Definition 17 (Difference Estimator. Definition 8.2, [21]).

Given a stream $\mathcal{S}$ and fixed times $t_{1}$ , $t_{2}$ , and $t_{3}$ , let frequency vectors $u$ and $v$ be induced by the updates of $\mathcal{S}$ between times $\left[t_{1},t_{2}\right)$ and $\left[t_{2},t_{3}\right)$ . Given an accuracy parameter $\varepsilon>0$ and a failure probability $\delta\in(0,1)$ , a streaming algorithm $\mathcal{C}\left(t_{1},t_{2},t,\gamma,\varepsilon,\delta\right)$ is a $(\gamma,\varepsilon,\delta)$ -suffix difference estimator for a function $F$ if, with probability at least $1-\delta$ , it outputs an additive $\varepsilon\cdot F\left(v+w_{t}\right)$ approximation to $F\left(u+v+w_{t}\right)-F\left(v+w_{t}\right)$ for all frequency vectors $w_{t}$ induced by $\left[t_{3},t\right)$ for times $t>t_{3}$ , given $\min(F(u),F(u+v)-F(v))\leq\gamma\cdot F(v)$ for a ratio parameter $\gamma\in(0,1]$ .

Lemma 18 (Construction of Difference Estimator. Lemma 4.9, Lemma 3.6 [21]).

Let the dimension $d=\mathcal{O}\left(\frac{\gamma^{2/p}}{\varepsilon^{2}}\left(\log\frac{1}{% \varepsilon}+\log\frac{1}{\delta}\right)\right)$ .

For $0<p<2$ , it suffices to use a sketching matrix $A\in\mathbb{R}^{d\times U}$ of i.i.d. entries drawn from the $p$ -stable distribution $\mathcal{D}_{p}$ , with dimension $d$ to obtain a $(\gamma,\varepsilon,\delta)$ -difference estimator for $F_{p}$ . To store such a matrix $A$ requires $O(d\log n(\log\log n)^{2})$ space.

Moreover, the estimation of $F(u+v)-F(v)$ is given by:

1.

For a parameter $q=3$ , let each $z_{i}=\prod_{j=q(i-1)+1}^{qi}\left(Au+Av\right)_{j}^{p/q}$ and $z_{i}^{\prime}=$ $\prod_{j=q(i-1)+1}^{qi}(Av)_{j}^{p/q}$ .
2.

Output the arithmetic mean of $\left(z_{1}-z_{1}^{\prime}\right),\left(z_{2}-z_{2}^{\prime}\right),\ldots,% \left(z_{d/q}-z_{d/q}^{\prime}\right)$ .

For $p=2$ , it suffices to use a sketching matrix $M\in\mathbb{R}^{d\times U}$ that each entry $M_{i,j}$ of $M$ is a 4-wise independent random sign scaled by $\frac{1}{\sqrt{d}}$ to obtain a $(\gamma,\epsilon,\delta)$ -difference estimator for $F_{2}$ . To store such a matrix $M$ requires $O(d\log n)$ space.

The estimation of $F(u+v)-F(v)$ is given by $\|M(u+v)\|_{2}^{2}-\|Mv\|_{2}^{2}$ .

Next, we introduce the rounding procedure described in [16]. Although this procedure is originally designed for a distributed setting, the process of merging two sketches can be interpreted as two nodes transmitting their respective sketches to a parent node, which then performs a new rounding operation based on the information received from its children.

Definition 19 ([16] Rounding random variable).

For any real value $r\in\mathbb{R}$ , let $i_{r}\in\mathbb{Z}$ and $\alpha_{i}\in\{1,-1\}$ be such that $(1+\gamma)^{i_{r}}\leq\alpha_{i}r\leq(1+\gamma)^{i_{r}+1}$ . Now fix $p_{r}$ such that:

\alpha_{i}r=p_{r}(1+\gamma)^{i_{r}+1}+\left(1-p_{r}\right)(1+\gamma)^{i_{r}}

We then define the rounding random variable $\Gamma_{\gamma}(r)$ by

\Gamma_{\gamma}(r)=\begin{cases}0&\text{ if }r=0\\ \alpha_{i}(1+\gamma)^{i_{r}+1}&\text{ with probability }p_{r}\\ \alpha_{i}(1+\gamma)^{i_{r}}&\text{ with probability }1-p_{r}\end{cases}

Procedure 1 (Recursive Randomized Rounding).

1.

Choose random vector $Z\in\mathbb{R}^{n}$ using shared randomness.
2.

Receive rounded sketches $r_{j_{1}},r_{j_{2}},\ldots,r_{j_{t_{j}}}\in\mathbb{R}$ from the $t_{j}$ children of node $j$ in the prior layer (if any such children exist).
3.

Compute $x_{j}=\left\langle X_{j},Z\right\rangle+r_{j_{1}}+r_{j_{2}}+\cdots+r_{j_{t}}% \in\mathbb{R}$ .
4.

Compute $r_{j}=\Gamma\left(x_{j}\right)$ . If player $j\neq\mathcal{C}$ , then send $r_{j}$ it to the parent node of $j$ in $T$ . If $j=\mathcal{C}$ , then output $r_{j}$ as the approximation to $\langle Z,\mathcal{X}\rangle$ .

Lemma 20 (Lemma 2, [16]).

Let $U$ be the universe space, and $m$ be the total point in the graph, $d$ be the diameter of this graph, $\epsilon,\delta\in(0,1)$ . Fix $p\in[1,2]$ , and let $Z=\left(Z_{1},Z_{2},\ldots,Z_{U}\right)\sim D_{p}^{U}$ . Then the above procedure when run on $\gamma=(\epsilon\delta/(d\log(Um)))^{C}$ for a sufficiently large constant $C$ , produces an estimate $r_{C}$ of $\langle Z,\mathcal{X}\rangle$ , held at the center vertex $\mathcal{C}$ , such that $\mathbb{E}\left[r_{\mathcal{C}}\right]=\langle Z,\mathcal{X}\rangle$ . Moreover, over the randomness used to draw $Z$ , with probability $1-\delta$ for $p<2$ , and with probability $1-e^{-1/\delta}$ for Gaussian $Z$ , we have $\mathbb{E}\left[\left(r_{\mathcal{C}}-\langle Z,\mathcal{X}\rangle\right)^{2}% \right]\leq(\epsilon/\delta)^{2}\|\mathcal{X}\|_{p}$ . Thus, with probability at least $1-O(\delta)$ , we have

\left|r_{\mathcal{C}}-\langle Z,\mathcal{X}\rangle\right|\leq\epsilon\|% \mathcal{X}\|_{p}

Moreover, if $Z=\left(Z_{1},Z_{2},\ldots,Z_{n}\right)\in\mathbb{R}^{n}$ where each $Z_{i}\in\{1,-1\}$ is a 4-wise independent Rademacher variable, then the above bound holds with $p=2$ (and with probability $1-\delta$ ).

Now, we give our own rounding procedure.

Procedure 2.

Suppose we want to update the rounded sketch of window $[l,r]$ . The sketch has a matrix $\Pi\in D_{p}^{d\times U}$ for $p\in(0,2)$ . For $p=2$ , the sketch has a matrix $\Pi\in\{1,-1\}^{d\times U}$ , with each entry being a Rademacher variable and each row being 4-wise independent. The precise sketch for window $[l,r]$ is $(|\Pi x^{(l,r)}|_{1},\dots,|\Pi x^{(l,r)}|_{d})\in\mathbb{R}^{d}$ . Let $\mathcal{P}(\Pi x^{(l,r)})\in\mathbb{R}^{d}$ be the sketch vector after the rounding procedure.

There are 2 cases: First is given a non-rounded sketch, and then rounding the sketch. In this case, for each $i\in[d]$ , $\mathcal{P}(\Pi x^{(l,r)})_{i}=\Gamma_{\gamma}((\Pi x^{(l,r)})_{i})$ .

The second case is given the rounded sketch of 2 sub-intervals $[l,k]$ and $[k+1,r]$ . For this case, $\mathcal{P}(\Pi x^{(l,r)})_{i}=\Gamma_{\gamma}(\mathcal{P}(\Pi x^{(l,k)})_{i}+% \mathcal{P}(\Pi x^{(k+1,r)})_{i})$ .

Lemma 21.

Let $[l,r]$ be a window, $\Pi\in\mathbb{R}^{d\times U}$ be the sketching matrix (meaning each entry is sampled from $\mathcal{D}_{p}$ if $0<p<2$ , or is a Rademacher variable if $p=2$ ), and we are running procedure $\mathcal{P}$ described in Procedure 2 with parameter $\gamma=(\frac{\epsilon\delta d}{m\log(Um)})^{C}$ to get a rounded vector for sketch $\Pi x^{(l,r)}$ . Then for all $i\in[d]$ , we have $\Big{|}\mathcal{P}(\Pi x^{(l,r)})_{i}-(\Pi x^{(l,r)})_{i}\Big{|}\leq\epsilon\|% x^{(l,r)}\|_{p}$ .

4.2 Algorithm Overview

The structure of our algorithm follows [21]. Let the current time be $r$ and query window be $W=[r-n+1,r]$ . Our algorithm first uses Algorithm 1 to maintain a constant factor partition on the top level. We will have $O(\log n)$ timestamps on the top level. Let them be $t_{1}<t_{2}<\dots<t_{s}$ .

For each of the $O(\log n)$ pieces of the stream, we will partition the substream $[t_{i},t_{i+1}]$ into more detailed timestamps. We will have a binary tree-like structure, with $\beta=O(\log\epsilon^{-p})$ at the top level, and each level $j$ will keep roughly $O(2^{j})$ timestamps. Define the $k$ -th timestamp on level $j$ between top level $i$ to be $t_{i,j,k}$ .

We will use Difference Estimator to estimate the difference of $F_{p}$ between 2 nearby timestamps and the current time $r$ , to exclude the contribution from $t_{1}$ to $r-n$ . As the Difference Estimator is maintained by a binary tree-like structure, we only need to query the Difference Estimator a constant number of times on each level. So if each Difference Estimator gives an $\eta$ additive error of $F_{p}^{W}$ , then our final approximation will have an $O(\eta\cdot\beta)$ additive error of $F_{p}^{W}$ . So a value $\eta=O(\frac{\epsilon}{\log\frac{1}{\epsilon}})$ suffices and the Difference Estimator uses $O(\epsilon^{-2}\log^{2}n\text{poly}(\log\frac{1}{\epsilon}))$ space in total.

To maintain the binary tree-like structure for the Difference Estimator, the algorithm by [21] uses a $p$ -stable sketch with high success probability, and thus by union bound, these sketch always can rule out unnecessary timestamps and retain only the essential ones.

Our algorithm relies on the analysis of Strong Estimator, and no such union bound will be used (thus saving a factor of $\log n$ ), so it will require a more careful analysis. In the following content, we introduce our new techniques.

Algorithm 2 Moment Estimation in the Sliding Window with Optimal Space.

Input: Stream $u_{1},u_{2},\dots\in[U]$ , window length $n$ , approximate ratio $\epsilon\in(0,1)$ and error rate $\delta\in(0,1)$ , parameter $p$ .

Output:At each time $i$ , output a $(1+\varepsilon)$ -approximation to $F_{p}$ for window $[i-n,i]$ .

Initialization: $\beta\leftarrow\left\lceil\log\frac{100\max(p,1)}{\varepsilon^{\max(p,1)}}% \right\rceil,\eta\leftarrow\frac{\epsilon}{2^{25}\cdot\log\frac{1}{\epsilon}},% \gamma_{j}\leftarrow 2^{3-j}$ for all $j\in[\beta]$ , $\delta_{dif}=\frac{\delta}{2^{25}\cdot\log\frac{1}{\epsilon}}$

1.

Let $\epsilon_{0}=\frac{1}{2}$ , prepare the matrix $\Pi_{top}$ for $(\frac{\epsilon_{0}^{p}}{64},\frac{\epsilon_{0}^{p}}{128},\delta_{SE}=\frac{% \delta}{\log O(\log m)})$ Strong Estimator to maintain a constant factor partition on the top level.
2.

Prepare the matrix $\pi$ for a Indyk’s p-stable Sketch with dimension $O(\epsilon^{2}\log\frac{1}{\delta})$ .
3.

To maintain the subroutine level, for each $j\in[\beta]$ , prepare $C\cdot 2^{j}\log n$ random matrices $\Pi_{i}\in d$ for a large constant $C$ , each responsible for an $(\frac{1}{8},\cdot\frac{2^{-1/p}\epsilon}{2^{32}},\delta_{sub}=\frac{\delta}{O% (C\epsilon^{-p}\log n)})$ Strong Estimator . Store them in a queue $q_{j}$ .
4.

For each level $j\in[\beta]$ , generate the matrix $\Pi_{dif}$ for the $(\gamma_{j},\eta,\delta_{dif})$ -Difference Estimator, with dimension $d=O(\frac{\gamma_{j}^{2/p}}{\eta^{2}}(\log\frac{1}{\eta}+\log\frac{1}{\delta_{% dif}}))$ .

4.3 Rounding Techniques

In this subsection, we present several space-saving techniques that allow us to maintain correctness guarantees while significantly reducing the storage requirements of the sketching data structures. First, we describe how rounding can be applied to the Strong Estimator sketches $f_{i,j,k}$ to reduce the number of bits required per entry. Next, we address the challenge of bounding the number of timestamps in the absence of a global union bound, introducing a rotating random set technique that probabilistically limits the number of active sketches. Finally, we apply a fine-grained rounding scheme to compress both the Indyk’s p-stable Sketch and Difference Estimator sketches, ensuring that their space usage remains within our overall complexity bounds.

Storing the Sketch via Strong Estimator $f_{i,j,k}$

We now describe how to apply rounding to the estimates computed by Strong Estimator to achieve space efficiency.

Throughout the algorithm, the sketch $f_{i,j,k}$ is used to estimate the $\ell_{p}$ norm over the interval from timestamp $t_{i,j,k}$ to either another timestamp $t^{\prime}$ or the current time. While we maintain exact estimates for intervals ending at the current time, we observe that for estimates ending at previous timestamps $t^{\prime}$ , we can round the values to save space. Specifically, we round $f_{i,j,k}^{(t_{i,j,k},t^{\prime})}$ to the nearest power of $(1+\frac{1}{16})$ . The following claim shows that this rounding preserves the Strong Estimator property up to a small additive loss:

Claim 22.

Let $u_{1},\dots,u_{m}$ be a stream and $[l,r]$ an arbitrary window. Suppose $f$ satisfies the $(\epsilon_{1},\epsilon_{2})$ Strong Estimator property on $[l,r]$ . Define $\hat{f}^{(a,b)}=\text{Round}(f^{(a,b)})$ to the nearest power of $1+\frac{1}{16}$ . Then $\hat{f}$ satisfies the $(\epsilon_{1}+\frac{1}{8},(1+\frac{1}{16})\epsilon_{2})$ Strong Estimator property on $[l,r]$ .

Thus, in our analysis, since we use $(\frac{1}{8},\frac{2^{-1/p}\epsilon}{2^{32}},\delta_{\text{sub}})$ -Strong Estimator , we may safely treat the rounded sketches as $(\frac{1}{4},\frac{2^{-1/p}\epsilon}{2^{30}},\delta_{\text{sub}})$ -Strong Estimator without loss of generality.

However, directly storing $O(s^{2})$ rounded values (where $s=O(\epsilon^{-p}\log n)$ is the number of timestamps) would still be space-inefficient. To address this, we introduce the following compression technique:

Technique 1.

For each timestamp $t_{i,j,k}$ , we maintain an array of length $O(\log n)$ . The $c$ -th entry stores the rank of the earliest timestamp $t^{\prime}$ such that $f_{i,j,k}^{(t_{i,j,k},t^{\prime})}\geq(1+\frac{1}{16})^{c}$ .

A potential issue with this scheme is that the value $f_{i,j,k}^{(t_{i,j,k},t_{u})}$ might be “overestimated” due to a later timestamp $t_{v}$ with $f_{i,j,k}^{(t_{i,j,k},t_{v})}>f_{i,j,k}^{(t_{i,j,k},t_{u})}$ . However, this does not violate the Strong Estimator property, since $f_{i,j,k}^{(t_{i,j,k},t_{u})}<f_{i,j,k}^{(t_{i,j,k},t_{v})}<(1+\epsilon_{1})% \ell_{p}^{(t_{i,j,k},t_{v})}+\epsilon_{2}\ell_{p}^{(l,r)}<(1+\epsilon_{1})\ell% _{p}^{(t_{i,j,k},t_{u})}+\epsilon_{2}\ell_{p}^{(l,r)}$ .

Therefore, this approximation remains valid, and the total space required to store all the Strong Estimator sketches is reduced to $O(s\log n\log s)$ bits.

Challenges in Bounding the Number of Timestamps

The analysis in [21] relies on applying a union bound over all sketches and all times, which introduces an additional $\log n$ factor in the space complexity. In contrast, we leverage the Strong Estimator to eliminate the need for such a union bound. However, this also introduces new challenges.

Consider the current window $W=[r-n+1,r]$ , and let $l$ be the earliest index such that $F_{p}^{(l,r)}\leq 2F_{p}^{W}$ . Conditioned on the Strong Estimator satisfying its guarantee over $[l,r]$ , we can show that the number of timestamps at the first level is bounded and provides reliable guidance for constructing Difference Estimator .

The complication arises because this guarantee only holds with probability $1-\delta$ , and without a global union bound, we cannot ensure that the number of timestamps is always bounded across time. For instance, if the same matrix $\Pi$ is reused across all Strong Estimator instances, then in the unlikely case that these matrices fail simultaneously, the number of timestamps could grow unbounded.

To mitigate this, we introduce the following technique:

Technique 2 (Rotating Random Sets).

For each level $j\in[\beta]$ , we maintain a queue $q_{j}$ that holds $C\cdot 2^{j}\log n$ independent random matrices $\Pi$ , for a sufficiently large constant $C$ .

Each Strong Estimator instance $f_{i,j,k}$ uses a distinct matrix $\Pi$ from the queue that has not been used before. When a sketch is deleted, its corresponding matrix is returned to $q_{j}$ , making it available for reuse.

Lemma 23.

Fix any time $t$ . With probability at least $1-\frac{1}{\text{poly}(n)}$ , the queue $q_{j}$ is non-empty for all levels $j\in[\beta]$ . This ensures that there are at most $O(2^{j}\log n)$ active timestamps at level $j$ .

Hence, by a union bound over all levels and time, with high probability, the total number of timestamps across all levels remains bounded by $O(\epsilon^{-p}\log n)$ throughout the entire stream.

Rounding the Indyk’s p-stable Sketch and Difference Estimator

Our algorithm maintains only $O(\log n)$ timestamps at the top level. However, each of these timestamps stores an instance of Indyk’s p-stable Sketch sketch, which uses $O(\epsilon^{-2}\log n)$ bits, and the total space for the Difference Estimator sketches at each level is also $\tilde{O}(\epsilon^{-2}\log n)$ . To reduce the overall space to $\tilde{O}(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n)$ , we apply the rounding technique from [16] to compress the sketch values.

Lemma 24 (Correctness of rounding Indyk’s p-stable Sketch ).

Let $\epsilon^{\prime},\delta^{\prime}\in(0,1)$ , and $[l,r]$ be a window, $\Pi\in\mathbb{R}^{d\times U}$ be the sketch matrix, $\gamma=(\frac{\epsilon^{\prime}\delta^{\prime}}{dm\log(Um)})^{C}$ , $\mathcal{P}(\Pi x^{(l,r)})$ be the sketch vector after rounding procedure 2 with parameter $\gamma$ .

Then $\operatorname{median}(|\mathcal{P}(\Pi x^{(l,r)})_{1}|,\dots,|\mathcal{P}(\Pi x% ^{(l,r)})_{d}|)=\operatorname{median}(|(\Pi x^{(l,r)})_{1}|,\dots,|(\Pi x^{(l,% r)})_{d}|)\pm\epsilon^{\prime}\|x^{(l,r)}\|_{p}$ with probability $\geq 1-\delta^{\prime}$ .

Proof.

The proof simply follows Lemma 21. $\hfill\blacktriangleleft$

Setting the $\epsilon^{\prime},\delta^{\prime}$ in Lemma 24 to be $\frac{1}{4}\epsilon,\frac{1}{4}\delta$ , we can store the rounded vector in space $O(d\log(\frac{1}{\gamma}\log n))=O(d(\log\frac{1}{\epsilon\delta}+\log\log n))$ .

Next, we prove the correctness of rounding the Difference Estimator, for $p=2$ . The argument is the same as Lemma 24.

Lemma 25 (Correctness of rounding the Difference Estimator ).

Let $\epsilon_{0},\delta_{0}\in(0,1),q=3$ , $u, v$ be 2 frequent vector, $A\in\mathbb{R}^{d\times U}$ be the sketch matrix, $\gamma=(\frac{\epsilon_{0}\delta_{0}}{dm\log(Um)})^{C}$ , for $p\in(0,2)$ , recall the way Difference Estimator estimate the difference between $\|u+v\|_{p}^{p}$ and $\|v\|_{p}^{p}$ is by calculate the mean of $\prod_{j=q(i-1)+1}^{qi}|(Au+Av)_{j}|^{p/q}-\prod_{j=q(i-1)+1}^{qi}|(Av)_{j}|^{% p/q}$ . Then the mean of $\prod_{j=q(i-1)+1}^{qi}|(\mathcal{P}(Au+Av))_{j}|^{p/q}-\prod_{j=q(i-1)+1}^{qi% }|(\mathcal{P}(Av))_{j}|^{p/q}$ only deviate $O((\epsilon_{0}\frac{d^{2}}{\delta_{0}^{2}})^{q/p})\cdot F(v)$ from the original estimation with probability $\geq 1-2\delta_{0}$ .

Thus, setting $\epsilon_{0}=O(\frac{\epsilon\delta^{2}}{d^{2}})$ and $\delta_{0}=O(\delta)$ , we can store the each Difference Estimator in space $O(d(\log\frac{1}{\epsilon\delta}+\log\log n))$ .

4.4 Nearly Optimal $F_{p}$ Estimation

Now we introduce our main result for $F_{p}$ estimation.

Theorem 26.

Let $p\in[1,2]$ , for any fixed window $W$ , with probability $\geq\frac{2}{3}$ , Algorithm 2 will give an $\epsilon$ approximation of $F_{p}^{W}$ , and use $\tilde{O}(\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n)$ bits of space, more specifically, $O((\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n\log^{4}\frac{1}{\epsilon})(\log% \frac{1}{\epsilon}+(\log\log n)^{2}))$ .

Let the query window be $W=[r-n,r]$ , and let $l$ be the smallest index that $F_{p}^{(l,r)}\leq 2F_{p}^{W}$ , and $W^{\prime}=[l,r]$ . First, we prove that our top level Strong Estimator gives a constant approximation. The proof can be easily seen from Lemma 16.

Lemma 27 (Constant factor partitions in top level).

With probability $\geq 1-\delta$ , $F_{p}^{W}\leq F_{p}^{\left(t_{1},r\right)}\leq 2F_{p}^{W}$ and $\frac{1}{2}F_{p}^{W}\leq F_{p}^{\left(t_{2},r\right)}\leq F_{p}^{W}$ .

Proof.

As we use a $(\frac{\epsilon^{p}}{64},\frac{\epsilon^{p}}{64},\delta_{SE})$ Strong Estimator for $\epsilon=\frac{1}{2}$ . By Lemma 16, $\ell_{p}^{(t_{2},r)}\geq\frac{3}{4}\ell_{p}^{(t_{1},r)}$ . So $F_{p}^{(t_{1},r)}\leq(\frac{4}{3})^{p}F_{p}^{(t_{2},r)}\leq 2F_{p}^{W}$ and $F_{p}^{(t_{2},r)}\geq(\frac{3}{4})^{p}F_{p}^{(t_{1},r)}\geq\frac{1}{2}F_{p}^{W}$ . $\hfill\blacktriangleleft$

Let $\mathcal{E}$ be the event that the top level Strong Estimator gives a constant factor partition, and for the subroutine levels, all the $O(\epsilon^{-p}\log n)$ instances of Strong Estimator satisfy the $(\frac{1}{2},\frac{2^{-1/p}\cdot\epsilon}{2^{30}})$ Strong Estimator property on window $W^{\prime}$ . In the subroutine level, we use $(\frac{1}{2},\frac{2^{1/p}\cdot\epsilon}{2^{30}},\delta_{Sub}<\frac{\delta}{O(% \epsilon^{p}\log n)})$ Strong Estimator, so by a union bound, all the $O(\epsilon^{p}\log n)$ instances of Strong Estimator satisfy the Strong Estimator property in window $W^{\prime}$ with probability $\geq 1-\delta$ .

So by adjusting the constant for $\delta$ , with probability $\geq 1-\delta$ , $\mathcal{E}$ will happen.

We now show an upper bound on the moment of each substream whose contribution is estimated by the difference estimator, i.e., the difference estimators are well-defined.

Lemma 28 (Upper bounds on splitting times).

Let $p\in[1,2]$ . Conditioned on $\mathcal{E}$ , we have that for each $j\in[\beta]$ and all $k$ , either $t_{1,j,k+1}=t_{1,j,k}+1$ or $F_{p}^{\left(t_{1,j,k},t_{1,j,k+1}\right)}\leq 2^{-j-7}\cdot F_{p}^{W}$ .

Lemma 29 (Accuracy of difference estimators. Lemma A.3 [21]).

Let $p\in[1,2]$ . Conditioned on the event in Lemma 28 holding, we have that for each $j\in[\beta]$ and all $k$ ,

$\operatorname{SDiffEsT}\left(t_{i,j,k-1},t_{i,j,k},t,\gamma_{j},\eta,\delta\right)$ gives an additive $\eta\cdot F_{p}\left(t_{i,j,k},t\right)$ approximation to $F_{p}^{\left(t_{i,j,k-1},t\right)}-F_{p}^{\left(t_{i,j,k},t\right)}$ with probability at least $1-\delta$ .

Lemma 30 (Geometric upper bounds on splitting times. Lemma A.4 [21]).

Let $p\in[1,2]$ . Conditioned on the event in Lemma 28 holding, we have that for each $j\in[\beta]$ and each $k$ that either $t_{i,j,k+1}=t_{i,j,k}+1$ or $F_{p}^{\left(t_{i,j,k},m\right)}-F_{p}^{\left(t_{i,j,k+1},m\right)}\leq 2^{-j/% p-2}p\cdot F_{p}^{\left(t_{i,j,k+1},m\right)}$ .

Lemma 31 (Geometric Lower bounds on splitting times).

Let $p\in[1,2]$ , conditioned on $\mathcal{E}$ , we have that for each $j\in[\beta]$ and each $k$ that

F_{p}^{\left(t_{i,j,k-1},t_{i,j,k+1}\right)}>2^{-j-21}\cdot F_{p}^{W}.

Lemma 32 (Number of level $j$ difference estimators. Lemma A.6 [21]).

Let $p\in[1,2]$ , conditioned on the Geometric Lower bounds on splitting times to hold (Lemma 31) , we have that for each $j\in[\beta]$ , that $k\leq 2^{j+23}$ for any $t_{1,j,k}$ .

We next bound the number of level $j$ difference estimators that can occur from the end of the previous level $j-1$ difference estimator to the time when the sliding window begins. We say a difference estimator $\mathcal{C}_{1,j,k}$ is active if $k\in\left[a_{j},b_{j}\right]$ for the indices $a_{j}$ and $b_{j}$ defined in StitchSW. The active difference estimators in each level will be the algorithms whose output are subtracted from the initial rough estimate to form the final estimate of $F_{p}^{W}$ .

Lemma 33 (Number of active level $j$ difference estimators. Lemma A.7 [21]).

Conditioned on the Upper bounds and Geometric lower bounds on splitting times to hold (Lemma 28 and 31) , for $p\in[1,2]$ and each $j\in[\beta]$ , let $a_{j}$ be the smallest index such that $t_{1,j,a_{j}}\geq c_{j-1}$ and let $b_{j}$ be the largest index such that $t_{1,j,b_{j}}\leq r-n$ . Then conditioned on $\mathcal{E}$ , we have $b_{j}-a_{j}\leq 2^{24}$ .

Lemma 34 (Correctness of sliding window algorithm).

For $p\in[1,2]$ , Algorithm 2 outputs a $(1\pm\varepsilon)$ -approximation to $F_{p}^{W}$ with probability $\geq 1-\delta$ .

Now we will prove the space bound.

Lemma 35.

Let $\delta$ be a constant $\frac{1}{3}$ , Algorithm 2 uses space $O((\epsilon^{-p}\log^{2}n+\epsilon^{-2}\log n\log^{4}\frac{1}{\epsilon})(\log% \frac{1}{\epsilon}+(\log\log n)^{2}))$ bits.

Combining Lemma 34, which establishes the approximation guarantee, with the space bound given in Lemma 35, we conclude the proof of Theorem 26.

5 Lower bounds

We consider the following communication problem, which is a generalization of the classic GreaterThan communication problem.

Problem 36.

In the $\text{IndexGreater}(N,k)$ problem, Alice receives a sequence $(x_{1},\ldots,x_{k})\in[2^{N}]^{k}$ . Bob receives an an index $j$ and a number $y\in[2^{N}].$ Alice sends a one-way message to Bob, and Bob must decide if $y>x_{j}$ or if $y\leq x_{j}.$

Lemma 37.

Problem 36 requires $\Omega(Nk)$ communication to solve with $2/3$ probability.

This lower bound is known. For instance [7] applies it in their lower bounds as well. We supply a short proof in the appendix from the somewhat more standard Augmented Index problem.

Lemma 38.

Let $p\in(1,2],$ and let $W, N, k, r$ be parameters satisfying the following:

$\blacksquare$

$n\leq U$
$\blacksquare$

$2^{r}Nk^{1-1/p}\leq n^{1-1/p}$
$\blacksquare$

$2^{r}Nk\leq n$

Suppose that $\mathcal{A}$ is a streaming algorithm that operates on sliding windows of size $n$ , and gives a $1\pm\frac{1}{12k^{1/p}}$ multiplicative approximation to $F_{p}$ on any fixed window with $9/10$ probability, over the universe $[U]$ .

Then $\mathcal{A}$ uses at least $\Omega(kr\log N)$ space.

Proof.

See the appendix of the full version. $\hfill\blacktriangleleft$

Theorem 39.

Fix $p\in(1,2]$ . Suppose that $\mathcal{A}$ is a streaming algorithm that operates on a window of size $n$ , and outputs a $(1\pm\varepsilon)$ approximation of $F_{p}$ on any fixed window with $9/10$ probability. Suppose that $\varepsilon\geq n^{-1/p}.$ Then $\mathcal{A}$ must use at least

\Omega\left(\frac{(p-1)^{2}}{\varepsilon^{p}}\log^{2}(\varepsilon\min(n,U)^{1/% p})+\frac{1}{\varepsilon^{2}}\log(\varepsilon^{1/p}\min(U,n))\right)

space. In particular for constant $p$ and $n\geq U,$ $\mathcal{A}$ must use at least

\Omega\left(\frac{1}{\varepsilon^{p}}\log^{2}(\varepsilon U)+\frac{1}{% \varepsilon^{2}}\log(\varepsilon^{1/p}U)\right)

space.

Proof.

The second term in the lower bound is shown by [5] in their Theorem 11 for general insertion-only streams.

For the first term, first suppose that $n\leq U.$ In Lemma 38, set $k=\frac{1}{\varepsilon^{p}}$ . Then set $2^{r}$ and $N$ to both be $\Theta((\varepsilon^{p-1}n^{1-1/p})^{1/2})$ so that the conditions of Lemma 38 hold. This gives the stated bound for $n\leq U.$ To handle the case where $n\geq U$ , note that we can reduce from the $n\leq U$ setting; simply duplicate each item in the stream $\frac{n}{U}$ times. $\hfill\blacktriangleleft$

References

[1] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20–29, 1996. doi:10.1145/237814.237823.
[2] Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702–732, 2004. doi:10.1016/J.JCSS.2003.11.006.
[3] Mihir Bellare and John Rompel. Randomness-efficient oblivious sampling. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 276–287. IEEE, 1994. doi:10.1109/SFCS.1994.365687.
[4] Jarosław Błasiok, Jian Ding, and Jelani Nelson. Continuous monitoring of $\ell_{p}$ norms in data streams. arXiv preprint arXiv:1704.06710, 2017.
[5] Mark Braverman and Or Zamir. Optimality of frequency moment estimation. arXiv preprint arXiv:2411.02148, 2024. doi:10.48550/arXiv.2411.02148.
[6] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P Woodruff. Bptree: An $\ell_{2}$ heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 361–376, 2017.
[7] Vladimir Braverman, Elena Grigorescu, Harry Lang, David P Woodruff, and Samson Zhou. Nearly optimal distinct elements and heavy hitters on sliding windows. arXiv preprint arXiv:1805.00212, 2018. arXiv:1805.00212.
[8] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 283–293, 2007. doi:10.1109/FOCS.2007.55.
[9] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 283–293. IEEE, 2007. doi:10.1109/FOCS.2007.55.
[10] Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM Journal on Computing, 39(6):2113–2131, 2010. doi:10.1137/090749281.
[11] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693–703. Springer, 2002. doi:10.1007/3-540-45465-9_59.
[12] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794–1813, 2002. doi:10.1137/S0097539701398363.
[13] Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms, pages 348–360. Springer, 2002. doi:10.1007/3-540-45749-6_33.
[14] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006. doi:10.1145/1147954.1147955.
[15] Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202–208, 2005. doi:10.1145/1060590.1060621.
[16] Rajesh Jayaram and David P. Woodruff. Towards optimal moment estimation in streaming and distributed models. ACM Trans. Algorithms, 19(3), June 2023. doi:10.1145/3596494.
[17] Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for $L_{p}$ samplers, finding duplicates in streams, and related problems. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’11, pages 49–58, New York, NY, USA, 2011. Association for Computing Machinery. doi:10.1145/1989284.1989289.
[18] Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161–1178. SIAM, 2010. doi:10.1137/1.9781611973075.93.
[19] James L. Jr. Nolan. Stable distributions. models for heavy tailed data, 2001. URL: https://api.semanticscholar.org/CorpusID:117487370.
[20] Subhabrata Sen and Jia Wang. Analyzing peer-to-peer traffic across large networks. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, pages 137–150, 2002. doi:10.1145/637201.637222.
[21] David P Woodruff and Samson Zhou. Tight bounds for adversarially robust streams and sliding windows via difference estimators. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1183–1196. IEEE, 2022.
[22] Vladimir M Zolotarev. One-dimensional stable distributions, volume 65. American Mathematical Soc., 1986.

[bib.bib1] [1] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20–29, 1996. doi:10.1145/237814.237823.

[bib.bib2] [2] Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702–732, 2004. doi:10.1016/J.JCSS.2003.11.006.

[bib.bib3] [3] Mihir Bellare and John Rompel. Randomness-efficient oblivious sampling. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 276–287. IEEE, 1994. doi:10.1109/SFCS.1994.365687.

[bib.bib4] [4] Jarosław Błasiok, Jian Ding, and Jelani Nelson. Continuous monitoring of $\ell_{p}$ norms in data streams. arXiv preprint arXiv:1704.06710, 2017.

[bib.bib5] [5] Mark Braverman and Or Zamir. Optimality of frequency moment estimation. arXiv preprint arXiv:2411.02148, 2024. doi:10.48550/arXiv.2411.02148.

[bib.bib6] [6] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P Woodruff. Bptree: An $\ell_{2}$ heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 361–376, 2017.

[bib.bib7] [7] Vladimir Braverman, Elena Grigorescu, Harry Lang, David P Woodruff, and Samson Zhou. Nearly optimal distinct elements and heavy hitters on sliding windows. arXiv preprint arXiv:1805.00212, 2018. arXiv:1805.00212.

[bib.bib8] [8] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 283–293, 2007. doi:10.1109/FOCS.2007.55.

[bib.bib9] [9] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 283–293. IEEE, 2007. doi:10.1109/FOCS.2007.55.

[bib.bib10] [10] Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM Journal on Computing, 39(6):2113–2131, 2010. doi:10.1137/090749281.

[bib.bib11] [11] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693–703. Springer, 2002. doi:10.1007/3-540-45465-9_59.

[bib.bib12] [12] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794–1813, 2002. doi:10.1137/S0097539701398363.

[bib.bib13] [13] Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms, pages 348–360. Springer, 2002. doi:10.1007/3-540-45749-6_33.

[bib.bib14] [14] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006. doi:10.1145/1147954.1147955.

[bib.bib15] [15] Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202–208, 2005. doi:10.1145/1060590.1060621.

[bib.bib16] [16] Rajesh Jayaram and David P. Woodruff. Towards optimal moment estimation in streaming and distributed models. ACM Trans. Algorithms, 19(3), June 2023. doi:10.1145/3596494.

[bib.bib17] [17] Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for $L_{p}$ samplers, finding duplicates in streams, and related problems. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’11, pages 49–58, New York, NY, USA, 2011. Association for Computing Machinery. doi:10.1145/1989284.1989289.

[bib.bib18] [18] Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161–1178. SIAM, 2010. doi:10.1137/1.9781611973075.93.

[bib.bib19] [19] James L. Jr. Nolan. Stable distributions. models for heavy tailed data, 2001. URL: https://api.semanticscholar.org/CorpusID:117487370.

[bib.bib20] [20] Subhabrata Sen and Jia Wang. Analyzing peer-to-peer traffic across large networks. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, pages 137–150, 2002. doi:10.1145/637201.637222.

[bib.bib21] [21] David P Woodruff and Samson Zhou. Tight bounds for adversarially robust streams and sliding windows via difference estimators. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1183–1196. IEEE, 2022.

[bib.bib22] [22] Vladimir M Zolotarev. One-dimensional stable distributions, volume 65. American Mathematical Soc., 1986.

Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window Model

Abstract

Keywords and phrases:

Category:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Funding:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

1.1 Our models

The sliding window model

𝑭𝒑 moment estimation

Heavy hitters

1.2 Prior Work

1.3 Our Results

𝑭𝒑 estimation

Theorem 1.

Heavy hitters

Theorem 2.

Lower bounds

Theorem 3.

1.4 Notation

1.5 Technical Overview

𝑭𝒑 estimation

Heavy Hitters

Lower Bounds

2 Preliminary Results

Definition 4 (p-stable distribution. [22]).

Definition 5 (Strong Estimator).

Lemma 6 ([3]).

Lemma 7 ([4]).

Lemma 8 ([7]).

Lemma 9 ([17]).

3 Warmup: Constant-factor 𝑭𝒑 approximation

3.1 Constructing a Strong Estimator

Construction 10.

Lemma 11.

Lemma 12.

Lemma 13 (Additional Property for the Strong Estimator).

3.2 Constant-factor 𝑭𝒑 approximation

Theorem 14.

Theorem 15 (Main Theorem on Heavy Hitters).

Proof.

Lemma 16.

Proof of Theorem 14.

4 Improving the 𝜺 dependence

4.1 Preliminary Results

Definition 17 (Difference Estimator. Definition 8.2, [21]).

Lemma 18 (Construction of Difference Estimator. Lemma 4.9, Lemma 3.6 [21]).

Definition 19 ([16] Rounding random variable).

Procedure 1 (Recursive Randomized Rounding).

Lemma 20 (Lemma 2, [16]).

Procedure 2.

Lemma 21.

4.2 Algorithm Overview

4.3 Rounding Techniques

Storing the Sketch via Strong Estimator 𝒇𝒊,𝒋,𝒌

Claim 22.

Technique 1.

Challenges in Bounding the Number of Timestamps

Technique 2 (Rotating Random Sets).

Lemma 23.

Rounding the Indyk’s p-stable Sketch and Difference Estimator

Lemma 24 (Correctness of rounding Indyk’s p-stable Sketch ).

Proof.

Lemma 25 (Correctness of rounding the Difference Estimator ).

4.4 Nearly Optimal 𝑭𝒑 Estimation

Theorem 26.

Lemma 27 (Constant factor partitions in top level).

Proof.

Lemma 28 (Upper bounds on splitting times).

Lemma 29 (Accuracy of difference estimators. Lemma A.3 [21]).

Lemma 30 (Geometric upper bounds on splitting times. Lemma A.4 [21]).

Lemma 31 (Geometric Lower bounds on splitting times).

Lemma 32 (Number of level j difference estimators. Lemma A.6 [21]).

Lemma 33 (Number of active level j difference estimators. Lemma A.7 [21]).

$F_{p}$ moment estimation

$F_{p}$ estimation

$F_{p}$ estimation

Definition 4 ( $p$ -stable distribution. [22]).

3 Warmup: Constant-factor $F_{p}$ approximation

3.2 Constant-factor $F_{p}$ approximation

4 Improving the $\varepsilon$ dependence

Storing the Sketch via Strong Estimator $f_{i,j,k}$

4.4 Nearly Optimal $F_{p}$ Estimation

Lemma 32 (Number of level $j$ difference estimators. Lemma A.6 [21]).

Lemma 33 (Number of active level $j$ difference estimators. Lemma A.7 [21]).