Unit Interval Selection in Random Order Streams
Abstract
We consider the Unit Interval Selection problem in the one-pass random order streaming model. In this setting, an algorithm is presented with a sequence of unit-length intervals on the line that arrive in uniform random order, one at a time, and the objective is to output (an approximation of) a largest set of disjoint intervals using space linear in the size of an optimal solution. Previous work only considered adversarially ordered streams and established that, within these space constraints, a -approximation can be achieved in such streams, and this is best possible, in that going beyond such an approximation factor requires space [Emek et al., TALG’16] .
In this work, we show that an improved expected approximation factor can be achieved if the input stream is in uniform random order, where the expectation is taken over the stream order. More specifically, we give a one-pass streaming algorithm with expected approximation factor that uses space , where denotes an optimal solution. We also show that random order algorithms with expected approximation factor above require space , and algorithms that compute a better than -approximation with probability above also require space.
On a technical level, we design an algorithm for the restricted domain , for some constant , and use standard techniques to obtain an algorithm for unrestricted domains. For the restricted domain , we run recursive instances of our algorithm, with each instance targeting the situation where a specific interval of an optimal solution arrives first. We establish the interesting property of our algorithm that it performs worst when the input stream consists solely of a set of independent intervals. It then remains to analyse the algorithm on these simple instances.
Our lower bound is proved via communication complexity arguments, similar in spirit to the robust communication lower bounds established by [Chakrabarti et al., Theory Comput. 2016].
Keywords and phrases:
Random order streaming algorithms, unit interval selectionFunding:
Magnús M. Halldórsson: Partially supported by Icelandic Research Fund grant 2511609.Copyright and License:
Christian Konrad, and Kheeran K. Naidu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithms ; Theory of computation Random order and robust communication complexityEditors:
Meena Mahajan, Florin Manea, Annabelle McIver, and Nguyễn Kim ThắngSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
In the one-pass streaming model of computation, an algorithm is presented with a stream of input items that, a priori, arrive in arbitrary order. The algorithm is tasked with computing a solution while maintaining a memory of size sublinear in the length of the input stream.
We consider the Unit Interval Selection problem in the streaming model. In this problem, the input stream consists of closed unit-length intervals on the line, presented to the algorithm in arbitrary order, and the objective is to compute a pairwise non-overlapping subset of largest cardinality.
Emek et al. [12] initiated the study of Unit Interval Selection in the one-pass streaming model. They gave a -approximation algorithm that uses words111A word is the amount of space required to store a single interval. of space, where denotes an optimal solution, and showed that bits of space are necessary for algorithms with even slightly better approximation guarantees, which renders their algorithm space optimal. Subsequently, Cabello and Pérez-Lantero [7] gave a simpler algorithm with the same guarantee. We emphasize that these results hold for streams that are in adversarial/arbitrary order.
Given that the adversarial order setting is fully understood, in this paper, we ask whether improved algorithms are possible if the input stream is in uniform random order. Indeed, random order streams have received significant attention, and, for many problems, improvements beyond known lower bounds that hold for adversarial orderings are known, such as Quantile Estimation [13], Maximum Matching [15, 5, 2], Bounded Degree Property Testing [17], and edge-arrival Set Cover [16]. Lower bounds for random order streams were pioneered by Chakrabarti et al. [9], with subsequent work [8] further developing these ideas. The ideas from [8] have since been employed in various random order streaming lower bound proofs, including [2] and [3]. The random order setting, which assumes worst-case instances and random order arrivals, is a less pessimistic setting than the arbitrary order setting, which assumes both worst-case instances and worst-case orderings, and is therefore more relevant from a practitioner’s point of view.
1.1 Our Results
As our main result, we show that we can go beyond the -approximation barrier for adversarial order streams when considering a uniform random arrival order.
Theorem 1.
There is a deterministic one-pass words of space streaming algorithm for Unit Interval Selection on random order streams with expected approximation factor , where the expectation is taken over the random order of the stream.
We then complement our algorithm with the following limitation result:
Theorem 2.
Let be any small constant. Then:
-
1.
Any randomized one-pass streaming algorithm for random-order Unit Interval Selection with expected approximation factor must use bits of space.
-
2.
For any , any randomized one-pass streaming algorithm for random-order Unit Interval Selection with approximation factor that succeeds with probability at least on any input instance must use bits of space.
The expectation in (1) and the probability in (2) are taken over both the stream order and the random choices of the algorithm.
We note that, in the previous theorem, (1) immediately follows from (2), however, the result stated in (1) is more suitable for comparison to our algorithmic result. Furthermore, the result in (2) explains why our algorithm can only go beyond the -approximation barrier that holds for adversarial order streams in expectation, and not with high constant probability.
Our algorithmic result combined with our lower bound result show that the optimal achievable approximation ratio lies in the interval , and we leave it as a great open question to further narrow this gap.
1.2 Techniques
We will now discuss the main ideas behind both our algorithm and our lower bound.
Algorithm.
We make use of an established technique that, given a one-pass streaming algorithm for unit-length intervals that are all contained in the restricted domain , for some constant integer , and has approximation factor , produces an algorithm for unrestricted domains with only a constant factor blow-up in space and a factor blow-up in the approximation factor [11]. We thus see that it is enough to design an algorithm for the restricted domain , for some large but constant , large enough so as to minimize the impact of the factor on the final approximation factor.
To this end, as a key building block of our algorithm, we employ the following strategy: We maintain the left-most interval observed thus far in the stream, and feed the substream of intervals that lie to the right of into a recursive instance of our algorithm. This building block then returns the interval together with the solution produced by the recursive call. To see the strength of this method, denote by an arbitrary but fixed optimal solution. Furthermore, assume that the left-most interval arrives before any of the other intervals in . In this case, the algorithm would have either picked as the interval or an interval that is even further left as , and the substream fed into the recursive call of the algorithm contains , i.e., an independent set that is only by one smaller. In other words, the algorithm has acted optimally thus far since one interval was selected and a subinstance that contains an independent set of size was created whose solution can be combined with the selected interval .
Unfortunately, the event that appears first among the intervals of has a probability of only (over the random order of the stream). Our aim, however, is to make this approach work, even if any other interval of is the first to arrive among the intervals in . First, by symmetry, we see that the approach above also works when the right-most interval of arrives first. Hence, suppose now that the -th (counted from left to right) optimal interval arrives before any of the other optimal intervals, for some . Furthermore, let be the largest integer strictly smaller than the left boundary of , and let be the smallest integer strictly larger than the right boundary of . Then, we construct two solutions and pick the larger one, as follows. For solution 1, we combine the outputs of two independent runs of our algorithm, one on the domain , and one on the domain . While these runs ignore intervals that intersect with the point and may thus potentially miss a crucial optimal interval that intersects this point, we can now exploit the fact that the interval is the left-most optimal interval in the domain . Hence, as discussed above, the run on does not make any mistake when selecting the interval as the leftmost interval (or an interval within that lies even further to the left as ) into the final solution. For solution 2, we combine the outputs of the two runs of the algorithm on the domains , and , respectively, and now exploit the fact that is the right-most optimal interval in the domain .
Last, since our algorithm cannot know which interval of will indeed arrive first, we run the above strategy for every integer . We observe that this still yields an algorithm with only constant space complexity since the domain is of constant size, which further implies that the optimal solution within this domain is also only of constant size .
In our analysis, we establish the monotonicity property that adding intervals to the input stream never decreases the size of the output produced by the algorithm. In other words, the worst-case performance of our algorithm is achieved when the algorithm is run on an independent set – a special case that could of course be solved optimally by just reporting these intervals. We can thus focus on analysing the performance of the algorithm on this special case. Due to the many recursive calls of the algorithm, we obtain a recursive formula for the solution size obtained. We observe that the approximation factor of our algorithm decreases as increases. Recall, however, that we still need to apply the technique mentioned above to go from the restricted domain to an unrestricted domain, which incurs a further loss in the approximation factor by a factor of . Using a computer programme, we see that, for , the final algorithm has an approximation factor of at least , which is provably very close to the best possible choice. This specific choice of is discussed in more detail in the full version of this paper.
Lower Bound.
Our lower bound is obtained via a reduction to the problem in the one-way two-party communication complexity setting, by combining the clique gadgets already used in [12] and [7] for adversarial order lower bounds with the random order lower bound ideas by Cormode et al. [8].
In the problem, there are two players, denoted Alice and Bob. Alice holds a vector consisting of uniform independent random bits, and Bob is given a uniform random index . Alice sends a message to Bob and Bob outputs the bit . The objective is to obtain a communication protocol that communicates a message of smallest possible size. It is known that requires a message of size . Since streaming algorithms can be used as one-way two-party communication protocols (Alice runs the streaming algorithm on her part of the input and sends the memory state to Bob, who then completes the execution of the algorithm on his part), lower bounds on the message size translate and yield lower bounds on the space used by streaming algorithms.
The problem can be reduced to Unit Interval Selection as follows. Using the bits in , Alice constructs a clique of mutually overlapping intervals, i.e., a stack of intervals such that, for every , the -th interval is slightly further to the right than the -th interval. The bits are encoded in this construction in that, depending on the value of , the -th interval is ever so slightly moved, which, however, does not affect the logic behind the construction. Crucially, an algorithm that outputs the -th interval of the clique stack, for any , must know the value . Then, based on the index , Bob constructs two wing intervals, which surround, but are independent of, the interval corresponding to . The key property is that every clique interval, other than the one corresponding to , must intersect exactly one of these wing intervals, or, in other words, the only independent set of size consists of the wing intervals together with the -th interval of the stack. This in turn implies that any algorithm on this construction that computes a better than -approximation must report the interval corresponding to , and its location allows recovering the bit .
When bringing this construction into the random-order setting, for the resulting construction to be hard it is necessary that the two wing intervals appear after the -th clique interval since otherwise the positions of the wing intervals reveal the index and an algorithm could then simply collect the -th clique interval when it arrives, obtain an optimal solution to the interval selection problem, and solve the underlying instance. We, however, observe that, under a uniform random arrival order, with probability , the two wing intervals indeed arrive after the -th clique interval, and we prove that this case indeed remains hard to solve. Our expected approximation factor lower bound of is then explained by the fact that, with probability , an algorithm can only obtain a solution of size , and with probability , an algorithm can obtain a solution of optimal size , which results in an expected solution size , or an approximation factor of .
On a technical level, our approach is a reduction via the use of shared public randomness in the spirit of the argument from [8]. Alice and Bob sample a shared random permutation of the intervals that are used in the hard instance. Alice constructs each clique interval which arrives before a wing interval via the bits in , and then Bob constructs the wing intervals via , and also the remaining clique intervals via sampling public random variables. This results in a construction that reflects the bit of the input to precisely when the interval corresponding to arrives before either wing interval, and this occurs with probability , as desired, which yields our lower bound results.
1.3 Further Related Work
Besides unit-length intervals, intervals of arbitrary lengths have also been considered. Emek et al. [12] showed that, in adversarial order streams, a -approximation for arbitrary-length intervals can be achieved with space , and they also proved that algorithms with improved approximation factor must use space . Cabello and Pérez-Lantero [7] gave a simpler algorithm for this case. More recently, Dark et al. [11] explored algorithms for weighted interval streams and streams that allow for deletions. In a different line of work, the problem of estimating the size of a largest disjoint set of intervals has also been considered [7, 4]. Besides intervals, other geometric objects, such as squares and rectangles [10], and segments and 2-intervals [6], have also been explored.
Interval Selection has also been studied in the streaming sliding window model, where the objective is to maintain a solution on the most recent intervals. Alexandru and Konrad [1] showed that Interval Selection is harder in the sliding window model than in the one-pass streaming setting in that, for both unit-length intervals and arbitrary-length intervals, algorithms cannot achieve the same approximation guarantees as in the streaming setting when using space .
1.4 Outline
2 Preliminaries
We use to denote the length of the input stream and to denote the set .
2.1 Interval Selection
In this work, we consider closed intervals , where . We mostly focus on intervals of unit length, i.e., when . We say that is the left endpoint of and is the right endpoint of . We say that an interval is further left than an interval if its left endpoint is strictly smaller, i.e., holds. Similarly, is further right of if its right endpoint is strictly larger, i.e., holds.
Two intervals are independent if , and a set of intervals is an independent set if the intervals contained in are pairwise independent. The Unit Interval Selection problem asks for an independent set of intervals of largest size, where we also assume that all intervals have length .
We use to denote an arbitrary but fixed independent set of largest size, and we write to denote the size of a largest independent set, where the parameter passed on to is either a stream or a set or intervals. For a parameter , we say that an independent set is a -approximation if .
We also assume that any interval can be stored in a single word of space. Under the plausible assumption that the endpoint can be encoded in bits of space, this yields space overhead.
2.2 Communication Complexity
We use the standard one-way two-party communication complexity setting, first introduced by [19]. Our lower bounds will then be derived from a reduction to the well known problem. We refer the reader to [18] for a comprehensive overview of communication complexity.
In this setting, we have two parties that we denote by Alice and Bob. They each hold a portion of the input, and , respectively, and their goal is to compute/approximate a function on their inputs. To this end, Alice sends a single message to Bob, , and after receiving the message, Bob outputs or a suitable approximation thereof. Then, the size of Alice’s message is the relevant notion of complexity, and the goal is to send a message that is as small as possible. Alice and Bob can make use of infinite sequences of both private and public/shared random bits.
Definition 3.
For an integer , is the one-way two-party communication game where Alice holds a vector and Bob holds an index . Alice sends a single message to Bob who then outputs the value .
Let denote the uniform distribution over instances. This is the distribution where is sampled uniformly at random from and is sampled independently from and uniformly at random from its support .
It is well-known that solving requires a message of size , even in a distributional sense when the input is chosen from , and also regardless of any public randomness used.
Theorem 4 ( Hardness (e.g., [18])).
Let be any constant. Let be a one-way communication protocol for . Let denote any public randomness used by . If
then must send a message of size bits.
3 Random-Order Unit Interval Selection Algorithm
We now prove Theorem 1. To this end, we first present our algorithm (see Algorithm 1), and then analyse it.
Input: A domain , where , and
Initialisation:
During the Stream:
Input: A stream of unit-length intervals, each of which is fully contained in .
Output:
Our algorithm proceeds as follows. It is parametrised by two integers with , and operates under the assumption that all intervals fed into the algorithm are fully contained in the domain . Our overall goal is to design an algorithm that operates on the domain , for some integer . To accommodate for this, we will see later that, for technical reasons, we need to invoke our algorithm with parameters and .
Next, our algorithm considers all integers in as split points. For each such split point , the algorithm maintains in variables and the closest interval to that is located left of and the closest interval to that is located right of , respectively. It also executes four recursive calls of the algorithm associated with . The recursive calls and are instantiated on the domains and , respectively, and all intervals that fall into these domains are fed into these algorithms. The recursive calls and are also instantiated on the domains and , however, only intervals that are left of (and within the domain ) are fed into , and intervals that right of (and within the domain ) are fed into .
For each split point , two potential outputs are considered: The output generated by , combined with the output of as well as the interval , and the output generated by , combined with the output of as well as the interval . We note that, by construction, both these potential outputs form independent sets. The output of the algorithm then is the largest output of this kind when considering all split points.
In the following, we write to denote the output produced by algorithm when run on the input stream .
3.1 Monotonicity Property
We first prove that removing any interval from the input stream cannot increase the size of the output regardless of the order of the stream.
Lemma 5.
Let be an instance of Algorithm 1 instantiated with integers . Let be any input stream (not necessarily random-order) whose intervals are fully contained in , and let be any substream of . Then .
Proof.
Our proof is via induction over the quantity . We first show that the recursion tree will always have finite size.
Claim 6.
For any value of , the recursion tree of has finite size.
Proof.
Each recursive subproblem is instantiated before any intervals arrive. Thus, the size of the recursion tree depends only on and not on the input stream. Each recursive instance has a domain that is by one smaller than the domain of its parent, and an instance with domain size at most 1 creates no subproblems.
We now proceed with the induction.
Base Case.
Suppose is an instance of Algorithm 1 such that . Then .
Proof.
If , then no unit length intervals fit within . Therefore, and must both be empty streams, and so .
Inductive Hypothesis.
Suppose the lemma holds for all instances of such that .
We now consider when .
Inductive Step.
By construction, the output of is formed by combining the outputs of its recursive children, together with either or , for some . We will first argue that if any or stores an interval under , then it must also store an interval under . We will then argue that all of the recursive children must output solutions at least as large under compared to . This is sufficient to prove the claim.
First, W.L.O.G., suppose some is non-empty under . Then, there must exist some interval in which is contained in the domain . Then, this same interval must also be contained in ( is a superstream of ), and so must also be non-empty under . This argument holds for all relevant integers , and also for each .
We now turn to the substreams fed into the recursive children of , and begin with the instances and . W.L.O.G. consider for some relevant integer . Then, by construction, the size of the domain of is at most . Next, let denote the substream of intervals fed into when is fed into , and the substream fed in when is fed into . Then, is precisely the substream of intervals in that are fully contained in . Then, every interval in is also contained in and so , and therefore every interval in is also contained in . This gives us that is a substream of . We can now apply the inductive hypothesis to obtain
The same argument applies for every relevant integer , and also for each .
Finally, we now consider the instances and . W.L.O.G. consider for some relevant integer . Then, by construction, the size of the domain of is at most . Next, let denote the substream of intervals fed into when is fed into , and the substream fed in when is fed into . Then, these are each the substreams of and of the intervals fully contained in , which are also further left and independent of at the time each interval arrives. We now observe that whenever an interval in arrives in , the interval stored in must be at least as far right as when the same interval arrives in - by the construction of the algorithm. Therefore, every interval in is also contained in , giving us that is a substream of . We can now apply the inductive hypothesis to obtain
The same argument applies for every relevant integer , and also for each .
This concludes the proof of the Lemma.
3.2 Bounding the Expected Approximation Factor
We now derive a bound on the expected approximation factor achieved by Algorithm 1. To this end, let be an arbitrary but fixed integer, and we consider the performance of our algorithm when all intervals lie within the domain . For technical reasons, we run our algorithm on the domain and analyse the performance of this instantiation of the algorithm.
For a set of intervals that constitutes the input instance, we denote by the set of permutations of . Observe that the input stream then constitutes a uniform random element of .
We are interested in the expected solution size produced by the algorithm as a function of , i.e., the optimal solution size. For this purpose, for any integer , we define the following quantity:
| (1) |
i.e., the worst-case expected performance of the algorithm on instances with optimal solution size . For , define .
We first observe that it suffices to analyse the performance of our algorithm on streams consisting solely of independent intervals.
Observation 7.
Let be any input instance, and let denote the optimal intervals in . Then,
Proof.
For a permutation of the intervals , we denote by the substream of optimal intervals. Then, since is a substream of , we can use Lemma 5 to obtain:
The result then follows from the observation that and are identically distributed.
We will thus assume from now on that the input instance is a collection of disjoint intervals. We then take the input stream to be for the remainder of this subsection, and omit writing that is the input to .
In the following, we will establish lower bounds on , for every . To this purpose, we treat the cases explicitly, and then provide a recurrence relation for larger values of .
Lemma 8.
For each , .
Proof.
First, implies the input stream has no intervals, which is trivial. Next, if , then the input stream consists of a single interval. This interval will be picked up by since constitutes the left-most interval in the domain .
Now suppose that , which implies that the input consists of two intervals and with further left of . If arrives before , then will store , and will be fed into and returned by this recursive call. The interval will then be stored as the right most interval fed into , ensuring that . We thus obtain that . Then, by the construction of the algorithm, we get
Similarly, if arrives first then and the output of constitutes a candidate solution of size .
Next, we consider the case . We write
to denote the optimal intervals ordered from left to right, and consider the cases for which interval from arrives first. We then fix some , and suppose that arrives in the input stream before any other interval in . Let , for some real , and let
denote the smallest integer larger than that does not intersect with , and the largest integer smaller than or equal to which may only intersect with ’s left endpoint, respectively. Note that when is an integer, we have , but . This choice is due to the fact that every recursive instantiation of our algorithm is run on a domain with a closed left endpoint and an open right endpoint.
Lemma 9.
Conditioned on arriving first, both of the following hold:
-
1.
, and
-
2.
.
Proof.
We only prove 1) – the proof of 2) is similar and omitted.
By definition, the interval is further right and independent of , and so will be stored in . Next, observe that each of the intervals are fed into in uniform random-order. Then, by definition of , we obtain . Finally, combining and via linearity of expectation yields the desired bound.
Remark.
We note that each interval being contained in the domain implies that and also . Recall that we run algorithm on the domain , which implies that both the instances and are guaranteed to exist.
The next lemma follows by a similar proof.
Lemma 10.
Conditioned on arriving first, both of the following hold:
-
1.
, and
-
2.
.
Proof.
We only prove 2) – the proof of 1) is similar and omitted.
By construction, the substream fed into consists of the intervals in that do not overlap with the point . Then, since each interval is of unit length, and also holds, contains either the or left-most intervals of , and these are fed into in a uniform random order. We thus obtain . Finally, by Lemma 5, is a non-decreasing function, which implies that holds, and completes the proof.
We now combine these lemmas to form a recurrence for . In the following, for convenience, we write the maximum function over multiple lines.
Lemma 11.
Conditioned on arriving first, the following holds:
Proof.
The proof follows almost immediately from Lemmas 9 and 10:
| (Construction of the algorithm.) | ||||
| (Jensen’s inequality.) | ||||
| (Lemmas 9 and 10.) | ||||
In the following lemma, we obtain a bound on the expected approximation factor of our algorithm.
Lemma 12.
The expected approximation factor achieved by on intervals that lie in the domain is bounded from below by
where , , and for all .
Proof.
Let and consider an input instance that establishes the minimum in the definition of . We consider a run of on . Then:
| (Law of total expectation.) | ||||
| (Uniform random arrival order.) | ||||
| (By Lemma 11.) | ||||
In particular, this gives us a recurrence relation for , which we will later use to compute values numerically. The expected approximation factor is then simply this quantity multiplied by .
Lastly, when our algorithm is run on intervals in the domain , then we only know that . Hence, to bound the approximation factor of our algorithm on such instances, we consider all possible values of and pick the minimum approximation factor for these cases, which establishes the lemma.
3.3 Bounding the Space Used
In this section, we establish a bound on the space used by our algorithm.
Lemma 13.
Let be an instance of Algorithm 1 run on domain boundaries with . Then, uses words of space.
Proof.
By construction, consists of the following:
-
recursive instances of the algorithm, each on input domains of length at most ,
-
at most many stored intervals (in each and ).
Then, letting denote the space used by an instance of Algorithm 1 on a domain of length , we obtain the recurrence
which implies .
3.4 Going to Unrestricted Domains
We now use the shifting window idea, originally by Hochbaum and Mass [14], and later applied by [7] and [11], that allows going from restricted to unrestricted domains.
Our lemma below is a straightforward reformulation to random-order streams of a corresponding lemma given in [11] for adversarial order streams. We note that the proofs are quite similar.
Lemma 14.
Let be an integer. Let be any (deterministic) algorithm for random-order unit-length interval selection on the restricted domain that computes an -approximation in expectation (over the random order of the stream) using space . Then there is a (deterministic) algorithm for random-order unit-length interval selection on unrestricted domains that computes an -approximation in expectation (over the random order of the stream) using space that uses as a black box.
Proof.
Let be any deterministic algorithm for random-order unit-length interval selection on the restricted domain that computes an -approximation in expectation (over the random order of the stream) using space . We use as a black-box to present an algorithm for unrestricted domains.
To this end, for all , we first define the window . Initially, we mark every window as inactive. For each interval arriving in the stream, let be the set of windows that fully contain . For each , if is marked inactive, create a new instance of corresponding to , and feed into . Then mark as active. Otherwise, if is already marked active then there is already an instance of running that corresponds to . In this case, we feed the interval into the instance of . We note that, as a technicality, whenever an interval is to be fed into an instance corresponding to a window then is given the transformed interval so that ’s input is contained within the restricted domain .
For each window , let denote the output from the instance of corresponding to at the end of the stream. We implicitly transform each interval in back so that they correspond to intervals on the unrestricted domain. At the end of the stream output a maximum independent set of intervals from
We optimize the space used by the algorithm by only storing the set of windows that are marked as active, i.e. each window is implicitly marked as inactive until the time it is explicitly marked as active. Then, a window marked as inactive uses zero space.
The space used by the algorithm is precisely the space used by windows marked as active, and each window uses space (to store as well as mark it active). For each window , define and to be the windows shifted by one to the left and the right, respectively. Then define
Now define
Next, for every interval , there exists exactly many values of such that . Therefore . Also, the set of windows marked as active must be a subset of . If not, there is a window marked as active that intersects no interval in , and so there is an interval in that intersects no interval in . This contradicts the maximality of a maximum independent set. It follows that the total space used by the algorithm is bounded by
For any window , the set of intervals in contained in is independent of the order of . Therefore, the substream of on is also a random-order stream222This is where our argument adds on to the similar argument by [11]..
Next, for each , define the set of disjoint windows and define . Then, using that every interval in is contained in exactly windows and only a single window from any given ,
Therefore, there exists some such that
| (2) |
We now consider the set of windows . Each window in is disjoint so is a valid independent set of the intervals of . Then, taking the expectation over the random order of the stream,
| ( is a valid independent set.) | ||||
| (Linearity of Expectation and the substream of on is also random-order.) | ||||
| ( computes an -approximation in expectation.) | ||||
| (Each is disjoint, so each is contained in exactly one .) | ||||
| (Definition of .) | ||||
| (Equation 2.) | ||||
This meta-algorithm for unrestricted domains is deterministic. Therefore, the final algorithm for unrestricted domains is deterministic if and only if is deterministic.
3.5 Completing the Analysis
Theorem 1. [Restated, see original statement.]
There is a deterministic one-pass words of space streaming algorithm for Unit Interval Selection on random order streams with expected approximation factor , where the expectation is taken over the random order of the stream.
Proof.
The proof is by combining Algorithm 1 with Lemma 14 to obtain an algorithm for random-order unit-length interval streams without any restrictions on the input domain. We first consider the space used by this approach. We then lower bound the expected approximation factor achieved on unrestricted domains with a function that only depends on the integer window length (from Lemma 14), and finally numerically optimise this function.
From Subsection 3.2, on a window of length , we run an algorithm with domain size . Then, from Lemmas 13 and 14, this yields an algorithm for unrestricted domains that uses
words of space, where denotes the space used by an instance of the algorithm instantiated on a domain of length (from Lemma 13). Then, when , this space collapses to become .
We now consider the approximation factor of the unrestricted algorithm. Observe that a largest independent set in the window is of size at most . Therefore, combining Lemmas 12 and 14, for any , there is an algorithm for random-order unit-length interval selection on unrestricted domains that achieves an expected approximation factor of
| (3) |
It remains to pick a suitable constant . Computing (Equation 3) numerically with (using the recurrence relation for of Lemma 12) yields a lower bound of (slightly) more than for the expected approximation factor, completing the proof. We discuss the choice of further in the full version of this paper.
4 The Lower Bound
We now establish our lower bound result as stated in Theorem 2.
Given a random order Unit Interval Selection algorithm , we define protocol that uses to solve the problem:
Protocol for :
Input
Let Alice and Bob be two players holding a instance from the uniform distribution . Let be any algorithm for random-order Unit Interval Selection. For each , define the uniformly and independently distributed binary random variable . Furthermore, define the uniformly distributed random variables . Let denote the set containing these random variables, that is, .
The Protocol
-
1.
Alice and Bob use shared public randomness to sample each variable in . They also sample a bijection .
-
2.
Let .
-
3.
For each such that , Alice constructs the interval
-
4.
Alice runs on these constructed intervals using the ordering . Note that defines an ordering on these intervals by associating each with . Alice then sends the state of to Bob as the message.
-
5.
For each such that , Bob constructs the interval
Bob also constructs the intervals
and
-
6.
Bob uses the message from Alice to continue running on these intervals using the ordering , associating with , and with , under .
Output
If either produces an output of size at most , or , then Bob outputs a uniform random bit. Otherwise, it must be that outputs a solution of size three, and . As observed below, this solution must contain the interval . Then, if is located at the position then Bob outputs the bit , otherwise the interval must be located at position and Bob outputs the bit .
Protocol works as follows. Given a uniform random input to , Alice and Bob first create a random interval instance that is fully described by uniform random bits as well as the index from the input: Each bit corresponds to an interval so that the intervals all mutually overlap as in Figure 1. The role of the bit is as follows: If then the interval is shifted slightly to the right, while this does not happen if . Then, the index translates into two additional intervals and that surround the interval so that constitutes the unique independent set of size .
We observe that the instance created so far is a uniform random instance since all the variables are independent uniform random variables.
Next, Alice and Bob choose a uniform random ordering of the intervals in this instance. We stress that, by construction, is independent of the intervals created, i.e., of the quantities as well as .
Alice now replaces some of the bits by the values as follows: If in the uniform random ordering the interval arrives before both and , then is updated to the value . Distributionally speaking, we observe that this process replaces uniform random bits by different uniform random bits (recall that the input to the instance is a uniform random instance). Hence, the distribution of the resulting interval instance is a uniform distribution, and, most importantly, the order of the intervals is random and independent of the instance. The reduction thus creates a random order stream.
We observe that Alice only updates intervals that arrive before the intervals and , or, in other words, the decision as to which intervals are updated depends on the stream order. One may falsely conclude that the resulting instance may therefore be correlated with the arrival order , which would mean that the resulting stream is not in uniform random order. We stress, however, that this is not the case since this process only replaces uniform random bits by a different set of uniform random bits. It is therefore not possible to establish a correlation.
Next, we see that whenever Alice encoded the correct bit into interval , which, as we will see, happens with probability , and if, at the same time, the algorithm succeeds, then Alice and Bob are able to solve the underlying instance. Otherwise, Bob outputs a uniform random bit. We will see that if succeeds with probability greater than then the protocol has a success probability strictly larger than . The lower bound for as stated in Theorem 4 then implies that must communicate bits, and thus, algorithm must also use at least as much space.
Theorem 2. [Restated, see original statement.]
Let be any small constant. Then:
-
1.
Any randomized one-pass streaming algorithm for random-order Unit Interval Selection with expected approximation factor must use bits of space.
-
2.
For any , any randomized one-pass streaming algorithm for random-order Unit Interval Selection with approximation factor that succeeds with probability at least on any input instance must use bits of space.
The expectation in (1) and the probability in (2) are taken over both the stream order and the random choices of the algorithm.
Proof.
Let be a randomized algorithm for Unit Interval Selection with approximation factor , for any small constant , that succeeds with probability at least , for some , on every input instance, where the probability is taken over both the stream order and the random coin flips of the algorithm. We will first prove that must use space, i.e., we establish result 2. Then, we argue that result 2 immediately implies result 1.
To see that result 2 holds, we first observe that, whenever algorithm succeeds then it must output a solution of size since its approximation factor is strictly larger than and every instance created contains an optimal solution of size . Second, as argued above, no matter the input instance created, it is presented to in uniform random order, which implies that the success probability of the algorithm is indeed . Third, we see that, with probability , the interval is created by Alice and thus correctly reflects the bit . We are thus interested in the probability that both reflects and the algorithm succeeds on the resulting instance.
Using the events “Alice creates ” and “ succeeds”, we obtain:
| (4) |
Next, we use the fact that the success probability of is at least . Hence,
which implies Using this bound in Equality 4 yields . In other words, with probability , Bob solves the underlying instance.
Lastly, in all other cases, i.e., with probability , Bob outputs a uniform random bit. Hence, the protocol solves the instance with probability . Then, by Theorem 4, the resulting protocol must use a message of size bits, and since the message size is identical to the space used by our algorithm, the same lower bound applies to the space usage of . Last, since in our construction, result 2 follows.
To see that result 1 holds, consider an algorithm for Unit Interval Selection with expected approximation factor , for some , where the expectation is taken over the random coin flips of the algorithm as well as the stream order. We will now argue that achieves an approximation factor of at least with probability . Then, by result 2 we obtain that must use bits of space, thus establishing result 1.
We compute:
which implies as desired.
5 Conclusion
In this paper, we gave a one-pass random order streaming algorithm for Unit Interval Selection with expected approximation factor that uses space , where the expectation is taken over the order of the stream. We also proved that, within this space constraint, no algorithm can achieve an expected approximation factor beyond , and no algorithm can have an approximation factor better than with probability above .
We conclude with two open questions.
-
1.
Our results are not tight. Is there either an algorithm with improved expected approximation guarantee or is it possible to prove a stronger lower bound?
-
2.
We have not addressed arbitrary-length intervals. Is there an algorithm with expected approximation factor above for Interval Selection on arbitrary-length intervals?
References
- [1] Cezar-Mihail Alexandru and Christian Konrad. Interval Selection in Sliding Windows. In Timothy Chan, Johannes Fischer, John Iacono, and Grzegorz Herman, editors, 32nd Annual European Symposium on Algorithms (ESA 2024), volume 308 of Leibniz International Proceedings in Informatics (LIPIcs), pages 8:1–8:17, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ESA.2024.8.
- [2] Sepehr Assadi and Soheil Behnezhad. Beating two-thirds for random-order streaming matching. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, July 12-16, 2021, Glasgow, Scotland (Virtual Conference), volume 198 of LIPIcs, pages 19:1–19:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ICALP.2021.19.
- [3] Sepehr Assadi and Janani Sundaresan. (noisy) gap cycle counting strikes back: Random order streaming lower bounds for connected components and beyond. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 183–195, 2023. doi:10.1145/3564246.3585192.
- [4] Ainesh Bakshi, Nadiia Chepurko, and David P. Woodruff. Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams. In Jarosław Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020), volume 176 of Leibniz International Proceedings in Informatics (LIPIcs), pages 64:1–64:22, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.APPROX/RANDOM.2020.64.
- [5] Aaron Bernstein. Improved bounds for matching in random-order streams. Theor. Comp. Sys., 68(4):758–772, December 2023. doi:10.1007/s00224-023-10155-7.
- [6] Sujoy Bhore, Fabian Klute, and Jelle J. Oostveen. On streaming algorithms for geometric independent set and clique. In Parinya Chalermsook and Bundit Laekhanukit, editors, Approximation and Online Algorithms - 20th International Workshop, WAOA 2022, Potsdam, Germany, September 8-9, 2022, Proceedings, volume 13538 of Lecture Notes in Computer Science, pages 211–224. Springer, 2022. doi:10.1007/978-3-031-18367-6_11.
- [7] Sergio Cabello and Pablo Pérez-Lantero. Interval selection in the streaming model. Theor. Comput. Sci., 702:77–96, 2017. doi:10.1016/J.TCS.2017.08.015.
- [8] Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, pages 641–650, New York, NY, USA, 2008. Association for Computing Machinery. doi:10.1145/1374376.1374470.
- [9] Amit Chakrabarti, T. S. Jayram, and Mihai Pundefinedtraşcu. Tight lower bounds for selection in randomly ordered streams. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, pages 720–729, USA, 2008. Society for Industrial and Applied Mathematics.
- [10] Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 45:1–45:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.ICALP.2019.45.
- [11] Jacques Dark, Adithya Diddapur, and Christian Konrad. Interval selection in data streams: Weighted intervals and the insertion-deletion setting. In Patricia Bouyer and Srikanth Srinivasan, editors, 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2023, December 18-20, 2023, IIIT Hyderabad, Telangana, India, volume 284 of LIPIcs, pages 24:1–24:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.FSTTCS.2023.24.
- [12] Yuval Emek, Magnús M. Halldórsson, and Adi Rosén. Space-constrained interval selection. ACM Trans. Algorithms, 12(4), September 2016. doi:10.1145/2886102.
- [13] Sudipto Guha and Andrew McGregor. Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing, 38(5):2044–2059, 2009. doi:10.1137/07069328X.
- [14] Dorit S. Hochbaum and Wolfgang Maass. Approximation schemes for covering and packing problems in image processing and vlsi. J. ACM, 32(1):130–136, January 1985. doi:10.1145/2455.214106.
- [15] Michael Kapralov. Better bounds for matchings in the streaming model. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1679–1697. SIAM, 2013. doi:10.1137/1.9781611973105.121.
- [16] Sanjeev Khanna, Christian Konrad, and Cezar-Mihail Alexandru. Set cover in the one-pass edge-arrival streaming model. In Floris Geerts, Hung Q. Ngo, and Stavros Sintos, editors, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023, pages 127–139. ACM, 2023. doi:10.1145/3584372.3588678.
- [17] Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, and Christian Sohler. Testable Bounded Degree Graph Properties Are Random Order Streamable. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017), volume 80 of Leibniz International Proceedings in Informatics (LIPIcs), pages 131:1–131:14, Dagstuhl, Germany, 2017. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2017.131.
- [18] Anup Rao and Amir Yehudayoff. Communication Complexity: and Applications. Cambridge University Press, 2020.
- [19] Andrew Chi-Chih Yao. Some complexity questions related to distributive computing (preliminary report). In Proceedings of the eleventh annual ACM symposium on Theory of computing, pages 209–213, 1979. doi:10.1145/800135.804414.
