Abstract 1 Introduction 2 Preliminaries 3 Warm-up: a 𝟐-approximation on the 𝓖 class 4 A first step beyond single-threshold algorithms 5 A 𝟐-approximation for concentrated 𝓛𝟐-horizons References

IID Prophet Inequality with Random Horizon: Going Beyond Increasing Hazard Rates

Giordano Giambartolomei Department of Informatics, King’s College London, UK Frederik Mallmann-Trenn Department of Informatics, King’s College London, UK Raimundo Saona ORCID Institute of Science and Technology Austria, Klosterneuburg, Austria
Abstract

Prophet inequalities are a central object of study in optimal stopping theory. In the iid model, a gambler sees values in an online fashion, sampled independently from a given distribution. Upon observing each value, the gambler either accepts it as a reward, or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is to maximise the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations.

This model has been studied with infinite, finite and unknown number of values. When the gambler faces a random number of values, the model is said to have a random horizon. We consider the model in which the gambler is given a priori knowledge of the horizon’s distribution. Alijani et al. (2020) designed a single-threshold algorithm achieving a ratio of 1/2 when the random horizon has an increasing hazard rate and is independent of the values. We prove that with a single threshold, a ratio of 1/2 is actually achievable for several larger classes of horizon distributions, with the largest being known as the 𝒢 class in reliability theory. Moreover, we show that this does not extend to its dual, the 𝒢¯ class (which includes the decreasing hazard rate class), while it can be extended to low-variance horizons. Finally, we construct the first example of a family of horizons, for which multiple thresholds are necessary to achieve a nonzero ratio. We establish that the Secretary Problem optimal stopping rule provides one such algorithm, paving the way towards the study of the model beyond single-threshold algorithms.

Keywords and phrases:
Online algorithms, Prophet Inequality, Random Horizon, Secretary Problem
Category:
Track A: Algorithms, Complexity and Games
Funding:
Giordano Giambartolomei: EPSRC grants EP/W005573/1 and EP/X021696/1.
Frederik Mallmann-Trenn: EPSRC grant EP/W005573/1.
Raimundo Saona: ERC grant CoG 863818 (ForM-SMArt), ANID Chile grant ACT210005, French Agence Nationale de la Recherche (ANR) grant ANR-21-CE40-0020 (CONVERGENCE), and Austrian Science Fund (FWF) grant 10.55776/COE12.
Copyright and License:
[Uncaptioned image] © Giordano Giambartolomei, Frederik Mallmann-Trenn, and Raimundo Saona; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
Full Version: https://arxiv.org/abs/2407.11752
Acknowledgements:
We would like to thank José Correa for his precious advice, Bruno Ziliotto and Vasilis Livanos for early conversations.
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

Prophet inequalities are a central object of study in optimal stopping theory. A gambler sees nonnegative values in an online fashion, sampled from an instance of independent random variables {Xi} with known distributions {𝒟i}, in adversarial, random or selected order, depending on the particular model. When observing each value, the gambler either accepts it as a reward, or irrevocably rejects it and proceeds with observing the next value. The goal of the gambler, who cannot see the future, is to maximise the expected value of the reward while competing against the expectation of a prophet (out of metaphor, the offline maximum or supremum, depending on whether the instance is finite or not). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations.

The gambler represents any online decision maker, such as an algorithm or stopping rule. Probabilistically, we will refer to it as a stopping time τ. Informally, the term online implies that the gambler, unable to see the future, will always stop at a time τ such that the event {τ=i} depends only on the first i values observed.

1.1 Prophet inequality models

Several models and extensions of prophet inequalities are present in the literature. We introduce the variants upon which our model is built, briefly reviewing the state-of-the-art.

The very first prophet inequality model, typically referred to as classical Prophet Inequality (PI), is due to Krengel and Sucheston [25, 26]. The given instance is composed of countably many integrable independent nonnegative random variables {Xi} with known distributions {𝒟i} in a fixed given order, usually referred to as adversarial. The gambler-to-prophet ratio to be maximised will therefore be 𝔼Xτ over 𝔼supiXi. When working with infinite instances, one often considers only finite stopping rules ((τ<)=1) and variables such that 𝔼supiXi<. The 1/2-hardness of PI (shown by Garlin [25]) is tight [26]. A single-threshold 2-approximation is also possible [35].

IID PI is specialised to {Xi} independent and identically distributed (iid) according to a given distribution 𝒟. The hardness of this problem is 1/β0.7451, where β is theoretically derived as solution to an integral equation [21]. The β-approximation quantile strategy devised in [10] makes the aforementioned hardness tight.

IID Prophet Inequality with Random Horizon

The IID PI with Random Horizon (RH) was first introduced in [17]. Consider a random variable H, which we will call the (random) horizon, with given discrete distribution , that is, supported on an arbitrary subset of . This assumption comes with no loss of generality (formally, it rules out that the horizon has mass at zero). H will be assumed finite ((H<)=1) and integrable (𝔼H<), which we denote H1(Ω). RH is a relaxation of IID PI, considering integrable iid random variables X1,,XH, with given distribution 𝒟 independent of . This setup, supported on a probability space (Ω,,), models the game where the gambler, facing an unknown number of values X1,,XH, can still use a priori knowledge of when maximising returns. More precisely, denote [n]={1,,n}. The goal is to maximise the gambler-to-prophet ratio of 𝔼Xτ over 𝔼MH, where MHmaxi[H]Xi and both expectations run over the randomness of the iid copies of X𝒟 and H. The gambler must select the value in ignorance of whether it is the last or not. If the gambler fails to stop by the time the last value is inspected, the return is zero. This ignorance, added to the usual nonanticipative constraint, yields a no easier model than IID PI. On top of the usual disadvantage of playing against a prophet, which knows all future realisations of the values, and chooses the largest; the gambler now competes against a prophet, which can also foresee the random number of values.

1.2 Previous bounds on IID Prophet Inequality with Random Horizon

No constant-approximations are possible for RH [2, Theorem 1.6] (we refer to this by saying that RH is hard). Distributional knowledge of the horizon is not enough to guarantee, in expectation, that the gambler’s return can attain a worst-case constant multiple of the prophet’s return. Therefore, we impose additional restrictions on the distribution of the horizon. The largest distributional class for which a constant-approximation has been found so far is that of horizons with increasing hazard rates λ(h). For simplicity, we will use increasing and decreasing instead of nondecreasing and nonincreasing respectively. Recall that the hazard rate λ(h) of a horizon H is defined as follows. Denote the survival function S(h)=(Hh). If |supp(H)|=, for every h, λ(h)(H=h)/S(h). If |supp(H)|<, for every h, λ(h) is defined analogously for all hsupsupp(H), while it is set to 1 for all h>supsupp(H).

Definition 1 (IHR and DHR class).

H is Increasing Hazard Rate (IHR) if for every h, λ(h)λ(h+1). H is Decreasing Hazard Rate (DHR) if for every hinfsupp(H), λ(h)λ(h+1).

One says that IHR and DHR are dual classes, and the geometric distribution is the only discrete distribution that belongs to both of them.

The state-of-the-art for RH consists in the findings of [2]. In [2, Theorem 3.2], through the use of second order stochastic dominance, it is shown that if HIHR, 𝔼Xτ(21/μ)1𝔼MH, where μ𝔼H1. This ensures a (uniform) 2-approximation on the IHR class. Studying monotone hazard rates stems from interpreting RH from a reliability theory point of view: an optimal stopping game under evolving risk (or ageing) that it ends by the next step, and, despite more than 15 years have passed, no progress has been made on classes larger than the IHR class.

1.3 Our contributions

In this paper we contribute to the study of RH by:

  • Extending existing characterisations of the optimal algorithm to unbounded horizons.

  • Complementing and extending key ideas from [2] to yield the existence of single-threshold 2-approximations on important superclasses of the IHR class, culminating in the 𝒢 class.

  • Showing that the 𝒢¯ class (dual of the 𝒢 class) admits hard horizons for single-threshold algorithms.

  • Deriving single-threshold constant-approximations for sufficiently concentrated horizons with finite second moments.

  • Showing that RH admits families of horizons for which single-threshold algorithms are not competitive, despite the Secretary Problem (SP) optimal stopping rule providing a constant-approximation (see Section 1.5 for details on SP and its variants).

In this section we give formal statements of the above results, with the exception of Theorem 14 in Section 2, due to the overall degree of technicalities involved. Informally speaking, this theorem characterises the optimal algorithm in terms of a discounted infinite optimal stopping problem, with a special focus on unbounded horizons. As a technical tool, the equivalence of stopping rules for RH with stopping rules for the discounted problem is leveraged not only to obtain hardness results for single-threshold algorithms, but also to formalise heuristic arguments involving the optimal algorithm under geometric horizon. More specifically: we exploit the equivalence between stopping rules for RH and corresponding rules for the discounted problem to extend hardness results to neighbouring instances; we show that with a geometric horizon, if the value distribution is bounded, the optimal algorithm of RH is single-threshold, pinpointing the structural equation for the threshold.

The next three results involve stochastic orders, which we now introduce. Note that several standard facts regarding stochastic orders are often relied upon throughout this work. They will be recalled as remarks, whose proof can always be found in [39]. In the following definitions, H and G always refer to horizons.

Definition 2 (Probability Generating Function Order).

Given two horizons H, G, we say that G is dominated by H in the probability generating function (pgf) order, and denote it as GpgfH, if for every t(0,1), 𝔼(tG)𝔼(tH).

 Remark 3.

GpgfH if and only if for all t(0,1), i=1SG(i)tii=1SH(i)ti.

The 𝒢 class, consisting of every positive discrete distribution that dominates, in the pgf order, the (-valued) geometric distribution with the same mean, has been introduced in [24].

Definition 4 (𝒢 class).

The 𝒢 class consists of every horizon that dominates, in the pgf order, the geometric distribution with the same mean: 𝒢{H:GpgfH,GGeom(1/𝔼H)}.

The dual class is obtained by reversing the pgf ordering, and is denoted as 𝒢¯. It is well known that IHR𝒢, with inclusion being strict. Similarly, DHR𝒢¯. In applications of reliability theory, the classes within 𝒢 and its dual are a staple of modelling many aspects of ageing and more generally of an evolving risk of failure.

Theorem 5.

For all H𝒢 with 𝔼Hμ, a single-threshold 21/μ-approximation exists.

From [2, Theorem 3.5] it follows that any 2-approximation on the 𝒢 class is tight. More specifically, the 2-approximation yielding Theorem 5 is essentially the same as the one used on the IHR class in [2, Theorem 3.1]. The key feature of the stopping rule is to determine the value p such that (X>p)=1/μ, and then to accept the first value exceeding p. The 𝒢 class may be somewhat abstract, compared to its subclasses, which have immediate intuitive interpretations in terms of ageing concepts, such as the IHR class. Its possible interpretations are quite general and go beyond mere ageing, as can be read in [24]. To gain some intuition of how much more general a class it is, the reader is referred to Section 3, where we will describe some of its largest and most well-known subclasses, as evidence that Theorem 5 generalises significantly [2, Theorem 3.1, Theorem 3.2]. We conclude by mentioning that statistical tests for the class (the continuous equivalent of the 𝒢 class) have also been developed [23]. The possibility of adapting them to tests for the 𝒢 class adds to the practical significance of the result.

Our next result requires introducing the stopping rule for SP with deterministic horizon m. This consists of rejecting the first r1 values, and subsequently accepting the first value ranked better than all the previous ones. The waiting time rmm/e as m yields the optimal stopping rule (see Section 1.5 for details). In the following, we provide the first example of a family of horizons, for which no single-threshold constant-approximation is possible, and yet an adaptation of the SP stopping rule to random horizons is proven to provide a constant-approximation. The family of horizons we will consider is of the form {Hm}mM, with pmf’s parametrised by ε>0, M large enough and msupsupp(Hm)<, and will therefore be denoted as M(ε). Let ζrm be the (optimal) stopping rule for SP with deterministic horizon m. Taking the minimum τmζrm(Hm+1) produces a natural adaptation of the SP stopping rule to the random horizon Hm.

Theorem 6.

There exists a family of horizons M(ε), such that for all fixed ε small enough and arbitrary M, no single-threshold constant-approximation is possible. Nonetheless, for every fixed ε, no matter how small, the random horizon SP rule τm is an approximate constant-approximation, as long as M=M(ε) is fixed large enough.

For brevity, we say that the family of horizons M(ε) is hard for single-threshold algorithms, but the random horizon SP rule is competitive on it. The SP stopping rule only needs information regarding the relative ranks of the values, not the values themselves. Our result opens up a fresh line of investigation into new classes of distributions, to be handled via multiple threshold algorithms, which cannot be accessed via single-threshold algorithms.

Since we showed that single-threshold 2-approximations exist for the 𝒢 class, finding whether single-threshold constant-approximations exist for the 𝒢¯ class turns into an interesting problem. The equivalent characterisation of stopping rules for RH in terms of a discounted stopping problem, combined with a perturbation argument on M(ε), yields a negative answer.

Theorem 7.

No single-threshold constant-approximation is possible on the 𝒢¯ class.

In Section 4.2 we will describe more in detail some of the largest and most well-known subclasses of the 𝒢¯ class. They are no less important than the corresponding subclasses of the 𝒢 class. For example, discrete DHR distributions are crucial in several areas: they have been shown to describe well the number of seasons a show is run before it gets cancelled and the number of periods until failure of a device (a system, or a component, that functions until first failure) governed by a continuous DHR life distribution in the grouped data case [27]. Theorem 7 motivates our conjecture that single-threshold 2-approximations are not possible on the DHR class either. Since our characterisation result of stopping rules for RH extends to unbounded horizons, it will be a crucial tool in proving this conjecture within our framework, as DHR horizons have support of the form [a,). Furthermore, suitably combining the techniques yielding Theorems 5 and 7 with the fact that the tight geometric instance for 2-approximations belongs to the 𝒢 class, motivates our second conjecture that the 𝒢 class is essentially optimal for 2-approximations on 1 horizons.

In our last result, we go back to single-threshold algorithms under horizons H satisfying only 𝔼(H2)<, denoted H2(Ω). We show that it is possible to exploit concentration bounds, in order to ensure that the same single-threshold algorithm used for the 𝒢 and 𝒢¯ class is still a 2-approximation. In the following result, W0 denotes the principal branch of the Lambert function.

Theorem 8.

For every H having both μ𝔼H and σ2VarH finite, such that

σ2μ2[11μ+W0((21μ)e(21μ))], (1)

a single-threshold 21/μ-approximation exists.

The Lambert function is readily simulated and one can reliably compute numerically the magnitude of concentration required by 1, so that a 2-approximation is ensured. To exemplify, we consider the large-market limit, that is, μ. Under this computationally simpler hypothesis, taking the square root on both sides of 1 yields approximately CVσ/μ0.770, where CV is the coefficient of variation of the horizon. Variations on this procedure show that both weaker and stronger than 2 constant-approximations are possible for all suitably concentrated horizons. Consider again the upper bound on CV provided by square-rooting 1 in the large-market limit: CV1+W0(2e2). Replacing 2 with C>2 yields CVC1+W0(CeC). This will ensure a weaker C-approximation in the large-market limit. As x1 grows, W0(xex) is negative strictly increasing (valued 1 at x=1) and vanishing. Thus the upper bound on the CV ensuring a C-approximation for C growing large scales roughly as C1, meaning that there will be, although progressively weaker, constant-approximations, as long as CV<C1. In the other direction, stronger C-approximations are ensured when taking admissible 1<C<2. When reducing C, we clearly cannot go past e/(e1), as the upper bound on the CV vanishes (approaching a deterministic horizon). At this point, our estimates hit the optimum competitive ratio for single-threshold algorithms on IID PI, 11/e [13]. This generalises as follows.

Corollary 9.

For every H having both μ𝔼H and σ2VarH finite, and every constant C[1(11/μ)μ]1, if CVC1+W0(CeC), then a single-threshold C-approximation exists.

The impact of our results concerns online auctions and, more generally, stochastic matching problems, where random horizons are considered and the connection to our model is well-established under perishable inventory assumptions (see Section 1.5).

1.4 Our techniques

The toolbox we assembled can be described in three key points.

Discounted infinite problem.

The optimal algorithm is characterised via a correspondence between stopping rules for RH and stopping rules for a discounted optimal stopping problem, with the discount factors being the survival function of the horizon. More general optimal stopping problems (unbounded random horizon) do not always admit an optimal algorithm [37]. We show that for RH, this is always possible. This is crucial for geometric and DHR horizons, which are unbounded. The generalisation is the product of an original measure-theoretic construction, leveraging trace σ-algebras.

Stochastic orders.

The extension of the 2-approximation from the IHR class to the 𝒢 class and 2 horizons is obtained by exploiting stochastic orders, which have never before been applied to prophet inequality problems. Several distributional classes, starting from the IHR class, can be equivalently defined through some stochastic-ordering with respect to the geometric distribution (as previously seen, the largest of such classes exploits the pgf order). Thus our Theorem 5 not only extends to the 𝒢 class, but simplifies the arguments of [2]. The proof of Theorem 8 leverages extremal properties of specific Bernoulli distributions with respect to another stochastic order, the Laplace transform order (see Section 5 for details), for distributions having finite variance.

Hard instance.

Our analysis of the hard instance for single-threshold algorithms is also novel. The literature on prophet inequalities has focused only on bounded discrete instances that are hard for every algorithm. On the other hand, the instance we provide exploits a continuous Pareto distribution, which, combined with a suitable family of horizons, is both hard for single-threshold algorithms, and competitive for the SP stopping rule, against the offline maximum. This requires a more sophisticated and analytic approach. The phenomenon in which certain algorithms fail on an instance but others do not is subtle. To the best of our knowledge, a competitive analysis of the SP stopping rule against the offline maximum with a random horizon is also new (see Section 1.5). It is worth noting that this hard instance is designed to live in a close neighbourhood of the 𝒢¯-class, so that perturbing it suitably yields, at the same time, a hard family in the 𝒢¯ class.

1.5 Additional related work and motivation

The literature on prophet inequalities models with deterministic horizon is vast (see [11, 12, 29, 19] for surveys). We only mention a few models closely related to PI and IID PI for context. In the Random Order PI values arrive in a uniform random order, it is 0.688-competitive [7] and 0.7235-hard [15]). In the Order Selection PI the order can be chosen by the gambler, it is 0.7258-competitive [6] and inherits 1/β-hardness from IID PI. Very recently, Oracle-augmented PI have also been introduced [18]. The existing literature strictly related to RH is limited to [2, 17, 30], which focuses on online automated mechanism design (see [38] for a survey), with emphasis on the multiple-items setting. In [2] a standard machinery has been put in place, which extends results for single-item RH to multiple items, provided the horizons for each item are uniformly bounded. This applies to our results, too. [30] considers the problem of unknown market size (that is, random horizon with unknown distribution, motivated by search keyword auctions such as Google). As previously mentioned, this assumption is hard [17]. In practice, distributional knowledge is available via the history of past transactions, from which RH arises as a natural model. On a technical level, RH is also well recognised as connected to other online stochastic matching problems with a random number of queries, which have recently been shown to admit 2-approximations [3]. A potential ground for applications of our techniques would be improving such guarantees.

The Secretary Problem (SP) has been extensively studied (see [16, 14] for variations). The classical SP is an online problem where there are n secretaries for hire, who show up in a uniform random order at interviews. An employer, who wants to maximise the probability of hiring the best solely based on the relative ranks up to the present, makes the decision of hiring or rejecting and keep interviewing, at each interview. The optimal stopping rule rejects all the first r1 of the n secretaries, and accepts the first secretary with relative rank 1 thereafter. The value r=rn is such that as n, both r/n and the probability of selecting the best tend to 1/e0.3679 [28].

When the number of secretaries is random, denoted by N, the secretaries are interviewed in a uniform random order, conditionally on N. Depending on the horizon’s distribution, we can have a more complicated set Γ composed of multiple islands, in which it is optimal to stop at a secretary with relative rank 1, and outside which it is not [33]. The model has been shown to be hard [1].

SP has been studied also in its full-information variant, with and without a random horizon. The common continuous distribution, from which the quality measurement of each of the n secretaries is sampled independently, is known to the employer, who directly inspects, at each interview, the secretary’s true quality [16, 4, 32]. Both no- and full-information SP have also been studied under the harder hypothesis of random freeze [36, 37]. More recent developments extend SP to combinatorial structures and unknown horizons [20, 31].

1.6 Organisation of the paper

In Section 2 we give preliminaries on the formal approach to random horizon optimal stopping problems, and prove the equivalence with discounted optimal stopping problem. In Section 3 we prove Theorem 5. In Section 4 we prove Theorem 6. In Section 4.2 we prove Theorem 7. In Section 5 we prove Theorem 8.

2 Preliminaries

In this section we set up our notation, give the details of the probabilistic model, and characterise the optimal algorithm.

2.1 Notation

We denote 0{0}, ¯{}, ¯{,}, [n]{1,2,,n}, [0]={0}, [n]0{0}[n], abmin(a,b) and abmax(a,b) for all a,b. We follow standard asymptotic notation for nonnegative functions (and sequences, with trivial extensions to signed functions when the use of absolute value is consistent), here we point out the only slightly less common f(x)g(x) as xa¯ if f(x)=𝒪(g(x)) and f(x)=Ω(g(x)). We will adopt the convention of omitting the dependence on all parameters not involved in the main limiting process (which will be clear from the context), regardless of whether uniformity (when not relevant to the context) holds within the parametric range. Absolutely continuous distributions will be referred to as continuous. Discontinuous distributions are assumed to satisfy the usual regularity assumption (isolated jumps). The convergence in distribution of random variables is denoted as weak convergence: 𝑤. Almost surely is abbreviated a.s. and HG denotes that H, G are identically distributed random variables. The essential supremum of a collection of random variables is denoted by esssup. S(h)(Hh) is called survival function of H (SH(h) if necessary). We follow the standard probabilistic functional notation for powers, such as 2()[()]2 and 𝔼2()[𝔼()]2.

2.2 Probabilistic model

As aforementioned in this section (and this section alone), RH is studied from the point of view of optimal stopping, hence horizons admitting mass at zero will also be considered. In Section 1.1 we ruled out any mass at zero, since in all subsequent sections, concerned with competitive analysis, this will be the standard assumption. A formal convention, in fact, restores the equivalence with a scenario admitting mass at zero, from a competitive analysis point of view. Add a value X00, and set XτX0M[0] for all ω{H=0}. Informally, when the game does not start, both the gambler and the prophet receive no return. Adopt the convention 0/0=1. Having set X0=0 ensures that the game always starts, unless ω{H=0}. Assuming absence of penalty for entering the game even though it does not start yields the equivalence: the ratio being 1 does not affect the worst case competitive ratio.

In RH the gambler must select the value in ignorance of whether it is the last or not: at each step i0 the gambler only learns whether the inspected value xi is the last or not (that is, whether ω{H=i} or ω{H>i}) after the decision of accepting or rejecting it has been made (that is, in the i+1st step). Probabilistically, it is standard to model this as follows. The random list X1,,XH is constructed from the infinite underlying process 𝐗{Xi} of iid copies of X, whose natural filtration (intuitively, the information associated with the history of the process up to the present) is the sequence of σ(X1,,Xi)i, for every i0, where 0{,Ω} (no information at the start: this is the implicit starting point of all filtration we will introduce). Let ν𝔼X<. The nonanticipative condition and the ignorance aforementioned are modelled by requiring that {τ=i}σ(𝟙{H=0},X1,,𝟙{H=i1},Xi)i for every i0. Intuitively, this second filtration represents the information associated with the history of the game, which combines both the history of the process of the values and the state of the game in all preceding steps (that is, whether it ended or is still running). We recover the original X1,,XH by imposing that the reward is zero if the gambler fails to stop by time H on the infinite sequence, that is, defining the reward sequence {Yi}, where YiXi𝟙{Hi}, to which we add Y0X0.

For technical reasons, we consider the slightly less common class of all possibly infinite stopping times for the problem 𝐗(𝐘), denoted as 𝒯(𝒯). Formally, τ𝒯(𝒯), means that {τ=i}i(i) for all i, also admitting (τ=)>0. This requires that we compactify the problem, by prescribing a conventional, yet natural, reward value, in the event that the gambler never stops. We set Ylim supiYi=0, since (H<)=1, so if the gambler never stops, a.s. the horizon will be reached, so there can be no reward. In this way infinite stopping rules do not increase the largest possible expected reward, and the equivalence with the original formulation is maintained. In conclusion, RH has been reformulated as an infinite optimal stopping problem with reward sequence 𝐘{Yi}i¯0, underlying process 𝐗 and filtration {i}i¯0, where σ(i0i) (the complete information about the history of the process, which is the ending point of all infinite filtration we will introduce), with respect to the class 𝒯. We seek to find the value of the problem 𝒱(𝐘)supτ𝒯𝔼Yτ and, if it exists, an optimal stopping rule τ¯𝒯. Optimal means that 𝔼Yτ¯=𝒱(𝐘). The reference to the horizon H can be made explicit, whenever the need arises, via the notation 𝐘(H){Yi(H)}i¯0.

2.3 Optimal algorithm

Under suitable hypotheses, an optimal stopping problem with independent horizon H, having survival function S(i), is equivalent to a finite (if H is bounded) or infinite (if H is unbounded) optimal stopping problem discounted by factors S(i). The discounted problem is 𝐙{Zi}i¯0, where Z0Y0, ZiS(i)Xi for every i, Zlim supiZi=0=Y. Thus, for 𝐙 we use the filtration {i}, having set 0 and as usual. The latter is used when the problem is infinite. Thus 𝒱(𝐙)supζ𝒯𝔼Zζ, and the equivalence aforementioned means that 𝒱(𝐘)=𝒱(𝐙).

The bounded horizon case is solved in [37] by backward induction, which does not extend to the unbounded case where an optimal stopping rule is needed explicitly. The main contribution of our Theorem 14 is showing that RH with unbounded horizon H admits an explicit optimal stopping rule of the form τ¯ζ¯(H+1), where we will take ζ¯ to be the Snell stopping rule for 𝐙. Let us recall a few facts, adapted from [8, 9] to our compactified framework. Let 𝒯i be the class of all stopping rules in 𝒯 that do not stop before step i.

Definition 10 (Snell envelope).

The Snell envelope of 𝐙 is {Vi}, where Viesssupζ𝒯i𝔼iZζ.

Note that 𝒱(𝐙)=V0. The intuitive interpretation of the stochastic process 𝐕={Vn} is the best expected return available at step i among the rules that have reached step i.

 Remark 11.

Since 𝔼supiZi< for every i0 the Snell envelope satisfies

Vi=Zi𝔼iVi+1. (2)

Informally, 2 represents an extension of the dynamic programming principle for the infinite process {Vi}: at time i it is optimal to stop whenever the currently inspected value Zi is greater than the highest expected future return as per the previous informal characterisation of Vi. We use 2 to compute V0. Note that for finite problems it coincides with backward induction.

Definition 12 (Snell stopping rule).

The Snell rule is ζ¯inf{i0:Zi𝔼iVi+1}.

 Remark 13.

If 𝔼supiZi< and Zlim supiZi, then the Snell rule ζ¯ is optimal.

Theorem 14.

Let H be a finite horizon with μ𝔼H and possibly mass at zero, and let msupsupp(H), 𝐘 and 𝐙 as previously defined.

It holds that: 𝒱(𝐘)=𝒱(𝐙); there exists ζ¯𝒯 such that 𝔼Zζ¯=𝒱(𝐙); τ¯ζ¯(H+1) is such that 𝔼Yτ¯=𝒱(𝐘).

Furthermore, ζ¯ is characterised as: the backward induction stopping rule for 𝐙(m) if m<; the Snell rule for 𝐙, if m=.

Idea of the proof.

The case m< leads to the equivalent characterisation as backward induction on {Zi}i[m]0 [37, Theorem 2.1]. For the case m=, a crucial step is to show that for every stopping rule σ𝒯, we can construct a stopping rule ζ𝒯, such that 𝔼Yσ=𝔼Yτ (that is σ and τ are equivalent), where τζ(H+1)𝒯. This requires an inductive measure-theoretic construction. For every i, let Ei{σ=i}{Hi}. We have that Eii{Hi}=i{Hi}, where the intersections denote, as standard, the corresponding trace σ-algebras. Next, we construct inductively mutually disjoint events Aii, such that Ei=Ai{Hi}. These events {Ai} are then relied upon to define ζ as, informally speaking, a stopping rule that stops when σ successfully stops by the time the horizon has realised, and does not stop otherwise. Formally, denote C(iAi)c, and define for every i,

ζ(ω)iif ωAi, (3)

while setting ζ to otherwise. Following the standard convention 𝔼Yσ𝔼iYi𝟙{σ=i}+𝔼Y, we verify by direct computation that 𝔼Yσ=𝔼Yτ is ensured by 𝔼Y=0.

We show that 𝒱(𝐘)=𝒱(𝐙) as follows. First, note that Y=Z=0. Second, for every σ𝒯, 𝔼Yσ=𝔼Zζ, where ζ is defined as in 3. Third, for every ζ𝒯,

𝔼Yζ=𝔼Yτ, (4)

with τζ(H+1). These facts imply that supσ𝒯𝔼Yσ=supζ𝒯𝔼Zζ.

Given an optimal stopping rule ζ¯𝒯, we have that τ¯ζ¯(H+1)𝒯 provides an optimal rule based on 4. Furthermore, our assumptions ensure that 𝔼supiZi< and lim supiZi=0. Hence, the Snell envelope 𝐕 of 𝐙 satisfies 2 by Remark 11, and yields the optimal stopping.

We also derive a result, which gives the intuitive interpretation of 𝔼iVi+1 as the best expected return available at step i among the rules that will not stop by or at step i.

Lemma 15.

Let 𝐕 be the Snell envelope of 𝐙. Then 𝔼iVi+1=esssupζ𝒯i+1𝔼iZζa.s.

3 Warm-up: a 𝟐-approximation on the 𝓖 class

We build familiarity with the model by proving Theorem 5, which derives a 2-approximation on the 𝒢 class, extending the results of [2, Section 3]. Recall that X𝒟 is the distribution of the values and H of the horizon (with no mass at zero and μ𝔼H). The expected return of a single-threshold algorithm is readily computed.

Lemma 16.

Let π>0 and consider the stopping rule τπinf{i:Yiπ}𝒯, where YiXi𝟙{Hi}. Then 𝔼Xτπ=cπ𝔼(X|Xπ), where cπ=cπ(,𝒟)1𝔼[H(X<π)].

The quantity cπ yields the competitive ratio. With this in mind, we upper-bound 𝔼MH by ex-ante relaxation of the prophet, combined with a simple variational argument.

Lemma 17.

Let X be continuous and p>0 such that (Xp)=1/μ. Then we have that 𝔼MH𝔼(X|Xp).

Idea of the proof.

Let f(x) be the probability density function (pdf) of MH and v(x) the pdf of X. In order to upper-bound 𝔼MH=0xf(x)𝑑xI(f), we solve the variational problem of maximising the functional I(f) with constraints: 0f(x)𝑑x=1; 0f(x)μv(x) for all x[0,); f is nonnegative Lebesgue integrable on [0,), the constraint follows from a union bound. The maximal solution is f¯(x)=μv(x) for all xp, and zero otherwise, with p as in the claim. Then I(f¯)=𝔼(X|Xp).

To prove Theorem 5 we extend Lemma 17 to discontinuous X via stochastic tie-breaking. The technique is standard with deterministic horizons, but with random horizons, it is not yet clear that the assumptions imposed on RH are robust enough. We will show that an underlying property of the prophet, Uniform Integrability (UI), is the key. Before that, we derive the equivalent of Lemma 16 for a randomised algorithm (randomisation does not increase the optimal value of optimal stopping problems). Attach to every Xi a biased coin flip, that is iid Bernoulli random variables BiBBer(q) independent of X and H, q denoting the probability of 1 (heads). Denote the class of adapted, thus randomised, stopping rules as 𝒯¯.

Lemma 18.

Let π>0 and consider the stopping rule τπ,qinf{i:Yiπ}𝒯¯. Then 𝔼Xτπ,q=cπ,q𝔼(X|Xπ), where cπ,q=cπ,q(,𝒟)1𝔼{[1q(Xπ)]H}.

Rewriting cπ=1𝔼{[1(Xπ)]H} in Lemma 16 helps to see the statement of Lemma 18 as its natural generalisation, although combinatorially tedious to prove.

Idea of the proof of Theorem 5.

Let X be continuous. By Lemmas 16 and 17,

𝔼Xτpcp𝔼MH,

where cp1𝔼[H(X<p)]. For any H𝒢, GpgfH with GGeom(1/μ), μ=𝔼H. Let t(X<p)=11/μ, so that cp=1𝔼tH, then

𝔼tH𝔼tG=1μhth(11/μ)h1=11/μ21/μ.

This implies that cp(21/μ)1 and the result follows.

Let X be discontinuous with cumulative distribution function (cdf) V(x) having 11/μ in correspondence of a jump at p, say limε0+V(pε)<11/μ<V(p). Let D{jk} be the set of discontinuities (increasingly enumerated), one of which is p. Let {ε(l)} be a small enough positive monotonically vanishing sequence. Linearly interpolate V(x) on disjoint intervals of length ε(l) and keep it unaltered otherwise. This yields an approximating sequence of continuous and piecewise differentiable cdf’s Vl(x) of continuous random variables X(l). Upon showing that the family {X(l)} is UI, define MH(l)max{X1(l),,XH(l)}. RH assumptions ensure that {MH(l)} is UI too. Since X(l)𝑤X, MH(l)𝑤MH. By Lemma 17, for every l there is pl such that (X(l)pl)=1/μ and 𝔼MH(l)𝔼(X(l)|X(l)pl). By construction, plp as l, so 𝔼(X(l)|X(l)pl)𝔼(X|Xp). All these facts ensure that

𝔼MH𝔼(X|Xp).

To conclude, for any H𝒢, since GpgfH, by Lemma 18 the single-threshold algorithm τp,q¯𝒯¯, with q¯[μ(Xp)]1, is such that

𝔼Xτp,q¯cp,q¯𝔼MH,

where cp,q¯1𝔼{[1q¯(Xp)]H}=1𝔼[(11/μ)H](21/μ)1.

 Remark 19.

If X is discontinuous with 1/μ in correspondence of a discontinuity jump of the survival function of X, the 21/μ-approximation is a single-threshold randomised algorithm, where the threshold is the value p at which the discontinuity jump occurs, and the randomisation parameter is [μ(Xp)]1, but the argument above shows that if X is discontinuous and 1/μ does not occur in correspondence of a discontinuity jump, the randomisation parameter is 1, meaning that the nonrandomised algorithm for the continuous case is enough to yield a 21/μ-approximation.

In the rest of the paper we will omit any explicit adaptation of similar tie-breaking arguments, as we showed that RH is robust enough. Some of the most well-known subclasses of the 𝒢 class, which earned considerable interest in applications, are: IHRE (Increasing Hazard Rate in Expectation); HIHRE (Harmonically Increasing Hazard Rate in Expectation); NBU (New Better than Used); NBUE (New Better than Used in Expectation). The following tower of inclusions (and its dual, which will follow in the next section) is well-established [5, 34, 22] and known to be strict: IHRIHREHIHRENBUNBUEHNBUE𝒢.

The tight instance for the 21/μ-approximation involves a geometric horizon and was studied in [2, Theorem 3.5]. Their basic assumptions, concerning a two-point distribution, are intuitive facts: that an optimal algorithm with geometric horizon should exist and that it has a single threshold. We show mathematically that they both hold, with the second extending, more in general, to any bounded value distribution. As a bonus, we obtain an equation for the optimal threshold. Recall that Geom(1q) denotes the geometric distribution with failure probability 0<q<1.

Lemma 20.

Let HGeom(1q), and X such that supp(X)=[a,b], 0ab. Then there exists a threshold V0 such that the corresponding single-threshold algorithm is optimal.

Idea of the proof.

By Theorem 14, 𝒱(𝐗)=𝒱(𝐘)=𝒱(𝐙)V0. By 2, it holds that V0=𝔼V1=𝔼(Z1𝔼1V2). Since Ziqi1Xi, by time invariance and Lemma 15 it follows that 𝔼1V2=q𝔼V1=qV0. Therefore, V0=𝔼(XqV0); equivalently

f(V0)V0+qV0bF(x)𝑑x=b.

The function f(V) is strictly increasing and differentiable on [0,b], with f(0)bf(b). Hence the value V0 exists. By Remarks 13 and 12 and time invariance again, it is optimal to stop at the first value greater than or equal to V0 (the expected return).

4 A first step beyond single-threshold algorithms

In this section we construct a parametric family of horizons, which is hard for any single-threshold algorithm, and yet it allows for an adaptive competitive algorithm. We also show that it can be perturbed so that it enters the 𝒢¯ class, while remaining hard for single-threshold algorithms.

4.1 The hard instance

Let us start with the hard instance X for the values, which is a Pareto (Type I) distribution with scale parameter 1 and shape parameter 1+ε, where ε>0 is fixed small enough. The corresponding cdf, denoted as V(x), vanishes on (,1), whereas on [1,)

V(x)=1x(1+ε). (5)

Since Mn has cdf Fn(x)=Vn(x), which is differentiable for all x1, we obtain that its density (defined almost everywhere) fn(x) vanishes on (,1), whereas fn(x)=nVn1(x)V(x) on (1,). By direct computation, relying on the properties of the Gamma and Beta function, it follows that 𝔼MnΓ(1(1+ε)1)n1/(1+ε), as n, uniformly in ε>0. Consequently, in conjunction with the instance X for the values, we consider a family of horizons {Hm}, with m large enough, such that for every fixed such m, supp(Hm)=[,m], with the lower limit 1<. For every m fixed, we define the pmf of each horizon by letting, for every hm,

ph(m)(Hm=h)=Zm1h11+2ε, (6)

having defined the normalising constant Zmh=mh11+2ε. Through standard integral estimates of summations, it is shown that

Zm[1+(2ε)1]m2ε1+2ε, (7)

and that the following holds.

Lemma 21.

As m, uniformly in ε>0 small enough, 𝔼MHmNm11+ε, where N=N(ε)Γ(111+ε)[1+ε(1+ε)(1+2ε)]1[1+12ε]1.

By 7 and integral estimates akin to those leading to Lemma 21 it is straightforward to compute that for every n, as m,

𝔼Hmnh=mhn+111+2ε[1+(2ε)1]m2ε1+2εmn11+2ε(n+1(1+2ε)1)[1+(2ε)1]m2ε1+2ε=2εmnn+2(n+1)ε. (8)

The properties of the Gamma function ensure that as x0+, Γ(x)x1. As a result, as ε vanishes, N2ε(1(1+ε)1)12 and we have the following.

 Remark 22.

As m, uniformly in ε>0 small enough, 𝔼MHmm11+ε.

We are now ready to prove the hardness of the family of horizons {Hm} for single-threshold algorithms. A stopping rule with a single threshold π is denoted as τπ. A crucial point is that the algorithm facing Hm will set a threshold πm exploiting all distributional knowledge regarding X and Hm. Yet, this is not enough to obtain a constant-approximation on the family (X,{Hm}) for all suitably small ε.

Proposition 23.

For fixed ε>0 small enough, given the instance of copies of X and the family of horizons {Hm} of 5 and 6 respectively, we have that for every sequence of thresholds {πm} lim infm𝔼Xτπm𝔼MHm=0.

Idea of the proof..

Our strategy will be to characterise the sequence of optimal single thresholds {π¯m} (corresponding to horizons {Hm} for m large enough and ε small enough), where π¯m maximizes the value obtained 𝔼Xτπm𝔼Yτπm(Hm) over all single threshold strategies πm. Denote with rm the gambler-to-prophet ratio of 𝔼Yτπ¯m(Hm) and 𝔼MHm. As a consequence of the characterisation obtained, it will follow that there is always a subsequence {mj}, such that as j, rmj vanishes. This yields the claim, since rmj is an upper bound on any gambler-to-prophet ratio for arbitrary single-threshold algorithms.

Since X is continuous, by Lemma 16 we compute 𝔼Yτπ(Hm), which is nontrivial only if π>1. This is expressed as the product of

𝔼(X|Xπ)=(1+1/ε)πandcπ=1𝔼[VHm(π)]=1𝔼[(11/π1+ε)Hm].

Next, we show analytically that the optimal sequence of thresholds {π¯m} satisfies the optimality equation

gm(π¯m)𝔼[(11/π¯m1+ε)Hm]+1+επ¯m1+ε𝔼[Hm(11/π¯m1+ε)Hm1]=1.

To conclude, by these facts and Lemmas 21 and 22 we have that, as m, uniformly in ε>0 small enough,

rm=cπ¯m𝔼(X|Xπ¯m)𝔼MHm(1+1ε)cπ¯mπ¯mNm11+εcπ¯mπ¯mεm11+ε.

Consider that {π¯m} is either bounded or unbounded.

Bounded case.

Since cπ¯m<1, rm vanishes as m, for any fixed ε small enough. Note that this case covers also the trivial case for which there exists a subsequence {π¯mj} such that π¯mj1, for which the numerator of rmj becomes upper-bounded by ν𝔼X.

Intermediate case.

Before moving on to the general unbounded case, we consider the subcase of divergence to infinity with m11+ε=𝒪(π¯m). Note that by Bernoulli’s inequality we have that

cπ¯m1𝔼[(11/π¯m1+ε)Hm]𝔼Hmπ¯m1+ε.

By 8 with n=1 it holds that as m, for every fixed ε small enough rm vanishes, since

cπ¯mπ¯mm11+ε𝔼Hmm11+επ¯mεε(m11+επ¯m)ε0.
Unbounded case.

More in general, we can rely on the previous point to show that the unbounded case overall only allows for rm to vanish. Consider that since 0rm1, if all of its convergent subsequences {rmj} vanish for all ε small enough, then {rm} vanishes for all ε small enough. We show the sufficient condition by contradiction: assume that as j, rmjρ=ρ(ε)(0,1], with ρ(ε) being bounded away from 0 no matter how small ε is taken. Then we have that as j, for ε>0 fixed small enough, μjmj/π¯mj1+ε is asymptotically equivalent to a bounded sequence, so it must be bounded too. Thus, we can consider a convergent subsequence of {μj}, denoted {μjk}, such that as k, μjkα=α(ε)0. A sharp asymptotic analysis of cπ¯mjk reveals that the optimality equation satisfied by π¯mjk allows only for α being bounded away from zero, unless it is identically zero. In what follows, it is useful to rewrite the optimality equation as

𝔼[ϕ(π¯mjk,Hmjk)]=1,whereϕ(π,H)(11/π1+ε)H1[11/π1+ε+(1+ε)H/π1+ε],

and denote gmjk(π¯mjk)𝔼[ϕ(π¯mjk,Hmjk)]. By estimating asymptotically the series expansion of ϕ(π¯mjk,Hmjk) with respect to the first argument, uniformly in α>0, the following bound is shown to hold as k:

gmjk(π¯mjk)1+ε[(2+α)eα2]+𝒪(ε2)+𝒪(1/mjk).

It follows that for ε>0 small enough, as k, gmjk(π¯mjk)<1 if the strictly decreasing function h(α)(2+α)eα2 is bounded away from zero. Note that h(α) is negative and vanishes as α0 vanishes. Since we have previously shown that if α>0 exists, it must be bounded away from zero as ε vanishes (and thus h(α is bounded away from zero too)), in this case the optimality equation cannot be satisfied, as long as ε is taken sufficiently small. Thus only α=0 would not yield a contradiction at this point, meaning that for all ε small enough, μjk0 as k. By boundedness, also μj0 as j. This corresponds to the scenario in the previous point (replace, in that argument, m with mj) and therefore rmj vanishes. This is in contradiction with the assumption, which ensures rmjρ>0, and therefore we must have ρ=0.

The sequence {rm} is thus shown to always admit a vanishing subsequence.

4.2 Hardness of the 𝓖¯ class

The most notable subclasses of the 𝒢¯ class are the dual of those introduced at the end of Section 3: DHRDHREHDHRENWUNWUEHNWUE𝒢¯. A hint towards the hardness of the 𝒢¯ class could be that the duality mirrors the estimates of Section 3, meaning that the lower-bound, which provides the competitive ratio ensured, turns into an upper-bound, informally speaking. This mirroring has significance, because the ex-ante relaxation of the prophet is tight on geometric horizons, and is essentially maxed-out by the very definition of the 𝒢 class.

Idea of the proof of Theorem 7.

We show that a perturbation of the hard instance from the previous section prevents any single-threshold constant-approximation on the 𝒢¯ class.

Step 1.

We start by constructing a family ~M(ε)𝒢¯, which is a sequence of horizons H~m arising as a perturbation of horizons Hm in the hard family M(ε) of 6, with fixed =2, as m grows, for all fixed 0<ε<1/4 small enough. This is done by adding to the pmf of said Hm, a (to be suitably rescaled) mass at one, δmCm2ε1+2ε, where C=C(ε)(3+10ε)/(2ε). We denote the perturbed horizon H~m for every m,ε considered, and observe that the corresponding normalizing constants for the pmf’s Z~m and Zm; and expectations μ~m𝔼H~m and μm𝔼Hm; satisfy, as m grows, the following:

Z~m C~Zm,C~C+1+12ε (9)
μ~m 1C~(C+1+2ε1+4εm). (10)

Trivially, these follow from 7 and 8 with n=1 and the definitions, respectively Z~mZm+δm and μ~m=δm/Z~m+μmZm/Z~m. To show that for all m large enough, H~m𝒢¯, we have to establish that eventually (short for all m large enough from now on), for all t(0,1),

1Z~m[δmt+h=2mthh11+2ε]1μ~mh=1(11/μ~m)h1th,

which we recast as

h=2mthh11+2εδmt+εmt1t(11/μ~m), (11)

having set εmZ~m/μ~m. Let ηm(1εm/δm)/(11/μ~m). We establish that eventually for all t(0,ηm]

δmt+εmt1t(11/μ~m)0, (12)

and that eventually for all t[ηm,1)

h=2mh(h1)th2h11+2ε<2εm(11/μ~m)[1t(11/μ~m)]3. (13)

This implies 11 for all t(0,1), since eventually on (0,ηm] 11 holds with strict inequality directly by 12, while for all t[ηm,1) we have that the difference of the left-hand and right-hand side of 11, denoted fm(t), is strictly concave ( 13 states precisely that fm′′(t)<0). Since fm(t)>0 for all t(0,ηm] and since it vanishes at both ends of the unit interval, it cannot vanish on (ηm,1), meaning that there cannot be any crossings of the two sides of 11 on (ηm,1), which therefore holds everywhere.

Step 2.

Next, we show that with X being set to be the hard instance of values X of 5, with ε>0 fixed small enough as in Step 1, 𝔼MH~m=Ω(𝔼MHm) and 𝔼Xτπm𝔼Yτπm(H~m)𝔼Yτπm(Hm) as m grows, for every sequence of thresholds {πm}. The first fact follows from Remarks 22 and 9 and observing that the law of total expectation yields

𝔼MH~m=δmZ~m𝔼X+ZmZ~m𝔼MHm>ZmZ~m𝔼MHm1C~𝔼MHm.

The second fact follows from considering the survival functions of H~m and Hm, respectively S~m(i) and Sm(i). For all i3, S~m(i)=Sm(i)Zm/Z~m, whereas S~m(2)=Sm(2)δm/Z~m. Consider next a single-threshold algorithm τπ, where π=πm on horizon H~m. By Theorem 14 (note that the notations τ and σ used there are now swapped, as our focus here is on the original stopping rule) for any horizon H~m, there is a stopping rule equivalent to τπ for the problem 𝐘(H~m), denoted as σπζπ(H~m+1), where ζπ is, as per 3, an algorithm for the discounted problem 𝐙={S~m(i)Xi}, which stops only when τπ successfully stops by the time the horizon has realised. Intuitively, we can construct ζπ as the stopping rule with single threshold π on the events {τπ=i}{H~mi}. Otherwise it stops at m on {τπ>H~m}. The proof of Theorem 14 shows that this can be done, so that ζπ is adapted to 𝐗, and as a result, as m grows,

𝔼Yτπ(H~m)=𝔼i=1mS~m(i)Xi𝟙{ζπ=i}𝔼i=1mSm(i)Xi𝟙{ζπ=i}=𝔼Yτπ(Hm).
Step 3.

Finally, suppose by contradiction that a constant-approximation exists on the 𝒢¯ class. Then there exist a sequence of thresholds {πm} and c>0 such that τπm is a 1/c-approximation on ~M(ε)𝒢¯ by Step 1. By Proposition 23 for all ε>0 fixed small enough, there exists a subsequence {πmj} such that rmj, the gambler-to-prophet ratio of 𝔼Yτπmj(Hmj) over 𝔼MHmj, vanishes. Let r~mj be the gambler-to-prophet ratio of 𝔼Yτπmj(H~mj) over 𝔼MH~mj. Combining this with Step 2’s asymptotics yields the following contradiction as j grows: cr~mj=𝒪(rmj)0.

4.3 The competitive Secretary Problem rule

In this section we show that a straightforward adaptation of the SP stopping rule with deterministic horizon m is competitive on the horizons Hm defined in 6. Consider any instance of the values X which is a nonnegative integrable continuous random variable. Consider the process 𝐗(X1,,Xm), where Xi are iid copies of X. We equivalently characterise it in the context of SP via a uniform random permutation π=(π1,,πm). Consider 𝐗π(Xπ1,,Xπm). Since 𝐗 is exchangeable, 𝐗π𝐗. Consider now the waiting time rm1 for the SP stopping rule, denoted as ζrm, and recall that rmm/e. Consider 𝐗π conditionally on 𝐗, denoted as (𝐗π|𝐗). This is the process of the realised values arriving in a uniform random order and reformulates the usual setup for SP. In this reformulation, the classical result of [28] can be restated as

(Xπζrm=Mm|𝐗)1e. (14)

Given the stopping rule ζrm, denote the event of winning (that is, choosing the best) as W{Xπζrm=Mm} and that of winning at time i as Wi{i=inf{jrm:Xπj=Mm}}. Note that by partitioning W conditionally on 𝐗, we have that

(W|𝐗)=i=rmm(Wi|𝐗). (15)

The proof of 14 relies on the following fact, which we will also exploit:

(i=inf{jrm:Xπj=Mm}|𝐗)=rm1m1i1. (16)

When considering RH, instead of working with 𝐗 as usual, we start from 𝐗π, by defining accordingly Yi(Hm)Xπi𝟙{Hmi}. This is equivalent, since for every stopping rule τ𝒯, 𝔼Xτ=𝔼Xτπ=𝔼Yτ(Hm). Yet it is formally advantageous in the following argument.

Idea of the proof of Theorem 6.

Consider the stopping rule τmζrm(Hm+1)𝒯. By 4 in Theorem 14 firstly, and the law of total expectation with respect to 𝐗 secondly, we have that

𝔼Yτm(Hm)=𝔼[Sm(ζrm)Xπζrm]=𝔼{𝔼[Sm(ζrm)Xπζrm|𝐗]}.

Since at any given step (greater than rm1) stopping at the overall maximum implies stopping at the relative maximum,

𝔼[Sm(i)Xπi𝟙{ζrm=i}|𝐗]𝔼[Sm(i)Xπi𝟙Wi|𝐗]=Sm(i)Mm(Wi|𝐗).

Therefore,

𝔼[Sm(ζrm)Xπζrm|𝐗]Mmi=rmmSm(i)(Wi|𝐗).

By 7 and comparison of the summation with the corresponding integral as in Lemma 21, we estimate the survival function of Hm as m: since rmm/e, for all i>rmm/e we have that

Sm(i)>1[i1m+1]2ε1+2ε.

Define the sequence of functions

fm(ε)(m+1)2ε1+2εrmmrm2m1x11+2ε𝑑x,g(ε)1e{1[1+12ε](1e2ε1+2ε)}.

For every fixed ε>0,

i=rmmSm(i)(Wi|𝐗)i=rmm(Wi|𝐗)i=rmm(i1m+1)2ε1+2ε(Wi|𝐗)1efm(ε)g(ε)

as m, where we used 15, 16, and 14 and the usual integral estimate in the second inequality, whereas the asymptotics come from rmm/e. It is crucial that g(ε) is positive and strictly increasing on (0,): even though as ε0, g(ε)=𝒪(ε), no matter how small, it still provides a constant approximation for every ε fixed.

Putting all the above together yields

𝔼[Sm(ζrm)Xπζrm|𝐗](g(ε)δ)Mm,

where δ0+, as m. This implies the final estimate

𝔼Xτm𝔼Yτm(Hm)(g(ε)δ)𝔼Mm.

Fix ε>0 small and consider horizons {Hm}mM as defined in 6, where M=M(ε) is large enough. By Proposition 23, the RH problem under this family does not admit single-threshold constant-approximations for the fixed ε, assuming it is small enough (the hard instance for the values being the Pareto distribution defined in 5). On the other hand, by our final estimate, τm provides an approximate g(ε)-approximation: for every (small) ε>0 fixed, setting M large enough, using Hmm yields that for every instance X,

𝔼Xτm(g(ε)δ)𝔼MHm,

where δ=δ(M)0 as M.

5 A 𝟐-approximation for concentrated 𝓛𝟐-horizons

Finally, we prove Theorem 8, extending the 2-approximation for the 𝒢 class to sufficiently concentrated horizons. We will make use of the following stochastic order.

Definition 24 (Laplace transform order).

Given nonnegative random variables H and G, G is dominated by H in the Laplace transform (Lt) order, denoted as GLtH, if s>0, 𝔼esG𝔼esH.

 Remark 25.

Consider a horizon H, having finite expectation μ and variance σ2, and random variables BBer(μ2μ2+σ2) and Gμ2+σ2μB. Then GLtH.

Idea of the proof of Theorem 8.

Recall that the single-threshold 2-approximation of Theorem 5 obtains at least competitive ratio

cp,q¯=1𝔼[(11/μ)H]=1𝔼(es¯H)

with s¯log(11/μ). Since GLtH, where G is the two-point distribution of Remark 25, it follows that

cp,q¯1𝔼(es¯G)=μ2μ2+σ2[1(11/μ)μ2+σ2μ]. (17)

As a result, an inequality that yields a single-threshold 21/μ-approximation is

μ2μ2+σ2[1(11/μ)μ2+σ2μ]121/μ, (18)

and straightforward estimates show that this is ensured by ex1x(21/μ)1, having defined x1+σ2/μ2. Furthermore, defining the constraints yx(21/μ)>1 and 2<y¯(21/μ)<1, we can rewrite the condition above as yeyy¯ey¯. Under the constraints given, this is more explicitly rewritten as yW0(y¯ey¯), where W0 denotes the principal branch of the Lambert function. The definition of y and x yields 1. It is possible to substitute suitable values of C>1 for 21/μ in 1, the only issue would arise near the branching point of the Lambert function. With this caveat in mind, we derive Corollary 9 by substituting admissible values of C>1 for 21/μ in 18. Since we obtain

CV11/μ+W0(CeC)=C1+W0(CeC),

this ensures a C-approximation. The degenerate cases establish insightful connections with the literature. The deterministic case (CV=0) can be recovered directly from 17, which can be reformulated as a guarantee that the competitive ratio (1/C) is always greater than the left-hand side of

1(11/μ)μ(1+CV2)1+CV21e(1+CV2)1+CV2.

Therefore, if CV=0, the optimal single-threshold algorithm exploited ensures a competitive ratio of 11/e. This value is the global maximum of the function on the right-hand side, and coincides with the optimal competitive ratio for single-threshold algorithms for (deterministic) IID PI [13][Theorem 21]. The left-hand side achieves global maximum 1(11/μ)μ at CV=0, reflecting that a deterministic horizon is no harder than a random horizon. Hence any admissible C is constrained to be greater than [1(11/μ)μ]1. This proves Corollary 9 and justifies the admissibility condition.

References

  • [1] A. R. Abdel-Hamid, J. A. Bather, and G. B. Trustrum. The secretary problem with an unknown number of candidates. J. Appl. Probab., 19(3):619–630, 1982.
  • [2] R. Alijani, S. Banerjee, S. Gollapudi, K. Munagala, and K. Wang. Predict and match: prophet inequalities with uncertain supply. Proc. ACM Meas. Anal. Comput. Syst., 4(1):Article 4, 2020.
  • [3] A. Aouad and W. Ma. A nonparametric framework for online stochastic matching with correlated arrivals. In EC, page 114. ACM, 2023. doi:10.1145/3580507.3597773.
  • [4] T. Bojdecki. On optimal stopping of a sequence of independent random variables - probability maximizing approach. Stoch. Process. Their Appl., 6:153–163, 1978.
  • [5] C. Bracquemond, D. Roy, and M. Xie. On some discrete notions of aging. In System and bayesian reliability. Essays in honor of Prof. E. Barlow on his 70th birthday, pages 185–197. World Scientific, 2001.
  • [6] A. Bubna and A. Chiplunkar. Prophet inequality: order selection beats random order. In EC, pages 302–336. ACM, 2023. doi:10.1145/3580507.3597687.
  • [7] Z. Chen, Z. Huang, D. Li, and Z. G. Tang. Prophet secretary and matching: the significance of the largest item. In SODA, pages 1371–1401, 2025.
  • [8] Y. S. Chow and H. Robbins. On optimal stopping rules. Z. Wahrscheinlichkeitstheorie, 2:33–49, 1963.
  • [9] Y. S. Chow, H. Robbins, and D. Siegmund. Great expectations: the theory of optimal stopping. Houghton Mifflin, 1971.
  • [10] J. Correa, P. Foncea, R. Hoeksma, R. Oosterwiijk, and T. Vredeveld. Posted price mechanisms for a random stream of customers. In EC, pages 169–189. ACM, 2017.
  • [11] J. Correa, P. Foncea, R. Hoeksma, R. Oosterwiijk, and T. Vredeveld. Recent developments in prophet inequalities. SIGecom Exch., 17(1):61–70, 2018. doi:10.1145/3331033.3331039.
  • [12] J. Correa, P. Foncea, D. Pizarro, and V. Verdugo. From pricing to prophets, and back! Oper. Res. Lett., 47:25–29, 2019. doi:10.1016/J.ORL.2018.11.010.
  • [13] S. Eshani, M. Hajiaghayi, M. Kesselheim, and S. Singla. Prophet secretary for combinatorial auctions and matroids. In SODA, pages 700–714. SIAM, 2018. doi:10.1137/1.9781611975031.46.
  • [14] P. R. Freeman. The secretary problem and its extensions: a review. Int. Stat. Rev., 51(2):189–206, 1983.
  • [15] G. Giambartolomei, F. Mallmann-Trenn, and S. Saona. Prophet inequalities: Separating Random Order from Order Selection. arXiv:2304.04024 [cs.DS].
  • [16] J. P. Gilbert and F. Mosteller. Recognizing the maximum of a sequence. J. Am. Stat. Assoc., 61(313):35–73, 1966.
  • [17] M. T. Hajiaghayi, R. D. Kleinberg, and T. Sandholm. Automated online mechanism design and prophet inequalities. In AAAI, pages 58–65, 2007.
  • [18] S. Har-Peled, E. Harb, and V. Livanos. Oracle-augmented prophet inequalities. In ICALP, pages 81:1–81:19, 2024.
  • [19] T. P. Hill and R. P. Kertz. A survey of prophet inequalities in optimal stopping theory. Contemp. Math., 125:191–207, 1992.
  • [20] M. Hoefer and B. Kodric. Combinatorial secretary problems with ordinal information. In ICALP, pages 133:1–133:14, 2017.
  • [21] R. P. Kertz. Stop rule and supremum expectations of i.i.d. random variables: a complete comparison by conjugate duality. J. Multivar. Anal., 19(1):88–112, 1986.
  • [22] B. Klefsjö. The hnbue and hnwue classes of life distributions. Naval Res. Logist. Quart., 29:331–344, 1982.
  • [23] B. Klefsjö. Testing exponentiality against hnbue. Scand. J. Stat., 10(2):65–75, 1983.
  • [24] B. Klefsjö. A useful ageing property based on the laplace transform. J. Appl. Prob., 20:615–626, 1983.
  • [25] U. Krengel and L. Sucheston. Semiamarts and finite values. Bull. Amer. Math. Soc., 83(4):745–747, 1977.
  • [26] U. Krengel and L. Sucheston. On semiamarts, amarts, and processes with finite value. Adv. Probab. Related Topics, 4:197–266, 1978.
  • [27] N. A. Langberg, R. V. Léon, J. Lynch, and F. Proschan. Extreme points of the class of discrete decreasing failure life distributions. Math. Oper. Res., 5(1):35–42, 1980. doi:10.1287/MOOR.5.1.35.
  • [28] D. V. Lindley. Dynamic programming and decision theory. J. R. Stat. Soc. Ser. C Appl. Stat., 10(1):39–51, 1961.
  • [29] B. Lucier. An economic view of prophet inequalities. SIGecom Exch., 16(1):24–47, 2017. doi:10.1145/3144722.3144725.
  • [30] M. Mahdian and A. Saberi. Multi-unit auctions with unknown supply. In CEC, pages 243–249. ACM, 2006. doi:10.1145/1134707.1134734.
  • [31] S. Oveis Gharan and J. Vondrák. On variants of the matroid secretary problem. Algorithmica, 67:472,497, 2013.
  • [32] Z. Poroziński. The full-information best choice problem with a random number of observations. Stoch. Process. Their Appl., 24:293–307, 1987.
  • [33] E. L. Presman and I. M. Sonin. The best choice problem for a random number of objects. Theory Probab. its Appl., 17(4):657–668, 1972.
  • [34] T. Rolski. Mean residual life. Bull. Int. Statist. Inst., 46:220–266, 1975.
  • [35] E. Samuel-Cahn. Comparisons of threshold stop rule and maximum for independent nonnegative random variables. Ann. Probab., 12(4):1213–1216, 1984.
  • [36] E. Samuel-Cahn. The best-choice secretary problem with random freeze on jobs. Stoch. Process. Their Appl., 55(2):315–327, 1995.
  • [37] E. Samuel-Cahn. Optimal stopping with random horizon with application to the full-information best-choice problem with random freeze. J. Am. Stat. Assoc., 91(433):357–364, 1996.
  • [38] T. Sandholm. Automated mechanism design: A new application area for search algorithms. In CP, pages 19–36, 2003.
  • [39] M. Shaked and J. G. Shanthikumar. Stochastic orders. Springer, 2007.