Designing Exploration Contracts
Abstract
We study a natural application of contract design in the context of sequential exploration problems. In our principal–agent setting, a search task is delegated to an agent. The agent performs a sequential exploration of boxes, suffers the exploration cost for each inspected box, and selects the content (called the prize) of one inspected box as outcome. Agent and principal obtain an individual value based on the selected prize. To influence the search, the principal a-priori designs a contract with a non-negative payment to the agent for each potential prize. The goal of the principal is to maximize her expected reward, i.e., value minus payment. Interestingly, this natural contract scenario shares close relations with the Pandora’s Box problem.
We show how to compute optimal contracts for the principal in several scenarios. A popular and important subclass is that of linear contracts, and we show how to compute optimal linear contracts in polynomial time. For general contracts, we obtain optimal contracts under the standard assumption that the agent suffers cost but obtains value only from the transfers by the principal. More generally, for general contracts with non-zero agent values for outcomes we show how to compute an optimal contract in two cases: (1) when each box has only one prize with non-zero value for principal and agent, (2) for i.i.d. boxes with a single prize with positive value for the principal.
Keywords and phrases:
Exploration, Contract Design, Pandora’s Box ProblemFunding:
Martin Hoefer: Supported by DFG Research Unit ADYN (project number 411362735) and DFG grant Ho 3831/9-1 (project number 514505843).Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Algorithmic game theory and mechanism designEditors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim ThắngSeries and Publisher:

1 Introduction
In many real-world situations, e.g., the search for a job candidate or house, a decision maker is faced with the choice between different alternatives of a-priori unknown value, which can be explored at some cost. Due to missing qualifications or time constraints, in many markets such exploration tasks are not executed directly by the decision maker. Rather, a decision maker (called principal in the following) can delegate the exploration to an agent , whose incentives are potentially misaligned with that of .
Indeed, suppose the principal intends to buy a house. They might know an inspection cost and have some prior knowledge about the value of each house. However, they might not be qualified to determine the exact market value by inspecting it. They decide to delegate the search to a real-estate agent. The agent can have an individual valuation for each house (e.g., a provision that is paid internally for selling the house). This individual value might also depend on the condition of the house, which is revealed only after inspection. Similar situations arise, e.g., when the principal wants to invest in a financial product and delegates this search to a financial agent.
In these settings, as a consequence of delegating the search, can merely observe the outcome of the search, not the other actions taken by . This leads to a possibly undesirable outcome for , even when they have the power to commit to accepting or rejecting certain outcomes [5].
A natural approach to align the incentives of an agent with the goals of the principal in such hidden-action settings are contracts. A contract allows for utility transfers to be made from the principal to the agent as a function of the outcome of the delegated task. While contract theory is a well-established [9] and celebrated [35] area in economics, the interest in algorithmic contract design has surged only relatively recently, see e.g., [20, 21, 17, 14, 27].
In this work, we initiate the study of algorithmic contract design for exploration problems. Specifically, we consider the following problem (for a fully formal definition, see Section 2): There are boxes, the -th of which contains a prize, drawn independently from a known probability distribution, and can be opened by paying a known exploration cost . The -th prize from box has some utilities and for and , respectively. Based on that information, commits to a contract, specifying a transfer for each prize, where we assume non-negative transfers (limited liability). Given such a contract, picks an adaptive search strategy that allows to optimize the trade-off between value of the selected prize (for ) and total exploration cost. uses the strategy to sequentially open boxes by paying the exploration cost. Depending on the observed prize, it determines the next box to open or to stop by selecting a prize from an open box. Assuming selects prize from box , the resulting final utility of is . The goal is to efficiently compute a contract that maximizes the expectation of the latter quantity. Here we assume that ties in the optimization of are broken in favor of , which is a standard assumption that makes the maximization problem well-defined.
For the problem of computing optimal contracts, there is a solution by linear programming that runs in time polynomial in the number of outcomes and the number of actions of the agent (cf., e.g., [20]). Note that this result does not yield a polynomial-time algorithm for our problem since the action space is extremely succinctly represented – its size is lower-bounded by the number of opening orders, .
Our Contribution.
We show that, in a variety of natural cases, optimal contracts can be computed in polynomial time (such as, e.g., optimal linear contracts, or optimal contracts when the agent has no intrinsic value). While our results leave open the question whether computing optimal contracts efficiently is always possible in general, our insights suggest that doing so is a highly non-trivial task and represents one (of several) very interesting avenue for future research.
Notably, faces an exploration problem known in the literature as the Pandora’s Box Problem. It was famously proposed and analyzed by Weitzman [39], who characterized optimal search strategies by a simple greedy rule. Starting from [33], this problem has recently received a lot of attention at the intersection of economics and computer science (see, e.g., [38, 26, 8, 10, 6, 25] and a recent survey [7]). We reformulate Weitzman’s characterization for ’s task: The fair cap of box is a value such that the expected amount of exceeding the fair cap is precisely . The boxes are considered in non-increasing order of fair caps. If the best (initially 0) observed so far is larger than the next fair cap, the next box is opened; if it is smaller, a corresponding prize is accepted. If neither is the case, is indifferent.
The first observation we make is that Weitzman’s policy does not solve ’s problem entirely. The reason is that breaks ties in favor of – recall that this is the standard assumption that makes the maximization problem well-defined. Note that the number of boxes that share some given fair cap may be linear in , leading to a number of possible ways of choosing the order that is exponential in . Indeed, the order of such boxes may matter to since different orders may lead to stopping with prizes that lead to vastly different utilities for . A complicating factor is that ties whether to select a prize with agent utility equal to have to be broken simultaneously. Interestingly, we show that the problem of breaking ties can be reduced – in polynomial time but in a non-obvious way – to solving different instances of the classic Pandora’s box problem.
With a solution of ’s problem at hand, the main trade-off inherent to an optimal contract is as follows: Transfers manipulate the fair caps, the resulting exploration order, and the acceptance decisions of in such a way that leads to a more favorable outcome for – but transfers are costly for which is unfavorable.
We start by considering linear contracts [30, 20], a class of simple and very intuitive contracts that has received a lot of attention due to its widespread use in applications. The agent receives a constant fraction of the principal’s revenue as commission. Formally, a linear contract is represented by a number , which defines transfers .
To compute linear exploration contracts, we identify a polynomial-time computable set of critical values of at which the set of -optimal policies (i.e., the set of policies that satisfy the above description) changes. By reasoning about the behavior of ’s utility between any two such values, we can argue that the optimal contract needs to be a critical value. This type of argument is reminiscent of arguments from the recent literature (e.g., [17, 22, 16]), but here it requires special care. Due to the tie breaking in favor of , which – as discussed above – is a key property here, we need to keep track of the set of -optimal policies.
For standard contract design, the authors in [20] describe a class of instances in which the restriction to linear contracts leads to a multiplicative loss of in ’s achievable utility, where is the number of actions. In this class of instances, action with cost deterministically leads to an individual outcome with value . We can create one box for each action , with opening cost and a single outcome with agent utility and principal utility . This leads to a straightforward one-to-one correspondence between contracts in the two instances that preserves utilities and linearity. Thus, the multiplicative loss of transfers to our model, which motivates our search for optimal general exploration contracts.
We first consider general contracts under the standard assumption that , i.e., that the agent has no intrinsic motivation to explore any boxes and is only motivated by payments from . In contract design, this assumption is usually without loss of generality, since arbitrary intrinsic agent values can be accounted for with action costs to obtain an equivalent instance with agent values . However, for the exploration contracts we consider, such an adjustment changes the costs of exploration strategies (rather than individual boxes). This substantially changes the structure of the problem. Indeed, as we see below, optimal policies for the problem with non-zero agent value have substantially different properties than the ones for zero value.
Clearly, the utility that can extract by solving the exploration problem themselves (i.e., ignoring the agent and paying the cost themselves) is a straightforward baseline; no policy can extract more value for . Solving the problem from ’s perspective yields fair caps . If all , then by transferring any value exceeding to , the -optimal policy is adopted by , without a loss in utility for . This implies, in particular, that delegating the exploration to using an optimal contract yields the same utility for as if she performed the exploration on her own. Formally, the idea is somewhat inspired by [33] for the original Pandora’s Box problem but different.
An orthogonal special case that has been considered in the literature (see, e.g., [5]) is that of binary boxes. That is, each box contains one of only two prizes, and one of them has value for both and while the other one is arbitrary. Without any payments, the fair cap (from ’s perspective again) of each box is its basic fair cap. The principal now faces the question of whether to lift (by transfers) the basic fair caps of favorable boxes above (or up to) the fair caps of less favorable boxes. While the latter is true in general, the special structure of binary boxes allows us to compute an optimal exploration contract by considering boxes in non-increasing order of their basic fair caps and moving their fair cap greedily. Note that this algorithm alone does not suffice to tackle more general cases. Indeed, when boxes are not binary, it may be profitable for the principal to choose different payments for two boxes with the same distribution and the same fair caps – a condition we encounter in the next and final case.
The final case we consider has a more general value structure, but we compromise on the boxes having different distributions. Specifically, we consider instances with only a single positive prize for , and all distributions are identical. Still, transfers can be made depending on the box from which the prize originates. Again, the complexity is reduced by the fact that only a single payment per box has to be decided. Through an intricate sequence of exchange arguments and the help of continuization, we establish that there is an optimal contract with a simple structure: (i) In a first phase, the prize with positive principal value gets immediately accepted, and all transfers are identical; (ii) all transfers for boxes in the second phase are also identical. These conditions allow to find the optimal contract by enumerating a polynomial-sized set of contracts.
In this final case, we also give a family of instances in which both the above phases exist and have substantial lengths in any optimal solution. Hence, perhaps surprisingly, such a relatively simple case already has optimal contracts with a fairly complicated structure.
Overview.
After a review of related work in the subsequent section, we formally introduce our problem in Section 2 along with preliminary observations. Section 3 treats linear contracts. Section 4 deals with general contracts in several special cases (no agent value in Section 4.1 and binary boxes in Section 4.2). General contracts for i.i.d. boxes with a single positive prize are discussed in Section 4.3. Missing proofs are provided in the full version of the paper.
1.1 Related Work
Contract design has recently gained a lot of attention from a computational point of view. Duetting et al. [20] advocated the study of linear contracts as an alternative to the more complicated (and sometimes unintuitive) general contracts. They show robustness results and give parameterized approximation guarantees w.r.t. optimal (general) contracts. For settings in which the outcome space is succinctly represented, hardness results are shown in [21]. In a similar but different approach, combinatorial contracts [17] let the agent choose a subset of actions which stochastically determines the (binary) outcome. Important recent work in the area of combinatorial contracts includes [22], [16]. Contracts were also studied with multiple agents [18, 16], ambiguity [19], and private types [2] with connections to classic mechanism design [1]. More generally, contracts have been of interest in specific application domains, such as classification in machine learning [36].
Most closely related is prior work by Postl [34], who analyzes our contract-design problem for two boxes without intrinsic agent value. The paper provides an explicit characterization of the chosen box in the optimal contract, but neither considers algorithmic aspects nor the reduction to the standard Pandora’s Box problem we discover in Sec. 4.1. Concurrent to our work, Ezra et al. [23] study the special case without agent value. In this special case, they also give a polynomial-time algorithm for linear contracts (without discussing the issues of tie-breaking in favor of ). For general contracts, the paper studies a technical approach when there is a constant number of prizes. Finally, the authors show NP-hardness for correlated distributions in the boxes.
Delegation is a related approach in the principal–agent framework and also received a lot of attention in the economics literature starting from seminal work of Holstrom [31]. Computational aspects of delegation are receiving interest recently, initiated by Kleinberg and Kleinberg [32]. In these models, the principal delegates a (search) task to an agent , again with potentially different interests, who has to inspect alternatives and propose an observed prize to . Instead of committing to a contract with payments, limits the set of prizes she is willing to accept upon proposal by . A prominent objective is to find good acceptance policy for and bound their performance by the delegation gap, which measures the multiplicative loss in utility for in comparison to the case when performs the (undelegated) search problem themselves. On a technical level, there are close connections to prophet inequalities [32] and online contention resolution schemes, which were established in different variants, such as multiple agents [37], stochastic probing [4, 5], as an online problem [11], or with limited information about agent utilities [13]. Perhaps most related is a version of the problem studied in [5]. Here, a principal delegates an instance of the Pandora’s Box problem by committing to a set of acceptable outcomes. An agent has to perform the costly inspection of alternatives. This standard delegation setting has some obvious limitations, e.g., the agent would never perform an inspection if his expected intrinsic value does not cover the inspection cost. A model variant with transfers is also studied in [5] in which the principal can reimburse the agent for exploration costs. Crucially, this is impossible in hidden-action principal-agent settings that we consider, where simply cannot directly reimburse for his expenses, since she cannot observe which inspections have been performed by .
2 Preliminaries
There are two parties, a principal and an agent , and boxes. Each box has an opening cost and contains one of possible prizes. Prize in box yields a value-pair , where is the value for and the value for . For each box , the prize in the box is distributed independently according to a known prior. We denote by the probability that prize is in box .
would like to motivate to open and inspect boxes in order to find and select a good prize (for ). cannot observe the actions taken by (i.e., which boxes were opened in which order), only the final selected box with the prize inside is revealed. Instead, can commit to an exploration contract (or simply, contract) : For each selected prize , she specifies an amount 111We make the natural assumption that , i.e., every transfer in the contract is bounded by the prize value of . For all scenarios we study here (linear contracts, no agent value, binary boxes, i.i.d. with single positive prize for ) it is straightforward to see that this assumption is w.l.o.g. – there is an optimal contract that satisfies for all . It is a very interesting open problem to prove that this holds beyond the cases we study here. that she pays to . Then the final utility of is .
Our model is an instance of the standard principal-agent model, where the hidden action of the agent is the policy to inspect and select from the boxes. Specifically, the task of is to open and inspect boxes to find and select at most one prize from an opened box. Formally, a policy specifies in every situation, based on the previously seen prizes, which box to open next, if any, or otherwise which prize to select from an opened box. Given a contract , she is facing the classic Pandora’s Box problem [39]: She obtains a utility of for the selected prize ( if no prize is selected) minus the opening cost for all opened boxes. Weitzman [39] shows that every optimal policy of that maximizes his expected total utility (called -optimal policy in the following) is in the form of the following index policy: For each box compute an index or fair cap such that
The boxes are considered in non-increasing order of fair caps. Suppose box is the next unopened box in this order, and the best prize has found thus far is . If no box has been opened, then , and otherwise for some where is a box that has already been opened, and prize has been found in it. There are three cases: (1) : opens box ; (2) : is indifferent between opening box and stopping opening boxes; and (3) : stops opening boxes. Note that any box with is never opened. For each we define the capped value . A policy as described above always selects the prize in box , i.e., the prize with the largest capped agent value.
Even under these conditions, there may still be several -optimal policies. Specifically,
-
(i)
the choice of is not unique when ,
-
(ii)
the non-decreasing order of fair caps may not be unique,
-
(iii)
in the case above, the choice of whether to stop or continue is not unique,
-
(iv)
among the observed prizes, the prize maximizing may not be unique.
We assume that breaks ties in favor of 222This is a standard assumption in bi-level problems. On a technical level, it ensures that the optimization problem of finding an optimal contract for is well-defined.. Among the set of -optimal policies, she selects a -optimal one, i.e., a policy that maximizes the expected utility for . (Note that such a policy may be vastly different from an -optimal policy among all policies.) We call a policy that is selected in this way an optimal policy (under contract ).
In case (iv), it is clear how to resolve the ambiguity in favor of : Simply select maximizing . The other cases, however, can give rise to a very large number of potential policies. Interestingly, in the full version of the paper we show that it is always possible to implement the tie-breaking in polynomial time and find an optimal policy under contract . We prove this result through a reduction to the original Pandora’s Box problem.
Theorem 1.
Given a contract , an optimal policy can be computed in polynomial time.
We study contracts for that steer the index policy executed by towards good outcomes for . We explore both linear and general contracts. A linear contract is given by a single number , and . Linear contracts are popular because of their simplicity, but they often suffer from substantial limitations of the achievable revenue. As such, we also explore general contracts, in which we only require every payment to be non-negative for every and . A (linear) contract that, among all possible (linear) contracts, achieves maximum expected utility for when executes an optimal policy under , is called optimal (linear) contract.
3 Optimal Linear Contracts
In this section we consider linear contracts. Recall that these are contracts characterized by a single such that for all . Our result is the following.
Theorem 2.
An optimal linear exploration contract can be computed in polynomial time.
Let such that the set of -optimal policies (i.e., index policies) is the same for all . Consider the expected utility of according to an optimal policy (i.e., an -optimal policy among this set of -optimal policies) as a function of within the interval . The following straightforward observation implies that this function is a linear and non-increasing function of .
Observation 3.
For every policy, there exists a constant such that the expected utility of as a function of is .
Hence, within , an optimal contract is . To find a polynomial-time algorithm for computing the global optimum , it therefore suffices to show that, in polynomial time, one can find a partition of the interval into (polynomially many) subintervals such that, in each subinterval, the set of -optimal policies is constant. (Strictly speaking, we will encounter a technicality at the endpoints of the subintervals.) In the following, we will define a (polynomial-time computable) set of critical values in such that, in an interval (strictly) between any two consecutive critical values, the set of -optimal policies is constant.
Towards this definition, for any given and every , let denote the value of prize in box for . Furthermore, let denote the (unique) fair cap of box with as a function of , i.e., it holds that
for all . Note that is a continuous function of in .
We call a critical value if or one of the following properties holds:
-
(a)
There exist boxes with such that the order of their fair caps changes at . Formally, , and there exists such that for all or for all .
-
(b)
There exist with and such that the order between the value of prize for and the fair cap of box changes. Formally, , and there exists such that for all or for all .
-
(c)
There exists with such that .
Observe that the critical values indeed have the property that, for any two such values , the set of -optimal policies is constant within . Note that any -optimal policy is also -optimal under both and , but there are potentially additional -optimal policies under or under . Together with Observation 3, this implies that the optimal contract is a critical value.
The following auxiliary lemma supports the analysis of the number of critical values.
Lemma 4.
For each with , the fair cap is a monotone convex piece-wise linear function with at most linear segments.
Proof.
For any , it holds by definition. An equivalent formulation would be
where . Hence, is the weighted average (minus offset ) of all that are in , i.e., above . For a fixed set , this means that the fair cap is linear in , as every is linear in . Therefore, the slope of only changes at intersections with some , where the set changes. We argue now that the fair cap can only increase at those intersections (cf. Fig. 1).
For every and with , but for some , there are two cases:
-
1.
. Then , but for some . Hence the slope of was greater than the slope of before the intersection at . The new weighted average of affine functions that are above the fair cap can only increase when is not contributing to that average anymore.
-
2.
. Then , but . Hence the slope of was less than the slope of before the intersection at . The new weighted average of affine functions that are above the fair cap can only increase when is starting to contribute to that average.
Thus the slope of is never decreasing when increases, which makes convex. As is an affine function, every can intersect at most twice with . Therefore, there are at most linear segments of on the interval .
According to Lemma 4, there are at most critical values induced by case (c). By the same lemma, there are at most critical values for every box with induced by case (b), as is convex and there are affine functions , each of which intersects at most twice. Thus, critical values in total are due to cases (b) and (c). Similarly, each linear segment of can intersect with another convex function at most twice. As there are at linear segments for and possible functions , there are at most such intersections for . Thus, case (a) induces at most critical values. Overall, the number of critical values is polynomial in and , and the set of them can be computed in polynomial time.
By Theorem 1, we can compute an optimal contract for any given critical value in polynomial time. The expected utility of under this contract can also be computed in polynomial time. Thus, we can enumerate all critical values and take the best contract with respect to the expected utility for , implying Theorem 2.
4 Optimal General Contracts
4.1 No Intrinsic Agent Value
In the consideration of contract problems, it is often assumed that the agent has no intrinsic value (only cost) and receives benefit only via transfers from the principal. In this case, we have for all . We show that under this assumption, we can compute an optimal contract in polynomial time.
Theorem 5.
Suppose that holds for all . Then an optimal exploration contract can be computed in polynomial time.
Without any transfers from , has no intrinsic motivation to open any box – except the ones with inspection cost 0. Specifically, if the agent opens box , the (expected) payments from have to cover the inspection costs for , i.e., it holds that . As a direct consequence, cannot obtain more utility from a contract with over the one obtained by exploring boxes and paying inspection costs by herself (she can simply imitate ’s behavior under the contract).
Definition 6.
A given contract implements a policy if (1) is optimal under , and (2) for all .
As observed above, an optimal contract cannot yield more utility for than an optimal policy that she can apply by herself (paying the costs herself). Hence, the next lemma implies Theorem 5.
Lemma 7.
There exists a contract that implements .
Proof.
Recall that the following is an optimal policy for (when she pays the costs herself). Define fair caps for each box such that . Then opens boxes in non-increasing order of their fair caps as long as the best prize observed so far does not exceed the fair cap of the next box.
A contract that implements is given by payments for all . Clearly, we have
(1) |
Consequently, is faced with an instance of the Pandora’s Box problem where her expected prize in each box equals the opening cost. Thus, if applies the (-optimal) index policy for the emerging instance, the fair caps for are given by for all .
This means that is entirely indifferent about the opening order. As long as there are only prizes with , is also indifferent about stopping to open further boxes and selecting any previously observed prize at any point in time. Thus, ’s behavior under these conditions can be assumed to be consistent with the one imposed by .
Restrictions only arise from the fact that must stop immediately whenever a prize with is drawn from box , according to ’s index policy. However, this is consistent with the behavior of the index policy for : Note that if and only if . Hence, also stops when prize is drawn from box , because exceeds the fair cap of box and, thus, the fair cap of the next box in the order.
Therefore, is an optimal policy under contract . Together with (1) this shows that implements .
4.2 Binary Boxes
We study a subclass of the problem where every box contains two prizes with positive probability. One of those prizes has value 0 for both and , and is called 0-prize. The other prize (the positive prize) is denoted by the value pair with and has probability . Consequently, the 0-prize is drawn from box with probability . Note that there cannot be a positive payment for the 0-prize. Thus, payments for box are given by a single , the payment for the positive prize.
Depending on a payment of , we define the fair cap of box for by
(2) |
Lemma 8.
Given a contract , there is an optimal policy that has the following properties. opens boxes in order of the fair cap defined by (2). Boxes with the same fair cap are ordered according to the value . accepts the first box with a positive prize (if any).
Proof.
As discussed in Section 2, the fair cap for boxes is unique whenever . For boxes with we also assume that the fair cap is given as in (2). This is the smallest fair cap for this box and defers the inspection of box to the latest point. Clearly, the agent is indifferent regarding this choice. Consequently, it is -optimal to stop immediately and accept as soon as opens a box with a positive prize in it, since the value is .
To argue that this behavior is also -optimal, suppose some box with gets a higher fair cap and gets opened earlier (in accordance with the resulting index policy of ). With such fair cap for , does not stop until all subsequent boxes with fair cap are opened. As such, we can defer the opening of box until after these boxes are opened. Now if there are several boxes with the same fair caps, it is optimal for if the opening is done in non-increasing order of . This guarantees that there is no subsequent box with the same fair cap and a prize that has a better value for . Conversely, any box with the same fair cap and potentially better value for has been inspected before. As such, using minimal fair caps for boxes with , breaking ties w.r.t. , and stopping as soon as a positive prize is observed is also -optimal.
Let us define a basic fair cap of box to be . A box with negative basic fair cap would not be considered by unless the payment is at least , in which case the fair cap would be precisely . Note that boxes with prohibitively high cost are w.l.o.g. never opened by under any contract . In what follows, we exclude such boxes from consideration. For the remaining boxes , and, hence, . Whenever , the exploration cost exceeds the expected value for in box . As such, a transfer of is clearly necessary to motivate to inspect box at all. We renormalize the box by adjusting the value of the positive prize to . Then the basic fair cap becomes . Indeed, assuming is w.l.o.g.: Consider any contract in which for all in some non-empty set of boxes , and the optimal policy under . Note that choosing a contract where is increased to for such boxes does not hurt: This will increase the fair caps in to 0. An -optimal policy for can be obtained from by opening the boxes in the end if no prize has been found yet, in any order. The resulting utility for in such cases is non-negative rather than 0 (as ); the utility in all other cases does not change.
In the remainder of the section, we consider the instance with renormalized boxes, i.e., each box has a basic fair cap . Furthermore, we re-number the boxes and assume that the indices are assigned in non-increasing order of basic fair caps, i.e. for all . We break ties in the ordering w.r.t. non-increasing value of .
Our next observation will allow us to restrict attention to the permutation and the fair caps in the policy of . By (2), fair caps are linear in payments. Clearly, for an optimal contract we strive to set smallest payments (and, hence, smallest fair caps) to induce a behavior of . For any given contract , we say implements an ordering of boxes if there is an optimal policy under that considers boxes in the order of . For the reverse direction, we need a slightly more technical definition.
Definition 9.
For any given ordering of boxes, we define a contract as follows: For each position (occupied with box ), let be the latest position of any subsequent box (including itself) with the highest basic fair cap of the subsequent boxes. In , we assign payments to the unique positive prizes for the boxes on positions such that they all have a fair cap of exactly . If has feasible payments and implements , we call the canonical contract for .
We have the following lemma concerning whether is the canonical contract for .
Lemma 10.
For any ordering , we can decide in polynomial time if is the canonical contract for (and compute it in this case). If is not the canonical contract for , then every contract that implements is suboptimal for the given instance. Otherwise, the canonical contract has point-wise minimal payments among all contracts that implement .
Proof.
The construction of and the subsequent feasibility check described in Definition 9 can be done in polynomial time. Note that if is the canonical contract for , it uses point-wise minimal payments for implementing : Having less payments for any box would directly contradict implementation of , since every optimal policy considers boxes in the order of their fair caps, which are lower bounded by the respective basic fair caps. Thus, if is not the canonical contract for due to infeasible payments, then implementing is impossible with any contract, because even higher payments would be necessary. If is not the canonical contract for , but the construction in Definition 9 defines feasible payments, it defines a canonical contract of some other ordering using point-wise minimal payments. Note that instead of only emerges because of tie-breaking in favor of , which implies that boxes with identical fair caps have to be considered in non-increasing order of . Thus it could be possible to implement with another contract , but additional payments would be required to remove (some of the) ties. However, since tie-breaking was already done in favor of , this contract clearly yields less utility than and hence is not an optimal contract for the given instance.
In the following, we will discuss how to compute the fair caps for the policy of in an optimal contract. By Lemma 10, the order that we compute in this way has to be implemented by .
Let denote the expected utility for under a contract given by fair caps . Algorithm 1 computes optimal fair caps for all boxes. Initially, the fair cap of each box is given by its basic fair cap. Then we calculate the optimal fair cap of boxes iteratively. In iteration of the outer for-loop, we find the optimal fair cap for box among the fair caps of previous boxes. Crucially, it is sufficient to test which of those fair caps yield maximum utility, even if the final fair caps of subsequent boxes are not determined yet. Note that by (2), to raise the fair cap of any box to , the required payment is a constant independent of the probabilities and values in box – more precisely, it is exactly .
Theorem 11.
Algorithm 1 computes an optimal exploration contract for binary boxes in polynomial time.
Proof.
For the proof, we show inductively that the decision of the algorithm in each iteration of the outer for-loop regarding box is optimal. Formally, there is an optimal ordering of all boxes that, when restricted to boxes , is identical to the ordering computed by the algorithm after iteration . By Lemma 10, this implies that the selected fair cap for box is optimal (and, thus, the payment in ), as boxes are numbered and considered in a non-increasing order of their basic fair caps.
For the rest of the proof we show the following statement. For every , let denote the ordering of boxes after iteration of the outer for-loop in Algorithm 1. There is an optimal ordering such that the relative order of boxes is identical in and .
We proceed by induction. The statement is trivially true for . Now suppose the statement holds for every of the first iterations, and consider iteration . Let denote the position of box w.r.t. boxes in , and let denote its position . Note that position corresponds to a position in the overall ordering (since for definition of we ignore boxes ). Clearly, the statement holds when .
For the two remaining cases ( and ) we show that the optimal ordering can be changed sequentially into one that has without decreasing the expected value for . We discuss the proof for case 1: . The proof of the other case is very similar (as seen in the full version of the paper).
Consider . Now suppose box was brought to position (w.r.t. first boxes) in . The fair cap of must be strictly increased, since otherwise would receive the same payment and the same position in and (due to optimal tie-breaking by the agent in non-increasing order of principal value). Note that position corresponds to a position in .
Consider the set of boxes located between positions and in . Consider the first box with and the position in . Let be the boxes located before box in positions . We use for the combined probability of selection in and for the combined expected value of a selected box in .
Let be the remaining value of box after a payment that lifts the fair cap of to the basic fair cap of . We split the analysis into two cases: and . If , consider the move of to position . Since there are only boxes in , we could place box on position in – right after boxes from , similar to the position of box in . We use to denote the payment required to lift the fair cap of box from its basic fair cap to the one required at position . Moreover, is the additional payment required to lift the fair cap of box from the one at position to the one at . Since the algorithm places at instead of , we know that
which implies
Suppose now we move from to , then we have the same terms with instead of . Clearly, since we know that the move is also profitable. This means that we can move to position without deteriorating the value of . Thereby we reduce the number of boxes between the positions of in and .
For the second case consider . We here consider two moves in – either swap box right after box , or swap box right before box . Neither of these moves shall maintain the value of , and we show that this is impossible.
Let be the boxes located between and in . We use and to denote selection probability and expected value of selected box, respectively. Now if we swap box after box , a strict decrease in value yields
Note that ensuring the same fair cap as box at position is enough, since and box is then sorted after box . We obtain
and, since ,
(3) |
For the other move, we see that a strict decrease in value yields
Note that ensuring the same fair cap as box at position is enough, since and box is then sorted before box . We obtain
so
This implies , since otherwise we derive , a contradiction. Now, if , , and , the above implies
(4) |
At least one swap (either moving right after , or moving right before ) must be non-deteriorating in terms of solution value. Using this swap, we can change and decrease the set of agents between and without loss in value.
Now using these swaps (either the one for or one of the two for ) iteratively, we can turn into an ordering such that there are only boxes in . Then moving box from to is not harmful. We let denote the overall probability that a box in is chosen and the expected value. Let be the additional payment required to raise the fair cap from the one of position to the one of position .
Indeed, our algorithm could have placed box at position , but preferred to place in position . This means that
which also reflects the change in value if the move is executed in . This proves the inductive step when .
4.3 I.I.D. with Single Positive Prize for Principal
Finally, we study the subclass of the problem where all distributions are identical, with the further restriction that there is only a single positive prize for the principal. Even this restriction has a perhaps surprisingly complicated solution. We show the following result.
Theorem 12.
There exists an algorithm that computes an optimal exploration contract for the i.i.d. case with a single positive prize for the principal in polynomial time.
We introduce some simplified notation for this subclass. All of the boxes have an identical distribution over prizes and the same opening cost . For each prize , the utilities for and are and , respectively. Without loss of generality, prize is the only prize for which has a positive value, i.e., for all . The probability that prize is drawn when a box is opened is denoted by . For the ease of notation, we define and . For each box , defines a (non-negative) payment of for prize . We may assume and as otherwise would be the unique transfers.
We make two further assumptions that are without loss of generality. First, the values for are unique: If there are with , replace both prizes by a new one with probability . Second, we assume .
Similarly as in Section 4.2, let be the fair cap of a box for when payment is made for prize of that box. We call the basic fair cap. Note that there is an optimal contract in which each fair cap is either or some value , where , with . The reason is that, if that were not the case, we could decrease the payment for each box, until that property is satisfied while keeping the policy optimal. Importantly, the agent value of outcome 0 (which is given by ) need not be considered here: Whenever outcome 0 is drawn from a box with , the process stops immediately since, by definition of the fair cap and the index policy, it must hold , where is the fair cap of the next box.
Observe that can be achieved with different payments , as long as . This may influence the stopping behavior of as the payment may determine the order of and for some prize . To achieve some fair cap larger than , however, there is a unique payment that achieves this fair cap because changing a values above the fair cap that occurs with non-zero probability always changes the fair cap by definition.
Now consider an optimal contract. Index the boxes in the order in which they are opened, and let be the corresponding fair caps. We first concentrate on a potential set of basic boxes with in the final rounds of the process.
If , then every positive payment of would increase the fair cap to above . Therefore, if basic boxes with fair cap exist in an optimal contract, must hold for all these boxes.
Otherwise, if , suppose is the best other prize for that is not exceeding the basic fair cap, i.e., . Observe that for every basic box , an optimal payment either fulfills for some , or . W.l.o.g., when prize 0 is drawn in a basic box with , the process stops (by tie-breaking in favor of , who cannot improve), and when an outcome with is drawn in any round before , the process continues (again by tie-breaking in favor of , who gets if is accepted).
We next show that all basic boxes with can be assumed to be opened first among the set of basic boxes. The argument is an exchange argument and given in the full version of the paper.
Lemma 13.
There is an optimal contract where all basic boxes with are opened first among the set of basic boxes.
We now divide the process into two phases. In every round of Phase 1, and prize 0 gets accepted by immediately if it is found. In Phase 2, and prize 0 is not accepted immediately (only if in the last round a prize 0 represents the best one for ). Suppose Phase 1 contains boxes and Phase 2 boxes . Note that either phase might be empty.
Before analyzing the process, we show an example that shows the perhaps counter-intuitive fact that the optimal contract might indeed involve two such phases, each of substantial length. Moreover, even though all boxes have the same distribution and the same fair cap, in the optimal contract they are assigned one of two different payments depending on their position in the inspection sequence.
Example 14.
Consider an instance with . Prize 0 has value and for some determined below, as well as . The other prizes have , and , . The inspection cost is , and the basic fair cap is .
Clearly, if prize 1 is found in a box, then always accepts. If prize 2 is found, continues to inspect further boxes. Suppose draws a prize 0 in the first round. It is rather unlikely that will run through all remaining boxes without finding a prize 1 and eventually accept prize 0 from round 1. Specifically, this probability is . Now can set , leading to direct acceptance by and immediate profit of . Alternatively, can decide to set (and, in an optimal contract, then throughout) gambling for a high profit of in the end. The gambling strategy is unprofitable if
Suppose satisfies this inequality, then round 1 is part of Phase 1 in an optimal contract. Via the same calculation, we can determine further rounds of Phase 1. accepts prizes 0 and 1 in all earlier rounds , so reaching round is only possible if prize 2 has been found in all these rounds.
Now suppose for some
Then wants to set for all , i.e., this becomes Phase 1 with direct acceptance of prize 0. For all rounds , it is more profitable to gamble that no prize 1 will arrive in the remaining rounds and will secure a value of . This represents Phase 2.
Turning to the analysis, we start by examining Phase 1. Using a relatively straightforward exchange argument given in the full version of the paper, we show the following.
Lemma 15.
There is an optimal contract in which all boxes of Phase 1 have the same fair cap.
In Phase 2, all boxes have fair cap , but this does not exclude the possibility of different payments. If the first box of Phase 2, box , is about to be opened, the highest agent value observed so far (if any) is at most . In particular, prize 0 was never observed at that point, since this would have led to acceptance of in an earlier round.
During Phase 2, the process only stops early when an outcome is drawn. In this case, the utility for is 0. If the process does not stop early, receives a utility only if outcome 0 is the best outcome for the agent among all outcomes.
could have different payments for different boxes as long as, for every box , there is some such that . If an outcome from a box with payment (for some ) is the maximum after rounds, then
-
outcome 0 must have been drawn from any number of boxes that have this payment , and
-
in all boxes with a payment of at most , some outcome was drawn, and
-
in all remaining boxes, some outcome was drawn.
Let denote the number of boxes of Phase 2 with such payment, for every . Furthermore, for every , let denote the combined probability of all outcomes (other than 0) that are not better for , and denote the number of boxes with the according payment. The probability that outcome 0 with a payment of is the best outcome after rounds is then given by
Thus, the expected utility for for Phase 2 is
With this at hand, we show the following lemma in the full version of the paper.
Lemma 16.
There is an optimal contract in which all boxes of Phase 2 have the same payment for prize 0. There is such that (and for all ).
By the previous lemmas, it suffices to enumerate all combinations of the length of Phase 1, the fair cap in Phase 1, and the unique payment of prize 0 for Phase 2 (determined by a prize to be targeted, a payment of 0, or a payment resulting in value for for prize 0). These are polynomially many combinations, and for each combination the expected utility of for the resulting contract can be computed in polynomial time. Theorem 12 follows.
References
- [1] Tal Alon, Paul Duetting, Yingkai Li, and Inbal Talgam-Cohen. Bayesian analysis of linear contracts. In Proc. 24th Conf. Econ. Comput. (EC), page 66, 2023. doi:10.1145/3580507.3597795.
- [2] Tal Alon, Paul Dütting, and Inbal Talgam-Cohen. Contracts with private cost per unit-of-effort. In Proc. 22nd Conf. Econ. Comput. (EC), pages 52–69, 2021. doi:10.1145/3465456.3467651.
- [3] Yakov Babichenko, Inbal Talgam-Cohen, Haifeng Xu, and Konstantin Zabarnyi. Regret-minimizing bayesian persuasion. Games Econ. Behav., 136:226–248, 2022. doi:10.1016/J.GEB.2022.09.001.
- [4] Curtis Bechtel and Shaddin Dughmi. Delegated Stochastic Probing. In Proc. Symp. Innov. Theoret. Comput. Sci. (ITCS), pages 37:1–37:19, 2021. doi:10.4230/LIPICS.ITCS.2021.37.
- [5] Curtis Bechtel, Shaddin Dughmi, and Neel Patel. Delegated Pandora’s box. In Proc. 23rd Conf. Econ. Comput. (EC), pages 666–693, 2022. doi:10.1145/3490486.3538267.
- [6] Hedyeh Beyhaghi and Linda Cai. Pandora’s problem with nonobligatory inspection: Optimal structure and a PTAS. In Proc. 55th Symp. Theory Comput. (STOC), pages 803–816, 2023. doi:10.1145/3564246.3585217.
- [7] Hedyeh Beyhaghi and Linda Cai. Recent developments in Pandora’s Box problem: Variants and applications. ACM SIGecom Exchanges, 21(1):20–34, 2023. doi:10.1145/3699814.3699817.
- [8] Hedyeh Beyhaghi and Robert Kleinberg. Pandora’s problem with nonobligatory inspection. In Proc. 20th Conf. Econ. Comput. (EC), pages 131–132, 2019. doi:10.1145/3328526.3329626.
- [9] Patrick Bolton and Mathias Dewatripont. Contract Theory. MIT Press, 2005.
- [10] Shant Boodaghians, Federico Fusco, Philip Lazos, and Stefano Leonardi. Pandora’s box problem with order constraints. Math. Oper. Res., 48(1):498–519, 2023. doi:10.1287/MOOR.2022.1271.
- [11] Pirmin Braun, Niklas Hahn, Martin Hoefer, and Conrad Schecker. Delegated online search. In Proc. Int. Joint Conf. Artif. Intell. (IJCAI), pages 2528–2536, 2023. doi:10.24963/IJCAI.2023/281.
- [12] Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Online Bayesian persuasion. In Proc. Conf. Adv. Neural Inf. Processing Syst. (NeurIPS), 2020.
- [13] Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. Bayesian agency: Linear versus tractable contracts. In Proc. 22nd Conf. Econ. Comput. (EC), pages 285–286, 2021. doi:10.1145/3465456.3467602.
- [14] Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. Designing menus of contracts efficiently: The power of randomization. In Proc. 23rd Conf. Econ. Comput. (EC), pages 705–735, 2022. doi:10.1145/3490486.3538270.
- [15] Bolin Ding, Yiding Feng, Chien-Ju Ho, Wei Tang, and Haifeng Xu. Competitive information design for pandora’s box. In Proc. Symp. Discret. Algorithms (SODA), pages 353–381, 2023. doi:10.1137/1.9781611977554.CH15.
- [16] Shaddin Dughmi, Neel Bharatkumar Patel, Aditya Prasad, and Ram Deo-Campo Vuong. On supermodular contracts and dense subgraphs. In Proc. Symp. Discret. Algorithms (SODA), pages 109–132, 2024. doi:10.1137/1.9781611977912.6.
- [17] Paul Dütting, Tomer Ezra, Michal Feldman, and Thomas Kesselheim. Combinatorial contracts. In Proc. Symp. Found. Comput. Sci. (FOCS), pages 815–826, 2021. doi:10.1109/FOCS52979.2021.00084.
- [18] Paul Dütting, Tomer Ezra, Michal Feldman, and Thomas Kesselheim. Multi-agent contracts. In Proc. Symp. Theory Comput. (STOC), pages 1311–1324, 2023. doi:10.1145/3564246.3585193.
- [19] Paul Dütting, Michal Feldman, and Daniel Peretz. Ambiguous contracts. In Proc. 24th Conf. Econ. Comput. (EC), page 539, 2023.
- [20] Paul Dütting, Tim Roughgarden, and Inbal Talgam-Cohen. Simple versus optimal contracts. In Proc. 20th Conf. Econ. Comput. (EC), pages 369–387, 2019. doi:10.1145/3328526.3329591.
- [21] Paul Dütting, Tim Roughgarden, and Inbal Talgam-Cohen. The complexity of contracts. SIAM J. Comput., 50(1):211–254, 2021. doi:10.1137/20M132153X.
- [22] Paul Dütting, Michal Feldman, and Yoav Gal Tzur. Combinatorial contracts beyond gross substitutes. In Proc. Symp. Discret. Algorithms (SODA), pages 92–108, 2024. doi:10.1137/1.9781611977912.5.
- [23] Tomer Ezra, Michal Feldman, and Maya Schlesinger. Sequential contracts. CoRR, abs/2403.09545, 2024. doi:10.48550/arXiv.2403.09545.
- [24] Yiding Feng, Wei Tang, and Haifeng Xu. Online Bayesian recommendation with no regret. In Proc. 23rd Conf. Econ. Comput. (EC), pages 818–819, 2022. doi:10.1145/3490486.3538327.
- [25] Hu Fu, Jiawei Li, and Daogao Liu. Pandora box problem with nonobligatory inspection: Hardness and approximation scheme. In Proc. 55th Symp. Theory Comput. (STOC), pages 789–802, 2023. doi:10.1145/3564246.3585229.
- [26] Anupam Gupta, Haotian Jiang, Ziv Scully, and Sahil Singla. The markovian price of information. In Proc. Int. Conf. Integer Prog. and Comb. Opt. (IPCO), pages 233–246, 2019. doi:10.1007/978-3-030-17953-3_18.
- [27] Guru Guruganesh, Jon Schneider, Joshua R. Wang, and Junyao Zhao. The power of menus in contract design. In Proc. 24th Conf. Econ. Comput. (EC), pages 818–848, 2023. doi:10.1145/3580507.3597735.
- [28] Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. Prophet inequalities for Bayesian persuasion. In Proc. Int. Joint Conf. Artif. Intell. (IJCAI), pages 175–181, 2020. doi:10.24963/IJCAI.2020/25.
- [29] Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. The secretary recommendation problem. Games Econ. Behav., 134:199–228, 2022. doi:10.1016/J.GEB.2022.05.002.
- [30] Bengt Holmstrom and Paul Milgrom. Aggregation and linearity in the provision of intertemporal incentives. Econometrica, 55(2):303–328, 1987.
- [31] Bengt Holmström. Moral hazard and observability. Bell J. Econ., 10(1):74–91, 1979.
- [32] Jon M. Kleinberg and Robert D. Kleinberg. Delegated search approximates efficient search. In Proc. 19th Conf. Econ. Comput. (EC), pages 287–302, 2018. doi:10.1145/3219166.3219205.
- [33] Robert D. Kleinberg, Bo Waggoner, and E. Glen Weyl. Descending price optimally coordinates search. In Proc. 17th Conf. Econ. Comput. (EC), pages 23–24, 2016. doi:10.1145/2940716.2940760.
- [34] P. Postl. Delegated search: Procedure matters. Discussion Paper 04-17, Department of Economics, University of Birmingham, 2004.
- [35] Royal Swedish Academy of Sciences. Scientific background on the 2016 nobel prize in economic sciences, 2016.
- [36] Eden Saig, Inbal Talgam-Cohen, and Nir Rosenfeld. Delegated classification. In Proc. Conf. Adv. Neural Inf. Processing Syst. (NeurIPS), 2023.
- [37] Suho Shin, Keivan Rezaei, and Mohammadtaghi Hajiaghayi. Delegating to multiple agents. In Proc. 24th Conf. Econ. Comput. (EC), pages 1081–1126, 2023. doi:10.1145/3580507.3597669.
- [38] Sahil Singla. The price of information in combinatorial optimization. In Proc. Symp. Discret. Algorithms (SODA), pages 2523–2532, 2018. doi:10.1137/1.9781611975031.161.
- [39] Martin L. Weitzman. Optimal Search for the Best Alternative. Econometrica, May, 47(3):641–654, 1979.
- [40] You Zu, Krishnamurthy Iyer, and Haifeng Xu. Learning to persuade on the fly: Robustness against ignorance. In Proc. 22nd Conf. Econ. Comput. (EC), pages 927–928, 2021. doi:10.1145/3465456.3467593.