Nontrivial and Universal Helping for Wait-free Queues and Stacks

This paper studies two approaches to formalize helping in wait-free implementations of shared objects. The first approach is based on operation valency, and it allows us to make the important distinction between trivial and nontrivial helping. We show that a wait-free implementation of a queue from common2 objects (e.g., Test&Set) requires nontrivial helping. In contrast, there is a wait-free implementation of a stack from Common2 objects with only trivial helping. This separation might shed light on the difficulty of implementing a queue from Common2 objects. The other approach formalizes the helping mechanism employed by Herlihy's universal wait-free construction and is based on having an operation by one process restrict the possible linear-izations of operations by other processes. We show that objects possessing such universal helping can be used to solve consensus. 1 Introduction A key component in the design of concurrent applications are shared objects providing higher-level semantics for communication among processes. For example, a shared queue to which processes can concurrently enqueue and dequeue, allows them to share tasks, and similarly a shared stack. Shared objects are implemented from more basic primitives supported implementation is wait-free if an operation on the shared object is guaranteed to terminate after a finite number of steps; the implementation is nonblocking if it only ensures that some operation (perhaps by another process) completes in this situation. Clearly, a wait-free implementation is nonblocking but not necessarily vice versa. Many implementations of shared objects, especially the wait-free ones, include one process helping another process to make progress. The helping mechanism is often some code that is added to a nonblocking implementation. Typically, the code uses only reads and writes, in addition to the primitives used in the nonblocking implementation (e.g., Test&Set). The aim of this extra code is that processes that complete an operation " help " the blocked processes to terminate, so that the resulting implementation is wait-free.


Introduction
A key component in the design of concurrent applications are shared objects providing higherlevel semantics for communication among processes.For example, a shared queue to which processes can concurrently enqueue and dequeue, allows them to share tasks, and similarly a shared stack.Shared objects are implemented from more basic primitives supported by the multiprocessing architecture, e.g., reads, writes, Test&Set, or Compare&Swap.An implementation is wait-free if an operation on the shared object is guaranteed to terminate after a finite number of steps; the implementation is nonblocking if it only ensures that some operation (perhaps by another process) completes in this situation.Clearly, a wait-free implementation is nonblocking but not necessarily vice versa.Many implementations of shared objects, especially the wait-free ones, include one process helping another process to make progress.The helping mechanism is often some code that is added to a nonblocking implementation.Typically, the code uses only reads and writes, in addition to the primitives used in the nonblocking implementation (e.g., Test&Set).The aim of this extra code is that processes that complete an operation "help" the blocked processes to terminate, so that the resulting implementation is wait-free.An interesting 2

Model of Computation
We consider a system with n asynchronous processes, p 1 , . . ., p n .Processes communicate with each other by applying primitives to shared base objects; the primitives can be read and write, or more powerful primitives like Test&Set or Compare&Swap.Any process may crash at any time in an execution, namely, it stops taking steps from that point on.A process that does not crash is correct.
A (high-level) concurrent object, or data type, is defined by a state machine consisting of a set of states, a set of operations, and a set of transitions between states.Such a specification is known as sequential.In the rest of the paper we will concentrate on stacks and queues.
A shared stack provides two operations push(•) and pop().A push(x) operation puts x at the top of the stack, and a pop() removes and returns the value at the top, if there is one, otherwise it returns ⊥.A shared queue provides operations enq(•) and deq().An enq(x) operation puts x at the tail of the queue, and a deq() removes and returns the value at the head of the queue, if there is one, otherwise it returns ⊥.
An implementation of an object O is a distributed algorithm A consisting of local state machines A 1 , . . ., A n .Local machine A i specifies which primitives p i executes in order to return a response when it invokes an operation of O.An implementation is wait-free if every process completes each of its invocations in a finite number of its steps.Formally, if a process executes infinitely many steps in an execution, it completes all its invocations.An implementation is nonblocking if whenever processes take steps, at least one of the operations terminates.Namely, in every infinite execution, infinitely many invocations are completed.Thus, a wait-free implementation is nonblocking but not necessarily vice versa.
A configuration C of the system is a collection containing the states of all base objects and processes.A configuration is initial if base objects and processes are in initial states.Given a configuration C, for any process p, p(C) denotes the configuration after p takes its next step.A process p is idle in a configuration C if p is in a state in which all its operations are completed.
An execution of the system is modelled by a history, which is a possibly infinite sequence of invocations and responses of high-level operations and primitives.For a set of processes S, an S-execution is an execution in which only processes in S take steps.If S = {p}, we say that the execution is p-solo.An operation op in a history is complete if both its invocation inv(op) and response res(op) appear in the history.An operation is pending if only its invocation appears in the history.
A history H induces a natural partial order < H on the operations of H: op < H op if and only if res(op) precedes inv(op ).Two operations are concurrent if they are incomparable.A sequential history alternates matching invocations and responses and starts with an invocation event.Hence, if H is sequential, < H induces a total order.
Linearizability [13] is the standard notion used to identify a correct implementation.Roughly speaking, an implementation is linearizable if each operation appears to take effect atomically at some time between the invocation and response of an operation.
Let A be an implementation of an object O.A history H of A is linearizable if H can be extended by adding response events for some pending invocations such that the sequence H containing only the invocation and responses of O agrees with the specification of O, namely, there is an initial state of O and a sequence of invocations and responses that produces H .We say that A is linearizable if each of its histories is linearizable.
In the consensus problem, each process proposes a value and is required to decide on a value such that the following properties are satisfied in every execution: Consensus is universal [12] in the sense that from reads and writes and objects solving consensus among n processes, it is possible to obtain a wait-free implementation for n processes of any concurrent object with a sequential specification.The consensus number of a primitive [12] is the maximum number n such that it is possible to solve consensus on n processes from reads, writes and the primitive.For example, the consensus number of Test&Set is 2. Hence, Test&Set allows us to implement any concurrent object in a system with 2 processes.

Separating Stacks and Queues with Nontrivial (Valency-Based) Helping
In this section, we present a notion of helping that differentiates between queues and stacks: any queue implementation must exhibit this kind of helping, but there is a stack implementation that does not (essentially, that of [1]).This sheds some light on the difficulty of finding a wait-free implementation of a queue from Common2.
Let A be a wait-free linearizable implementation of a data type T , such as a stack or queue.The input for an invocation of an operation of T is from some domain V and the output of a response is from the domain V ∪ {⊥}, where ⊥ / ∈ V denotes the empty or initial state of T .
Let C be a reachable configuration of A and let opType(•) be an operation by a process p.We say that opType( Definition 1 (Nontrivial and trivial (valency-based) helping).Process q helps process p = q in configuration C if there is a multivalent opType(•) ∈ C by p that is v-univalent in q(C).We say that q nontrivially helps p if v = ⊥; otherwise, it trivially helps p.An implementation of a type T has nontrivial (trivial) helping if it has a reachable configuration C such that some process q nontrivially (trivially) helps process p in C.
Directly from the previous definition we get the following claim.Claim 2. If C is a reachable configuration of an algorithm without nontrivial helping, and an operation op by p is multivalent in C, then op is not v-valent in q(C), for any value v = ⊥ and process q = p.
The proof of the next theorem captures the challenging "tail chasing" phenomenon one faces when trying to implement a queue from objects in Common2.Observe that in the case of a queue implementation, only dequeues can be nontrivially helped since enqueues always return true, and are therefore trivially univalent.Theorem 3. Any two-process wait-free linearizable queue implementation from read/write and Test&Set operations has nontrivial helping.
Proof.Assume, by way of contradiction, there is such an implementation A without nontrivial helping.Let p and q be two distinct processes, and let C init be the initial configuration of A.

31:5
For any k ≥ 1, we construct an execution α k of p and q, starting with C init and ending in configuration C k .In α k , p executes a single deq p () operation, and the following properties hold: 1. q is idle in C k , 2. p has at least k steps in α k , 3. in every linearization of α k , all enqueues appear in the same order and enqueue distinct values, 4. there is no linearization of α k in which deq p () outputs ⊥, and 5. deq p () is multivalent in C k (in particular, it is pending).
We proceed by induction.For the base case, k = 1, let α 1 be the execution that starts at C init and in which p completes alone enq(1) and then starts deq p () until it is critical on 1.This execution exists because A is wait-free.Clearly, there is no linearization of α 1 in which deq p () outputs ⊥.The other properties also hold.
Suppose that we have constructed α k , k ≥ 1; we show how to obtain α k+1 .Let β 1 be the q-solo extension of α k in which q completes enq(z), where z is a value that is not enqueued in α k , and then starts a deq q () operation.Let β 2 be an extension of α k β 1 in which p and q take steps until both their dequeue operations are critical.The extension β 2 exists because, first, A is wait-free and, second, by Claim 2, a step of p does not make deq q () univalent, and a step of q does not make deq p () univalent.
Let C be the configuration at the end of α k β 1 β 2 ; note that deq p () is critical on some value y p in C and that deq q () is critical on some value y q in C.
Note that neither y p nor y q is ⊥ since the queue has at least two values in C.This holds since the induction hypothesis is that there is no linearization of α k in which deq p () outputs a non-⊥ value, and in the extension β 1 β 2 , q first completes an enqueue and then starts a dequeue.
By a similar argument, there is no linearization of α k β 1 β 2 in which either deq p () or deq q () outputs ⊥.
The following claim is where the specification of a queue comes into play.(This claim does not hold for a stack, for example.)Claim 4. y p = y q .
Proof.Suppose, by way of contradiction, that y p = y q .By the induction hypothesis, in every linearization of α k , the order of enqueues is the same.The same holds for α k β 1 β 2 because q is idle in α k , by induction hypothesis, and then its enqueue in β 1 happens after all enqueues in α k .Suppose, without loss of generality, that enq(y q ) precedes enq(y p ) in every linearization of α k β 1 β 2 .Consider the p-solo extension in which p completes deq p (), ending with configuration D.
Since deq p () is critical on y p in C, it outputs y p in D. We claim that deq q () outputs y q in every extension from D. Otherwise, another dequeue outputs y p in some extension from D. Since this dequeue starts after deq p () completes, it must be linearized after deq p ().This contradicts the linearizability of A, since in every linearization of α k β 1 β 2 , enq(y q ) precedes enq(y p ).Therefore, deq q () is y q -univalent in D. This contradicting Claim 2, since a step of p makes deq q () univalent on a non-⊥ value.
Note that there is no extension of C in which deq p () and deq q () output the same value because the enqueues in α k have distinct values and in β 1 , q enqueues z, a value that is not enqueued in α k .Assume that p is poised to access R p in C (i.e. the next step of p is on R p ) and that q is poised to access R q in C. If R p = R q , then in the p-solo extension of q(p(C)) in which p completes deq p (), its output is y = y p = y q .But in p(q(C)), the local state of p and the state of the shared memory is the same as in q(p(C)).Hence, in the p-solo extension of p(q(C)), p completes deq p () with output y, as well.This contradicts the fact that deq q () is critical on y in C. Thus, R p = R q = R.

O P O D I S
A similar argument, by case analysis, shows that p and q must apply Test&Set primitives to R in C, and that the value of R is 0 in C.These facts are used in the proof of the next claim.

Claim 5. deq p () is not critical in q(C).
Proof.Let y = y p = y q be the value that deq p () and deq q () are critical on in C. Suppose, by way of contradiction, that deq p () is critical on y in q(C).We have that y = y.
Let γ be an extension of α k β 1 β 2 q in which deq p () outputs.Write γ = λ 1 p λ 2 , where λ 1 is p-free (λ 1 might be empty).Since p and q are about to perform Test&Set primitives on R in C, the state of the shared memory and the local state of p are the same at the end of α k β 1 β 2 q λ 1 p and α k β 1 β 2 q p λ 1 , because in both executions q is the first process accessing R (using Test&Set) and then when p accesses R (using Test&Set also), it gets false, no matter when it accesses R.Then, p is in the same local state in α k β 1 β 2 q λ 1 p λ 2 and in α k β 1 β 2 q p λ 1 λ 2 .We have that deq p () is critical in q(C), which implies that the output of it in α k β 1 β 2 q p λ 1 λ 2 is y , and thus the output of deq p () in α k β 1 β 2 q λ 1 p λ 2 is y too, since, as already said, the local state of p is the same in both executions.This implies that deq p () is univalent in q(C), contrary to our assumption that it is critical in q(C).
Let α k+1 = α k β 1 β 2 q p β 3 , where β 3 is the q-solo extension in which q completes its deq q () (β 3 exists because A is wait-free).See Figure 1.We argue that α k+1 has the desired properties.1. q is idle in α k+1 because in β 3 it completes deq q () and does not start a new operation.2. p has at least k + 1 steps of deq p () in α k+1 , since p has at least k steps of deq p () in α k , by the induction hypothesis, and at least one step in β 1 β 2 q p β 3 .3. By the induction hypothesis, in every linearization of α k , the enqueue operations follow the same order.The enqueue of q in β 1 happens after all enqueues in α k .Then, in every linearization of α k+1 , the enqueues follow the same order.The enqueues in α k+1 enqueue distinct values because that is true for α k , by the induction hypothesis, and the enqueue of q in β 1 enqueues a value that is not in α k .4. As argued above, there is no linearization of α k β 1 β 2 in which either deq p () or deq q () output ⊥.In β 3 , q just completes deq q ().Then, there is no linearization of α k+1 in which deq p () outputs ⊥.

5.
Since deq p () is not critical in q(C), it is multivalent at the end of α k β 1 β 2 q p.Since β 3  is q-solo, Claim 2 implies that deq p () is multivalent at the end of α k+1 .
This yields an execution of A in which p executes an infinite number of steps but its deq p () operation does not complete, contradicting the wait-freedom of A.
As we just saw, any wait-free implementation of a queue from Test&Set must have nontrivial helping.This is not the case for stack implementations, as we show next.Theorem 6.There is an n-process wait-free linearizable stack implementation from read/write and m-process Test&Set primitives, 2 ≤ m ≤ n, without nontrivial helping.
Proof.First, we show that an n-process wait-free linearizable Test&Set operation can be implemented from 2-process Test&Set without nontrivial helping.[2] present an n-process wait-free linearizable implementation of a Test&Set operation from 2-process one-shot swap primitives.Let us call this algorithm A. It is easy to check that A does not have nontrivial helping.It is also easy to implement one-shot 2-process swap from 2-process Test&Set without helping (the processes just use Test&Set to decide who swaps first), and hence, from A we can get an n-process wait-free linearizable implementation of a Test&Set operation from 2-process Test&Set without nontrivial helping.Let us call the resulting algorithm B. Now, consider Afek et al.'s stack implementation [1] (Figure 2).We argue that the implementation does not have nontrivial helping: just note that there is no configuration C in which process q makes another process p v-univalent, v = ⊥, because the only way a pop operation becomes univalent on a non-⊥ value is by winning the Test&Set in line 6; thus, it is impossible that a multivalent pop operation by p in C becomes univalent on a non-⊥ value in q(C), with q = p.
Afek et al. proved that one can get an n-process wait-free linearizable Fetch&Add from 2-process wait-free linearizable Test&Set primitives [2].Let C be this implementation.Moreover, it has no helping because, as mentioned already, B has no helping (in the sense described above) and Afek et al.'s stack implementation has no helping as well.Note that it does not matter if C has helping or not (trivial or nontrivial) because, as already pointed out, the only way a pop operation can become univalent is by winning the Test&Set in line 6, hence the C cannot change this.
From Theorems 3 and 6, we get that nontrivial helping is a distinguishing factor between stacks and queues: while a stack can be implemented without nontrivial helping from read/write and Test&Set, any implementation of a queue from the same primitives necessitates nontrivial helping.Although the stack implementation of Theorem 6 is without nontrivial helping, it does have trivial helping.An example is when a process p reads the counter range in line 3 when there is only a single non-⊥ value in T [1, . . ., t] (where t is the value that p reads), and then a process q = p reads range after p and takes the only non-⊥ value in T [1, . . ., t] (namely, q overtakes p).When q wins in line 6, it makes p's pop operation ⊥-univalent because p will scan the range T [1, . . ., t] without seeing any non-⊥ value, and will therefore return ⊥ in line 7.

Universal (Linearization-Based) Helping
In this section we propose another formalization of helping, in which a process ensures that operations by other processes are eventually linearized.This definition captures helping mechanisms such as the one used in Herlihy's universal wait-free construction [12].We evaluate the power of this helping mechanism via consensus and compare it with the valencybased helping notion studied in Section 3. Throughout this section, we assume, without loss of generality, that the first step of every operation is to publish its signature (i.e., the operation type and its parameters) to the shared-memory so that it may be helped by other processes.An operation is pending if it has published its signature but did not yet terminate.

Definition 7 (Universal (linearization-based) Helping
).Consider an n-process wait-free linearizable implementation of a data type T .The implementation has universal helping if every infinite extension α β of a finite history α has a finite prefix γ with a linearization lin(γ), which satisfies the following conditions: lin(γ) contains every pending high-level operation of α (see Figure 3), in addition to all high-level operations that complete in γ.
Every extension γλ of γ has a linearization lin (γλ) such that lin (γ) is a prefix of it.
If the above conditions are satisfied for every γ such that some process completes f (n) or more high-level operations in the extension γ − α, for some function f : N → N, then we say the implementation has f -universal helping.Universal helping requires that the progress of some processes eventually ensures that all pending invocations are linearized and all processes make progress.f -universal helping bounds from above the number of high-level operations a process needs to perform in order to ensure the progress of other processes.Theorem 8. Let B be n-process nonblocking linearizable implementation of a queue or stack.If B has f -universal helping, then n-process consensus can be solved using B and read/write registers.
Proof.First assume that B implements a queue.Figure 4 shows the pseudocode of an algorithm that solves consensus using B and read/write registers.Each process p i first writes its proposal to V als[i] (initialized to ⊥) in Line 01 and then performs f (n) + 1 enq(i) operations in Line 02.
To solve consensus, p i computes a snapshot that reads the state of the queue from the shared memory to a local variable S (Line 03) and then invokes a single deq() operation using state S in Line 04 (we say that p i locally simulates the deq() operation).Finally, p i decides (in Line 05) on the value proposed by the process whose identifier it dequeues in Line 04.The snapshot in Line 03 is taken as follows.In all executions of B in which each process executes at most f (n) + 1 enqueue operations, processes access a finite set of base objects in the shared memory.Let R be the set with all base objects in all those executions.Then, processes use any read/write wait-free snapshot algorithm to take a snapshot of R. We now prove that the algorithm is correct.
Termination.Every correct process decides as each process invokes a finite number of operations of B, which is nonblocking.Validity.The view stored to S in Line 03 represents a state of B in which the queue is non-empty, since at least a single enq() operation completed and no deq() operations were invoked.Moreover, if p i gets d from its local simulation, p d participated in the execution.It follows that every correct process p i decides on a proposed value.Agreement.We prove that all correct processes dequeue the same value in Line 04, from which agreement follows easily.Let E be an execution of the algorithm of Figure 3. Let p i , p j be two distinct correct processes.Let α i (resp.α j ) be the shortest prefix of E in which the first enqueue operation performed by p i (resp.p j ) in Line 02 completes.Let γ i (resp.γ j ) denote the shortest extension of α i in which p i (resp.p j ) completes the last enqueue operation in Line 02.WLOG, assume that γ i is a prefix of γ j , that is, p i is the first to complete Line 02.In γ i , p i completed f (n) enq() operations after completing its first enqueue operation on B. Consequently, Definition 7 guarantees that its first enq(i) operation, as well as any operations that preceded it and pending operations that were concurrent with it, are linearized in lin(γ i ) and their order is fixed.A similar argument gives the same result for stacks.The difference is that in the local simulation, a process simulates pop operations until it gets an empty response, and then decides on the proposed value of the process whose identifier was popped last (hence, pushed first).
Herlihy's universal construction [12] has f -universal helping.Thus, Theorem 8 implies that, for stacks and queues, Herlihy's construction uses the full power of n-consensus in the sense that the resulting implementations can actually be used to solve n-consensus.
The next lemma will be used to show that for queues and stacks, universal helping implies nontrivial helping.

Lemma 9.
Let T be a data type with two operations put(x) and get() such that, for distinct processes p and q, there is an infinite sequential execution S of T containing only put operations by q with a prefix S such that: P1: For every prefix S • S of S, in every sequential extension S • S • p.get() : return y , y = ⊥ holds, where ⊥ is the initial state of T .P2: In every pair of sequential extensions S • p.get() : return y 1 • q.get() : return z 1 and S • q.get() : return z 2 • p.get() : return y 2 , y 1 = y 2 holds.Then, any wait-free linearizable implementation of T with universal helping also has nontrivial helping.
Proof.Consider sequential executions S and S of T as the lemma assumes.Let A be a wait-free linearizable implementation of T with universal helping.Let α be an execution of A in which q completes alone all operations in S , in that order.We claim that a get p () by p is multivalent in the configuration at the end of α (note that at the end of α, get p () has not even started): by property P2, the output of get p () in the extension of α in which get p () is completed alone and then a get q () by q is completed alone, is different from the output in which the operations are completed in the opposite order.Now, let α be an extension of α in which p executes alone a get p () until the operation is critical.Let β be an infinite extension of α in which q runs alone and executes the operations in S − S , in that order.Since A has universal helping, there is a finite prefix γ of β such that there is a linearization lin(γ) containing get p ().Moreover, for every extension λ of γ, there is a linearization lin (λ) such that lin(γ) = lin (γ).Intuitively, this means that the linearization order of get p () in γ is fixed, hence it is univalent at the end of γ.We formally prove this.
Let v be the return value of get p () in lin(γ).We claim that get p () is v-univalent in the configuration at the end of γ.Let λ be any extension of γ in which get p () is completed.Let u be the output value of get p () in λ.Since get p () is completed in λ, any linearization of λ contains get p ().As noted above, there is a linearization lin (λ) of λ such that lin(γ) = lin (γ), which implies that u = v.We conclude that get p () is v-univalent at the end of γ.We now show that v = ⊥.Observe that lin(γ) must have the form S • S • get p () : return v • S , for some sequences S and S of put operations by q.Note that S • S • S is a prefix of 31:11 S, since q executes its operations in the order they appear in S. Thus, by property P1, it follows that v = ⊥.
Finally, since get p () is multivalent at the end of α and it is univalent on a non-⊥ value at the end of γ, there must be a prefix of γ that ends in a configuration C in which get p () is multivalent but it is univalent in q(C) on a non-⊥ value.Therefore, A has nontrivial helping.
Corollary 10.A wait-free linearizable implementation of a queue or stack with universal helping has nontrivial helping.
Proof.For the case of the stack, S is the infinite execution in which some process q performs a sequence of push operations with distinct values and S is any non-empty prefix of S. The case of the queue is defined similarly.
Figure 5 presents a stack implementation that has nontrivial helping but not universal helping, as established by Lemma 14 in the appendix.It augments the wait-free stack of Afek et al. [1] with a helping mechanism, added by lines 01-05 in push and lines 08-14 in pop.Each process p i = p n that wants to push value x, first checks if p n 's current pop operation is pending (lines 02-03), and if so, tries to help p n by directly giving x to its current pop operation.If p i succeeds in updating H[j] in line 05, then it does not access the items array.In that case, p n is not able to update H[j] in line 11, implying that it must take the value in h_items that p i left for it (lines 12-14).If p n manages to update H[j] in line 11, then no process succeeded in helping it and it proceeds as in Afek et al.'s stack (lines 15-23).Similarly, processes whose Push operation fails to help p n proceed as in Afek et al.'s stack (lines 06-07).
In Appendix A, we prove that the algorithm in Figure 5 is a wait-free linearizable implementation of a stack that has nontrivial helping but not universal helping.Together with Lemma 9, this implies that, for stacks, universal helping is strictly stronger than nontrivial helping.

Related Notions
This section compares valency-based helping and universal helping to the definition of helping in [3] and the notion of strong linearizability [9].

Relation to the help definition of [3]
Helping is formalized in [3] as follows.A linearizable implementation of a concurrent object has helping, which we call here linearization-based helping, if there is an execution α with distinct operations op 1 and op 2 by p and q, such that 1. there are linearizations lin(α) and lin (α) such that op 1 precedes op 2 in lin(α) and op 2 precedes op 1 in lin (α), and 2. in every linearization of α • r, for some r = p (possibly r = q), op 1 precedes op 2 .
In a sense, valency-based helping and linearization-based helping are incomparable.On the one hand, valency-based helping allows us to distinguish stacks and queues, as queues need nontrivial (valency-based) helping and stacks do not (Theorems 3 and 6), while both stacks and queues necessarily have linearization-based helping [3].On the other hand, valency-based helping cannot capture helping among enqueues, as they always return true.Nevertheless, enqueues are taken into account, since the dequeues in an execution reveal how the helping mechanism determines the order of enqueues.which each process is helped just once.The resulting implementation has linearization-based helping but not universal helping because universal helping requires that every pending operation is eventually linearized, which does not happen once every process in the execution has been helped, since from this point on some operations may be blocked forever.
Theorem 11.For every data type T , every nonblocking or wait-free implementation of T with universal helping has linearization-based helping, while the opposite is not necessarily true.

Relation to strong linearizability [9]
Roughly speaking, an implementation of a data type is strongly linearizable [3] if once an operation is linearized, its linearization order cannot be changed in the future.More specifically, there is a function L mapping each execution to a linearization, and the function is prefix-closed: for every two executions α and β, if α is a prefix of β, then L(α) is a prefix of L(β).
In a sense, universal helping can be thought of as a sort of eventual strong linearizability.For every execution α, as it is extended, there is eventually an extension α with a linearization lin(α α ) such that for every execution β, if α α is a prefix of β, then there is a linearization lin (β) with lin(α α ) = lin (α α ).We stress that universal helping provides the property that pending operations are linearized eventually, which is not guaranteed by strong linearizability.
The simulation in the proof of Theorem 8 solves consensus because from some point on, all processes agree on a first operation and this agreement cannot be changed as a result of future steps.The following theorem can be proven using a simulation similar to the one in the proof of Theorem 8, with the difference being that each process only needs to complete a single enqueue because the linearization order of that operation does not change in the future.
Theorem 12. Let B be an n-process strongly-linearizable nonblocking implementation of a queue (stack).Then, n-process consensus can be solved from B.
The previous theorem shows that, for some data types, strong linearizability for n processes can only be obtained through consensus number n, thus strong linearizability is costly, even if we are looking for nonblocking implementations.However, for stacks, linearizability can be obtained from consensus number 2 as there are wait-free stack implementations from Test&Set [1].

Corollary 13.
There is no n-process strongly-linearizable nonblocking implementation of a queue (stack) from primitives with consensus number less than n.
All previous impossibility results on strongly-linearizable implementations that we are aware of consider only implemenations from consensus-number 1 base objects [7,10].

Discussion
We have considered two ways to formalize helping in implementations of shared objects, one that is based on operation valency and another that is based on possible linearizations.
We used these notions to study the kind of helping needed in wait-free implementations of queues and stacks, from Test&Set and stronger primitives.In this work we used an ad-hoc definition of nontrivial helping for queues and stacks, but this notion can be generalized by defining two disjoint sets of outputs values, trivial and nontrivial, and defining trivial and Proof.We first prove that the implementation is linearizable (clearly, it is wait-free).Let α be an execution of the algorithm.Intuitively, we show that there is an execution γ of Afek et al.'s stack implementation (see Figure 2) such that the operations in γ respect the real-time order of the operations in α and the outputs are the same.Thus, α is linearizable since γ is linearizable.
In any execution of the algorithm, a push i (x) operation of p i matches the j-th pop n () operation of p n , if p i successfully updates H[j] during the execution.We call such a pair of operations a matching.
Let k be the number of matchings in α.By induction on k, we show that α is linearizable.If k = 0, then α is linearizable because it corresponds to some execution of Afek et al.'s implementation.Suppose that the claim holds for k − 1. Below we show that it holds for k.
Let push i (x) by p i and pop n () by p n be the k'th matching in α.Note that push i (x) and pop n () are concurrent in α.Moreover, the Compare&Swap in line 05 of push i (x) precedes the Compare&Swap in line 11 of pop j () (if p n ever executes it).
We now construct an execution α that is easier to reason about than α.Let β be the longest prefix of α that does not have the Compare&Swap in line 05 of push i (x).Thus, in the configuration at the end of β, p i is about to perform the Compare&Swap in line 05, and p n is about to perform the Compare&Swap in line 11.
Let β i and β n respectively denote the subsequences of α − β containing only the steps of push i (x) and pop n ().Let λ be the subsequence of α − β obtained by removing the steps of β i and β n .
Then, α is the execution β β i β n λ.Intuitively, in α , the steps of push i (x) and pop n () are placed together.
It can be seen that there is no process that can distinguish between α and α : since neither p i nor p n change items or range in push i (x) and pop n (), the position in the execution when they take the steps in β i and β n does not affect other operations.Moreover, α respects the real-time order in α: if an operation op 1 precedes op 2 in α, op 1 also precedes op 2 in α .Although there may be concurrent operations in α that are not concurrent in α , this is not a problem for linearizability. 1 Therefore, if α is linearizable, then α is linearizable too.We now show that it is.
Consider the following execution γ that starts with β and then: 1. p n executes the Compare&Swap in line 11 (hence sets H[j]).2. p i executes three consecutive steps, which correspond to lines 05, 06 and 07 (because it cannot set H[j]). 3. If pop j () (by p n ) is completed in α , p n completes it in γ (thus it outputs x). 4. If push i (x) is completed in α , p i completes it in γ. 5. λ is appended at the end.Thus, in γ, p n is about to take its output from items, p i places x in items (at the top of the stack) and p n takes it from there.The steps of λ, following push i (x) and pop j (), proceed as in α and the only difference is that the Fetch&Add in lines 06 and 15 outputs in γ an integer larger than in α, since p i adds 1 to range in γ in operation push i (x).
Also observe that γ respects the real-time order in α .By induction hypothesis, γ is linearizable, since it has k − 1 matchings.Let lin(γ) be a linearization of γ.From the properties of γ just described, it follows that lin(γ) is actually a linearization of α as well, hence a linearization of α.Therefore, the implementation is linearizable.
We now show that the algorithm has nontrivial helping.Starting at the initial configuration, let α be the execution in which p n completes alone a push(1) operation and then starts a pop() operation and stops just before executing the Compare&Swap in line 11.Then, p 1 starts a push(2) operation and stops just before executing the Compare&Swap in line 05.
Let C be the configuration at the end of α.We claim that the pop() is multivalent in C. Indeed, let x ≥ 3.In the extension of α in which first p 2 completes a push(x) alone and then p n completes its pop(), the output of the pop() is x.Also, note that pop() is 2-univalent in p 1 (C) because there is no extension of p 1 (C) in which p n updates H [1] in Line 11, so if it ever returns a value, this must be the value in h_items [1] [1] (which is 2).
Finally, we prove that the implementation has no universal helping.Starting at the initial configuration, let α be the execution in which p 1 starts pop() and stops before executing the Fetch&Add in line 15.Let β the the infinite extension of α in which p 2 completes alone (infinitely many) push operations with distinct values.If the algorithm would have had universal helping, then there would have been a finite prefix γ of β such that there was a linearization lin(γ) containing pop(), and for every extension λ of γ, there would have been a linearization lin (λ) such that lin(γ) = lin (γ).
Let γ be such a prefix of β and let λ be the extension of γ in which p 2 completes any pending operation in γ and a push(x), where x is greater than any value in γ.Let λ be the extension of λ in which p 1 completes its pop() operation.Observe that p 1 's operation outputs x in λ .Moreover, there is no linearization lin (λ ) of λ with lin(γ) = lin (γ) because push(x) does not appear in γ.Thus, the implementation has no universal helping.
Helping for Wait-Free Queues and Stacks Termination.Every correct process decides.Agreement.Processes decide on the same value.Validity.Processes decide proposed values.

∞Figure 3
Figure 3Universal helping: every pending operation is eventually linearized.

Figure 4
Figure 4 Solving consensus using B.
Helping for Wait-Free Queues and Stacks nontrivial helping accordingly.These notions might facilitate further study of the relations between nonblocking and wait-free implementations.

0 1 5 31:10 Nontrivial and Universal Helping for Wait-Free Queues and Stacks (
the shared copy of) B, lin(γ i ) consists of enqueue operations only.Let enq(k) be the first operation in lin(γ i ).Let β i (resp.β j ) denote the shortest prefix of E in which p i (resp.p j ) completes Line 04.Since γ i is a prefix of both β i and β j , it follows from Definition 7 that there are linearizations lin (β i ) and lin (β j ) in which enq(k) is the first operation.It in turn that the dequeue operations of both p i and p j in Line 04 return k, hence they both decide on vals[k] in Line 05.
Since no dequeue operations are applied to O P O D I S 2