Brief Announcement: Time, Fences and the Ordering of Events in TSO
Abstract
Total Store Order (TSO) is one of the most popular relaxed memory model in multiprocessor architectures, widely implemented, for example, in Intel’s x86 and x64 platforms. It delays write visibility via store buffers, thereby allowing a significant improvement in efficiency. This, however, complicates reasoning about correctness, as executions may violate sequential consistency. We present a semantic framework that provides effective tools that can pinpoint when such synchronization is necessary under TSO. We define a TSO-specific occurs-before relation, adapting Lamport’s happens-before to TSO, and prove that events at different sites can be temporally ordered only via an occurs-before chain. Analyzing how fences and s create these chains lets us identify when they are unavoidable. We present in this BA how these results impact linearizable implementations of registers, capturing information flow and causality in TSO. The full version of this work provides details as well as results regarding the need for synchronization in linearizable implementations of additional objects.
Keywords and phrases:
TSO, linearizability, happens before, fences, synchronization actionsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Distributed computing models ; Theory of computation Concurrent algorithmsEditor:
Dariusz R. KowalskiSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction and Related Work
Modern multiprocessors rely on relaxed memory models to improve performance through techniques such as store buffering and out-of-order execution. Among these, Total Store Order (TSO) – used in Intel’s x86 and x64 architectures – is one of the most widely deployed. TSO increases efficiency by allowing writes to be held temporarily in per-process store buffers, deferring their visibility to other processors. However, this optimization breaks sequential consistency (SC) [11], making reasoning about correctness more difficult: a program correct under SC may fail under TSO due to delayed write visibility. To ensure correct behavior under TSO, programmers can use synchronization primitives such as memory fences () and read-modify-write () operations. These enforce memory ordering by flushing buffers or performing atomic accesses. While effective, they limit concurrency and reduce performance [4, 9, 17]. Determining when such synchronization is truly necessary is therefore a central question in the study of weak memory models such as TSO.
Lamport’s happens-before relation [10] is central in asynchronous computing, underlying vector clocks, race detection, causal memory [12, 7, 1], and more. Happens-before was originally defined in asynchronous message-passing models, and it captures all of the information about timing that processes can obtain in such settings. In a recent paper, we proved a theorem called Delaying the Future (DtF) that provides a close formal connection between happens-before and the ability to reorder actions and events in asynchronous message-passing systems [14]. Roughly speaking, DtF implies that if there is no message chain between an operation and an operation in a given run , then there is a run that is indistinguishable to all processes from in which takes place after does in real-time. This is of importance, for example, when considering linearizable implementations of concurrent objects, since the real-time order of event invocations and completions plays a central role in the definition of Linearizability [8]. The happens-before relation has been considered in different models, including TSO [3]. In this work, we introduce a new definition for the analogue of happens-before in TSO, which we call the occurs-before relation. One of the differences is that our definition stems from an operational model of TSO, whereas previous definitions such as in [3] are formulated using declarative models. As a result, our approach employs a lower level of abstraction, making it easier to derive concrete implementation constraints. We demonstrate the analogy between the classical happens-before relation of message chains to our occurs-before relation by proving a version of the DtF theorem for TSO, which is obtained by replacing “happens before” by “occurs before.” This makes it possible to prove necessary conditions on the use of synchronization in linearizable TSO implementations, much in the spirit and style that [14] does for linearizable register implementations in asynchronous message passing.
Related Work.
Prior work has tackled the need for synchronization actions in weak memory models from both practical and theoretical perspectives. Automated tools attempt to minimize fences while preserving correctness [2, 17], and impossibility results show that synchronization is sometimes unavoidable.
Attiya et al. [4] and Castañeda et al. [6] are most directly related to our TSO results. Both works establish conditions on when linearizable implementations require the use of synchronization operations. In [4], covering arguments are used to show that precise communication patterns involving reads and writes are unavoidable in implementing classic and widely used specifications. Their results apply to objects whose methods are strongly non-commutative – such as sets, queues, or stacks – but not to registers. The more recent [6] proves a mergeability theorem for TSO and related weak memory models traces and applies it to obtain results about objects with one-sided non commutative methods such as registers. They show that linearizable TSO implementations of objects, including registers, must use fences or s in some executions. In [14], a DtF theorem is proved for asynchronous message-passing systems. It is then used to prove the need to construct message chains between operations in linearizable register implementations in that model.
The current paper can be viewed as performing an analogous analysis for TSO to that of [14]. Our analysis highlights the role of synchronization primitives, extending lower bounds on the use of synchronization primitives in TSO. Practically, such lower bounds identify the synchronization mechanisms implementations must employ and indicate when further attempts to remove them would be futile.
2 Model and Preliminary Definitions
This section outlines the main features of our model; the reader should consult the full version [15] for additional details. The model is based on the operational definition of TSO given in [6]. (Similar, though slightly different, operational models for TSO have appeared, e.g., in [16, 5].) It consists of a set of processes and a finite set of variables. A basic TSO state is a pair , where describes the main memory and assigns a queue called a store buffer to every process. Processes can perform actions from the set . These correspond, respectively, to reading the value of (from local buffer if it contains an unexecuted write to and from the physical variable in memory otherwise), writing the value in ( is appended to the local buffer), performing a fence (which occurs only when the local buffer is empty), performing a read-modify-write action (which occurs only if the buffer is empty and the value of in the memory is . Its effect is to write directly to the memory). To account for propagating values from the store buffer to variables in memory, we associate with each a “local dispatcher” component with a single action , whose impact is to propagate a write command from the buffer queue to the memory. We denote and call its elements agents. Reads and writes are always enabled, while fences, and actions can take place only if specific preconditions hold.
We think of an action taking place as being an event. It is convenient to consider two types of read events: and . The former is when the value of is read from the store buffer, and the latter is when the value of is read from the variable in memory.
In both instances, the value returned by the read action is .
Using to denote the restriction of a store buffer to pairs
of the form involving writes to , the preconditions and effects of the event of a single action being performed
in a TSO state are defined as follows:
A TSO action is considered a memory access of the variable if it is either a , a action resulting in , or a read action resulting in .
Runs and Protocols.
A protocol maps each process’ local state to a nonempty set of enabled actions. Local states record the sequence of actions observed so far. A global state is with being the TSO state and the local state of process . A run is an infinite sequence of global states, starting with empty buffers and initial local states; each step applies a joint action of process, dispatcher, and environment actions, with no conflicting memory accesses to the same variable. A run is of if every process action in it is allowed by . We make two liveness assumptions: (i) every enabled eventually occurs, ensuring fences terminate, and (ii) any process with an enabled action eventually moves.
Nodes and Tags.
A node denotes agent at time ; is ’s action at time (possibly a null action). The write by is taken to have a tag ; the matching , , and on that value inherit the same tag.
3 The Occurs-before Relation in TSO
In asynchronous message-passing systems, the only way that a protocol can ensure that an action takes place later than a specific event at another process, is by forcing the action to be delayed until a message chain from the second process has been constructed. This is Lamport’s happens-before relation ([10]). We now define an analogous relation for TSO, where scheduling and propagation are asynchronous and processes lack direct knowledge of event timing. Complicating matters is the fact that distinct processes can read different values of a variable at the same time. Our occurs-before relation, also builds chains across process timelines,111We do not call this relation “happens-before” because the latter has become synonymous with message chains over the last five decades, and the new relation in TSO has a different flavor. and is defined as follows:
Definition 1 (TSO occurs before).
Let , let and let . We define the binary occurs before relation between nodes in as follows:
-
1.
for every agent .
-
2.
if , and .
-
3.
if and are both memory access of the same variable unless
-
(a)
and are both actions, or
-
(b)
, , and .
-
(a)
-
4.
if and .
-
5.
if and for some node .
What makes interesting and useful is the fact that, in a precise sense, it covers all the information that may be available to the processes regarding the ordering of events. We will show that, roughly speaking, if an event in is not related by to another event in , then there is a run that is indistinguishable from in which takes place strictly before does. We now turn to prove a more general result which will imply this fact.
4 Delaying the Future in TSO and its implications
Our goal is to use the occurs-before relation to prove lower bounds and necessary conditions on protocol structure. First, we show that processes cannot guarantee a specific event order without an occurs-before chain. If a process has the same local state at two execution points where it moves, it will take the same action at both points; events that did not modify its state cannot affect its behavior. This motivates the definitions of , the set of nodes in the causal past of and of runs local equivalence.
Definition 2 (The past).
222See Figure 1 for an illustration.For a set of nodes in a run , we define
and
Definition 3 (Local Equivalence).
Two runs and are called locally equivalent, denoted by , if for every process , a local state of appears in iff appears in .
We can now show:
Theorem 4 (Delaying the future in TSO).
Let be a TSO run of a protocol , let be a set of nodes of , and let . Then there exists a TSO run of the same protocol such that:
-
(a)
Every agent performs exactly the same actions in the same order in both runs; moreover, an action that performs at time in is performed in at time if , and it is performed at time otherwise.
-
(b)
Every process ’s local states are shifted accordingly:
Intuitively, Theorem 4 implies that everything outside the of a node (or set of nodes) can be moved forward in time by an arbitrary amount, while preserving the timing of events in this past. This implies that without an connection, events and operations preceding a given operation can be shifted to no longer precede it, without changing its outcome. If changing the real-time order of two operations can affect the correctness of a given execution, as it typically does when linearizability is required, then implementations must ensure that operations whose order should not be changed are related by . In some cases, e.g., when a process runs solo, it is possible to show that establishing requires a process to perform explicit synchronization operations. This is the subject of the next section.
5 Implementing Linearizable Operations in TSO
Roughly speaking, an implementation of an object is a protocol satisfying the object’s specification. It is said to be linearizable if in every execution of , operations appear to occur instantaneously in a way that is consistent with the original execution and that satisfy the sequential specification of the object. See [8] for a formal definition.
Registers.
We consider implementations of a multi-writer multi-reader register, where every process may perform (linearizable) and operations, in the TSO memory model. For obstruction-free register implementations, techniques used in the proof of Theorem 4 and in [13] can be used to show:
Corollary 5.
Let and let be an obstruction-free linearizable implementation of a register in TSO for processes. Then for every there must be a run of in which exactly operations are performed, and each of the s performs a fence or an .
We remark that Castañeda et al. prove in [6] that spec-available implementations of linearizable registers in TSO must contain a run in which at least one or operation performs an or action. Since every obstruction-free implementation is in particular spec-available, the same follows for obstruction-free implementations. For obstruction-free implementations, Corollary 5 provides sharper bounds. In [15] Theorem 4 (DtF) is applied in the style of Theorem 6 of [14] to show that linearizable register implementations must create relations between operations, to ensure linearizability. In addition, DtF is applied to the analysis of linearizable implementations of snapshot objects.
The Need for Synchronization to Establish Occurs Before.
In a precise sense, Corollary 5 makes use of the fact that a process performing a linearizable must guarantee that the value it is writing is recorded in memory. We can show that the only way for a process to know that the value it wrote can be read by others is either by way of a read that provides it with feedback from other processes or by synchronization.
Definition 6 (feedback loop).
We say that there is a feedback loop between nodes and in run (and write ) if there is a node with such that . We say that an operation contains a feedback loop in if .
Theorem 7.
Let be an operation in containing a write with a tag . If
-
1.
does not contain a feedback loop, and
-
2.
does not perform a or action during .
Then there is a run in which is not propagated to memory before .
This is a powerful result. In TSO, a process shares information only by writing to memory and making the write visible. If completing an operation requires others to observe it, the process must either obtain feedback from another process or use a or operation. In obstruction-free protocols, feedback can not be guaranteed, so fences or become necessary to ensure visibility before operations can be completed.
6 Discussion
While we focus in this BA on registers, the full version [15] also covers snapshots, and extending the approach to the study of linearizable implementations of other shared objects appears promising. The register results of [6] exploit the fact that operations are one-sided non-commutative with respect to operations, yielding (roughly) that there exist runs where at least one of two adjacent write–read operations must contain a or . Our framework strengthens this: we prove that there exist runs where in fact s must contain a or . In typical register implementations, s complete with an acknowledgment independently of preceding s, meaning they are not one-sided non-commutative. Nevertheless, our tools show that there are runs where s must contain or . In addition, the DtF theorem and its uses, both in the TSO setting and in [14], raise the question of whether similar results hold in other memory models. Finally, we observe that although our relation in TSO is analogous to happens-before in message passing, it differs in that does not necessarily convey information, even under full information, whereas happens-before does. Nevertheless, like happens-before, the relation remains a necessary condition for ordering in many cases. This fundamental difference is both interesting and worth deeper investigation.
References
- [1] Mustaque Ahamad, Gil Neiger, James E Burns, Prince Kohli, and Phillip W Hutto. Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1):37–49, 1995. doi:10.1007/BF01784241.
- [2] Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. Don’t sit on the fence: A static analysis approach to automatic fence insertion. ACM Trans. Program. Lang. Syst., 39(2), May 2017. doi:10.1145/2994593.
- [3] Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. Fences in weak memory models. In Tayssir Touili, Byron Cook, and Paul B. Jackson, editors, Computer Aided Verification, 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings, volume 6174 of Lecture Notes in Computer Science, pages 258–272. Springer, 2010. doi:10.1007/978-3-642-14295-6_25.
- [4] Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, and Martin Vechev. Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated. In 38th POPL, New York, NY, USA, 2011. Association for Computing Machinery. doi:10.1145/1926385.1926442.
- [5] Gérard Boudol and Gustavo Petri. Relaxed memory models: an operational approach. In 36th POPL, New York, NY, USA, 2009. Association for Computing Machinery. doi:10.1145/1480881.1480930.
- [6] Armando Castañeda, Gregory V. Chockler, Brijesh Dongol, and Ori Lahav. What cannot be implemented on weak memory? In DISC 2024, LIPIcs, 2024. doi:10.4230/LIPICS.DISC.2024.11.
- [7] Cormac Flanagan and Stephen N Freund. Fasttrack: Efficient and precise dynamic race detection. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 121–133. ACM, 2009. doi:10.1145/1542476.1542490.
- [8] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463–492, july 1990. doi:10.1145/78969.78972.
- [9] Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairing sequential consistency in C/C++11. SIGPLAN Not., 52(6):618–632, June 2017. doi:10.1145/3140587.3062352.
- [10] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, july 1978. doi:10.1145/359545.359563.
- [11] Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers c-28, 9:690–691, 1979. doi:10.1109/TC.1979.1675439.
- [12] Friedemann Mattern. Virtual time and global states of distributed systems. In Parallel and Distributed Algorithms, pages 215–226. North-Holland, 1989.
- [13] Yoram Moses. Relating knowledge and coordinated action: The knowledge of preconditions principle. In R. Ramanujam, editor, 15th TARK, pages 231–245, 2015. doi:10.4204/EPTCS.215.17.
- [14] Raïssa Nataf and Yoram Moses. Communication Requirements for Linearizable Registers. In DISC 2024, Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2024. doi:10.4230/LIPIcs.DISC.2024.33.
- [15] Raïssa Nataf and Yoram Moses. Time, fences and the ordering of events in TSO, 2025. arXiv:2508.11415.
- [16] Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-TSO. In Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics, TPHOLs ’09, pages 391–407, Berlin, Heidelberg, 2009. Springer-Verlag. doi:10.1007/978-3-642-03359-9_27.
- [17] Viktor Vafeiadis and Francesco Zappa Nardelli. Verifying fence elimination optimisations. In Proceedings of the 18th International Conference on Static Analysis, SAS’11, pages 146–162, Berlin, Heidelberg, 2011. Springer-Verlag. doi:10.1007/978-3-642-23702-7_14.
