,
Luís Pina
Creative Commons Attribution 4.0 International license
Record/Replay (RR) allows developers to record an execution and then replay it exactly as it was recorded. RR enables deterministic replay of non-deterministic behaviors in a different environment than the one used for the recording, which can capture complex bugs in production and find the root cause during development. Unfortunately, support for RR still introduces a non-negligible amount of performance overhead, which limits its applicability. Two main sources of such overhead in state-of-the-art RR systems are: multi-threading, and I/O bound workloads. To ensure high-fidelity when replaying multi-threading execution, recordings either capture the total order of events, which the replayer then enforces, or capture a partial order that requires further processing before replay. Recording also effectively doubles the I/O performed as the recorder needs to perform the original I/O and then record it to a log. Such increased I/O severely limits the performance of I/O dominated workloads. In this paper, we present two complimentary techniques to reduce the overhead of RR. First, we introduce Relaxed Total Order (RTO), an online-computable weakening of total order that preserves the cross-thread constraints needed for replay while avoiding unnecessary serialization. We design RTO to be compatible with Multi-Version eXecution (MVX), enabling online deterministic replay without pre-processing the recording log or heavyweight coordination. We formalize RTO’s strictness and correctness, showing that it is a novel point between partial- and total-order. Our prototype implementation on top of an existing state-of-the-art RR system reduces recording overhead from 21.0% to 15.3% and halves replay overhead from 67.5% to 31.7%. Second, we combine RR with Multi-Version eXecution (MVX) to eliminate RR’s poor performance on I/O-bound workloads. Our hybrid design uses a follower variant to absorb the extra I/O needed for logging, and to backfill as much I/O as possible from the same underlying system, keeping the user-facing leader off the critical path. Our prototype reduces the overhead required to record I/O bound programs from 192.5% to just 25.5%, without penalizing other more common workloads. Together, RTO and hybrid MVX/RR substantially narrow the gap between today’s RR systems and practical, low-overhead, always-on deployment.
@InProceedings{schwartz_et_al:LIPIcs.ECOOP.2026.24,
author = {Schwartz, David and Pina, Lu{\'\i}s},
title = {{Optimizing Record/Replay Through Relaxed Total Ordering and Multi-Version eXecution}},
booktitle = {40th European Conference on Object-Oriented Programming (ECOOP 2026)},
pages = {24:1--24:28},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-423-9},
ISSN = {1868-8969},
year = {2026},
volume = {372},
editor = {Krebbers, Robbert and Silva, Alexandra},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2026.24},
URN = {urn:nbn:de:0030-drops-261207},
doi = {10.4230/LIPIcs.ECOOP.2026.24},
annote = {Keywords: Record Replay, Multi-version Execution, Java Virtual Machine}
}