,
Isaac Keslassy
,
Alexander Shpiner
,
Liran Liss
Creative Commons Attribution 4.0 International license
Existing transport protocols in commodity datacenter networks struggle to provide low collective completion times (CCTs) to AI training collectives, as packet losses and retransmissions significantly degrade performance. We propose dcSim, an efficient transport that achieves low CCTs and practically lossless performance with commodity switches. In dcSim, each packet first employs a small simulation probe to traverse the network and explore congestion along a candidate path. Only packets whose simulation probes succeed are then transmitted, expecting to succeed as well. Evaluations confirm that dcSim achieves faster CCTs than existing schemes, with small queues and virtually zero packet loss. Finally, dcSim also excels in adverse conditions, including oversubscribed topologies.
@InProceedings{straussman_et_al:OASIcs.NINeS.2026.19,
author = {Straussman, Dan and Keslassy, Isaac and Shpiner, Alexander and Liss, Liran},
title = {{Simulate Before Sending: Rethinking Transport in Datacenter Networks}},
booktitle = {1st New Ideas in Networked Systems (NINeS 2026)},
pages = {19:1--19:22},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-414-7},
ISSN = {2190-6807},
year = {2026},
volume = {139},
editor = {Argyraki, Katerina and Panda, Aurojit},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.NINeS.2026.19},
URN = {urn:nbn:de:0030-drops-256044},
doi = {10.4230/OASIcs.NINeS.2026.19},
annote = {Keywords: Datacenter networks, transport protocols, AI training, lossless networks}
}