DROPS

Document

Software Performance Engineering (Dagstuhl Seminar 25341)

Authors: Chen Ding, Charles E. Leiserson, Yihan Sun, and Bruce Hoppe

Published in: Dagstuhl Reports, Volume 15, Issue 8 (2026)

Abstract

This report documents the program and the outcomes of Dagstuhl Seminar 25341 on software performance engineering. This seminar convened researchers from diverse intellectual communities across computer science to synthesize a collective understanding of this fragmented discipline and advance software performance engineering as a rigorous and principled scientific field in its own right. With connections established by this seminar, we are creating a unique arena for computer science researchers to cross-fertilize by sharing performance-engineering tools, techniques, opportunities, challenges, and open problems in their own domains of expertise. The major activities included 29 talks on recent research, five working groups that discussed topics like tools, community-building, education, and LLMs, and three "world cafe" sessions with in-depth conversations among participants on guiding questions for advancing software performance engineering.

Cite as

Chen Ding, Charles E. Leiserson, Yihan Sun, and Bruce Hoppe. Software Performance Engineering (Dagstuhl Seminar 25341). In Dagstuhl Reports, Volume 15, Issue 8, pp. 1-28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@Article{ding_et_al:DagRep.15.8.1,
  author =	{Ding, Chen and Leiserson, Charles E. and Sun, Yihan and Hoppe, Bruce},
  title =	{{Software Performance Engineering (Dagstuhl Seminar 25341)}},
  pages =	{1--28},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2026},
  volume =	{15},
  number =	{8},
  editor =	{Ding, Chen and Leiserson, Charles E. and Sun, Yihan and Hoppe, Bruce},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.15.8.1},
  URN =		{urn:nbn:de:0030-drops-257759},
  doi =		{10.4230/DagRep.15.8.1},
  annote =	{Keywords: applications, productivity tools, software performance engineering, theory and practice}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2025.37

Parallel Joinable B-Trees in the Fork-Join I/O Model

Authors: Michael T. Goodrich, Yan Gu, Ryuto Kitagawa, and Yihan Sun

Published in: LIPIcs, Volume 359, 36th International Symposium on Algorithms and Computation (ISAAC 2025)

Abstract

Balanced search trees are widely used in computer science to efficiently maintain dynamic ordered data. To support efficient set operations (e.g., union, intersection, difference) using trees, the join-based framework is widely studied. This framework has received particular attention in the parallel setting, and has been shown to be effective in enabling simple and theoretically efficient set operations on trees. Despite the widespread adoption of parallel join-based trees, a major drawback of previous work on such data structures is the inefficiency of their input/output (I/O) access patterns. Some recent work (e.g., C-trees and PaC-trees) focused on more I/O-friendly implementations of these algorithms. Surprisingly, however, there have been no results on bounding the I/O-costs for these algorithms. It remains open whether these algorithms can provide tight, provable guarantees in I/O-costs on trees. This paper studies efficient parallel algorithms for set operations based on search tree algorithms using a join-based framework, with a special focus on achieving I/O efficiency in these algorithms. To better capture the I/O-efficiency in these algorithms in parallel, we introduce a new computational model, the Fork-Join I/O Model, to measure the I/O costs in fork-join parallelism. This model measures the total block transfers (I/O work) and their critical path (I/O span). Under this model, we propose our new solution based on B-trees. Our parallel algorithm computes the union, intersection, and difference of two B-trees with O(m log_B(n/m)) I/O work and O(log_B m ⋅ log₂ log_B n + log_B n) I/O span, where n and m ≤ n are the sizes of the two trees, and B is the block size.

Cite as

Michael T. Goodrich, Yan Gu, Ryuto Kitagawa, and Yihan Sun. Parallel Joinable B-Trees in the Fork-Join I/O Model. In 36th International Symposium on Algorithms and Computation (ISAAC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 359, pp. 37:1-37:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{goodrich_et_al:LIPIcs.ISAAC.2025.37,
  author =	{Goodrich, Michael T. and Gu, Yan and Kitagawa, Ryuto and Sun, Yihan},
  title =	{{Parallel Joinable B-Trees in the Fork-Join I/O Model}},
  booktitle =	{36th International Symposium on Algorithms and Computation (ISAAC 2025)},
  pages =	{37:1--37:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-408-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{359},
  editor =	{Chen, Ho-Lin and Hon, Wing-Kai and Tsai, Meng-Tsung},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2025.37},
  URN =		{urn:nbn:de:0030-drops-249451},
  doi =		{10.4230/LIPIcs.ISAAC.2025.37},
  annote =	{Keywords: Parallel algorithm, I/O efficiency, search trees, B-trees}
}

Document

DOI: 10.4230/OASIcs.SpaceCHI.2025.18

Advancing Intelligent Personal Assistants for Human Spaceflight

Authors: Leonie Bensch, Oliver Bensch, and Tommy Nilsson

Published in: OASIcs, Volume 130, Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025)

Abstract

The Artemis program and upcoming missions to Mars mark a new era of human space exploration that will require new tools to support astronaut autonomy in the absence of real-time communication with Earth. This paper investigates the role of voice-based intelligent personal assistants (IPAs) in future crewed space missions. Through semi-structured interviews with astronauts (n=3) and spaceflight experts (n=12), we identify key user-centered design requirements for IPAs in this uniquely constrained and safety-critical environment. Our thematic analysis reveals core requirements for flexibility, reliability, offline capability, and multimodal interaction. Drawing on these findings, we outline design guidelines for next-generation IPAs and discuss how technologies such as retrieval-augmented generation (RAG), knowledge graphs, and augmented reality should be combined to support flexible, reliable, and multimodal IPAs for future human spaceflight missions.

Cite as

Leonie Bensch, Oliver Bensch, and Tommy Nilsson. Advancing Intelligent Personal Assistants for Human Spaceflight. In Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025). Open Access Series in Informatics (OASIcs), Volume 130, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bensch_et_al:OASIcs.SpaceCHI.2025.18,
  author =	{Bensch, Leonie and Bensch, Oliver and Nilsson, Tommy},
  title =	{{Advancing Intelligent Personal Assistants for Human Spaceflight}},
  booktitle =	{Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025)},
  pages =	{18:1--18:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-384-3},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{130},
  editor =	{Bensch, Leonie and Nilsson, Tommy and Nisser, Martin and Pataranutaporn, Pat and Schmidt, Albrecht and Sumini, Valentina},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SpaceCHI.2025.18},
  URN =		{urn:nbn:de:0030-drops-240082},
  doi =		{10.4230/OASIcs.SpaceCHI.2025.18},
  annote =	{Keywords: Conversational Assistant, Intelligent Personal Assistant, Artificial Intelligence, Astronaut, Human Spaceflight, Generative Pre-Trained Transformer (GPT), Retrieval Augmented Generation (RAG), Knowledge Graphs, Augmented Reality, Voice Assistant, Long Duration Spaceflight}
}

@InProceedings{bensch_et_al:OASIcs.SpaceCHI.2025.18,
  author =	{Bensch, Leonie and Bensch, Oliver and Nilsson, Tommy},
  title =	{{Advancing Intelligent Personal Assistants for Human Spaceflight}},
  booktitle =	{Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025)},
  pages =	{18:1--18:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-384-3},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{130},
  editor =	{Bensch, Leonie and Nilsson, Tommy and Nisser, Martin and Pataranutaporn, Pat and Schmidt, Albrecht and Sumini, Valentina},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SpaceCHI.2025.18},
  URN =		{urn:nbn:de:0030-drops-240082},
  doi =		{10.4230/OASIcs.SpaceCHI.2025.18},
  annote =	{Keywords: Conversational Assistant, Intelligent Personal Assistant, Artificial Intelligence, Astronaut, Human Spaceflight, Generative Pre-Trained Transformer (GPT), Retrieval Augmented Generation (RAG), Knowledge Graphs, Augmented Reality, Voice Assistant, Long Duration Spaceflight}
}

Document

DOI: 10.4230/LIPIcs.WADS.2025.49

On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms

Authors: Lorenzo De Stefani and Vedant Gupta

Published in: LIPIcs, Volume 349, 19th International Symposium on Algorithms and Data Structures (WADS 2025)

Abstract

Asymptotically tight lower bounds are derived for the Input/Output (I/O) complexity of a class of dynamic programming algorithms, including matrix chain multiplication, optimal polygon triangulation, and the construction of optimal binary search trees. Assuming no recomputation of intermediate values, we establish an Ω(n³/(√M B)) I/O lower bound, where n denotes the size of the input and M denotes the size of the available fast memory (cache). When recomputation is allowed, we show that the same bound holds for M < cn, where c is a positive constant. In the case where M ≥ 2n, we show an Ω(n/B) I/O lower bound. We also discuss algorithms for which the number of executed I/O operations matches asymptotically each of the presented lower bounds, which are thus asymptotically tight. Additionally, we refine our general method to obtain a lower bound for the I/O complexity of the Cocke-Younger-Kasami algorithm, where the size of the grammar impacts the I/O complexity. An upper bound with asymptotically matching performance in many cases is also provided.

Cite as

Lorenzo De Stefani and Vedant Gupta. On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms. In 19th International Symposium on Algorithms and Data Structures (WADS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 349, pp. 49:1-49:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{destefani_et_al:LIPIcs.WADS.2025.49,
  author =	{De Stefani, Lorenzo and Gupta, Vedant},
  title =	{{On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms}},
  booktitle =	{19th International Symposium on Algorithms and Data Structures (WADS 2025)},
  pages =	{49:1--49:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-398-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{349},
  editor =	{Morin, Pat and Oh, Eunjin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WADS.2025.49},
  URN =		{urn:nbn:de:0030-drops-242800},
  doi =		{10.4230/LIPIcs.WADS.2025.49},
  annote =	{Keywords: I/O complexity, Dynamic Programming Algorithms, Lower Bounds, Recomputation, Cocke-Younger-Kasami}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.36

Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Authors: Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch-and-bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-served" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problems and so will not lead to a performance degradation. Specifically, Oliva has two variants, including Oliva^GR, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and Oliva^SA, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR-10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25× in MNIST, and up to 80× in CIFAR-10.

Cite as

Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui. Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 36:1-36:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ECOOP.2025.36,
  author =	{Zhang, Guanqin and Fukuda, Kota and Zhang, Zhenya and Bandara, H.M.N. Dilum and Chen, Shiping and Zhao, Jianjun and Sui, Yulei},
  title =	{{Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{36:1--36:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.36},
  URN =		{urn:nbn:de:0030-drops-233281},
  doi =		{10.4230/LIPIcs.ECOOP.2025.36},
  annote =	{Keywords: neural network verification, branch and bound, counterexample potentiality, simulated annealing, stochastic optimization}
}

@InProceedings{zhang_et_al:LIPIcs.ECOOP.2025.36,
  author =	{Zhang, Guanqin and Fukuda, Kota and Zhang, Zhenya and Bandara, H.M.N. Dilum and Chen, Shiping and Zhao, Jianjun and Sui, Yulei},
  title =	{{Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{36:1--36:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.36},
  URN =		{urn:nbn:de:0030-drops-233281},
  doi =		{10.4230/LIPIcs.ECOOP.2025.36},
  annote =	{Keywords: neural network verification, branch and bound, counterexample potentiality, simulated annealing, stochastic optimization}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.26

Dynamic Direct Access of MSO Query Evaluation over Strings

Authors: Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton 𝒜 with state set Q and variables X and a string s, computes a data structure in time 𝒪(|Q|^ω⋅ |X|² ⋅ |s|) and, then, given an index i retrieves, using the data structure, the i-th output of the evaluation of 𝒜 over s in time 𝒪(|Q|^ω ⋅ |X|³ ⋅ log(|s|)²) where ω is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data. Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g. efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.

Cite as

Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros. Dynamic Direct Access of MSO Query Evaluation over Strings. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 26:1-26:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2025.26,
  author =	{Bourhis, Pierre and Capelli, Florent and Mengel, Stefan and Riveros, Cristian},
  title =	{{Dynamic Direct Access of MSO Query Evaluation over Strings}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{26:1--26:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.26},
  URN =		{urn:nbn:de:0030-drops-229675},
  doi =		{10.4230/LIPIcs.ICDT.2025.26},
  annote =	{Keywords: Query evaluation, direct access, MSO queries}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.14

Repairing Databases over Metric Spaces with Coincidence Constraints

Authors: Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a vector space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as coincidence constraints, which include unary key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm solving the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining our algorithm for tree metrics and a classic result on probabilistic tree embeddings, we design a (high probability) logarithmic-ratio approximation for general metrics. We also study the variant of the problem where we limit the allowed change of each individual value. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and we present a polynomial-time repairing algorithm for the case of a line metric.

Cite as

Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc. Repairing Databases over Metric Spaces with Coincidence Constraints. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kaminsky_et_al:LIPIcs.ICDT.2025.14,
  author =	{Kaminsky, Youri and Kimelfeld, Benny and Livshits, Ester and Naumann, Felix and Wajc, David},
  title =	{{Repairing Databases over Metric Spaces with Coincidence Constraints}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.14},
  URN =		{urn:nbn:de:0030-drops-229554},
  doi =		{10.4230/LIPIcs.ICDT.2025.14},
  annote =	{Keywords: Database repairs, metric spaces, coincidence constraints, inclusion constraints, foreign-key constraints}
}

Document

DOI: 10.4230/LIPIcs.ESA.2023.40

Efficient Parallel Output-Sensitive Edit Distance

Authors: Xiangyun Ding, Xiaojun Dong, Yan Gu, Youzhe Liu, and Yihan Sun

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

In this paper, we study efficient parallel edit distance algorithms, both in theory and in practice. Given two strings A[1..n] and B[1..m], and a set of operations allowed to edit the strings, the edit distance between A and B is the minimum number of operations required to transform A into B. In this paper, we use edit distance to refer to the Levenshtein distance, which allows for unit-cost single-character edits (insertions, deletions, substitutions). Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with Θ(nm) cost. In many real-world applications, the strings to be compared are similar to each other and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard Θ(nm) algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS), and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All of the algorithms studied in this paper have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data, including DNA sequences, Wikipedia texts, GitHub repositories, etc. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. On cases with fewer than 10⁵ edits, our algorithm can process input sequences of size 10⁹ in about ten seconds, while ParlayLib can only process sequences of sizes up to 10⁶ in the same amount of time. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance.

Cite as

Xiangyun Ding, Xiaojun Dong, Yan Gu, Youzhe Liu, and Yihan Sun. Efficient Parallel Output-Sensitive Edit Distance. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 40:1-40:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.ESA.2023.40,
  author =	{Ding, Xiangyun and Dong, Xiaojun and Gu, Yan and Liu, Youzhe and Sun, Yihan},
  title =	{{Efficient Parallel Output-Sensitive Edit Distance}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{40:1--40:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.40},
  URN =		{urn:nbn:de:0030-drops-186935},
  doi =		{10.4230/LIPIcs.ESA.2023.40},
  annote =	{Keywords: Edit Distance, Parallel Algorithms, String Algorithms, Dynamic Programming, Pattern Matching}
}

Document

DOI: 10.4230/LIPIcs.DISC.2021.12

Space and Time Bounded Multiversion Garbage Collection

Authors: Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun, and Yuanhao Wei

Published in: LIPIcs, Volume 209, 35th International Symposium on Distributed Computing (DISC 2021)

Abstract

We present a general technique for garbage collecting old versions for multiversion concurrency control that simultaneously achieves good time and space complexity. Our technique takes only O(1) time on average to reclaim each version and maintains only a constant factor more versions than needed (plus an additive term). It is designed for multiversion schemes using version lists, which are the most common. Our approach uses two components that are of independent interest. First, we define a novel range-tracking data structure which stores a set of old versions and efficiently finds those that are no longer needed. We provide a wait-free implementation in which all operations take amortized constant time. Second, we represent version lists using a new lock-free doubly-linked list algorithm that supports efficient (amortized constant time) removals given a pointer to any node in the list. These two components naturally fit together to solve the multiversion garbage collection problem - the range-tracker identifies which versions to remove and our list algorithm can then be used to remove them from their version lists. We apply our garbage collection technique to generate end-to-end time and space bounds for the multiversioning system of Wei et al. (PPoPP 2021).

Cite as

Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun, and Yuanhao Wei. Space and Time Bounded Multiversion Garbage Collection. In 35th International Symposium on Distributed Computing (DISC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 209, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bendavid_et_al:LIPIcs.DISC.2021.12,
  author =	{Ben-David, Naama and Blelloch, Guy E. and Fatourou, Panagiota and Ruppert, Eric and Sun, Yihan and Wei, Yuanhao},
  title =	{{Space and Time Bounded Multiversion Garbage Collection}},
  booktitle =	{35th International Symposium on Distributed Computing (DISC 2021)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-210-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{209},
  editor =	{Gilbert, Seth},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2021.12},
  URN =		{urn:nbn:de:0030-drops-148143},
  doi =		{10.4230/LIPIcs.DISC.2021.12},
  annote =	{Keywords: Lock-free, data structures, memory management, snapshot, version lists}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2021.60

A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

Authors: Yiqiu Wang, Shangdi Yu, Yan Gu, and Julian Shun

Published in: LIPIcs, Volume 189, 37th International Symposium on Computational Geometry (SoCG 2021)

Abstract

We propose a theoretically-efficient and practical parallel batch-dynamic data structure for the closest pair problem. Our solution is based on a serial dynamic closest pair data structure by Golin et al., and supports batches of insertions and deletions in parallel. For a data set of size n, our data structure supports a batch of insertions or deletions of size m in O(m(1+log ((n+m)/m))) expected work and O(log (n+m)log^*(n+m)) depth with high probability, and takes linear space. The key techniques for achieving these bounds are a new work-efficient parallel batch-dynamic binary heap, and careful management of the computation across sets of points to minimize work and depth. We provide an optimized multicore implementation of our data structure using dynamic hash tables, parallel heaps, and dynamic k-d trees. Our experiments on a variety of synthetic and real-world data sets show that it achieves a parallel speedup of up to 38.57x (15.10x on average) on 48 cores with hyper-threading. In addition, we also implement and compare four parallel algorithms for static closest pair problem, for which we are not aware of any existing practical implementations. On 48 cores with hyper-threading, the static algorithms achieve up to 51.45x (29.42x on average) speedup, and Rabin’s algorithm performs the best on average. Comparing our dynamic algorithm to the fastest static algorithm, we find that it is advantageous to use the dynamic algorithm for batch sizes of up to 20% of the data set. As far as we know, our work is the first to experimentally evaluate parallel closest pair algorithms, in both the static and the dynamic settings.

Cite as

Yiqiu Wang, Shangdi Yu, Yan Gu, and Julian Shun. A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem. In 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 189, pp. 60:1-60:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{wang_et_al:LIPIcs.SoCG.2021.60,
  author =	{Wang, Yiqiu and Yu, Shangdi and Gu, Yan and Shun, Julian},
  title =	{{A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem}},
  booktitle =	{37th International Symposium on Computational Geometry (SoCG 2021)},
  pages =	{60:1--60:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-184-9},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{189},
  editor =	{Buchin, Kevin and Colin de Verdi\`{e}re, \'{E}ric},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2021.60},
  URN =		{urn:nbn:de:0030-drops-138594},
  doi =		{10.4230/LIPIcs.SoCG.2021.60},
  annote =	{Keywords: Closest Pair, Parallel Algorithms, Dynamic Algorithms, Experimental Algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.44

Algorithmic Building Blocks for Asymmetric Memories

Authors: Yan Gu, Yihan Sun, and Guy E. Blelloch

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy, bandwidth, and latency. This asymmetry can have a significant effect on algorithm design, and in many cases it is possible to reduce writes at the cost of more reads. This paper studies which algorithmic techniques are useful in designing practical write-efficient algorithms. We focus on several fundamental algorithmic building blocks including unordered set/map implemented using hash tables, comparison sort, and graph traversal algorithms including breadth-first search and Dijkstra's algorithm. We introduce new algorithms and implementations that can reduce writes, and analyze the performance experimentally using a software simulator. Finally, we summarize interesting lessons and directions in designing write-efficient algorithms that can be valuable to share.

Cite as

Yan Gu, Yihan Sun, and Guy E. Blelloch. Algorithmic Building Blocks for Asymmetric Memories. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gu_et_al:LIPIcs.ESA.2018.44,
  author =	{Gu, Yan and Sun, Yihan and Blelloch, Guy E.},
  title =	{{Algorithmic Building Blocks for Asymmetric Memories}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{44:1--44:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.44},
  URN =		{urn:nbn:de:0030-drops-95070},
  doi =		{10.4230/LIPIcs.ESA.2018.44},
  annote =	{Keywords: Asymmetric Memory, I/O Cost, Write-Efficient Algorithms, Hash Tables, Graph-Traversal Algorithms}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2017.26

Efficient Construction of Probabilistic Tree Embeddings

Authors: Guy E. Blelloch, Yan Gu, and Yihan Sun

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Abstract

In this paper we describe an algorithm that embeds a graph metric (V,d_G) on an undirected weighted graph G=(V,E) into a distribution of tree metrics (T,D_T) such that for every pair u,v in V, d_G(u,v)<=d_T(u,v) and E_T[d_T(u,v)]<=O(log n)d_G(u,v). Such embeddings have proved highly useful in designing fast approximation algorithms, as many hard problems on graphs are easy to solve on tree instances. For a graph with n vertices and m edges, our algorithm runs in O(m log n) time with high probability, which improves the previous upper bound of O(m log^3 n) shown by Mendel et al. in 2009. The key component of our algorithm is a new approximate single-source shortest-path algorithm, which implements the priority queue with a new data structure, the bucket-tree structure. The algorithm has three properties: it only requires linear time in terms of the number of edges in the input graph; the computed distances have the distance preserving property; and when computing the shortest-paths to the k-nearest vertices from the source, it only requires to visit these vertices and their edge lists. These properties are essential to guarantee the correctness and the stated work bound. Using this shortest-path algorithm, we show how to generate an intermediate structure, the approximate dominance sequences of the input graph, in O(m log n) time, and further propose a simple yet efficient algorithm to converted this sequence to a tree embedding in O(n log n) time, both with high probability. Combining the three subroutines gives the stated work bound of the algorithm. We also show a new application of probabilistic tree embeddings: they can be used to accelerate the construction of a series of approximate distance oracles.

Cite as

Guy E. Blelloch, Yan Gu, and Yihan Sun. Efficient Construction of Probabilistic Tree Embeddings. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{blelloch_et_al:LIPIcs.ICALP.2017.26,
  author =	{Blelloch, Guy E. and Gu, Yan and Sun, Yihan},
  title =	{{Efficient Construction of Probabilistic Tree Embeddings}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.26},
  URN =		{urn:nbn:de:0030-drops-75034},
  doi =		{10.4230/LIPIcs.ICALP.2017.26},
  annote =	{Keywords: Graph Algorithm, Metric Embeddings, Probabilistic Tree Embeddings, Single-source Shortest-paths}
}

12 Search Results for "Sun, Yihan"

Software Performance Engineering (Dagstuhl Seminar 25341)

Abstract

Cite as

Parallel Joinable B-Trees in the Fork-Join I/O Model

Abstract

Cite as

Advancing Intelligent Personal Assistants for Human Spaceflight

Abstract

Cite as

On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms

Abstract

Cite as

Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Abstract

Cite as

Dynamic Direct Access of MSO Query Evaluation over Strings

Abstract

Cite as

Repairing Databases over Metric Spaces with Coincidence Constraints

Abstract

Cite as

Efficient Parallel Output-Sensitive Edit Distance

Abstract

Cite as

Space and Time Bounded Multiversion Garbage Collection

Abstract

Cite as

A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

Abstract

Cite as

Algorithmic Building Blocks for Asymmetric Memories

Abstract

Cite as

Efficient Construction of Probabilistic Tree Embeddings

Abstract

Cite as

Thanks for your feedback!

Could not send message