DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2023.40

Efficient Parallel Output-Sensitive Edit Distance

Authors: Xiangyun Ding, Xiaojun Dong, Yan Gu, Youzhe Liu, and Yihan Sun

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

In this paper, we study efficient parallel edit distance algorithms, both in theory and in practice. Given two strings A[1..n] and B[1..m], and a set of operations allowed to edit the strings, the edit distance between A and B is the minimum number of operations required to transform A into B. In this paper, we use edit distance to refer to the Levenshtein distance, which allows for unit-cost single-character edits (insertions, deletions, substitutions). Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with Θ(nm) cost. In many real-world applications, the strings to be compared are similar to each other and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard Θ(nm) algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS), and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All of the algorithms studied in this paper have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data, including DNA sequences, Wikipedia texts, GitHub repositories, etc. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. On cases with fewer than 10⁵ edits, our algorithm can process input sequences of size 10⁹ in about ten seconds, while ParlayLib can only process sequences of sizes up to 10⁶ in the same amount of time. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance.

Cite as

Xiangyun Ding, Xiaojun Dong, Yan Gu, Youzhe Liu, and Yihan Sun. Efficient Parallel Output-Sensitive Edit Distance. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 40:1-40:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.ESA.2023.40,
  author =	{Ding, Xiangyun and Dong, Xiaojun and Gu, Yan and Liu, Youzhe and Sun, Yihan},
  title =	{{Efficient Parallel Output-Sensitive Edit Distance}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{40:1--40:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.40},
  URN =		{urn:nbn:de:0030-drops-186935},
  doi =		{10.4230/LIPIcs.ESA.2023.40},
  annote =	{Keywords: Edit Distance, Parallel Algorithms, String Algorithms, Dynamic Programming, Pattern Matching}
}

Document

DOI: 10.4230/LIPIcs.DISC.2021.12

Space and Time Bounded Multiversion Garbage Collection

Authors: Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun, and Yuanhao Wei

Published in: LIPIcs, Volume 209, 35th International Symposium on Distributed Computing (DISC 2021)

Abstract

We present a general technique for garbage collecting old versions for multiversion concurrency control that simultaneously achieves good time and space complexity. Our technique takes only O(1) time on average to reclaim each version and maintains only a constant factor more versions than needed (plus an additive term). It is designed for multiversion schemes using version lists, which are the most common. Our approach uses two components that are of independent interest. First, we define a novel range-tracking data structure which stores a set of old versions and efficiently finds those that are no longer needed. We provide a wait-free implementation in which all operations take amortized constant time. Second, we represent version lists using a new lock-free doubly-linked list algorithm that supports efficient (amortized constant time) removals given a pointer to any node in the list. These two components naturally fit together to solve the multiversion garbage collection problem - the range-tracker identifies which versions to remove and our list algorithm can then be used to remove them from their version lists. We apply our garbage collection technique to generate end-to-end time and space bounds for the multiversioning system of Wei et al. (PPoPP 2021).

Cite as

Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun, and Yuanhao Wei. Space and Time Bounded Multiversion Garbage Collection. In 35th International Symposium on Distributed Computing (DISC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 209, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bendavid_et_al:LIPIcs.DISC.2021.12,
  author =	{Ben-David, Naama and Blelloch, Guy E. and Fatourou, Panagiota and Ruppert, Eric and Sun, Yihan and Wei, Yuanhao},
  title =	{{Space and Time Bounded Multiversion Garbage Collection}},
  booktitle =	{35th International Symposium on Distributed Computing (DISC 2021)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-210-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{209},
  editor =	{Gilbert, Seth},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2021.12},
  URN =		{urn:nbn:de:0030-drops-148143},
  doi =		{10.4230/LIPIcs.DISC.2021.12},
  annote =	{Keywords: Lock-free, data structures, memory management, snapshot, version lists}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.44

Algorithmic Building Blocks for Asymmetric Memories

Authors: Yan Gu, Yihan Sun, and Guy E. Blelloch

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy, bandwidth, and latency. This asymmetry can have a significant effect on algorithm design, and in many cases it is possible to reduce writes at the cost of more reads. This paper studies which algorithmic techniques are useful in designing practical write-efficient algorithms. We focus on several fundamental algorithmic building blocks including unordered set/map implemented using hash tables, comparison sort, and graph traversal algorithms including breadth-first search and Dijkstra's algorithm. We introduce new algorithms and implementations that can reduce writes, and analyze the performance experimentally using a software simulator. Finally, we summarize interesting lessons and directions in designing write-efficient algorithms that can be valuable to share.

Cite as

Yan Gu, Yihan Sun, and Guy E. Blelloch. Algorithmic Building Blocks for Asymmetric Memories. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gu_et_al:LIPIcs.ESA.2018.44,
  author =	{Gu, Yan and Sun, Yihan and Blelloch, Guy E.},
  title =	{{Algorithmic Building Blocks for Asymmetric Memories}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{44:1--44:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.44},
  URN =		{urn:nbn:de:0030-drops-95070},
  doi =		{10.4230/LIPIcs.ESA.2018.44},
  annote =	{Keywords: Asymmetric Memory, I/O Cost, Write-Efficient Algorithms, Hash Tables, Graph-Traversal Algorithms}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2017.26

Efficient Construction of Probabilistic Tree Embeddings

Authors: Guy E. Blelloch, Yan Gu, and Yihan Sun

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Abstract

In this paper we describe an algorithm that embeds a graph metric (V,d_G) on an undirected weighted graph G=(V,E) into a distribution of tree metrics (T,D_T) such that for every pair u,v in V, d_G(u,v)<=d_T(u,v) and E_T[d_T(u,v)]<=O(log n)d_G(u,v). Such embeddings have proved highly useful in designing fast approximation algorithms, as many hard problems on graphs are easy to solve on tree instances. For a graph with n vertices and m edges, our algorithm runs in O(m log n) time with high probability, which improves the previous upper bound of O(m log^3 n) shown by Mendel et al. in 2009. The key component of our algorithm is a new approximate single-source shortest-path algorithm, which implements the priority queue with a new data structure, the bucket-tree structure. The algorithm has three properties: it only requires linear time in terms of the number of edges in the input graph; the computed distances have the distance preserving property; and when computing the shortest-paths to the k-nearest vertices from the source, it only requires to visit these vertices and their edge lists. These properties are essential to guarantee the correctness and the stated work bound. Using this shortest-path algorithm, we show how to generate an intermediate structure, the approximate dominance sequences of the input graph, in O(m log n) time, and further propose a simple yet efficient algorithm to converted this sequence to a tree embedding in O(n log n) time, both with high probability. Combining the three subroutines gives the stated work bound of the algorithm. We also show a new application of probabilistic tree embeddings: they can be used to accelerate the construction of a series of approximate distance oracles.

Cite as

Guy E. Blelloch, Yan Gu, and Yihan Sun. Efficient Construction of Probabilistic Tree Embeddings. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{blelloch_et_al:LIPIcs.ICALP.2017.26,
  author =	{Blelloch, Guy E. and Gu, Yan and Sun, Yihan},
  title =	{{Efficient Construction of Probabilistic Tree Embeddings}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.26},
  URN =		{urn:nbn:de:0030-drops-75034},
  doi =		{10.4230/LIPIcs.ICALP.2017.26},
  annote =	{Keywords: Graph Algorithm, Metric Embeddings, Probabilistic Tree Embeddings, Single-source Shortest-paths}
}

Search Results

Documents authored by Sun, Yihan

Efficient Parallel Output-Sensitive Edit Distance

Abstract

Cite as

Space and Time Bounded Multiversion Garbage Collection

Abstract

Cite as

Algorithmic Building Blocks for Asymmetric Memories

Abstract

Cite as

Efficient Construction of Probabilistic Tree Embeddings

Abstract

Cite as

Thanks for your feedback!

Could not send message