DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.94

Faster Algorithm for Bounded Tree Edit Distance in the Low-Distance Regime

Authors: Tomasz Kociumaka and Ali Shahali

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

The tree edit distance is a natural dissimilarity measure between rooted ordered trees whose nodes are labeled over an alphabet Σ. It is defined as the minimum number of node edits - insertions, deletions, and relabelings - required to transform one tree into the other. The weighted variant assigns costs ≥ 1 to edits (based on node labels), minimizing total cost rather than edit count. The unweighted tree edit distance between two trees of total size n can be computed in 𝒪(n^{2.6857}) time; in contrast, determining the weighted tree edit distance is fine-grained equivalent to the All-Pairs Shortest Paths (APSP) problem and requires n³/2^Ω(√{log n}) time [Nogler, Polak, Saha, Vassilevska Williams, Xu, Ye; STOC'25]. These impractical super-quadratic times for large, similar trees motivate the bounded version, parameterizing runtime by the distance k to enable faster algorithms for k ≪ n. Prior algorithms for bounded unweighted edit distance achieve 𝒪(nk²log n) [Akmal & Jin; ICALP’21] and 𝒪(n + k⁷log k) [Das, Gilbert, Hajiaghayi, Kociumaka, Saha; STOC'23]. For weighted, only 𝒪(n + k^{15}) is known [Das, Gilbert, Hajiaghayi, Kociumaka, Saha; STOC'23]. We present an 𝒪(n + k⁶ log k)-time algorithm for bounded tree edit distance in both weighted/unweighted settings. First, we devise a simpler weighted 𝒪(nk² log n)-time algorithm. Next, we exploit periodic structures in input trees via an optimized universal kernel: modifying prior 𝒪(n)-time 𝒪(k⁵)-size kernels to generate such structured instances, enabling efficient analysis.

Cite as

Tomasz Kociumaka and Ali Shahali. Faster Algorithm for Bounded Tree Edit Distance in the Low-Distance Regime. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 94:1-94:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kociumaka_et_al:LIPIcs.ESA.2025.94,
  author =	{Kociumaka, Tomasz and Shahali, Ali},
  title =	{{Faster Algorithm for Bounded Tree Edit Distance in the Low-Distance Regime}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{94:1--94:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.94},
  URN =		{urn:nbn:de:0030-drops-245634},
  doi =		{10.4230/LIPIcs.ESA.2025.94},
  annote =	{Keywords: tree edit distance, edit distance, kernelization, dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.60

Fast and Memory-Efficient BWT Construction of Repetitive Texts Using Lyndon Grammars

Authors: Jannik Olbrich

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

The Burrows-Wheeler Transform (BWT) serves as the basis for many important sequence indexes. On very large datasets (e.g. genomic databases), classical BWT construction algorithms are often infeasible because they usually need to have the entire dataset in main memory. Fortunately, such large datasets are often highly repetitive. It can thus be beneficial to compute the BWT from a compressed representation. We propose an algorithm for computing the BWT via the Lyndon straight-line program, a grammar based on the standard factorization of Lyndon words. Our algorithm can also be used to compute the extended BWT (eBWT) of a multiset of sequences. We empirically evaluate our implementation and find that we can compute the BWT and eBWT of very large datasets faster and/or with less memory than competing methods.

Cite as

Jannik Olbrich. Fast and Memory-Efficient BWT Construction of Repetitive Texts Using Lyndon Grammars. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 60:1-60:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{olbrich:LIPIcs.ESA.2025.60,
  author =	{Olbrich, Jannik},
  title =	{{Fast and Memory-Efficient BWT Construction of Repetitive Texts Using Lyndon Grammars}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{60:1--60:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.60},
  URN =		{urn:nbn:de:0030-drops-245286},
  doi =		{10.4230/LIPIcs.ESA.2025.60},
  annote =	{Keywords: Burrows-Wheeler Transform, Grammar compression}
}

Document

DOI: 10.4230/OASIcs.Manzini.14

BWT Indexes for Optimal Joins in Graph Databases

Authors: Diego Arroyuelo and Gonzalo Navarro

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

Graph databases represent data as a labeled directed graph, where the labels refer to properties that connect the entities represented by their source and target vertices. Queries feature, most prominently, sets of edges where source, target, and/or label can be variables; each instantiation of the variables where all the edges occur in the graph is a solution to the query. Worst-case-optimal algorithms to solve those queries have been devised, but they pose significant space requirements. This overhead has hindered the adoption of worst-case-optimal algorithms in real systems. We show that a representation of the graph based on the extended BWT (eBWT), where each edge is seen as an independent string of length 3 (source, label, target) supports worst-case-optimal algorithms while using almost no extra space on top of the raw data. We then show how the idea is generalized to the relational model, where the strings can be longer than 3 and several eBWTs are needed to obtain worst-case optimality. The aim to minimize the amount of space in that case leads to consider novel eBWT variants, where columns other than the last can be chosen. Finally, we show how the same graph representation can be used to solve other typical queries, like finding graph paths that match regular expressions.

Cite as

Diego Arroyuelo and Gonzalo Navarro. BWT Indexes for Optimal Joins in Graph Databases. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{arroyuelo_et_al:OASIcs.Manzini.14,
  author =	{Arroyuelo, Diego and Navarro, Gonzalo},
  title =	{{BWT Indexes for Optimal Joins in Graph Databases}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{14:1--14:19},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.14},
  URN =		{urn:nbn:de:0030-drops-239222},
  doi =		{10.4230/OASIcs.Manzini.14},
  annote =	{Keywords: Graph databases, Ring index, extended BWT, compact data structures}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.23

Elias-Fano Compression for Space-Efficient Rank and Select Structures

Authors: Lannie Dalton Hough and Abhinav Bhatele

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Bit vectors are an important component in many data structures. Such data structures are used in a variety of applications and domains including databases, search engines, and computational biology. Many use cases depend on being able to perform rank and/or select queries on the bit vector. No existing rank and select structure enabling these queries is most efficient both for space and for time; there is a tradeoff between the two. In practice, the smallest rank and select data structures, cs-poppy and pasta-flat, impose a space overhead of 3.51%, or 3.125% if only rank needs to be supported. In this paper, we present a new data structure, orzo, which reduces the overhead of the rank component by a further 26.5%. We preserve desirable cache-centric design decisions made in prior work, which allows us to minimize the performance penalty of creating a smaller data structure.

Cite as

Lannie Dalton Hough and Abhinav Bhatele. Elias-Fano Compression for Space-Efficient Rank and Select Structures. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hough_et_al:LIPIcs.SEA.2025.23,
  author =	{Hough, Lannie Dalton and Bhatele, Abhinav},
  title =	{{Elias-Fano Compression for Space-Efficient Rank and Select Structures}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{23:1--23:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.23},
  URN =		{urn:nbn:de:0030-drops-232617},
  doi =		{10.4230/LIPIcs.SEA.2025.23},
  annote =	{Keywords: rank and select, cache-aware, succinct data structures, bit vector}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.4

The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable

Authors: Dirk Nowotka and Max Wiedenhöft

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

Patterns are words with terminals and variables. The language of a pattern is the set of words obtained by uniformly substituting all variables with words that contain only terminals. Length constraints restrict valid substitutions of variables by associating the variables of a pattern with a system (or disjunction of systems) of linear diophantine inequalities. Pattern languages with length constraints contain only words in which all variables are substituted to words with lengths that fulfill such a given set of length constraints. We consider membership, inclusion, and equivalence problems for erasing and non-erasing pattern languages with length constraints. Our main result shows that the erasing equivalence problem - one of the most prominent open problems in the realm of patterns - becomes undecidable if length constraints are allowed in addition to variable equality. Additionally, it is shown that the terminal-free inclusion problem, a prominent problem which has been shown to be undecidable in the binary case for patterns without any constraints, is also generally undecidable for all larger alphabets in this setting. Finally, we also show that considering regular constraints, i.e., associating variables also with regular languages as additional restrictions together with length constraints for valid substitutions, results in undecidability of the non-erasing equivalence problem. This sets a first upper bound on constraints to obtain undecidability in this case, as this problem is trivially decidable in the case of no constraints and as it has unknown decidability if only regular or only length constraints are considered.

Cite as

Dirk Nowotka and Max Wiedenhöft. The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{nowotka_et_al:LIPIcs.CPM.2025.4,
  author =	{Nowotka, Dirk and Wiedenh\"{o}ft, Max},
  title =	{{The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.4},
  URN =		{urn:nbn:de:0030-drops-230988},
  doi =		{10.4230/LIPIcs.CPM.2025.4},
  annote =	{Keywords: Patterns, Pattern Languages, Length Constraints, Regular Constraints, Decidability, Undecidability, Membership, Inclusion, Equivalence}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.19

The Trie Measure, Revisited

Authors: Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

In this paper, we study the following problem: given n subsets S₁, … , S_n of an integer universe U = {0,… , u-1}, having total cardinality N = ∑_{i = 1}ⁿ |S_i|, find a prefix-free encoding enc : U → {0,1}^+ minimizing the so-called trie measure, i.e., the total number of edges in the n binary tries T₁, … , T_n, where T_i is the trie packing the encoded integers {enc(x):x ∈ S_i}. We first observe that this problem is equivalent to that of merging u sets with the cheapest sequence of binary unions, a problem which in [Ghosh et al., ICDCS 2015] is shown to be NP-hard. Motivated by the hardness of the general problem, we focus on particular families of prefix-free encodings. We start by studying the fixed-length shifted encoding of [Gupta et al., Theoretical Computer Science 2007]. Given a parameter 0 ≤ a < u, this encoding sends each x ∈ U to (x + a) mod u, interpreted as a bit-string of log u bits. We develop the first efficient algorithms that find the value of a minimizing the trie measure when this encoding is used. Our two algorithms run in O(u + Nlog u) and O(Nlog² u) time, respectively. We proceed by studying ordered encodings (a.k.a. monotone or alphabetic), and describe an algorithm finding the optimal such encoding in O(N+u³) time. Within the same running time, we show how to compute the best shifted ordered encoding, provably no worse than both the optimal shifted and optimal ordered encodings. We provide implementations of our algorithms and discuss how these encodings perform in practice.

Cite as

Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza. The Trie Measure, Revisited. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2025.19,
  author =	{Alanko, Jarno N. and Becker, Ruben and Cenzato, Davide and Gagie, Travis and Kim, Sung-Hwan and Kodric, Bojana and Prezza, Nicola},
  title =	{{The Trie Measure, Revisited}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.19},
  URN =		{urn:nbn:de:0030-drops-231135},
  doi =		{10.4230/LIPIcs.CPM.2025.19},
  annote =	{Keywords: Succinct data structures, degenerate strings, integer encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.1

Trie-Compressed Adaptive Set Intersection

Authors: Diego Arroyuelo and Juan Pablo Castillo

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We introduce space- and time-efficient algorithms and data structures for the offline set intersection problem. We show that a sorted integer set S ⊆ [0..u) of n elements can be represented using compressed space while supporting k-way intersections in adaptive O(kδlg(u/δ)) time, δ being the alternation measure introduced by Barbay and Kenyon. Our experimental results suggest that our approaches are competitive in practice, outperforming the most efficient alternatives (Partitioned Elias-Fano indexes, Roaring Bitmaps, and Recursive Universe Partitioning (RUP)) in several scenarios, offering in general relevant space-time trade-offs.

Cite as

Diego Arroyuelo and Juan Pablo Castillo. Trie-Compressed Adaptive Set Intersection. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1:1-1:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{arroyuelo_et_al:LIPIcs.CPM.2023.1,
  author =	{Arroyuelo, Diego and Castillo, Juan Pablo},
  title =	{{Trie-Compressed Adaptive Set Intersection}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1:1--1:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.1},
  URN =		{urn:nbn:de:0030-drops-179552},
  doi =		{10.4230/LIPIcs.CPM.2023.1},
  annote =	{Keywords: Set intersection problem, Adaptive Algorithms, Compressed and compact data structures}
}

7 Search Results for "Arroyuelo, Diego"

Faster Algorithm for Bounded Tree Edit Distance in the Low-Distance Regime

Abstract

Cite as

Fast and Memory-Efficient BWT Construction of Repetitive Texts Using Lyndon Grammars

Abstract

Cite as

BWT Indexes for Optimal Joins in Graph Databases

Abstract

Cite as

Elias-Fano Compression for Space-Efficient Rank and Select Structures

Abstract

Cite as

The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable

Abstract

Cite as

The Trie Measure, Revisited

Abstract

Cite as

Trie-Compressed Adaptive Set Intersection

Abstract

Cite as

Thanks for your feedback!

Could not send message