DROPS

Document

Research

On the Computational Cost of Knowledge Graph Embeddings

Authors: Victor Charpenay, Mansour Zoubeirou A Mayaki, and Antoine Zimmermann

Published in: TGDK, Volume 4, Issue 1 (2026). Transactions on Graph Data and Knowledge, Volume 4, Issue 1

Abstract

Over a decade, numerous Knowledge Graph Embedding (KGE) models have been designed and evaluated on reference datasets, always with increasing performance. In this paper, we re-evaluate these models with respect to their computational efficiency during training, by estimating the computational cost of the procedure expressed in floating-point operations. We design a cost model based on analytical expressions and apply it on a collection of 20 KGE models, representative of the state-of-the-art. We show that dimensionality or parameter efficiency, used in the literature to compare models with each other, are not suitable to evaluate the true cost of models. Through fixed-budget experiments, a novel approach to evaluate KGE models based on cost estimates, we re-assess the relative performance of model families compared to the state-of-the-art. Bilinear models such as ComplEx underperform with a low computational budget while hyperbolic linear models appear to offer no particular benefit compared to simpler Euclidian models, especially the MuRE model. Neural models, such as ConvE or CompGCN, achieve reasonable performance in the literature but their high computational cost appears unnecessary when compared with other models. The trade-off between efficiency and expressivity of both linear and neural models is to be further explored.

Cite as

Victor Charpenay, Mansour Zoubeirou A Mayaki, and Antoine Zimmermann. On the Computational Cost of Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 4, Issue 1, pp. 1:1-1:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@Article{charpenay_et_al:TGDK.4.1.1,
  author =	{Charpenay, Victor and Zoubeirou A Mayaki, Mansour and Zimmermann, Antoine},
  title =	{{On the Computational Cost of Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:30},
  ISSN =	{2942-7517},
  year =	{2026},
  volume =	{4},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.4.1.1},
  URN =		{urn:nbn:de:0030-drops-256863},
  doi =		{10.4230/TGDK.4.1.1},
  annote =	{Keywords: Knowledge Graph Embedding, Parameter Efficiency, Computational Budget, Green AI}
}

Document

Research

DOI: 10.4230/TGDK.3.3.4

Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web

Authors: Florian Ruosch, Cristina Sarasua, and Abraham Bernstein

Published in: TGDK, Volume 3, Issue 3 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 3

Abstract

In Argument Mining, predicting argumentative relations between texts (or spans) remains one of the most challenging aspects, even more so in the cross-document setting. This paper makes three key contributions to advance research in this domain. We first extend an existing dataset, the Sci-Arg corpus, by annotating it with explicit inter-document argumentative relations, thereby allowing arguments to be distributed over several documents forming an Argument Web; these new annotations are published using Semantic Web technologies (RDF, OWL). Second, we explore and evaluate three automated approaches for predicting these inter-document argumentative relations, establishing critical baselines on the new dataset. We find that a simple classifier based on discourse indicators with access to context outperforms neural methods. Third, we conduct a comparative analysis of these approaches for both intra- and inter-document settings, identifying statistically significant differences in results that indicate the necessity of distinguishing between these two scenarios. Our findings highlight significant challenges in this complex domain and open crucial avenues for future research on the Argument Web of Science, particularly for those interested in leveraging Semantic Web technologies and knowledge graphs to understand scholarly discourse. With this, we provide the first stepping stones in the form of a benchmark dataset, three baseline methods, and an initial analysis for a systematic exploration of this field relevant to the Web of Data and Science.

Cite as

Florian Ruosch, Cristina Sarasua, and Abraham Bernstein. Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 3, pp. 4:1-4:33, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{ruosch_et_al:TGDK.3.3.4,
  author =	{Ruosch, Florian and Sarasua, Cristina and Bernstein, Abraham},
  title =	{{Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{4:1--4:33},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{3},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.3.4},
  URN =		{urn:nbn:de:0030-drops-252159},
  doi =		{10.4230/TGDK.3.3.4},
  annote =	{Keywords: Argument Mining, Large Language Models, Knowledge Graphs, Link Prediction}
}

Document

Survey

DOI: 10.4230/TGDK.3.2.1

Resilience in Knowledge Graph Embeddings

Authors: Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2

Abstract

In recent years, knowledge graphs have gained interest and witnessed widespread applications in various domains, such as information retrieval, question-answering, recommendation systems, amongst others. Large-scale knowledge graphs to this end have demonstrated their utility in effectively representing structured knowledge. To further facilitate the application of machine learning techniques, knowledge graph embedding models have been developed. Such models can transform entities and relationships within knowledge graphs into vectors. However, these embedding models often face challenges related to noise, missing information, distribution shift, adversarial attacks, etc. This can lead to sub-optimal embeddings and incorrect inferences, thereby negatively impacting downstream applications. While the existing literature has focused so far on adversarial attacks on KGE models, the challenges related to the other critical aspects remain unexplored. In this paper, we, first of all, give a unified definition of resilience, encompassing several factors such as generalisation, in-distribution generalization, distribution adaption, and robustness. After formalizing these concepts for machine learning in general, we define them in the context of knowledge graphs. To find the gap in the existing works on resilience in the context of knowledge graphs, we perform a systematic survey, taking into account all these aspects mentioned previously. Our survey results show that most of the existing works focus on a specific aspect of resilience, namely robustness. After categorizing such works based on their respective aspects of resilience, we discuss the challenges and future research directions.

Cite as

Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo. Resilience in Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 1:1-1:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{sharma_et_al:TGDK.3.2.1,
  author =	{Sharma, Arnab and Kouagou, N'Dah Jean and Ngomo, Axel-Cyrille Ngonga},
  title =	{{Resilience in Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:38},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.1},
  URN =		{urn:nbn:de:0030-drops-248117},
  doi =		{10.4230/TGDK.3.2.1},
  annote =	{Keywords: Knowledge graphs, Resilience, Robustness}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.21

A Unified FPT Framework for Crossing Number Problems

Authors: Éric Colin de Verdière and Petr Hliněný

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

The basic (and traditional) crossing number problem is to determine the minimum number of crossings in a topological drawing of an input graph in the plane. We develop a unified framework that smoothly captures many generalized crossing number problems, and that yields fixed-parameter tractable (FPT) algorithms for them not only in the plane but also on surfaces. Our framework takes the following form. We fix a surface S, an integer r, and a map κ from the set of topological drawings of graphs in S to ℤ_+ ∪ {∞}, satisfying some natural monotonicity conditions, but essentially describing the allowed drawings and how we want to count the crossings in them. Then deciding whether an input graph G has an allowed drawing D on S with κ(D) ≤ r can be done in time quadratic in the size of G (and exponential in other parameters). More generally, we may take as input an edge-colored graph, and distinguish crossings by the colors of the involved edges; and we may allow to perform a bounded number of edge removals and vertex splits to G before drawing it. The proof is a reduction to the embeddability of a graph on a two-dimensional simplicial complex. This framework implies, in a unified way, quadratic FPT algorithms for many topological crossing number variants established in the graph drawing community. Some of these variants already had previously published FPT algorithms, mostly relying on Courcelle’s metatheorem, but for many of those, we obtain an algorithm with a better runtime. Moreover, our framework extends, at no cost, to these crossing number variants in any fixed surface.

Cite as

Éric Colin de Verdière and Petr Hliněný. A Unified FPT Framework for Crossing Number Problems. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{colindeverdiere_et_al:LIPIcs.ESA.2025.21,
  author =	{Colin de Verdi\`{e}re, \'{E}ric and Hlin\v{e}n\'{y}, Petr},
  title =	{{A Unified FPT Framework for Crossing Number Problems}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.21},
  URN =		{urn:nbn:de:0030-drops-244897},
  doi =		{10.4230/LIPIcs.ESA.2025.21},
  annote =	{Keywords: computational geometry, fixed-parameter tractability, graph drawing, graph embedding, crossing number, two-dimensional simplicial complex, surface}
}

Document

DOI: 10.4230/LIPIcs.WADS.2025.49

On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms

Authors: Lorenzo De Stefani and Vedant Gupta

Published in: LIPIcs, Volume 349, 19th International Symposium on Algorithms and Data Structures (WADS 2025)

Abstract

Asymptotically tight lower bounds are derived for the Input/Output (I/O) complexity of a class of dynamic programming algorithms, including matrix chain multiplication, optimal polygon triangulation, and the construction of optimal binary search trees. Assuming no recomputation of intermediate values, we establish an Ω(n³/(√M B)) I/O lower bound, where n denotes the size of the input and M denotes the size of the available fast memory (cache). When recomputation is allowed, we show that the same bound holds for M < cn, where c is a positive constant. In the case where M ≥ 2n, we show an Ω(n/B) I/O lower bound. We also discuss algorithms for which the number of executed I/O operations matches asymptotically each of the presented lower bounds, which are thus asymptotically tight. Additionally, we refine our general method to obtain a lower bound for the I/O complexity of the Cocke-Younger-Kasami algorithm, where the size of the grammar impacts the I/O complexity. An upper bound with asymptotically matching performance in many cases is also provided.

Cite as

Lorenzo De Stefani and Vedant Gupta. On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms. In 19th International Symposium on Algorithms and Data Structures (WADS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 349, pp. 49:1-49:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{destefani_et_al:LIPIcs.WADS.2025.49,
  author =	{De Stefani, Lorenzo and Gupta, Vedant},
  title =	{{On the I/O Complexity of the Cocke-Younger-Kasami Algorithm and of a Family of Related Dynamic Programming Algorithms}},
  booktitle =	{19th International Symposium on Algorithms and Data Structures (WADS 2025)},
  pages =	{49:1--49:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-398-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{349},
  editor =	{Morin, Pat and Oh, Eunjin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WADS.2025.49},
  URN =		{urn:nbn:de:0030-drops-242800},
  doi =		{10.4230/LIPIcs.WADS.2025.49},
  annote =	{Keywords: I/O complexity, Dynamic Programming Algorithms, Lower Bounds, Recomputation, Cocke-Younger-Kasami}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.48

Generalized De Bruijn Words, Invertible Necklaces, and the Burrows-Wheeler Transform

Authors: Gabriele Fici and Estéban Gabory

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We define generalized de Bruijn words as those words having a Burrows-Wheeler transform that is a concatenation of permutations of the alphabet. We show that generalized de Bruijn words are in 1-to-1 correspondence with Hamiltonian cycles in the generalized de Bruijn graphs, introduced in the early '80s in the context of network design. When the size of the alphabet is a prime p, we define invertible necklaces as those whose BWT-matrix is non-singular. We show that invertible necklaces of length n correspond to normal bases of the finite field 𝔽_{pⁿ}, and that they form an Abelian group isomorphic to the Reutenauer group RG_pⁿ. Using known results in abstract algebra, we can make a bridge between generalized de Bruijn words and invertible necklaces. In particular, we highlight a correspondence between binary de Bruijn words of order d+1, binary necklaces of length 2^{d} having an odd number of 1’s, invertible BWT matrices of size 2^{d}× 2^{d}, and normal bases of the finite field 𝔽_{2^{2^{d}}}.

Cite as

Gabriele Fici and Estéban Gabory. Generalized De Bruijn Words, Invertible Necklaces, and the Burrows-Wheeler Transform. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 48:1-48:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{fici_et_al:LIPIcs.MFCS.2025.48,
  author =	{Fici, Gabriele and Gabory, Est\'{e}ban},
  title =	{{Generalized De Bruijn Words, Invertible Necklaces, and the Burrows-Wheeler Transform}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{48:1--48:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.48},
  URN =		{urn:nbn:de:0030-drops-241555},
  doi =		{10.4230/LIPIcs.MFCS.2025.48},
  annote =	{Keywords: Burrows-Wheeler Transform, Generalized de Bruijn Word, Generalized de Bruijn Graph, Circulant Matrix, Invertible Necklace, Sandpile Group, Reutenauer Group}
}

@InProceedings{fici_et_al:LIPIcs.MFCS.2025.48,
  author =	{Fici, Gabriele and Gabory, Est\'{e}ban},
  title =	{{Generalized De Bruijn Words, Invertible Necklaces, and the Burrows-Wheeler Transform}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{48:1--48:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.48},
  URN =		{urn:nbn:de:0030-drops-241555},
  doi =		{10.4230/LIPIcs.MFCS.2025.48},
  annote =	{Keywords: Burrows-Wheeler Transform, Generalized de Bruijn Word, Generalized de Bruijn Graph, Circulant Matrix, Invertible Necklace, Sandpile Group, Reutenauer Group}
}

Document

DOI: 10.4230/LIPIcs.GIScience.2025.18

U-Prithvi: Integrating a Foundation Model and U-Net for Enhanced Flood Inundation Mapping

Authors: Vit Kostejn, Yamil Essus, Jenna Abrahamson, and Ranga Raju Vatsavai

Published in: LIPIcs, Volume 346, 13th International Conference on Geographic Information Science (GIScience 2025)

Abstract

In recent years, large pre-trained models, commonly referred to as foundation models, have become increasingly popular for various tasks leveraging transfer learning. This trend has expanded to remote sensing, where transformer-based foundation models such as Prithvi, msGFM, and SatSwinMAE have been utilized for a range of applications. While these transformer-based models, particularly the Prithvi model, exhibit strong generalization capabilities, they have limitations on capturing fine-grained details compared to convolutional neural network architectures like U-Net in segmentation tasks. In this paper, we propose a novel architecture, U-Prithvi, which combines the strengths of the Prithvi transformer with those of U-Net. We introduce a RandomHalfMaskLayer to ensure balanced learning from both models during training. Our approach is evaluated on the Sen1Floods11 dataset for flood inundation mapping, and experimental results demonstrate better performance of U-Prithvi over both individual models, achieving improved performance on out-of-sample data. While this principle is illustrated using the Prithvi model, it is easily adaptable to other foundation models.

Cite as

Vit Kostejn, Yamil Essus, Jenna Abrahamson, and Ranga Raju Vatsavai. U-Prithvi: Integrating a Foundation Model and U-Net for Enhanced Flood Inundation Mapping. In 13th International Conference on Geographic Information Science (GIScience 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 346, pp. 18:1-18:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kostejn_et_al:LIPIcs.GIScience.2025.18,
  author =	{Kostejn, Vit and Essus, Yamil and Abrahamson, Jenna and Vatsavai, Ranga Raju},
  title =	{{U-Prithvi: Integrating a Foundation Model and U-Net for Enhanced Flood Inundation Mapping}},
  booktitle =	{13th International Conference on Geographic Information Science (GIScience 2025)},
  pages =	{18:1--18:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-378-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{346},
  editor =	{Sila-Nowicka, Katarzyna and Moore, Antoni and O'Sullivan, David and Adams, Benjamin and Gahegan, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2025.18},
  URN =		{urn:nbn:de:0030-drops-238479},
  doi =		{10.4230/LIPIcs.GIScience.2025.18},
  annote =	{Keywords: GeoAI, flood mapping, foundation model, U-Net, Prithvi}
}

@InProceedings{kostejn_et_al:LIPIcs.GIScience.2025.18,
  author =	{Kostejn, Vit and Essus, Yamil and Abrahamson, Jenna and Vatsavai, Ranga Raju},
  title =	{{U-Prithvi: Integrating a Foundation Model and U-Net for Enhanced Flood Inundation Mapping}},
  booktitle =	{13th International Conference on Geographic Information Science (GIScience 2025)},
  pages =	{18:1--18:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-378-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{346},
  editor =	{Sila-Nowicka, Katarzyna and Moore, Antoni and O'Sullivan, David and Adams, Benjamin and Gahegan, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2025.18},
  URN =		{urn:nbn:de:0030-drops-238479},
  doi =		{10.4230/LIPIcs.GIScience.2025.18},
  annote =	{Keywords: GeoAI, flood mapping, foundation model, U-Net, Prithvi}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.WABI.2025.1

Recursive Parsing and Grammar Compression in the Era of Pangenomics (Invited Talk)

Authors: Christina Boucher

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Prefix-Free Parsing (PFP) and its recursive variant (RPFP) provide a scalable framework for compressing and indexing large genomic datasets. By enabling efficient construction of succinct data structures, these methods support fast and memory-efficient read alignment across thousands of genomes. Their deterministic and modular design makes them especially well-suited for pangenomics and large-scale sequence analysis.

Cite as

Christina Boucher. Recursive Parsing and Grammar Compression in the Era of Pangenomics (Invited Talk). In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{boucher:LIPIcs.WABI.2025.1,
  author =	{Boucher, Christina},
  title =	{{Recursive Parsing and Grammar Compression in the Era of Pangenomics}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.1},
  URN =		{urn:nbn:de:0030-drops-239278},
  doi =		{10.4230/LIPIcs.WABI.2025.1},
  annote =	{Keywords: Prefix-Free Parsing, Recursive Prefix-Free Parsing, Grammar-Based Compression, Succinct Data Structures, RePair Compression}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.10

Faster Run-Length Compressed Suffix Arrays

Authors: Nathaniel K. Brown, Travis Gagie, Giovanni Manzini, Gonzalo Navarro, and Marinella Sciortino

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log (n / r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for a P in O (log r_a + log log n) time, where r_a is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for a P in only O (log r_a) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.

Cite as

Nathaniel K. Brown, Travis Gagie, Giovanni Manzini, Gonzalo Navarro, and Marinella Sciortino. Faster Run-Length Compressed Suffix Arrays. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{brown_et_al:OASIcs.Grossi.10,
  author =	{Brown, Nathaniel K. and Gagie, Travis and Manzini, Giovanni and Navarro, Gonzalo and Sciortino, Marinella},
  title =	{{Faster Run-Length Compressed Suffix Arrays}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{10:1--10:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.10},
  URN =		{urn:nbn:de:0030-drops-238095},
  doi =		{10.4230/OASIcs.Grossi.10},
  annote =	{Keywords: Run-length compressed suffix arrays, interpolative coding, two-level indexing}
}

Document

DOI: 10.4230/OASIcs.Manzini.6

Optimizing the Performance of the FM-Index for Large-Scale Data

Authors: Eddie Ferro and Christina Boucher

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The FM-index is a fundamental data structure used in bioinformatics to efficiently search for strings and index genomes. However, the FM-index can pose computational challenges, particularly in the context of large-scale genomic datasets, due to the complexity of its underlying components and data encodings. In this paper, we present a comprehensive review of efficient variants of the FM-index and the encoding strategies used to improve performance. We examine hardware-accelerated techniques, such as memory-efficient data layouts and cache-aware structures, as well as software-level innovations, including algorithmic refinements and compact representations. The reviewed work demonstrates substantial gains in both speed and scalability, making methods that use the FM-index more practical for high-throughput genomic applications. By analyzing the trade-offs and design choices of these variants, we highlight how combining hardware-aware and software-centric strategies enables more efficient FM-index construction and usage across a range of bioinformatics tasks.

Cite as

Eddie Ferro and Christina Boucher. Optimizing the Performance of the FM-Index for Large-Scale Data. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ferro_et_al:OASIcs.Manzini.6,
  author =	{Ferro, Eddie and Boucher, Christina},
  title =	{{Optimizing the Performance of the FM-Index for Large-Scale Data}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{6:1--6:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.6},
  URN =		{urn:nbn:de:0030-drops-239140},
  doi =		{10.4230/OASIcs.Manzini.6},
  annote =	{Keywords: FM-Index Acceleration, Run-Length Encoding, Suffix Array Optimization, Burrows-Wheeler Transform, Efficient Backward Search}
}

Document

DOI: 10.4230/LIPIcs.SAT.2025.32

Efficient Certified Reasoning for Binarized Neural Networks

Authors: Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel

Published in: LIPIcs, Volume 341, 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)

Abstract

Neural networks have emerged as essential components in safety-critical applications - these use cases demand complex, yet trustworthy computations. Binarized Neural Networks (BNNs) are a type of neural network where each neuron is constrained to a Boolean value; they are particularly well-suited for safety-critical tasks because they retain much of the computational capacities of full-scale (floating-point or quantized) deep neural networks, but remain compatible with satisfiability solvers for qualitative verification and with model counters for quantitative reasoning. However, existing methods for BNN analysis suffer from either limited scalability or susceptibility to soundness errors, which hinders their applicability in real-world scenarios. In this work, we present a scalable and trustworthy approach for both qualitative and quantitative verification of BNNs. Our approach introduces a native representation of BNN constraints in a custom-designed solver for qualitative reasoning, and in an approximate model counter for quantitative reasoning. We further develop specialized proof generation and checking pipelines with native support for BNN constraint reasoning, ensuring trustworthiness for all of our verification results. Empirical evaluations on a BNN robustness verification benchmark suite demonstrate that our certified solving approach achieves a 9× speedup over prior certified CNF and PB-based approaches, and our certified counting approach achieves a 218× speedup over the existing CNF-based baseline. In terms of coverage, our pipeline produces fully certified results for 99% and 86% of the qualitative and quantitative reasoning queries on BNNs, respectively. This is in sharp contrast to the best existing baselines which can fully certify only 62% and 4% of the queries, respectively.

Cite as

Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel. Efficient Certified Reasoning for Binarized Neural Networks. In 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 341, pp. 32:1-32:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{yang_et_al:LIPIcs.SAT.2025.32,
  author =	{Yang, Jiong and Tan, Yong Kiam and Soos, Mate and Myreen, Magnus O. and Meel, Kuldeep S.},
  title =	{{Efficient Certified Reasoning for Binarized Neural Networks}},
  booktitle =	{28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)},
  pages =	{32:1--32:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-381-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{341},
  editor =	{Berg, Jeremias and Nordstr\"{o}m, Jakob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2025.32},
  URN =		{urn:nbn:de:0030-drops-237665},
  doi =		{10.4230/LIPIcs.SAT.2025.32},
  annote =	{Keywords: Neural network verification, proof certification, SAT solving, approximate model counting}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.4

U-Index: A Universal Indexing Framework for Matching Long Patterns

Authors: Lorraine A. K. Ayad, Gabriele Fici, Ragnar Groot Koerkamp, Grigorios Loukides, Rob Patro, Giulio Ermanno Pibiri, and Solon P. Pissis

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Motivation. Text indexing is a fundamental and well-studied problem. Classic solutions to this problem either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an index - to accelerate matching, e.g., the suffix array. The former solutions thus retain excellent compressed space, but are practically slow to construct and query. The latter approaches, instead, sacrifice space efficiency but are typically faster; for example, the suffix array takes much more space than the text itself for commonly used alphabets, like ASCII or DNA, but it is very fast to construct and query. Methods. In this paper, we show that efficient text indexing can be achieved using just a small extra space on top of the original text, provided that the query patterns are sufficiently long. More specifically, we develop a new indexing paradigm in which a sketch of a query pattern is first matched against a sketch of the text. Once candidate matches are retrieved, they are verified using the original text. This paradigm is thus universal in the sense that it allows us to use any solution to index the sketched text, like a suffix array, FM-index, or r-index. Results. We explore both the theory and the practice of this universal framework. With an extensive experimental analysis, we show that, surprisingly, universal indexes can be constructed much faster than their unsketched counterparts and take a fraction of the space, as a direct consequence of (i) having a lower bound on the length of patterns and (ii) working in sketch space. Furthermore, these data structures have the potential of retaining or even improving query time, because matching against the sketched text is faster and verifying candidates can be theoretically done in constant time per occurrence (or, in practice, by short and cache-friendly scans of the text). Finally, we discuss some important applications of this novel indexing paradigm to computational biology. We hypothesize that such indexes will be particularly effective when the queries are sufficiently long, and so we demonstrate applications in long-read mapping.

Cite as

Lorraine A. K. Ayad, Gabriele Fici, Ragnar Groot Koerkamp, Grigorios Loukides, Rob Patro, Giulio Ermanno Pibiri, and Solon P. Pissis. U-Index: A Universal Indexing Framework for Matching Long Patterns. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ayad_et_al:LIPIcs.SEA.2025.4,
  author =	{Ayad, Lorraine A. K. and Fici, Gabriele and Groot Koerkamp, Ragnar and Loukides, Grigorios and Patro, Rob and Pibiri, Giulio Ermanno and Pissis, Solon P.},
  title =	{{U-Index: A Universal Indexing Framework for Matching Long Patterns}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.4},
  URN =		{urn:nbn:de:0030-drops-232420},
  doi =		{10.4230/LIPIcs.SEA.2025.4},
  annote =	{Keywords: Text Indexing, Sketching, Minimizers, Hashing}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.65

Recognizing 2-Layer and Outer k-Planar Graphs

Authors: Yasuaki Kobayashi, Yuto Okada, and Alexander Wolff

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

The crossing number of a graph is the least number of crossings over all drawings of the graph in the plane. Computing the crossing number of a given graph is NP-hard, but fixed-parameter tractable (FPT) with respect to the natural parameter. Two well-known variants of the problem are 2-layer crossing minimization and circular crossing minimization, where every vertex must lie on one of two layers, namely two parallel lines, or a circle, respectively. In both cases, edges are drawn as straight-line segments. Both variants are NP-hard, but admit FPT-algorithms with respect to the natural parameter. In recent years, in the context of beyond-planar graphs, a local version of the crossing number has also received considerable attention. A graph is k-planar if it admits a drawing with at most k crossings per edge. In contrast to the crossing number, recognizing k-planar graphs is NP-hard even if k = 1 and hence not likely to be FPT with respect to the natural parameter k. In this paper, we consider the two above variants in the local setting. The k-planar graphs that admit a straight-line drawing with vertices on two layers or on a circle are called 2-layer k-planar and outer k-planar graphs, respectively. We study the parameterized complexity of the two recognition problems with respect to the natural parameter k. For k = 0, the two classes of graphs are exactly the caterpillars and outerplanar graphs, respectively, which can be recognized in linear time. Two groups of researchers independently showed that outer 1-planar graphs can also be recognized in linear time [Hong et al., Algorithmica 2015; Auer et al., Algorithmica 2016]. One group asked explicitly whether outer 2-planar graphs can be recognized in polynomial time. Our main contribution consists of XP-algorithms for recognizing 2-layer k-planar graphs and outer k-planar graphs, which implies that both recognition problems can be solved in polynomial time for every fixed k. We complement these results by showing that recognizing 2-layer k-planar graphs is XNLP-complete and that recognizing outer k-planar graphs is XNLP-hard. This implies that both problems are W[t]-hard for every t and that it is unlikely that they admit FPT-algorithms. On the other hand, we present an FPT-algorithm for recognizing 2-layer k-planar graphs where the order of the vertices on one layer is specified.

Cite as

Yasuaki Kobayashi, Yuto Okada, and Alexander Wolff. Recognizing 2-Layer and Outer k-Planar Graphs. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 65:1-65:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kobayashi_et_al:LIPIcs.SoCG.2025.65,
  author =	{Kobayashi, Yasuaki and Okada, Yuto and Wolff, Alexander},
  title =	{{Recognizing 2-Layer and Outer k-Planar Graphs}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{65:1--65:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.65},
  URN =		{urn:nbn:de:0030-drops-232170},
  doi =		{10.4230/LIPIcs.SoCG.2025.65},
  annote =	{Keywords: 2-layer k-planar graphs, outer k-planar graphs, recognition algorithms, local crossing number, bandwidth, FPT, XNLP, XP, W\lbrackt\rbrack}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.3

Private Estimation When Data and Privacy Demands Are Correlated

Authors: Syomantak Chaudhuri and Thomas A. Courtade

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations - first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.

Cite as

Syomantak Chaudhuri and Thomas A. Courtade. Private Estimation When Data and Privacy Demands Are Correlated. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chaudhuri_et_al:LIPIcs.FORC.2025.3,
  author =	{Chaudhuri, Syomantak and Courtade, Thomas A.},
  title =	{{Private Estimation When Data and Privacy Demands Are Correlated}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.3},
  URN =		{urn:nbn:de:0030-drops-231305},
  doi =		{10.4230/LIPIcs.FORC.2025.3},
  annote =	{Keywords: Differential Privacy, Personalized Privacy, Heterogeneous Privacy, Correlations in Privacy}
}

Document

Research

DOI: 10.4230/TGDK.3.1.2

Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs

Authors: Elisavet Koutsiana, Ioannis Reklos, Kholoud Saad Alghamdi, Nitisha Jain, Albert Meroño-Peñuela, and Elena Simperl

Published in: TGDK, Volume 3, Issue 1 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 1

Abstract

We study collaboration patterns of Wikidata, one of the world's largest open source collaborative knowledge graph (KG) communities. Collaborative KG communities, play a key role in structuring machine-readable knowledge to support AI systems like conversational agents. However, these communities face challenges related to long-term member engagement, as a small subset of contributors often is responsible for the majority of contributions and decision-making. While prior research has explored contributors' roles and lifespans, discussions within collaborative KG communities remain understudied. To fill this gap, we investigated the behavioural patterns of contributors and factors affecting their communication and participation. We analysed all the discussions on Wikidata using a mixed methods approach, including statistical tests, network analysis, and text and graph embedding representations. Our findings reveal that the interactions between Wikidata editors form a small world network, resilient to dropouts and inclusive, where both the network topology and discussion content influence the continuity of conversations. Furthermore, the account age of Wikidata members and their conversations are significant factors in their long-term engagement with the project. Our observations and recommendations can benefit the Wikidata and semantic web communities, providing guidance on how to improve collaborative environments for sustainability, growth, and quality.

Cite as

Elisavet Koutsiana, Ioannis Reklos, Kholoud Saad Alghamdi, Nitisha Jain, Albert Meroño-Peñuela, and Elena Simperl. Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 1, pp. 2:1-2:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{koutsiana_et_al:TGDK.3.1.2,
  author =	{Koutsiana, Elisavet and Reklos, Ioannis and Alghamdi, Kholoud Saad and Jain, Nitisha and Mero\~{n}o-Pe\~{n}uela, Albert and Simperl, Elena},
  title =	{{Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:27},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.1.2},
  URN =		{urn:nbn:de:0030-drops-230114},
  doi =		{10.4230/TGDK.3.1.2},
  annote =	{Keywords: collaborative knowledge graph, network analysis, graph embeddings, text embeddings}
}

22 Search Results for "Hong, Aaron"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message