65 Search Results for "Li, Yan"


Document
Survey
Resilience in Knowledge Graph Embeddings

Authors: Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2


Abstract
In recent years, knowledge graphs have gained interest and witnessed widespread applications in various domains, such as information retrieval, question-answering, recommendation systems, amongst others. Large-scale knowledge graphs to this end have demonstrated their utility in effectively representing structured knowledge. To further facilitate the application of machine learning techniques, knowledge graph embedding models have been developed. Such models can transform entities and relationships within knowledge graphs into vectors. However, these embedding models often face challenges related to noise, missing information, distribution shift, adversarial attacks, etc. This can lead to sub-optimal embeddings and incorrect inferences, thereby negatively impacting downstream applications. While the existing literature has focused so far on adversarial attacks on KGE models, the challenges related to the other critical aspects remain unexplored. In this paper, we, first of all, give a unified definition of resilience, encompassing several factors such as generalisation, in-distribution generalization, distribution adaption, and robustness. After formalizing these concepts for machine learning in general, we define them in the context of knowledge graphs. To find the gap in the existing works on resilience in the context of knowledge graphs, we perform a systematic survey, taking into account all these aspects mentioned previously. Our survey results show that most of the existing works focus on a specific aspect of resilience, namely robustness. After categorizing such works based on their respective aspects of resilience, we discuss the challenges and future research directions.

Cite as

Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo. Resilience in Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 1:1-1:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Article{sharma_et_al:TGDK.3.2.1,
  author =	{Sharma, Arnab and Kouagou, N'Dah Jean and Ngomo, Axel-Cyrille Ngonga},
  title =	{{Resilience in Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:38},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.1},
  URN =		{urn:nbn:de:0030-drops-248117},
  doi =		{10.4230/TGDK.3.2.1},
  annote =	{Keywords: Knowledge graphs, Resilience, Robustness}
}
Document
Research
GraphRAG on Technical Documents - Impact of Knowledge Graph Schema

Authors: Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2


Abstract
Retrieval Augmented Generation (RAG) is seeing rapid adoption in industry to enable employees to query information captured in proprietary data for their organisation. In this work, we test the impact of domain-relevant knowledge graph schemas on the results of Microsoft’s GraphRAG pipeline. Our approach aims to address the poor quality of GraphRAG responses on technical reports rich in domain-specific terms. The use case involves technical reports about geology, chemistry and mineral processing published by the Minerals Research Institute of Western Australia (MRIWA). Four schemas are considered: a simple five-class minerals domain expert-developed schema, an expanded minerals domain schema, the Microsoft GraphRAG auto-generated schema, and a schema-less GraphRAG. These are compared to a conventional baseline RAG. Performance is evaluated using a scoring approach that accounts for the mix of correct, incorrect, additional, and missing content in RAG responses. The results show that the simple five-class minerals domain schema extracts approximately 10% more entities from the MRIWA reports than the other schema options. Additionally, both the five-class and the expanded eight-class minerals domain schemas produce the most factually correct answers and the fewest hallucinations. We attribute this to the minerals-specific schemas extracting more relevant, domain-specific information during the Indexing stage. As a result, the Query stage’s context window includes more high-value content. This contributes to the observed improvement in answer quality compared to the other pipelines. In contrast, pipelines with fewer domain-related entities in the KG retrieve less valuable information, leaving more room for irrelevant content in the context window. Baseline RAG responses were typically shorter, less complete, and contained more hallucinations compared to our GraphRAG pipelines. We provide a complete set of resources at https://github.com/nlp-tlp/GraphRAG-on-Minerals-Domain/tree/main. These resources include links to the MRIWA reports, a set of questions (from simple to challenging) along with domain-expert curated answers, schemas, and evaluations of the pipelines.

Cite as

Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke. GraphRAG on Technical Documents - Impact of Knowledge Graph Schema. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 3:1-3:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Article{scaffidi_et_al:TGDK.3.2.3,
  author =	{Scaffidi, Henri and Hodkiewicz, Melinda and Woods, Caitlin and Roocke, Nicole},
  title =	{{GraphRAG on Technical Documents - Impact of Knowledge Graph Schema}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:24},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.3},
  URN =		{urn:nbn:de:0030-drops-248131},
  doi =		{10.4230/TGDK.3.2.3},
  annote =	{Keywords: RAG, minerals, local search, global search, entity extraction, competency questions}
}
Document
Temporal Ensemble Logic for Integrative Representation of the Entirety of Clinical Trials

Authors: Xiaojin Li, Yan Huang, Rashmie Abeysinghe, Zenan Sun, Hongyu Chen, Pengze Li, Xing He, Shiqiang Tao, Cui Tao, Jiang Bian, Licong Cui, and Guo-Qiang Zhang

Published in: LIPIcs, Volume 355, 32nd International Symposium on Temporal Representation and Reasoning (TIME 2025)


Abstract
Clinical trials are typically specified with protocols that define eligibility criteria, treatment regimens, follow-up schedules, and outcome assessments. Temporality is a hallmark of all clinical trials, reflected within and across trial components, with complex dependencies unfolding across multiple time points. Despite their importance, clinical trial protocols are described in free-text format, limiting their semantic precision and the ability to support automated reasoning, leverage data across studies and sites, or simulate trial execution under varying assumptions using Real-World Data. This paper introduces a formalized representation of clinical trials using Temporal Ensemble Logic (TEL). TEL incorporates metricized modal operators, such as "always until t" (□_t) and "possibly until t" (◇_t), where t is a time-length parameter, to offer a logical framework for capturing phenotypes in biomedicine. TEL is more expressive in syntax than classical linear temporal logic (LTL) while maintaining the simplicity of semantic structures. The attributes of TEL are exploited in this paper to formally represent not only individual clinical trial components, but also the timing and sequential dependencies of these components as a whole. Modeling strategies and demonstration case studies are provided to show that TEL can represent the entirety of clinical trials, whereby providing a formal logical framework that can be used to represent the intricate temporal dependencies in trial structure specification. Since clinical trials are a cornerstone of evidence-based medicine, serving as the scientific basis for evaluating the safety, efficacy, and comparative effectiveness of therapeutic interventions, results reported here can serve as a stepping stone that leads to scalable, consistent, and reproducible representation and simulation of clinical trials across all disease domains.

Cite as

Xiaojin Li, Yan Huang, Rashmie Abeysinghe, Zenan Sun, Hongyu Chen, Pengze Li, Xing He, Shiqiang Tao, Cui Tao, Jiang Bian, Licong Cui, and Guo-Qiang Zhang. Temporal Ensemble Logic for Integrative Representation of the Entirety of Clinical Trials. In 32nd International Symposium on Temporal Representation and Reasoning (TIME 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 355, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.TIME.2025.13,
  author =	{Li, Xiaojin and Huang, Yan and Abeysinghe, Rashmie and Sun, Zenan and Chen, Hongyu and Li, Pengze and He, Xing and Tao, Shiqiang and Tao, Cui and Bian, Jiang and Cui, Licong and Zhang, Guo-Qiang},
  title =	{{Temporal Ensemble Logic for Integrative Representation of the Entirety of Clinical Trials}},
  booktitle =	{32nd International Symposium on Temporal Representation and Reasoning (TIME 2025)},
  pages =	{13:1--13:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-401-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{355},
  editor =	{Vidal, Thierry and Wa{\l}\k{e}ga, Przemys{\l}aw Andrzej},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.TIME.2025.13},
  URN =		{urn:nbn:de:0030-drops-244595},
  doi =		{10.4230/LIPIcs.TIME.2025.13},
  annote =	{Keywords: Temporal ensemble logic, Clinical trials, Logic-based modeling}
}
Document
RANDOM
Low-Degree Polynomials Are Good Extractors

Authors: Omar Alrabiah, Jesse Goodman, Jonathan Mosheiff, and João Ribeiro

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)


Abstract
We prove that random low-degree polynomials (over 𝔽₂) are unbiased, in an extremely general sense. That is, we show that random low-degree polynomials are good randomness extractors for a wide class of distributions. Prior to our work, such results were only known for the small families of (1) uniform sources, (2) affine sources, and (3) local sources. We significantly generalize these results, and prove the following. 1) Low-degree polynomials extract from small families. We show that a random low-degree polynomial is a good low-error extractor for any small family of sources. In particular, we improve the positive result of Alrabiah, Chattopadhyay, Goodman, Li, and Ribeiro (ICALP 2022) for local sources, and give new results for polynomial and variety sources via a single unified approach. 2) Low-degree polynomials extract from sumset sources. We show that a random low-degree polynomial is a good extractor for sumset sources, which are the most general large family of sources (capturing independent sources, interleaved sources, small-space sources, and more). Formally, for any even d, we show that a random degree d polynomial is an ε-error extractor for n-bit sumset sources with min-entropy k = O(d(n/ε²)^{2/d}). This is nearly tight in the polynomial error regime. Our results on sumset extractors imply new complexity separations for linear ROBPs, and the tools that go into its proof may be of independent interest. The two main tools we use are a new structural result on sumset-punctured Reed-Muller codes, paired with a novel type of reduction between extractors. Using the new structural result, we obtain new limits on the power of sumset extractors, strengthening and generalizing the impossibility results of Chattopadhyay, Goodman, and Gurumukhani (ITCS 2024).

Cite as

Omar Alrabiah, Jesse Goodman, Jonathan Mosheiff, and João Ribeiro. Low-Degree Polynomials Are Good Extractors. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 38:1-38:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{alrabiah_et_al:LIPIcs.APPROX/RANDOM.2025.38,
  author =	{Alrabiah, Omar and Goodman, Jesse and Mosheiff, Jonathan and Ribeiro, Jo\~{a}o},
  title =	{{Low-Degree Polynomials Are Good Extractors}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{38:1--38:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.38},
  URN =		{urn:nbn:de:0030-drops-244048},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.38},
  annote =	{Keywords: randomness extractors, low-degree polynomials, local sources, polynomial sources, variety sources, sumset sources, sumset extractors, Reed-Muller codes, lower bounds}
}
Document
List Decoding Quotient Reed-Muller Codes

Authors: Omri Gotlib, Tali Kaufman, and Shachar Lovett

Published in: LIPIcs, Volume 339, 40th Computational Complexity Conference (CCC 2025)


Abstract
Reed-Muller codes consist of evaluations of n-variate polynomials over a finite field 𝔽 with degree at most d. Much like every linear code, Reed-Muller codes can be characterized by constraints, where a codeword is valid if and only if it satisfies all degree-d constraints. For a subset X̃ ⊆ 𝔽ⁿ, we introduce the notion of X̃-quotient Reed-Muller code. A function F:X̃ → 𝔽 is a valid codeword in the quotient code if it satisfies all the constraints of degree-d polynomials lying in X̃. This gives rise to a novel phenomenon: a quotient codeword may have many extensions to original codewords. This weakens the connection between original codewords and quotient codewords which introduces a richer range of behaviors along with substantial new challenges. Our goal is to answer the following question: what properties of X̃ will imply that the quotient code inherits its distance and list-decoding radius from the original code? We address this question using techniques developed by Bhowmick and Lovett [Abhishek Bhowmick and Shachar Lovett, 2014], identifying key properties of 𝔽ⁿ used in their proof and extending them to general subsets X̃ ⊆ 𝔽ⁿ. By introducing a new tool, we overcome the novel challenge in analyzing the quotient code that arises from the weak connection between original and quotient codewords. This enables us to apply known results from additive combinatorics and algebraic geometry [David Kazhdan and Tamar Ziegler, 2018; David Kazhdan and Tamar Ziegler, 2019; Amichai Lampert and Tamar Ziegler, 2021] to show that when X̃ is a high rank variety, X̃-quotient Reed-Muller codes inherit the distance and list-decoding parameters from the original Reed-Muller codes.

Cite as

Omri Gotlib, Tali Kaufman, and Shachar Lovett. List Decoding Quotient Reed-Muller Codes. In 40th Computational Complexity Conference (CCC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 339, pp. 1:1-1:44, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gotlib_et_al:LIPIcs.CCC.2025.1,
  author =	{Gotlib, Omri and Kaufman, Tali and Lovett, Shachar},
  title =	{{List Decoding Quotient Reed-Muller Codes}},
  booktitle =	{40th Computational Complexity Conference (CCC 2025)},
  pages =	{1:1--1:44},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-379-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{339},
  editor =	{Srinivasan, Srikanth},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2025.1},
  URN =		{urn:nbn:de:0030-drops-236957},
  doi =		{10.4230/LIPIcs.CCC.2025.1},
  annote =	{Keywords: Reed-Muller Codes, Quotient Code, Quotient Reed-Muller Code, List Decoding, High Rank Variety, High-Order Fourier Analysis, Error-Correcting Codes}
}
Document
Pangenome Graph Indexing via the Multidollar-BWT

Authors: Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Indexing pangenome graphs is a major algorithmic challenge in computational pangenomics, a recent and active research field that seeks to use graphs as representations of multiple genomes. Since these graphs are constructed from whole genome sequences of a species population, they can become very large, making indexing one of the most challenging problems. In this paper, we propose gindex, a novel indexing approach to solve the Graph Pattern Matching Problem based on the multidollar-BWT. Specifically, gindex aims to find all occurrences of a pattern in a sequence-labeled graph by overcoming two main limitations of GCSA2, one of the most widely used graph indexes: handling queries of arbitrary length and scaling to large graphs without pruning any complex regions. Moreover, we show how a smart preprocessing step can optimize the use of multidollar-BWT to skip small redundant sub-patterns and enhance gindex’s querying capabilities. We demonstrate the effectiveness of our approach by comparing it to GCSA2 in terms of index construction and query time, using different preprocessing modes on three pangenome graphs: one built from Drosophila genomes and two produced by the Human Pangenome Reference Consortium. The results show that gindex can scale on human pangenome graphs - which GCSA2 cannot index using large amounts of RAM - with acceptable memory and time requirements. Moreover, gindex achieves fast query times, although not as fast as GCSA2, which may produce false positives.

Cite as

Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni. Pangenome Graph Indexing via the Multidollar-BWT. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cozzi_et_al:LIPIcs.SEA.2025.13,
  author =	{Cozzi, Davide and Riccardi, Brian and Denti, Luca and Ciccolella, Simone and Sadakane, Kunihiko and Bonizzoni, Paola},
  title =	{{Pangenome Graph Indexing via the Multidollar-BWT}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.13},
  URN =		{urn:nbn:de:0030-drops-232515},
  doi =		{10.4230/LIPIcs.SEA.2025.13},
  annote =	{Keywords: Multidollar-BWT, Graph Index, Graph Pattern Matching, Pangenome Graph}
}
Document
SimdMinimizers: Computing Random Minimizers, fast

Authors: Ragnar Groot Koerkamp and Igor Martayan

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Motivation. Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller than the input sequences, but preserve sufficient information for downstream analysis. Minimizers are an especially popular sketching technique and used in a wide variety of applications. They sample at least one out of every w consecutive k-mers. As DNA sequencers are getting more accurate, some applications can afford to use a larger w and hence sparser and smaller sketches. And as sketches get smaller, their analysis becomes faster, so the time spent sketching the full-sized input becomes more of a bottleneck. Methods. Our library simd-minimizers implements a random minimizer algorithm using SIMD instructions. It supports both AVX2 and NEON architectures. Its main novelty is two-fold. First, it splits the input into 8 chunks that are streamed over in parallel through all steps of the algorithm. This is enabled by using the completely deterministic two-stacks sliding window minimum algorithm, which seems not to have been used before for finding minimizers. Results. Our library is up to 6.8× faster than a scalar implementation of the rescan method when w = 5 is small, and 3.4× faster for larger w = 19. Computing canonical minimizers is less than 50% slower than computing forward minimizers, and over 15× faster than the existing implementation in the minimizer-iter crate. Our library finds all (canonical) minimizers of a 3.2 Gbp human genome in 5.2 (resp. 6.7) seconds.

Cite as

Ragnar Groot Koerkamp and Igor Martayan. SimdMinimizers: Computing Random Minimizers, fast. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{grootkoerkamp_et_al:LIPIcs.SEA.2025.20,
  author =	{Groot Koerkamp, Ragnar and Martayan, Igor},
  title =	{{SimdMinimizers: Computing Random Minimizers, fast}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.20},
  URN =		{urn:nbn:de:0030-drops-232581},
  doi =		{10.4230/LIPIcs.SEA.2025.20},
  annote =	{Keywords: Minimizers, Randomized algorithms, Sketching, Hashing}
}
Document
A Comparative Study of Compressed, Learned, and Traditional Indexing Methods for Integer Data

Authors: Lorenzo Bellomo, Giuseppe Cianci, Luca de Rosa, Paolo Ferragina, and Mattia Odorisio

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
The rapid evolution of learned data structures has revolutionized database indexing, particularly for sorted integer datasets. While learned indexes excel in static scenarios due to their low memory footprint, reduced storage requirements, and fast lookup times, benchmarks like SOSD and TLI have largely overlooked compressed indexes and SIMD-based implementations of traditional indexes. This paper addresses this gap by introducing a comprehensive benchmarking framework that (i) evaluates traditional, learned, and compressed indexes across 12 datasets (real and synthetic) of varying types and sizes; (ii) integrates state-of-the-art SIMD-enhanced B-Tree variants; and (iii) measures critical performance metrics such as memory usage, construction time, and lookup efficiency. Our findings reveal that while learned indexes minimize memory usage, a feature useful when internal memory constraints are mandatory, SIMD-enhanced B-Trees consistently achieve superior lookup times with comparable extra space. On the other hand, compressed indexes like LA-vector and EliasFano provide very effective compression of the indexed data with slower access speeds (2x-3x). Another contribution of this paper is a publicly available benchmarking framework (composed of code and datasets) that makes our experiments reproducible and extensible to other indexes and datasets.

Cite as

Lorenzo Bellomo, Giuseppe Cianci, Luca de Rosa, Paolo Ferragina, and Mattia Odorisio. A Comparative Study of Compressed, Learned, and Traditional Indexing Methods for Integer Data. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bellomo_et_al:LIPIcs.SEA.2025.5,
  author =	{Bellomo, Lorenzo and Cianci, Giuseppe and de Rosa, Luca and Ferragina, Paolo and Odorisio, Mattia},
  title =	{{A Comparative Study of Compressed, Learned, and Traditional Indexing Methods for Integer Data}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{5:1--5:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.5},
  URN =		{urn:nbn:de:0030-drops-232439},
  doi =		{10.4230/LIPIcs.SEA.2025.5},
  annote =	{Keywords: indexing data structures, compression, algorithm engineering, benchmark}
}
Document
Continuous Map Matching to Paths Under Travel Time Constraints

Authors: Yannick Bosch and Sabine Storandt

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
In this paper, we study the problem of map matching with travel time constraints. Given a sequence of k spatio-temporal measurements and an embedded path graph with travel time costs, the goal is to snap each measurement to a close-by location in the graph, such that consecutive locations can be reached from one another along the path within the timestamp difference of the respective measurements. This problem arises in public transit data processing as well as in map matching of movement trajectories to general graphs. We show that the classical approach for this problem, which relies on selecting a finite set of candidate locations in the graph for each measurement, cannot guarantee to find a consistent solution. We propose a new algorithm that can deal with an infinite set of candidate locations per measurement. We prove that our algorithm always detects a consistent map matching path (if one exists). Despite the enlarged candidate set, we also demonstrate that our algorithm has superior running time in theory and practice. For a path graph with n nodes, we show that our algorithm runs in 𝒪(k² n log {nk}) and under mild assumptions in 𝒪(k n ^λ + n log³ n) for λ ≈ 0.695. This is a significant improvement over the baseline, which runs in 𝒪(k n²) and which might not even identify a correct solution. The performance of our algorithm hinges on an efficient segment-circle intersection data structure. We describe how to design and implement such a data structure for our application. In the experimental evaluation, we demonstrate the usefulness of our novel algorithm on a diverse set of generated measurements as well as GTFS data.

Cite as

Yannick Bosch and Sabine Storandt. Continuous Map Matching to Paths Under Travel Time Constraints. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bosch_et_al:LIPIcs.SEA.2025.7,
  author =	{Bosch, Yannick and Storandt, Sabine},
  title =	{{Continuous Map Matching to Paths Under Travel Time Constraints}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{7:1--7:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.7},
  URN =		{urn:nbn:de:0030-drops-232457},
  doi =		{10.4230/LIPIcs.SEA.2025.7},
  annote =	{Keywords: Map matching, Travel time, Segment-circle intersection data structure}
}
Document
LoRaHART: Hardware-Aware Real-Time Scheduling for LoRa

Authors: Soumya Ranjan Sahoo, Amalinda Gamage, Niraj Kumar, and Arvind Easwaran

Published in: LIPIcs, Volume 335, 37th Euromicro Conference on Real-Time Systems (ECRTS 2025)


Abstract
Time-sensitive data acquisition is critical for many Low-Power Wide-Area Network (LPWAN) applications, such as healthcare monitoring and industrial Internet of Things. Among the available LPWAN technologies, LoRa (Long Range) has emerged as a leading choice, offering kilometer-scale communication with minimal power consumption and enabling high-density deployments across large areas. However, the conventional ALOHA-based Medium Access Control (MAC) in LoRa is not designed to support real-time communication over large-scale networks. This paper introduces LoRaHART, a novel approach that overcomes two critical, under-explored limitations in Commercial Off The Shelf (COTS) LoRa gateways that impact real-time performance. LoRa gateways have limited capacity for demodulation of parallel transmissions and their antenna can either transmit or receive at any time instant. LoRaHART incorporates a hardware-aware super-frame structure, comprising both Time Division Multiple Access (TDMA) slots as well as opportunistic retransmissions using Carrier Sense Multiple Access (CSMA), designed to mitigate the above constraints. We use a partial packing and makespan minimization algorithm to schedule periodic real-time transmissions efficiently within the TDMA slots, and also develop a probabilistic node contention model for CSMA retransmissions, providing analytical guarantees for deadline satisfaction under ideal channel conditions. Our evaluation of LoRaHART on a 40-node LoRa testbed demonstrates significant improvements over existing solutions in practice, achieving an average Packet Reception Ratio of 98% and a 45% higher airtime utilization than the best performing baseline.

Cite as

Soumya Ranjan Sahoo, Amalinda Gamage, Niraj Kumar, and Arvind Easwaran. LoRaHART: Hardware-Aware Real-Time Scheduling for LoRa. In 37th Euromicro Conference on Real-Time Systems (ECRTS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 335, pp. 10:1-10:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{sahoo_et_al:LIPIcs.ECRTS.2025.10,
  author =	{Sahoo, Soumya Ranjan and Gamage, Amalinda and Kumar, Niraj and Easwaran, Arvind},
  title =	{{LoRaHART: Hardware-Aware Real-Time Scheduling for LoRa}},
  booktitle =	{37th Euromicro Conference on Real-Time Systems (ECRTS 2025)},
  pages =	{10:1--10:28},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-377-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{335},
  editor =	{Mancuso, Renato},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2025.10},
  URN =		{urn:nbn:de:0030-drops-235880},
  doi =		{10.4230/LIPIcs.ECRTS.2025.10},
  annote =	{Keywords: LoRa, LPWAN, Real-time Scheduling, Hardware Constraints}
}
Document
Research
CoaKG: A Contextualized Knowledge Graph Approach for Exploratory Search and Decision Making

Authors: Veronica dos Santos, Daniel Schwabe, Altigran Soares da Silva, and Sérgio Lifschitz

Published in: TGDK, Volume 3, Issue 1 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 1


Abstract
In decision-making scenarios, an information need arises due to a knowledge gap when a decision-maker needs more knowledge to make a decision. Users may take the initiative to acquire knowledge to fill this gap through exploratory search approaches using Knowledge Graphs (KGs) as information sources, but their queries can be incomplete, inaccurate, and ambiguous. Although KGs have great potential for exploratory search, they are incomplete by nature. Besides, for both Crowd-sourced KGs and KGs constructed by integrating several different information sources of varying quality to be effectively consumed, there is a need for a Trust Layer. Our research aims to enrich and allow querying KGs to support context-aware exploration in decision-making scenarios. We propose a layered architecture for Context Augmented Knowledge Graphs-based Decision Support Systems with a Knowledge Layer that operates under a Dual Open World Assumption (DOWA). Under DOWA, the evaluation of the truthfulness of the information obtained from KGs depends on the context of its claims and the tasks carried out or intended (purpose). The Knowledge Layer comprises a Context Augmented KG (CoaKG) and a CoaKG Query Engine. The CoaKG contains contextual mappings to identify explicit context and rules to infer implicit context. The CoaKG Query Engine is designed as a query-answering approach that retrieves all contextualized answers from the CoaKG. A Proof of Concept (PoC) based on Wikidata was developed to evaluate the effectiveness of the Knowledge Layer.

Cite as

Veronica dos Santos, Daniel Schwabe, Altigran Soares da Silva, and Sérgio Lifschitz. CoaKG: A Contextualized Knowledge Graph Approach for Exploratory Search and Decision Making. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 1, pp. 4:1-4:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Article{dossantos_et_al:TGDK.3.1.4,
  author =	{dos Santos, Veronica and Schwabe, Daniel and da Silva, Altigran Soares and Lifschitz, S\'{e}rgio},
  title =	{{CoaKG: A Contextualized Knowledge Graph Approach for Exploratory Search and Decision Making}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{4:1--4:27},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.1.4},
  URN =		{urn:nbn:de:0030-drops-236685},
  doi =		{10.4230/TGDK.3.1.4},
  annote =	{Keywords: Knowledge Graphs, Context Search, Decision Support}
}
Document
Track A: Algorithms, Complexity and Games
q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations

Authors: Kiril Bangachev and S. Matthew Weinberg

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
For a set M of m elements, we define a decreasing chain of classes of normalized monotone-increasing valuation functions from 2^M to ℝ_{≥ 0}, parameterized by an integer q ∈ [2,m]. For a given q, we refer to the class as q-partitioning. A valuation function is subadditive if and only if it is 2-partitioning, and fractionally subadditive if and only if it is m-partitioning. Thus, our chain establishes an interpolation between subadditive and fractionally subadditive valuations. We show that this interpolation is smooth (q-partitioning valuations are "nearly" (q-1)-partitioning in a precise sense, Theorem 6), interpretable (the definition arises by analyzing the core of a cost-sharing game, à la the Bondareva-Shapley Theorem for fractionally subadditive valuations, Section 3.1), and non-trivial (the class of q-partitioning valuations is distinct for all q, Proposition 3). For domains where provable separations exist between subadditive and fractionally subadditive, we interpolate the stronger guarantees achievable for fractionally subadditive valuations to all q ∈ {2,…, m}. Two highlights are the following: 1) An Ω ((log log q)/(log log m))-competitive posted price mechanism for q-partitioning valuations. Note that this matches asymptotically the state-of-the-art for both subadditive (q = 2) [Paul Dütting et al., 2020], and fractionally subadditive (q = m) [Feldman et al., 2015]. 2) Two upper-tail concentration inequalities on 1-Lipschitz, q-partitioning valuations over independent items. One extends the state-of-the-art for q = m to q < m, the other improves the state-of-the-art for q = 2 for q > 2. Our concentration inequalities imply several corollaries that interpolate between subadditive and fractionally subadditive, for example: 𝔼[v(S)] ≤ (1 + 1/log q)Median[v(S)] + O(log q). To prove this, we develop a new isoperimetric inequality using Talagrand’s method of control by q points, which may be of independent interest. We also discuss other probabilistic inequalities and game-theoretic applications of q-partitioning valuations, and connections to subadditive MPH-k valuations [Tomer Ezra et al., 2019].

Cite as

Kiril Bangachev and S. Matthew Weinberg. q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bangachev_et_al:LIPIcs.ICALP.2025.18,
  author =	{Bangachev, Kiril and Weinberg, S. Matthew},
  title =	{{q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.18},
  URN =		{urn:nbn:de:0030-drops-233956},
  doi =		{10.4230/LIPIcs.ICALP.2025.18},
  annote =	{Keywords: Subadditive Functions, Fractionally Subadditive Functions, Posted Price Mechanisms, Concentration Inequalities}
}
Document
Track A: Algorithms, Complexity and Games
Even Faster Algorithm for the Chamfer Distance

Authors: Ying Feng and Piotr Indyk

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
For two d-dimensional point sets A,B of size up to n, the Chamfer distance from A to B is defined as CH(A,B) = ∑_{a ∈ A} min_{b ∈ B} ‖a-b‖. The Chamfer distance is a widely used measure for quantifying dissimilarity between sets of points, used in many machine learning and computer vision applications. A recent work of Bakshi et al, NeuriPS'23, gave the first near-linear time (1+ε)-approximate algorithm, with a running time of 𝒪(nd log (n)/ε²). In this paper we improve the running time further, to 𝒪(nd(log log n+log1/(ε))/ε²)). When ε is a constant, this reduces the gap between the upper bound and the trivial Ω(dn) lower bound significantly, from 𝒪(log n) to 𝒪(log log n).

Cite as

Ying Feng and Piotr Indyk. Even Faster Algorithm for the Chamfer Distance. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 76:1-76:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ICALP.2025.76,
  author =	{Feng, Ying and Indyk, Piotr},
  title =	{{Even Faster Algorithm for the Chamfer Distance}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{76:1--76:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.76},
  URN =		{urn:nbn:de:0030-drops-234531},
  doi =		{10.4230/LIPIcs.ICALP.2025.76},
  annote =	{Keywords: Chamfer distance}
}
Document
Track A: Algorithms, Complexity and Games
Faster Semi-Streaming Matchings via Alternating Trees

Authors: Slobodan Mitrović, Anish Mukherjee, Piotr Sankowski, and Wen-Horng Sheu

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
We design a deterministic algorithm for the (1+ε)-approximate maximum matching problem. Our primary result demonstrates that this problem can be solved in O(ε^{-6}) semi-streaming passes, improving upon the O(ε^{-19}) pass-complexity algorithm by [Fischer, Mitrović, and Uitto, STOC'22]. This contributes substantially toward resolving Open question 2 from [Assadi, SOSA'24]. Leveraging the framework introduced in [FMU'22], our algorithm achieves an analogous round complexity speed-up for computing a (1+ε)-approximate maximum matching in both the Massively Parallel Computation (MPC) and CONGEST models. The data structures maintained by our algorithm are formulated using blossom notation and represented through alternating trees. This approach enables a simplified correctness analysis by treating specific components as if operating on bipartite graphs, effectively circumventing certain technical intricacies present in prior work.

Cite as

Slobodan Mitrović, Anish Mukherjee, Piotr Sankowski, and Wen-Horng Sheu. Faster Semi-Streaming Matchings via Alternating Trees. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 119:1-119:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{mitrovic_et_al:LIPIcs.ICALP.2025.119,
  author =	{Mitrovi\'{c}, Slobodan and Mukherjee, Anish and Sankowski, Piotr and Sheu, Wen-Horng},
  title =	{{Faster Semi-Streaming Matchings via Alternating Trees}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{119:1--119:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.119},
  URN =		{urn:nbn:de:0030-drops-234965},
  doi =		{10.4230/LIPIcs.ICALP.2025.119},
  annote =	{Keywords: streaming algorithms, approximation algorithms, maximum matching}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
Bayesian Inference in Quantum Programs

Authors: Christina Gehnen, Dominique Unruh, and Joost-Pieter Katoen

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
Conditioning is a key feature in probabilistic programming to enable modeling the influence of data (also known as observations) to the probability distribution described by such programs. Determining the posterior distribution is also known as Bayesian inference. This paper equips a quantum while-language with conditioning, defines its denotational and operational semantics over infinite-dimensional Hilbert spaces, and shows their equivalence. We provide sufficient conditions for the existence of weakest (liberal) precondition-transformers and derive inductive characterizations of these transformers. It is shown how w(l)p-transformers can be used to assess the effect of Bayesian inference on (possibly diverging) quantum programs.

Cite as

Christina Gehnen, Dominique Unruh, and Joost-Pieter Katoen. Bayesian Inference in Quantum Programs. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 157:1-157:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gehnen_et_al:LIPIcs.ICALP.2025.157,
  author =	{Gehnen, Christina and Unruh, Dominique and Katoen, Joost-Pieter},
  title =	{{Bayesian Inference in Quantum Programs}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{157:1--157:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.157},
  URN =		{urn:nbn:de:0030-drops-235345},
  doi =		{10.4230/LIPIcs.ICALP.2025.157},
  annote =	{Keywords: Quantum Program Logics, Weakest Preconditions, Bayesian Inference, Program Semantics}
}
  • Refine by Type
  • 65 Document/PDF
  • 51 Document/HTML

  • Refine by Publication Year
  • 40 2025
  • 11 2024
  • 9 2023
  • 1 2022
  • 2 2018
  • Show More...

  • Refine by Author
  • 3 Biswas, Russa
  • 3 de Melo, Gerard
  • 2 Allen, Bradley P.
  • 2 Carmel, Amir
  • 2 Chen, Jiaoyan
  • Show More...

  • Refine by Series/Journal
  • 41 LIPIcs
  • 1 OASIcs
  • 3 LITES
  • 19 TGDK
  • 1 DagSemProc

  • Refine by Classification
  • 5 Theory of computation → Pseudorandomness and derandomization
  • 4 Computing methodologies → Knowledge representation and reasoning
  • 4 Information systems → Graph-based database models
  • 3 Computing methodologies → Machine learning
  • 3 Computing methodologies → Natural language processing
  • Show More...

  • Refine by Keyword
  • 3 Clustering
  • 3 Knowledge Graphs
  • 3 Large Language Models
  • 2 Affine
  • 2 Explainable AI
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail