15 Search Results for "Wang, Pengfei"


Document
Fast Kd-Trees for the Kullback-Leibler Divergence and Other Decomposable Bregman Divergences

Authors: Tuyen Pham and Hubert Wagner

Published in: LIPIcs, Volume 349, 19th International Symposium on Algorithms and Data Structures (WADS 2025)


Abstract
The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to ℝ^d with the distance measured by an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences. The implementation supports the Kullback-Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms. Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a linear search, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods.

Cite as

Tuyen Pham and Hubert Wagner. Fast Kd-Trees for the Kullback-Leibler Divergence and Other Decomposable Bregman Divergences. In 19th International Symposium on Algorithms and Data Structures (WADS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 349, pp. 45:1-45:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{pham_et_al:LIPIcs.WADS.2025.45,
  author =	{Pham, Tuyen and Wagner, Hubert},
  title =	{{Fast Kd-Trees for the Kullback-Leibler Divergence and Other Decomposable Bregman Divergences}},
  booktitle =	{19th International Symposium on Algorithms and Data Structures (WADS 2025)},
  pages =	{45:1--45:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-398-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{349},
  editor =	{Morin, Pat and Oh, Eunjin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WADS.2025.45},
  URN =		{urn:nbn:de:0030-drops-242766},
  doi =		{10.4230/LIPIcs.WADS.2025.45},
  annote =	{Keywords: Kd-tree, k-d tree, nearest neighbour search, Bregman divergence, decomposable Bregman divergence, KL divergence, relative entropy, cross entropy, Shannon’s entropy}
}
Document
Temporal Explorability Games

Authors: Pete Austin, Sougata Bose, Nicolas Mazzocchi, and Patrick Totzke

Published in: LIPIcs, Volume 348, 36th International Conference on Concurrency Theory (CONCUR 2025)


Abstract
Temporal graphs extend ordinary graphs with discrete time that affects the availability of edges. We consider solving games played on temporal graphs where one player aims to explore the graph, i.e., visit all vertices. The complexity depends majorly on two factors: the presence of an adversary and how edge availability is specified. We demonstrate that on static graphs, where edges are always available, solving explorability games is just as hard as solving reachability games. In contrast, on temporal graphs, the complexity of explorability coincides with generalized reachability (NP-complete for one-player and PSPACE-complete for two player games). We show that if temporal graphs are given symbolically, even one-player reachability (and thus explorability and generalized reachability) games are PSPACE-hard. For one player, all these are also solvable in PSPACE and for two players, they are in PSPACE, EXP and EXP, respectively.

Cite as

Pete Austin, Sougata Bose, Nicolas Mazzocchi, and Patrick Totzke. Temporal Explorability Games. In 36th International Conference on Concurrency Theory (CONCUR 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 348, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{austin_et_al:LIPIcs.CONCUR.2025.7,
  author =	{Austin, Pete and Bose, Sougata and Mazzocchi, Nicolas and Totzke, Patrick},
  title =	{{Temporal Explorability Games}},
  booktitle =	{36th International Conference on Concurrency Theory (CONCUR 2025)},
  pages =	{7:1--7:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-389-8},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{348},
  editor =	{Bouyer, Patricia and van de Pol, Jaco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2025.7},
  URN =		{urn:nbn:de:0030-drops-239575},
  doi =		{10.4230/LIPIcs.CONCUR.2025.7},
  annote =	{Keywords: Temporal Graphs, Explorability, Reachability, Games}
}
Document
FM-Adaptive: A Practical Data-Aware FM-Index

Authors: Hongwei Huo, Zongtao He, Pengfei Liu, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The FM-index provides an important solution for efficient retrieval and search in textual big data. Its variants have been widely used in many fields including information retrieval, genome analysis, and web searching. In this paper, we propose improvements via a new compressed representation of the wavelet tree of the Burrows-Wheeler transform of the input text, which incorporates the gap γ-encoding. Our theoretical analysis shows that the new index, called FM-Adaptive, achieves asymptotic space optimality within a factor of 2 in the leading term, but it has a better compression and faster retrieval in practice than the competitive optimal compression boosting used in previous FM-indexes. We present a practical improved locate algorithm that provides substantially faster locating time based upon memoization, which takes advantage of the overlapping subproblems property. We design the lookup table for accelerated decoding to support fast pattern matching in a text. Extensive experiments demonstrate that FM-Adaptive provides faster query performance, often by a considerable amount, and/or comparable or better compression than other state-of-the-art FM-index methods.

Cite as

Hongwei Huo, Zongtao He, Pengfei Liu, and Jeffrey Scott Vitter. FM-Adaptive: A Practical Data-Aware FM-Index. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{huo_et_al:OASIcs.Manzini.5,
  author =	{Huo, Hongwei and He, Zongtao and Liu, Pengfei and Vitter, Jeffrey Scott},
  title =	{{FM-Adaptive: A Practical Data-Aware FM-Index}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{5:1--5:23},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.5},
  URN =		{urn:nbn:de:0030-drops-239139},
  doi =		{10.4230/OASIcs.Manzini.5},
  annote =	{Keywords: Text indexing, Burrows-Wheeler transform, Compressed wavelet trees, Entropy-compressed, Compressed data structures}
}
Document
Algorithms for Computing Very Large BWTs: a Short Survey

Authors: Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Burrows-Wheeler Transform (BWT) is a fundamental string transformation that, although initially introduced for data compression, has been extensively utilized across various domains, including text indexing and pattern matching within large datasets. Although the BWT construction is linear, the constants make the task impractical for large datasets, and as highlighted by Ferragina et al. [Paolo Ferragina et al., 2012], "to use it, one must first build it!". Thus, the construction of the BWT remains a significant challenge. For these reasons, during the past three decades there has been a succession of new algorithms for its construction using techniques that work in external memory or that use text compression. In this survey, we revise some of the most important advancements and tools presented in the past years for computing large BWTs exploiting external memory or text compression approaches without using additional information about the data.

Cite as

Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone. Algorithms for Computing Very Large BWTs: a Short Survey. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 7:1-7:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}
Document
Bridging Language Models and Symbolic Solvers via the Model Context Protocol

Authors: Stefan Szeider

Published in: LIPIcs, Volume 341, 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)


Abstract
This paper presents the MCP Solver, a system that bridges large language models with symbolic solvers through the Model Context Protocol (MCP). The system includes a server and a client component. The server provides an interface to constraint programming (via MiniZinc Python), propositional satisfiability and maximum satisfiability (both via PySAT), and SAT modulo Theories (via Python Z3). The client contains an agent that connects to the server via MCP and uses a language model to autonomously translate problem statements (given in English) into encodings through an incremental editing process and runs the solver. Our experiments demonstrate that this neurosymbolic integration effectively combines the natural language understanding of language models with robust solving capabilities across multiple solving paradigms.

Cite as

Stefan Szeider. Bridging Language Models and Symbolic Solvers via the Model Context Protocol. In 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 341, pp. 30:1-30:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{szeider:LIPIcs.SAT.2025.30,
  author =	{Szeider, Stefan},
  title =	{{Bridging Language Models and Symbolic Solvers via the Model Context Protocol}},
  booktitle =	{28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)},
  pages =	{30:1--30:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-381-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{341},
  editor =	{Berg, Jeremias and Nordstr\"{o}m, Jakob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2025.30},
  URN =		{urn:nbn:de:0030-drops-237649},
  doi =		{10.4230/LIPIcs.SAT.2025.30},
  annote =	{Keywords: Large Language Models, Agents, Constraint Programming, Satisfiability Solvers, Maximum Satisfiability, SAT Modulo Theories, Model Context Protocol}
}
Document
From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education

Authors: Miguel Da Corte and Jorge Baptista

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)


Abstract
Accurate text classification and placement remain challenges in U.S. higher education, with traditional automated systems like Accuplacer functioning as "black-box" models with limited assessment transparency. This study evaluates Large Language Models (LLMs) as complementary placement tools by comparing their classification performance against a human-rated gold standard and Accuplacer. A 450-essay corpus was classified using Claude, Gemini, GPT-3.5-turbo, and GPT-4o across four prompting strategies: Zero-shot, Few-shot, Enhanced, and Enhanced+ (definitions with examples). Two classification approaches were tested: (i) a 1-step, 3 class classification task, distinguishing DevEd Level 1, DevEd Level 2, and College-level texts in one single run; and (ii) a 2-step classification task, first separating College vs. Non-College texts before further classifying Non-College texts into DevEd sublevels. The results show that structured prompt refinement improves the precision of LLMs' classification, with Claude Enhanced + achieving 62.22% precision (1 step) and Gemini Enhanced + reaching 69.33% (2 step), both surpassing Accuplacer (58.22%). Gemini and Claude also demonstrated strong correlation with human ratings, with Claude achieving the highest Pearson scores (ρ = 0.75; 1-step, ρ = 0.73; 2-step) vs. Accuplacer (ρ = 0.67). While LLMs show promise for DevEd placement, their precision remains a work in progress, highlighting the need for further refinement and safeguards to ensure ethical and equitable placement.

Cite as

Miguel Da Corte and Jorge Baptista. From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education. In 14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 1:1-1:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{dacorte_et_al:OASIcs.SLATE.2025.1,
  author =	{Da Corte, Miguel and Baptista, Jorge},
  title =	{{From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{1:1--1:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.1},
  URN =		{urn:nbn:de:0030-drops-236817},
  doi =		{10.4230/OASIcs.SLATE.2025.1},
  annote =	{Keywords: Large Language Models (LLMs), Developmental Education (DevEd), writing assessment, text classification, English writing proficiency}
}
Document
Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Authors: Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)


Abstract
The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch-and-bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-served" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problems and so will not lead to a performance degradation. Specifically, Oliva has two variants, including Oliva^GR, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and Oliva^SA, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR-10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25× in MNIST, and up to 80× in CIFAR-10.

Cite as

Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui. Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 36:1-36:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ECOOP.2025.36,
  author =	{Zhang, Guanqin and Fukuda, Kota and Zhang, Zhenya and Bandara, H.M.N. Dilum and Chen, Shiping and Zhao, Jianjun and Sui, Yulei},
  title =	{{Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{36:1--36:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.36},
  URN =		{urn:nbn:de:0030-drops-233281},
  doi =		{10.4230/LIPIcs.ECOOP.2025.36},
  annote =	{Keywords: neural network verification, branch and bound, counterexample potentiality, simulated annealing, stochastic optimization}
}
Document
Profile-Guided Field Externalization in an Ahead-Of-Time Compiler

Authors: Sebastian Kloibhofer, Lukas Makor, Peter Hofer, David Leopoldseder, and Hanspeter Mössenböck

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)


Abstract
Field externalization is a technique to reduce the footprint of objects by removing fields that most frequently contain zero or null. While researchers have developed ways to bring this optimization into the Java world, these have been limited to research compilers or virtual machines for embedded systems. In this work, we present a novel field externalization technique that uses information from static analysis and profiling to determine externalizable fields. During compilation, we remove those fields and define companion classes. These are used in case of non-default-value writes to the externalized fields. Our approach also correctly handles synchronization to prevent issues in multithreaded environments. We integrated our approach into the modern Java ahead-of-time compiler GraalVM Native Image. We conducted an evaluation on a diverse set of benchmarks that includes standard and microservice-based benchmarks. For standard benchmarks, our approach reduces the total allocated bytes by 2.76% and the maximum resident set size (max-RSS) by 2.55%. For microservice benchmarks, we achieved a reduction of 6.88% for normalized allocated bytes and 2.45% for max-RSS. We computed these improvements via the geometric mean. The median reductions are are 1.46% (alloc. bytes) and 0.22% (max-RSS) in standard benchmarks, as well as 3.63% (alloc. bytes) and 0.20% (max-RSS) in microservice benchmarks.

Cite as

Sebastian Kloibhofer, Lukas Makor, Peter Hofer, David Leopoldseder, and Hanspeter Mössenböck. Profile-Guided Field Externalization in an Ahead-Of-Time Compiler. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 19:1-19:32, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kloibhofer_et_al:LIPIcs.ECOOP.2025.19,
  author =	{Kloibhofer, Sebastian and Makor, Lukas and Hofer, Peter and Leopoldseder, David and M\"{o}ssenb\"{o}ck, Hanspeter},
  title =	{{Profile-Guided Field Externalization in an Ahead-Of-Time Compiler}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{19:1--19:32},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.19},
  URN =		{urn:nbn:de:0030-drops-233121},
  doi =		{10.4230/LIPIcs.ECOOP.2025.19},
  annote =	{Keywords: compilation, instrumentation, profiling, fields, externalization, memory footprint reduction, memory footprint optimization}
}
Document
FL-RMQ: A Learned Approach to Range Minimum Queries

Authors: Paolo Ferragina and Filippo Lari

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
We address the problem of designing and implementing a data structure for the Range Minimum Query problem. We show a surprising connection between this classical problem and the geometry of a properly defined set of points in the Cartesian plane. Building on this insight, we hinge upon a well-known result in Computational Geometry to introduce the first RMQ solution that exploits (i.e., learns) the distribution of such 2D-points via proper error-bounded linear approximations. Because of these features, we name the resulting data structure: Fully-Learned RMQ, shortly FL-RMQ. We prove theoretical bounds for its space usage and query time, covering both worst-case scenarios and average-case performance for uniformly distributed inputs. These bounds compare favorably with the ones achievable by the best-known indexing solutions (i.e., the ones that allow access to the indexed array), especially when the input data follow some geometric regularities that we characterize in the paper, thus providing principled evidence of FL-RMQ being a novel data-aware solution to the RMQ problem. We corroborate our theoretical findings with a wide set of experiments showing that FL-RMQ offers more robust space-time trade-offs than the other known practical indexing solutions on both artificial and real-world datasets. We believe that our novel approach to the RMQ problem is noteworthy not only for its interesting space-time trade-offs, but also because it is flexible enough to be applied easily to the encoding variant of RMQ (i.e., the one that does not allow access to the indexed array), and moreover, because it paves the way to research opportunities on possibly other problems.

Cite as

Paolo Ferragina and Filippo Lari. FL-RMQ: A Learned Approach to Range Minimum Queries. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 7:1-7:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:LIPIcs.CPM.2025.7,
  author =	{Ferragina, Paolo and Lari, Filippo},
  title =	{{FL-RMQ: A Learned Approach to Range Minimum Queries}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{7:1--7:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.7},
  URN =		{urn:nbn:de:0030-drops-231014},
  doi =		{10.4230/LIPIcs.CPM.2025.7},
  annote =	{Keywords: Range-Minimum query, Learned data structures, Compact data structures, Experimental results}
}
Document
Model Ensembling for Constrained Optimization

Authors: Ira Globus Harris, Varun Gupta, Michael Kearns, and Aaron Roth

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)


Abstract
Many instances of decision making under objective uncertainty can be decomposed into two steps: predicting the objective function and then optimizing for the best feasible action under the estimate of the objective vector. We study the problem of ensembling models for optimization of uncertain linear objectives under arbitrary constraints. We imagine we are given a collection of predictive models mapping a feature space to multi-dimensional real-valued predictions, which form the coefficients of a linear objective that we would like to optimize. We give two ensembling methods that can provably result in transparent decisions that strictly improve on all initial policies. The first method operates in the "white box" setting in which we have access to the underlying prediction models and the second in the "black box" setting in which we only have access to the induced decisions (in the downstream optimization problem) of the constituent models, but not their underlying point predictions. They are transparent or trustworthy in the sense that the user can reliably predict long-term ensemble rewards even if the instance by instance predictions are imperfect.

Cite as

Ira Globus Harris, Varun Gupta, Michael Kearns, and Aaron Roth. Model Ensembling for Constrained Optimization. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{globusharris_et_al:LIPIcs.FORC.2025.14,
  author =	{Globus Harris, Ira and Gupta, Varun and Kearns, Michael and Roth, Aaron},
  title =	{{Model Ensembling for Constrained Optimization}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.14},
  URN =		{urn:nbn:de:0030-drops-231412},
  doi =		{10.4230/LIPIcs.FORC.2025.14},
  annote =	{Keywords: model ensembling, trustworthy AI, decision-making under uncertainty}
}
Document
Repairing Databases over Metric Spaces with Coincidence Constraints

Authors: Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)


Abstract
Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a vector space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as coincidence constraints, which include unary key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm solving the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining our algorithm for tree metrics and a classic result on probabilistic tree embeddings, we design a (high probability) logarithmic-ratio approximation for general metrics. We also study the variant of the problem where we limit the allowed change of each individual value. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and we present a polynomial-time repairing algorithm for the case of a line metric.

Cite as

Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc. Repairing Databases over Metric Spaces with Coincidence Constraints. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kaminsky_et_al:LIPIcs.ICDT.2025.14,
  author =	{Kaminsky, Youri and Kimelfeld, Benny and Livshits, Ester and Naumann, Felix and Wajc, David},
  title =	{{Repairing Databases over Metric Spaces with Coincidence Constraints}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.14},
  URN =		{urn:nbn:de:0030-drops-229554},
  doi =		{10.4230/LIPIcs.ICDT.2025.14},
  annote =	{Keywords: Database repairs, metric spaces, coincidence constraints, inclusion constraints, foreign-key constraints}
}
Document
Resource Paper
MELArt: A Multimodal Entity Linking Dataset for Art

Authors: Alejandro Sierra-Múnera, Linh Le, Gianluca Demartini, and Ralf Krestel

Published in: TGDK, Volume 2, Issue 2 (2024): Special Issue on Resources for Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 2, Issue 2


Abstract
Traditional named entity linking (NEL) tools have largely employed a general-domain approach, spanning across various entity types such as persons, organizations, locations, and events in a multitude of contexts. While multimodal entity linking datasets exist (e.g., disambiguation of person names with the help of photographs), there is a need to develop domain-specific resources that represent the unique challenges present in domains like cultural heritage (e.g., stylistic changes through time, diversity of social and political context). To address this gap, our work presents a novel multimodal entity linking benchmark dataset for the art domain together with a comprehensive experimental evaluation of existing NEL methods on this new dataset. The dataset encapsulates various entities unique to the art domain. During the dataset creation process, we also adopt manual human evaluation, providing high-quality labels for our dataset. We introduce an automated process that facilitates the generation of this art dataset, harnessing data from multiple sources (Artpedia, Wikidata and Wikimedia Commons) to ensure its reliability and comprehensiveness. Furthermore, our paper delineates best practices for the integration of art datasets, and presents a detailed performance analysis of general-domain entity linking systems, when applied to domain-specific datasets. Through our research, we aim to address the lack of datasets for NEL in the art domain, providing resources for the development of new, more nuanced, and contextually rich entity linking methods in the realm of art and cultural heritage.

Cite as

Alejandro Sierra-Múnera, Linh Le, Gianluca Demartini, and Ralf Krestel. MELArt: A Multimodal Entity Linking Dataset for Art. In Special Issue on Resources for Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 2, pp. 8:1-8:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Article{sierramunera_et_al:TGDK.2.2.8,
  author =	{Sierra-M\'{u}nera, Alejandro and Le, Linh and Demartini, Gianluca and Krestel, Ralf},
  title =	{{MELArt: A Multimodal Entity Linking Dataset for Art}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{8:1--8:22},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.2.8},
  URN =		{urn:nbn:de:0030-drops-225921},
  doi =		{10.4230/TGDK.2.2.8},
  annote =	{Keywords: A Multimodal Entity Linking Dataset, Named Entity Linking, Art Domain, Wikidata, Wikimedia, Artpedia}
}
Document
Vision
Knowledge Engineering Using Large Language Models

Authors: Bradley P. Allen, Lise Stork, and Paul Groth

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1


Abstract
Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.

Cite as

Bradley P. Allen, Lise Stork, and Paul Groth. Knowledge Engineering Using Large Language Models. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 3:1-3:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Article{allen_et_al:TGDK.1.1.3,
  author =	{Allen, Bradley P. and Stork, Lise and Groth, Paul},
  title =	{{Knowledge Engineering Using Large Language Models}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:19},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.3},
  URN =		{urn:nbn:de:0030-drops-194777},
  doi =		{10.4230/TGDK.1.1.3},
  annote =	{Keywords: knowledge engineering, large language models}
}
Document
Track A: Algorithms, Complexity and Games
Convergence of the Number of Period Sets in Strings

Authors: Eric Rivals, Michelle Sweering, and Pengfei Wang

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)


Abstract
Consider words of length n. The set of all periods of a word of length n is a subset of {0,1,2,…,n-1}. However, any subset of {0,1,2,…,n-1} is not necessarily a valid set of periods. In a seminal paper in 1981, Guibas and Odlyzko proposed to encode the set of periods of a word into an n long binary string, called an autocorrelation, where a one at position i denotes the period i. They considered the question of recognizing a valid period set, and also studied the number of valid period sets for strings of length n, denoted κ_n. They conjectured that ln(κ_n) asymptotically converges to a constant times ln²(n). Although improved lower bounds for ln(κ_n)/ln²(n) were proposed in 2001, the question of a tight upper bound has remained open since Guibas and Odlyzko’s paper. Here, we exhibit an upper bound for this fraction, which implies its convergence and closes this longstanding conjecture. Moreover, we extend our result to find similar bounds for the number of correlations: a generalization of autocorrelations which encodes the overlaps between two strings.

Cite as

Eric Rivals, Michelle Sweering, and Pengfei Wang. Convergence of the Number of Period Sets in Strings. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 100:1-100:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{rivals_et_al:LIPIcs.ICALP.2023.100,
  author =	{Rivals, Eric and Sweering, Michelle and Wang, Pengfei},
  title =	{{Convergence of the Number of Period Sets in Strings}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{100:1--100:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.100},
  URN =		{urn:nbn:de:0030-drops-181527},
  doi =		{10.4230/LIPIcs.ICALP.2023.100},
  annote =	{Keywords: Autocorrelation, period, border, combinatorics, correlation, periodicity, upper bound, asymptotic convergence}
}
Document
Micro- and Macroscopic Road Traffic Analysis using Drone Image Data

Authors: Friedrich Kruber, Eduardo Sánchez Morales, Robin Egolf, Jonas Wurst, Samarjit Chakraborty, and Michael Botsch

Published in: LITES, Volume 8, Issue 1 (2022): Special Issue on Embedded Systems for Computer Vision. Leibniz Transactions on Embedded Systems, Volume 8, Issue 1


Abstract
The current development in the drone technology, alongside with machine learning based image processing, open new possibilities for various applications. Thus, the market volume is expected to grow rapidly over the next years. The goal of this paper is to demonstrate the capabilities and limitations of drone based image data processing for the purpose of road traffic analysis. In the first part a method for generating microscopic traffic data is proposed. More precisely, the state of vehicles and the resulting trajectories are estimated. The method is validated by conducting experiments with reference sensors and proofs to achieve precise vehicle state estimation results. It is also shown, how the computational effort can be reduced by incorporating the tracking information into a neural network. A discussion on current limitations supplements the findings. By collecting a large number of vehicle trajectories, macroscopic statistics, such as traffic flow and density can be obtained from the data. In the second part, a publicly available drone based data set is analyzed to evaluate the suitability for macroscopic traffic modeling. The results show that the method is well suited for gaining detailed information about macroscopic statistics, such as traffic flow dependent time headway or lane change occurrences. In conclusion, this paper presents methods to exploit the remarkable opportunities of drone based image processing for joint macro- and microscopic traffic analysis.

Cite as

Friedrich Kruber, Eduardo Sánchez Morales, Robin Egolf, Jonas Wurst, Samarjit Chakraborty, and Michael Botsch. Micro- and Macroscopic Road Traffic Analysis using Drone Image Data. In LITES, Volume 8, Issue 1 (2022): Special Issue on Embedded Systems for Computer Vision. Leibniz Transactions on Embedded Systems, Volume 8, Issue 1, pp. 02:1-02:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@Article{kruber_et_al:LITES.8.1.2,
  author =	{Kruber, Friedrich and S\'{a}nchez Morales, Eduardo and Egolf, Robin and Wurst, Jonas and Chakraborty, Samarjit and Botsch, Michael},
  title =	{{Micro- and Macroscopic Road Traffic Analysis using Drone Image Data}},
  journal =	{Leibniz Transactions on Embedded Systems},
  pages =	{02:1--02:27},
  ISSN =	{2199-2002},
  year =	{2022},
  volume =	{8},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LITES.8.1.2},
  URN =		{urn:nbn:de:0030-drops-192898},
  doi =		{10.4230/LITES.8.1.2},
  annote =	{Keywords: traffic data analysis, trajectory data, drone image data}
}
  • Refine by Type
  • 15 Document/PDF
  • 13 Document/HTML

  • Refine by Publication Year
  • 11 2025
  • 1 2024
  • 2 2023
  • 1 2022

  • Refine by Author
  • 1 Allen, Bradley P.
  • 1 Austin, Pete
  • 1 Bandara, H.M.N. Dilum
  • 1 Baptista, Jorge
  • 1 Bose, Sougata
  • Show More...

  • Refine by Series/Journal
  • 9 LIPIcs
  • 3 OASIcs
  • 1 LITES
  • 2 TGDK

  • Refine by Classification
  • 2 Computing methodologies → Information extraction
  • 2 Computing methodologies → Machine learning
  • 2 Computing methodologies → Natural language processing
  • 2 Information systems → Information retrieval
  • 2 Theory of computation → Data compression
  • Show More...

  • Refine by Keyword
  • 2 Burrows-Wheeler transform
  • 1 A Multimodal Entity Linking Dataset
  • 1 Agents
  • 1 Art Domain
  • 1 Artpedia
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail