DROPS

Document

DOI: 10.4230/LIPIcs.SoCG.2026.39

Locality Sensitive Hashing in Hyperbolic Space

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, and Cheng Xin

Published in: LIPIcs, Volume 367, 42nd International Symposium on Computational Geometry (SoCG 2026)

Abstract

For a metric space (X, d), a family ℋ of locality sensitive hash functions is called (r, cr, p₁, p₂) sensitive if a randomly chosen function h ∈ ℋ has probability at least p₁ (at most p₂) to map any a, b ∈ X in the same hash bucket if d(a, b) ≤ r (or d(a, b) ≥ cr). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate nearest-neighbor search in high-dimensional spaces, and has been studied extensively for Hamming, Euclidean, and spherical geometries. An (r, cr, p₁, p₂)-sensitive hash function enables approximate nearest neighbor search (i.e., returning a point within distance cr from a query q if there exists a point within distance r from q) with space O(n^{1+ρ}) and query time O(n^ρ) where ρ = (log 1/p₁)/(log 1/p₂). But LSH for hyperbolic spaces ℍ^d remains largely unexplored. In this work, we present the first LSH construction native to hyperbolic space. For the hyperbolic plane (d = 2), we show a construction achieving ρ ≤ 1/c, based on the hyperplane rounding scheme. For general hyperbolic spaces (d ≥ 3), we use dimension reduction from ℍ^d to ℍ² and the 2D hyperbolic LSH to get ρ ≤ 1.59/c. On the lower bound side, we show that the lower bound on ρ of Euclidean LSH extends to the hyperbolic setting via local isometry, therefore giving ρ ≥ 1/c².

Cite as

Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, and Cheng Xin. Locality Sensitive Hashing in Hyperbolic Space. In 42nd International Symposium on Computational Geometry (SoCG 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 367, pp. 39:1-39:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{deng_et_al:LIPIcs.SoCG.2026.39,
  author =	{Deng, Chengyuan and Gao, Jie and Lu, Kevin and Luo, Feng and Xin, Cheng},
  title =	{{Locality Sensitive Hashing in Hyperbolic Space}},
  booktitle =	{42nd International Symposium on Computational Geometry (SoCG 2026)},
  pages =	{39:1--39:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-418-5},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{367},
  editor =	{Ahn, Hee-Kap and Hoffmann, Michael and Nayyeri, Amir},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2026.39},
  URN =		{urn:nbn:de:0030-drops-258454},
  doi =		{10.4230/LIPIcs.SoCG.2026.39},
  annote =	{Keywords: Locality Sensitive Hashing, Hyperbolic Geometry, Dimension Reduction, Approximate Nearest Neighbor Search}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2026.78

Mapping Chemical Space: Topological Data Analysis of Chemical Latent Space with Mapper

Authors: Dhruv Meduri, Chuan-Shen Hu, Cong Shen, Kelin Xia, and Bei Wang

Published in: LIPIcs, Volume 367, 42nd International Symposium on Computational Geometry (SoCG 2026)

Abstract

The vast chemical space, encompassing virtually innumerable molecules and materials, presents both immense opportunities and significant challenges. The design and discovery of novel drugs and functional materials may be viewed as a search within this space; however, the sheer scale of potential candidates renders exhaustive exploration infeasible. To address this, we introduce Chemical Mapper, a framework that integrates topological data analysis with deep learning to enable the visual exploration and analysis of chemical latent spaces. At its core, Chemical Mapper employs mapper, a widely used tool in topological data analysis, to investigate the organizational principles of chemical latent spaces defined by molecular representations learned by geometric deep learning models. In doing so, Chemical Mapper not only highlights groups of molecular representations but also uncovers the relationships among them through linkages and branching structures. Our results show that Chemical Mapper reveals intrinsic patterns associated with molecular scaffolds, functional groups, and chemical properties, as well as the structural and functional evolutions of the molecules.

Cite as

Dhruv Meduri, Chuan-Shen Hu, Cong Shen, Kelin Xia, and Bei Wang. Mapping Chemical Space: Topological Data Analysis of Chemical Latent Space with Mapper. In 42nd International Symposium on Computational Geometry (SoCG 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 367, pp. 78:1-78:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{meduri_et_al:LIPIcs.SoCG.2026.78,
  author =	{Meduri, Dhruv and Hu, Chuan-Shen and Shen, Cong and Xia, Kelin and Wang, Bei},
  title =	{{Mapping Chemical Space: Topological Data Analysis of Chemical Latent Space with Mapper}},
  booktitle =	{42nd International Symposium on Computational Geometry (SoCG 2026)},
  pages =	{78:1--78:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-418-5},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{367},
  editor =	{Ahn, Hee-Kap and Hoffmann, Michael and Nayyeri, Amir},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2026.78},
  URN =		{urn:nbn:de:0030-drops-258854},
  doi =		{10.4230/LIPIcs.SoCG.2026.78},
  annote =	{Keywords: Practice of computational topology, topological data analysis, applications in chemistry, mapper algorithm, high-dimensional data analysis, chemical spaces, geometric deep learning, latent space geometry}
}

@InProceedings{meduri_et_al:LIPIcs.SoCG.2026.78,
  author =	{Meduri, Dhruv and Hu, Chuan-Shen and Shen, Cong and Xia, Kelin and Wang, Bei},
  title =	{{Mapping Chemical Space: Topological Data Analysis of Chemical Latent Space with Mapper}},
  booktitle =	{42nd International Symposium on Computational Geometry (SoCG 2026)},
  pages =	{78:1--78:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-418-5},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{367},
  editor =	{Ahn, Hee-Kap and Hoffmann, Michael and Nayyeri, Amir},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2026.78},
  URN =		{urn:nbn:de:0030-drops-258854},
  doi =		{10.4230/LIPIcs.SoCG.2026.78},
  annote =	{Keywords: Practice of computational topology, topological data analysis, applications in chemistry, mapper algorithm, high-dimensional data analysis, chemical spaces, geometric deep learning, latent space geometry}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.17

An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT

Authors: Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

String matching problems in bioinformatics are typically for finding exact substring matches between a query and a reference text. Previous formulations often focus on maximum exact matches (MEMs). However, multiple occurrences of substrings of the query in the text that are long enough but not maximal may not be captured by MEMs. Such long matches can be informative, especially when the text is a collection of similar sequences such as genomes. In this paper, we describe a new type of match between a pattern and a text that aren't necessarily maximal in the query, but still contain useful matching information: locally maximal exact matches (LEMs). There are usually a large amount of LEMs, so we only consider those above some length threshold ℒ. These are referred to as long LEMs. The purpose of long LEMs is to capture substring matches between a query and a text that are not necessarily maximal in the pattern but still long enough to be important. Therefore efficient long LEMs finding algorithms are desired for these datasets. However, these datasets are too large to query on traditional string indexes. Fortunately, these datasets are very repetitive. Recently, compressed string indexes that take advantage of the redundancy in the data but retain efficient querying capability have been proposed as a solution. We therefore give an efficient algorithm for computing all the long LEMs of a query and a text in a BWT runs compressed string index. We describe an O(m+occ) expected time algorithm that relies on an O(r) words space string index for outputting all long LEMs of a pattern with respect to a text given the matching statistics of the pattern with respect to the text. Here m is the length of the query, occ is the number of long LEMs outputted, and r is the number of runs in the BWT of the text. The O(r) space string index we describe relies on an adaptation of the move data structure by Nishimoto and Tabei. We are able to support LCP[i] queries in constant time given SA[i]. In other words, we answer PLCP[i] queries in constant time. These PLCP queries enable the efficient long LEM query. Long LEMs may provide useful similarity information between a pattern and a text that MEMs may ignore. This information is particularly useful in pangenome and biobank scale haplotype panel contexts.

Cite as

Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang. An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 17:1-17:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{sanaullah_et_al:LIPIcs.WABI.2025.17,
  author =	{Sanaullah, Ahsan and Zhi, Degui and Zhang, Shaojie},
  title =	{{An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{17:1--17:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.17},
  URN =		{urn:nbn:de:0030-drops-239433},
  doi =		{10.4230/LIPIcs.WABI.2025.17},
  annote =	{Keywords: BWT, LEM, Long LEM, MEM, Run Length Compressed BWT, Move Data Structure, Pangenome}
}

Document

DOI: 10.4230/LIPIcs.SAT.2025.32

Efficient Certified Reasoning for Binarized Neural Networks

Authors: Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel

Published in: LIPIcs, Volume 341, 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)

Abstract

Neural networks have emerged as essential components in safety-critical applications - these use cases demand complex, yet trustworthy computations. Binarized Neural Networks (BNNs) are a type of neural network where each neuron is constrained to a Boolean value; they are particularly well-suited for safety-critical tasks because they retain much of the computational capacities of full-scale (floating-point or quantized) deep neural networks, but remain compatible with satisfiability solvers for qualitative verification and with model counters for quantitative reasoning. However, existing methods for BNN analysis suffer from either limited scalability or susceptibility to soundness errors, which hinders their applicability in real-world scenarios. In this work, we present a scalable and trustworthy approach for both qualitative and quantitative verification of BNNs. Our approach introduces a native representation of BNN constraints in a custom-designed solver for qualitative reasoning, and in an approximate model counter for quantitative reasoning. We further develop specialized proof generation and checking pipelines with native support for BNN constraint reasoning, ensuring trustworthiness for all of our verification results. Empirical evaluations on a BNN robustness verification benchmark suite demonstrate that our certified solving approach achieves a 9× speedup over prior certified CNF and PB-based approaches, and our certified counting approach achieves a 218× speedup over the existing CNF-based baseline. In terms of coverage, our pipeline produces fully certified results for 99% and 86% of the qualitative and quantitative reasoning queries on BNNs, respectively. This is in sharp contrast to the best existing baselines which can fully certify only 62% and 4% of the queries, respectively.

Cite as

Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel. Efficient Certified Reasoning for Binarized Neural Networks. In 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 341, pp. 32:1-32:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{yang_et_al:LIPIcs.SAT.2025.32,
  author =	{Yang, Jiong and Tan, Yong Kiam and Soos, Mate and Myreen, Magnus O. and Meel, Kuldeep S.},
  title =	{{Efficient Certified Reasoning for Binarized Neural Networks}},
  booktitle =	{28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)},
  pages =	{32:1--32:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-381-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{341},
  editor =	{Berg, Jeremias and Nordstr\"{o}m, Jakob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2025.32},
  URN =		{urn:nbn:de:0030-drops-237665},
  doi =		{10.4230/LIPIcs.SAT.2025.32},
  annote =	{Keywords: Neural network verification, proof certification, SAT solving, approximate model counting}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.36

Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Authors: Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch-and-bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-served" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problems and so will not lead to a performance degradation. Specifically, Oliva has two variants, including Oliva^GR, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and Oliva^SA, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR-10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25× in MNIST, and up to 80× in CIFAR-10.

Cite as

Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, and Yulei Sui. Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 36:1-36:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ECOOP.2025.36,
  author =	{Zhang, Guanqin and Fukuda, Kota and Zhang, Zhenya and Bandara, H.M.N. Dilum and Chen, Shiping and Zhao, Jianjun and Sui, Yulei},
  title =	{{Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{36:1--36:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.36},
  URN =		{urn:nbn:de:0030-drops-233281},
  doi =		{10.4230/LIPIcs.ECOOP.2025.36},
  annote =	{Keywords: neural network verification, branch and bound, counterexample potentiality, simulated annealing, stochastic optimization}
}

@InProceedings{zhang_et_al:LIPIcs.ECOOP.2025.36,
  author =	{Zhang, Guanqin and Fukuda, Kota and Zhang, Zhenya and Bandara, H.M.N. Dilum and Chen, Shiping and Zhao, Jianjun and Sui, Yulei},
  title =	{{Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{36:1--36:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.36},
  URN =		{urn:nbn:de:0030-drops-233281},
  doi =		{10.4230/LIPIcs.ECOOP.2025.36},
  annote =	{Keywords: neural network verification, branch and bound, counterexample potentiality, simulated annealing, stochastic optimization}
}

Document

DOI: 10.4230/OASIcs.NG-RES.2025.2

Low-Latency Real-Time Applications on Heterogeneous MPSoCs

Authors: Nicolas Coppik, Pascal Becker, and Marcus Ritter

Published in: OASIcs, Volume 128, Sixth Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2025)

Abstract

Heterogeneous Multi-Processor Systems-on-Chip (MPSoCs) that combine multiple, heterogeneous processing units are becoming increasingly popular for a wide range of applications, including industrial applications, where complex real-time applications can benefit from the performance and flexibility they offer. However, deploying real-time applications with low latency requirements across multiple processing units on such MPSoCs remains a challenging problem, particularly when communication between processors is required on a time-critical path. Existing solutions generally rely on the presence of at least one full-featured, general-purpose operating system on the device, and do not cater to the requirements of distributed, low-latency real-time applications. In this paper, we investigate the performance, with a focus on latency, of different options for communication between CPUs, including inter-processor interrupts and shared memory communication via different memories, on the popular Xilinx Zynq UltraScale+ platform and propose a novel solution for communication between heterogeneous processing units that relies only on the availability of shared memory. Our solution is capable of achieving sub-microsecond latencies for signaling and the transfer of small amounts of data between processing units, making it suitable for deploying distributed, low-latency real-time applications.

Cite as

Nicolas Coppik, Pascal Becker, and Marcus Ritter. Low-Latency Real-Time Applications on Heterogeneous MPSoCs. In Sixth Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2025). Open Access Series in Informatics (OASIcs), Volume 128, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{coppik_et_al:OASIcs.NG-RES.2025.2,
  author =	{Coppik, Nicolas and Becker, Pascal and Ritter, Marcus},
  title =	{{Low-Latency Real-Time Applications on Heterogeneous MPSoCs}},
  booktitle =	{Sixth Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2025)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-366-9},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{128},
  editor =	{Yomsi, Patrick Meumeu and Wildermann, Stefan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.NG-RES.2025.2},
  URN =		{urn:nbn:de:0030-drops-229883},
  doi =		{10.4230/OASIcs.NG-RES.2025.2},
  annote =	{Keywords: real-time systems, heterogeneous systems, latency, inter-core communication}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.14

Repairing Databases over Metric Spaces with Coincidence Constraints

Authors: Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a vector space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as coincidence constraints, which include unary key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm solving the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining our algorithm for tree metrics and a classic result on probabilistic tree embeddings, we design a (high probability) logarithmic-ratio approximation for general metrics. We also study the variant of the problem where we limit the allowed change of each individual value. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and we present a polynomial-time repairing algorithm for the case of a line metric.

Cite as

Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc. Repairing Databases over Metric Spaces with Coincidence Constraints. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kaminsky_et_al:LIPIcs.ICDT.2025.14,
  author =	{Kaminsky, Youri and Kimelfeld, Benny and Livshits, Ester and Naumann, Felix and Wajc, David},
  title =	{{Repairing Databases over Metric Spaces with Coincidence Constraints}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.14},
  URN =		{urn:nbn:de:0030-drops-229554},
  doi =		{10.4230/LIPIcs.ICDT.2025.14},
  annote =	{Keywords: Database repairs, metric spaces, coincidence constraints, inclusion constraints, foreign-key constraints}
}

Document

Position

DOI: 10.4230/TGDK.1.1.2

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Large Language Models (LLMs) have taken Knowledge Representation - and the world - by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.

Cite as

Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux. Large Language Models and Knowledge Graphs: Opportunities and Challenges. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 2:1-2:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{pan_et_al:TGDK.1.1.2,
  author =	{Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
  title =	{{Large Language Models and Knowledge Graphs: Opportunities and Challenges}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:38},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.2},
  URN =		{urn:nbn:de:0030-drops-194766},
  doi =		{10.4230/TGDK.1.1.2},
  annote =	{Keywords: Large Language Models, Pre-trained Language Models, Knowledge Graphs, Ontology, Retrieval Augmented Language Models}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.4

Robust Phoneme Recognition with Little Data

Authors: Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, and Sandra Maria Aluisio

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

A common belief in the community is that deep learning requires large datasets to be effective. We show that with careful parameter selection, deep feature extraction can be applied even to small datasets.We also explore exactly how much data is necessary to guarantee learning by convergence analysis and calculating the shattering coefficient for the algorithms used. Another problem is that state-of-the-art results are rarely reproducible because they use proprietary datasets, pretrained networks and/or weight initializations from other larger networks. We present a two-fold novelty for this situation where a carefully designed CNN architecture, together with a knowledge-driven classifier achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. We also beat the best replication study of the state of the art with a 28% FER. More importantly, we are able to achieve transparent, reproducible frame-level accuracy and, additionally, perform a convergence analysis to show the generalization capacity of the model providing statistical evidence that our results are not obtained by chance. Furthermore, we show how algorithms with strong learning guarantees can not only benefit from raw data extraction but contribute with more robust results.

Cite as

Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, and Sandra Maria Aluisio. Robust Phoneme Recognition with Little Data. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 4:1-4:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{shulby_et_al:OASIcs.SLATE.2019.4,
  author =	{Shulby, Christopher Dane and Ferreira, Martha Dais and de Mello, Rodrigo F. and Aluisio, Sandra Maria},
  title =	{{Robust Phoneme Recognition with Little Data}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{4:1--4:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.4},
  URN =		{urn:nbn:de:0030-drops-108715},
  doi =		{10.4230/OASIcs.SLATE.2019.4},
  annote =	{Keywords: feature extraction, acoustic modeling, phoneme recognition, statistical learning theory}
}

@InProceedings{shulby_et_al:OASIcs.SLATE.2019.4,
  author =	{Shulby, Christopher Dane and Ferreira, Martha Dais and de Mello, Rodrigo F. and Aluisio, Sandra Maria},
  title =	{{Robust Phoneme Recognition with Little Data}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{4:1--4:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.4},
  URN =		{urn:nbn:de:0030-drops-108715},
  doi =		{10.4230/OASIcs.SLATE.2019.4},
  annote =	{Keywords: feature extraction, acoustic modeling, phoneme recognition, statistical learning theory}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.6

A Formal Framework for Probabilistic Unclean Databases

Authors: Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples.

Cite as

Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{desa_et_al:LIPIcs.ICDT.2019.6,
  author =	{De Sa, Christopher and Ilyas, Ihab F. and Kimelfeld, Benny and R\'{e}, Christopher and Rekatsinas, Theodoros},
  title =	{{A Formal Framework for Probabilistic Unclean Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{6:1--6:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.6},
  URN =		{urn:nbn:de:0030-drops-103083},
  doi =		{10.4230/LIPIcs.ICDT.2019.6},
  annote =	{Keywords: Unclean databases, data cleaning, probabilistic databases, noisy channel}
}

10 Search Results for "De Sa, Christopher"

Locality Sensitive Hashing in Hyperbolic Space

Abstract

Cite as

Mapping Chemical Space: Topological Data Analysis of Chemical Latent Space with Mapper

Abstract

Cite as

An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT

Abstract

Cite as

Efficient Certified Reasoning for Binarized Neural Networks

Abstract

Cite as

Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Abstract

Cite as

Low-Latency Real-Time Applications on Heterogeneous MPSoCs

Abstract

Cite as

Repairing Databases over Metric Spaces with Coincidence Constraints

Abstract

Cite as

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Abstract

Cite as

Robust Phoneme Recognition with Little Data

Abstract

Cite as

A Formal Framework for Probabilistic Unclean Databases

Abstract

Cite as

Thanks for your feedback!

Could not send message