DROPS

Document

Research

Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web

Authors: Florian Ruosch, Cristina Sarasua, and Abraham Bernstein

Published in: TGDK, Volume 3, Issue 3 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 3

Abstract

In Argument Mining, predicting argumentative relations between texts (or spans) remains one of the most challenging aspects, even more so in the cross-document setting. This paper makes three key contributions to advance research in this domain. We first extend an existing dataset, the Sci-Arg corpus, by annotating it with explicit inter-document argumentative relations, thereby allowing arguments to be distributed over several documents forming an Argument Web; these new annotations are published using Semantic Web technologies (RDF, OWL). Second, we explore and evaluate three automated approaches for predicting these inter-document argumentative relations, establishing critical baselines on the new dataset. We find that a simple classifier based on discourse indicators with access to context outperforms neural methods. Third, we conduct a comparative analysis of these approaches for both intra- and inter-document settings, identifying statistically significant differences in results that indicate the necessity of distinguishing between these two scenarios. Our findings highlight significant challenges in this complex domain and open crucial avenues for future research on the Argument Web of Science, particularly for those interested in leveraging Semantic Web technologies and knowledge graphs to understand scholarly discourse. With this, we provide the first stepping stones in the form of a benchmark dataset, three baseline methods, and an initial analysis for a systematic exploration of this field relevant to the Web of Data and Science.

Cite as

Florian Ruosch, Cristina Sarasua, and Abraham Bernstein. Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 3, pp. 4:1-4:33, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{ruosch_et_al:TGDK.3.3.4,
  author =	{Ruosch, Florian and Sarasua, Cristina and Bernstein, Abraham},
  title =	{{Mining Inter-Document Argument Structures in Scientific Papers for an Argument Web}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{4:1--4:33},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{3},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.3.4},
  URN =		{urn:nbn:de:0030-drops-252159},
  doi =		{10.4230/TGDK.3.3.4},
  annote =	{Keywords: Argument Mining, Large Language Models, Knowledge Graphs, Link Prediction}
}

Document

DOI: 10.4230/OASIcs.ATMOS.2025.15

Exact and Heuristic Dynamic Taxi Sharing with Transfers Using Shortest-Path Speedup Techniques

Authors: Johannes Breitling and Moritz Laupichler

Published in: OASIcs, Volume 137, 25th Symposium on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2025)

Abstract

We introduce a first-of-its-kind efficient, exact algorithm for the dynamic taxi-sharing problem with single-transfer journeys, i.e., a dispatcher that assigns traveler requests to a fleet of shared taxi-like vehicles allowing transfers between vehicles. We extend an existing no-transfer solution by collecting all viable pickup and dropoff vehicles for a request and computing the optimal transfer point for every pair of vehicles. We analyze underlying shortest-path problems and employ state-of-the-art routing algorithms to compute distances on-the-fly, which serves as the basis of dispatching requests with exact and up-to-date travel time information. We utilize constraints on existing routes, pruning techniques for transfer points, and both instruction- and thread-level parallelism to speed up the computation of the best assignment for every traveler. In addition to the exact variant, we propose a tunable heuristic approach that sacrifices solution quality in favor of improved running time. We evaluate our algorithm on a large road network with realistic input sets (up to 150000 requests). We demonstrate the effectiveness of our speedup techniques and the heuristic. We show first results on the benefits of transfers for taxi sharing on dense request sets, proving that our algorithm is well suited for the analysis of taxi sharing with transfers on large input instances.

Cite as

Johannes Breitling and Moritz Laupichler. Exact and Heuristic Dynamic Taxi Sharing with Transfers Using Shortest-Path Speedup Techniques. In 25th Symposium on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2025). Open Access Series in Informatics (OASIcs), Volume 137, pp. 15:1-15:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{breitling_et_al:OASIcs.ATMOS.2025.15,
  author =	{Breitling, Johannes and Laupichler, Moritz},
  title =	{{Exact and Heuristic Dynamic Taxi Sharing with Transfers Using Shortest-Path Speedup Techniques}},
  booktitle =	{25th Symposium on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2025)},
  pages =	{15:1--15:22},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-404-8},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{137},
  editor =	{Sauer, Jonas and Schmidt, Marie},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ATMOS.2025.15},
  URN =		{urn:nbn:de:0030-drops-247718},
  doi =		{10.4230/OASIcs.ATMOS.2025.15},
  annote =	{Keywords: Dynamic taxi sharing, ride pooling, dial-a-ride problem, transfers, route planning}
}

Document

Survey

DOI: 10.4230/TGDK.3.2.1

Resilience in Knowledge Graph Embeddings

Authors: Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2

Abstract

In recent years, knowledge graphs have gained interest and witnessed widespread applications in various domains, such as information retrieval, question-answering, recommendation systems, amongst others. Large-scale knowledge graphs to this end have demonstrated their utility in effectively representing structured knowledge. To further facilitate the application of machine learning techniques, knowledge graph embedding models have been developed. Such models can transform entities and relationships within knowledge graphs into vectors. However, these embedding models often face challenges related to noise, missing information, distribution shift, adversarial attacks, etc. This can lead to sub-optimal embeddings and incorrect inferences, thereby negatively impacting downstream applications. While the existing literature has focused so far on adversarial attacks on KGE models, the challenges related to the other critical aspects remain unexplored. In this paper, we, first of all, give a unified definition of resilience, encompassing several factors such as generalisation, in-distribution generalization, distribution adaption, and robustness. After formalizing these concepts for machine learning in general, we define them in the context of knowledge graphs. To find the gap in the existing works on resilience in the context of knowledge graphs, we perform a systematic survey, taking into account all these aspects mentioned previously. Our survey results show that most of the existing works focus on a specific aspect of resilience, namely robustness. After categorizing such works based on their respective aspects of resilience, we discuss the challenges and future research directions.

Cite as

Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo. Resilience in Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 1:1-1:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{sharma_et_al:TGDK.3.2.1,
  author =	{Sharma, Arnab and Kouagou, N'Dah Jean and Ngomo, Axel-Cyrille Ngonga},
  title =	{{Resilience in Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:38},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.1},
  URN =		{urn:nbn:de:0030-drops-248117},
  doi =		{10.4230/TGDK.3.2.1},
  annote =	{Keywords: Knowledge graphs, Resilience, Robustness}
}

Document

DOI: 10.4230/OASIcs.SpaceCHI.2025.15

Integrating Human-In-The-Loop AI to Tackle Space Communication Delay Challenges

Authors: Nikos Mavrakis, Effie Lai-Chong Law, and Hubert P. H. Shum

Published in: OASIcs, Volume 130, Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025)

Abstract

Deep space missions face significant communication delays that disrupt both operational workflows and psychological support for crew members. Unlike low Earth orbit operations, delays ranging from several minutes to nearly an hour make real-time communication with mission control infeasible, forcing crews to act with greater independence under uncertain conditions. This position paper examines how human-in-the-loop AI, digital twins, and edge AI can be integrated to mitigate these delays while maintaining astronaut autonomy and engagement. We argue that human-in-the-loop AI enables decision-making processes that are responsive to local context while remaining adaptable to changing mission demands. Digital twins offer real-time simulation and predictive modelling capabilities, allowing astronauts to explore options and troubleshoot without waiting for ground input. Edge AI brings computation closer to data sources, enabling low-latency inference onboard spacecraft for time-critical decisions. These ideas are explored through two use cases: using deepfakes to support emotionally resonant communication with loved ones, and applying visual-language models for onboard fault diagnosis and adaptive task replanning. We conclude with reflections on system design challenges under constrained and high-stakes conditions.

Cite as

Nikos Mavrakis, Effie Lai-Chong Law, and Hubert P. H. Shum. Integrating Human-In-The-Loop AI to Tackle Space Communication Delay Challenges. In Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025). Open Access Series in Informatics (OASIcs), Volume 130, pp. 15:1-15:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{mavrakis_et_al:OASIcs.SpaceCHI.2025.15,
  author =	{Mavrakis, Nikos and Law, Effie Lai-Chong and Shum, Hubert P. H.},
  title =	{{Integrating Human-In-The-Loop AI to Tackle Space Communication Delay Challenges}},
  booktitle =	{Advancing Human-Computer Interaction for Space Exploration (SpaceCHI 2025)},
  pages =	{15:1--15:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-384-3},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{130},
  editor =	{Bensch, Leonie and Nilsson, Tommy and Nisser, Martin and Pataranutaporn, Pat and Schmidt, Albrecht and Sumini, Valentina},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SpaceCHI.2025.15},
  URN =		{urn:nbn:de:0030-drops-240051},
  doi =		{10.4230/OASIcs.SpaceCHI.2025.15},
  annote =	{Keywords: Human-in-the-loop AI, communication delays, human spaceflight}
}

Document

DOI: 10.4230/OASIcs.ICPEC.2025.9

Are We There Yet? On Security Vulnerabilities Produced by Open Source Generative AI Models and Its Implications for Security Education

Authors: Maria Camila Santos Galeano, Tiago Espinha Gasiba, Sathwik Amburi, and Maria Pinto-Albuquerque

Published in: OASIcs, Volume 133, 6th International Computer Programming Education Conference (ICPEC 2025)

Abstract

With the increasing integration of large language models (LLMs) into software development and programming education, concerns have emerged about the security of AI-generated code. This study investigates the security of three open source code generation models. Codestral, DeepSeek R1, and LLaMA 3.3 70B using structured prompts in Python, C, and Java. Some prompts were designed to explicitly trigger known vulnerability patterns, such as unsanitized input handling or unsafe memory operations, in order to assess how each model responds to security-sensitive tasks. The findings reveal recurring issues, including command execution vulnerabilities, insecure memory handling, and insufficient input validation. In response, we propose a set of recommendations for integrating secure prompt design and code auditing practices into developer training. These guidelines aim to help future developers generate safer code and better identify flaws in GenAI-generated output. This work offers an initial analysis of the limitations of GenAI-assisted code generation and provides actionable strategies to support the more secure and responsible use of these tools in professional and educational contexts.

Cite as

Maria Camila Santos Galeano, Tiago Espinha Gasiba, Sathwik Amburi, and Maria Pinto-Albuquerque. Are We There Yet? On Security Vulnerabilities Produced by Open Source Generative AI Models and Its Implications for Security Education. In 6th International Computer Programming Education Conference (ICPEC 2025). Open Access Series in Informatics (OASIcs), Volume 133, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{santosgaleano_et_al:OASIcs.ICPEC.2025.9,
  author =	{Santos Galeano, Maria Camila and Espinha Gasiba, Tiago and Amburi, Sathwik and Pinto-Albuquerque, Maria},
  title =	{{Are We There Yet? On Security Vulnerabilities Produced by Open Source Generative AI Models and Its Implications for Security Education}},
  booktitle =	{6th International Computer Programming Education Conference (ICPEC 2025)},
  pages =	{9:1--9:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-393-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{133},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Portela, Filipe and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2025.9},
  URN =		{urn:nbn:de:0030-drops-240395},
  doi =		{10.4230/OASIcs.ICPEC.2025.9},
  annote =	{Keywords: Generative AI, Code Security, Programming Education, Prompt Engineering, Secure Coding, Static Analysis}
}

@InProceedings{santosgaleano_et_al:OASIcs.ICPEC.2025.9,
  author =	{Santos Galeano, Maria Camila and Espinha Gasiba, Tiago and Amburi, Sathwik and Pinto-Albuquerque, Maria},
  title =	{{Are We There Yet? On Security Vulnerabilities Produced by Open Source Generative AI Models and Its Implications for Security Education}},
  booktitle =	{6th International Computer Programming Education Conference (ICPEC 2025)},
  pages =	{9:1--9:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-393-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{133},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Portela, Filipe and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2025.9},
  URN =		{urn:nbn:de:0030-drops-240395},
  doi =		{10.4230/OASIcs.ICPEC.2025.9},
  annote =	{Keywords: Generative AI, Code Security, Programming Education, Prompt Engineering, Secure Coding, Static Analysis}
}

Document

DOI: 10.4230/LIPIcs.GIScience.2025.12

Large Multi-Modal Model Cartographic Map Comprehension for Textual Locality Georeferencing

Authors: Kalana Wijegunarathna, Kristin Stock, and Christopher B. Jones

Published in: LIPIcs, Volume 346, 13th International Conference on Geographic Information Science (GIScience 2025)

Abstract

Millions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multi-modal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach (∼1 km Average distance error) compared to uni-modal georeferencing with Large Language Models and existing georeferencing tools. The paper also discusses the findings of the experiments in light of an LMM’s ability to comprehend fine-grained maps. Motivated by these results, a practical framework is proposed to integrate this method into a georeferencing workflow.

Cite as

Kalana Wijegunarathna, Kristin Stock, and Christopher B. Jones. Large Multi-Modal Model Cartographic Map Comprehension for Textual Locality Georeferencing. In 13th International Conference on Geographic Information Science (GIScience 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 346, pp. 12:1-12:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{wijegunarathna_et_al:LIPIcs.GIScience.2025.12,
  author =	{Wijegunarathna, Kalana and Stock, Kristin and Jones, Christopher B.},
  title =	{{Large Multi-Modal Model Cartographic Map Comprehension for Textual Locality Georeferencing}},
  booktitle =	{13th International Conference on Geographic Information Science (GIScience 2025)},
  pages =	{12:1--12:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-378-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{346},
  editor =	{Sila-Nowicka, Katarzyna and Moore, Antoni and O'Sullivan, David and Adams, Benjamin and Gahegan, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2025.12},
  URN =		{urn:nbn:de:0030-drops-238412},
  doi =		{10.4230/LIPIcs.GIScience.2025.12},
  annote =	{Keywords: Large Multi-Modal Models, Large Language Models, LLM, Georeferencing, Natural History collections}
}

@InProceedings{wijegunarathna_et_al:LIPIcs.GIScience.2025.12,
  author =	{Wijegunarathna, Kalana and Stock, Kristin and Jones, Christopher B.},
  title =	{{Large Multi-Modal Model Cartographic Map Comprehension for Textual Locality Georeferencing}},
  booktitle =	{13th International Conference on Geographic Information Science (GIScience 2025)},
  pages =	{12:1--12:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-378-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{346},
  editor =	{Sila-Nowicka, Katarzyna and Moore, Antoni and O'Sullivan, David and Adams, Benjamin and Gahegan, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2025.12},
  URN =		{urn:nbn:de:0030-drops-238412},
  doi =		{10.4230/LIPIcs.GIScience.2025.12},
  annote =	{Keywords: Large Multi-Modal Models, Large Language Models, LLM, Georeferencing, Natural History collections}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.13

FuzzFlesh: Randomised Testing of Decompilers via Control Flow Graph-Based Program Generation

Authors: Amber Gorzynski and Alastair F. Donaldson

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

Decompilation is the process of translating compiled code into high-level code. Control flow recovery is a challenging part of the process. "Misdecompilations" can occur, whereby the decompiled code does not accurately represent the semantics of the compiled code, despite it being syntactically valid. This is problematic because it can mislead users who are trying to reason about the program. We present CFG-based program generation: a novel approach to randomised testing that aims to improve the control flow recovery of decompilers. CFG-based program generation involves randomly generating control flow graphs (CFGs) and paths through each graph. Inspired by prior work in the domain of GPU computing, (CFG, path) pairs are "fleshed" into test programs. Each program is decompiled and recompiled. The test oracle verifies whether the actual runtime path through the graph matches the expected path. Any difference in the execution paths after recompilation indicates a possible misdecompilation. A key benefit of this approach is that it is largely independent of the source and target languages in question because it is focused on control flow. The approach is therefore applicable to numerous decompilation settings. The trade-off resulting from the focus on control flow is that misdecompilation bugs that do not relate to control flow (e.g. bugs that involve specific arithmetic operations) are out of scope. We have implemented this approach in FuzzFlesh, an open-source randomised testing tool. FuzzFlesh can be easily configured to target a variety of low-level languages and decompiler toolchains because most of the CFG and path generation process is language-independent. At present, FuzzFlesh supports testing decompilation of Java bytecode, .NET assembly and x86 machine code. In addition to program generation, FuzzFlesh also includes an automated test-case reducer that operates on the CFG rather than the low-level program, which means that it can be applied to any of the target languages. We present a large experimental campaign applying FuzzFlesh to a variety of decompilers, leading to the discovery of 12 previously-unknown bugs across two language formats, six of which have been fixed. We present experiments comparing our generic FuzzFlesh tool to two state-of-the-art decompiler testing tools targeted at specific languages. As expected, the coverage our generic FuzzFlesh tool achieves on a given decompiler is lower than the coverage achieved by a tool specifically designed for the input format of that decompiler. However, due to its focus on control flow, FuzzFlesh is able to cover sections of control flow recovery code that the targeted tools cannot reach, and identify control flow related bugs that the targeted tools miss.

Cite as

Amber Gorzynski and Alastair F. Donaldson. FuzzFlesh: Randomised Testing of Decompilers via Control Flow Graph-Based Program Generation. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 13:1-13:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gorzynski_et_al:LIPIcs.ECOOP.2025.13,
  author =	{Gorzynski, Amber and Donaldson, Alastair F.},
  title =	{{FuzzFlesh: Randomised Testing of Decompilers via Control Flow Graph-Based Program Generation}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{13:1--13:26},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.13},
  URN =		{urn:nbn:de:0030-drops-233062},
  doi =		{10.4230/LIPIcs.ECOOP.2025.13},
  annote =	{Keywords: Decompiler, Reverse Engineering, Control Flow, Software Testing, Fuzzing}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.6

Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories

Authors: Tianyu Chen, Zeyu Wang, Lin Li, Ding Li, Zongyang Li, Xiaoning Chang, Pan Bian, Guangtai Liang, Qianxiang Wang, and Tao Xie

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

Functionality-specific vulnerabilities, which mainly occur in Application Programming Interfaces (APIs) with specific functionalities, are crucial for software developers to detect and avoid. When detecting individual functionality-specific vulnerabilities, the existing two categories of approaches are ineffective because they consider only the API bodies and are unable to handle diverse implementations of functionality-equivalent APIs. To effectively detect functionality-specific vulnerabilities, we propose APISS, the first approach to utilize API doc strings and signatures instead of API bodies. APISS first retrieves functionality-equivalent APIs for APIs with existing vulnerabilities and then migrates Proof-of-Concepts (PoCs) of the existing vulnerabilities for newly detected vulnerable APIs. To retrieve functionality-equivalent APIs, we leverage a Large Language Model for API embedding to improve the accuracy and address the effectiveness and scalability issues suffered by the existing approaches. To migrate PoCs of the existing vulnerabilities for newly detected vulnerable APIs, we design a semi-automatic schema to substantially reduce manual costs. We conduct a comprehensive evaluation to empirically compare APISS with four state-of-the-art approaches of detecting vulnerabilities and two state-of-the-art approaches of retrieving functionality-equivalent APIs. The evaluation subjects include 180 widely used Java repositories using 10 existing vulnerabilities, along with their PoCs. The results show that APISS effectively retrieves functionality-equivalent APIs, achieving a Top-1 Accuracy of 0.81 while the best of the baselines under comparison achieves only 0.55. APISS is highly efficient: the manual costs are within 10 minutes per vulnerability and the end-to-end runtime overhead of testing one candidate API is less than 2 hours. APISS detects 179 new vulnerabilities and receives 60 new CVE IDs, bringing high value to security practice.

Cite as

Tianyu Chen, Zeyu Wang, Lin Li, Ding Li, Zongyang Li, Xiaoning Chang, Pan Bian, Guangtai Liang, Qianxiang Wang, and Tao Xie. Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 6:1-6:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ECOOP.2025.6,
  author =	{Chen, Tianyu and Wang, Zeyu and Li, Lin and Li, Ding and Li, Zongyang and Chang, Xiaoning and Bian, Pan and Liang, Guangtai and Wang, Qianxiang and Xie, Tao},
  title =	{{Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{6:1--6:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.6},
  URN =		{urn:nbn:de:0030-drops-232999},
  doi =		{10.4230/LIPIcs.ECOOP.2025.6},
  annote =	{Keywords: Application Security, Vulnerability Detection, Large Language Model}
}

@InProceedings{chen_et_al:LIPIcs.ECOOP.2025.6,
  author =	{Chen, Tianyu and Wang, Zeyu and Li, Lin and Li, Ding and Li, Zongyang and Chang, Xiaoning and Bian, Pan and Liang, Guangtai and Wang, Qianxiang and Xie, Tao},
  title =	{{Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{6:1--6:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.6},
  URN =		{urn:nbn:de:0030-drops-232999},
  doi =		{10.4230/LIPIcs.ECOOP.2025.6},
  annote =	{Keywords: Application Security, Vulnerability Detection, Large Language Model}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.58

Tracking the Persistence of Harmonic Chains: Barcode and Stability

Authors: Tao Hou, Salman Parsa, and Bei Wang

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

The persistence barcode is a topological descriptor of data that plays a fundamental role in topological data analysis. Given a filtration of data, the persistence barcode tracks the evolution of its homology groups. In this paper, we introduce a new type of barcode, called the harmonic chain barcode, which tracks the evolution of harmonic chains. In addition, we show that the harmonic chain barcode is stable. Given a filtration of a simplicial complex of size m, we present an algorithm to compute its harmonic chain barcode in O(m³) time. Consequently, the harmonic chain barcode can enrich the family of topological descriptors in applications where a persistence barcode is applicable, such as feature vectorization and machine learning.

Cite as

Tao Hou, Salman Parsa, and Bei Wang. Tracking the Persistence of Harmonic Chains: Barcode and Stability. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 58:1-58:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hou_et_al:LIPIcs.SoCG.2025.58,
  author =	{Hou, Tao and Parsa, Salman and Wang, Bei},
  title =	{{Tracking the Persistence of Harmonic Chains: Barcode and Stability}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{58:1--58:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.58},
  URN =		{urn:nbn:de:0030-drops-232100},
  doi =		{10.4230/LIPIcs.SoCG.2025.58},
  annote =	{Keywords: Persistent homology, harmonic chains, topological data analysis}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.40

Apex Representatives

Authors: Tamal K. Dey, Tao Hou, and Dmitriy Morozov

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

Given a zigzag filtration, we want to find its barcode representatives, i.e., a compatible choice of bases for the homology groups that diagonalize the linear maps in the zigzag. To achieve this, we convert the input zigzag to a levelset zigzag of a real-valued function. This function generates a Mayer-Vietoris pyramid of spaces, which generates an infinite strip of homology groups. We call the origins of indecomposable (diamond) summands of this strip their apexes and give an algorithm to find representative cycles in these apexes from ordinary persistence computation. The resulting representatives map back to the levelset zigzag and thus yield barcode representatives for the input zigzag. Our algorithm for lifting a p-dimensional cycle from ordinary persistence to an apex representative takes O(p ⋅ m log m) time. From this we can recover zigzag representatives in time O(log m + C), where C is the size of the output.

Cite as

Tamal K. Dey, Tao Hou, and Dmitriy Morozov. Apex Representatives. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 40:1-40:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{dey_et_al:LIPIcs.SoCG.2025.40,
  author =	{Dey, Tamal K. and Hou, Tao and Morozov, Dmitriy},
  title =	{{Apex Representatives}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{40:1--40:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.40},
  URN =		{urn:nbn:de:0030-drops-231927},
  doi =		{10.4230/LIPIcs.SoCG.2025.40},
  annote =	{Keywords: zigzag persistent homology, Mayer-Vietoris pyramid, cycle representatives}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.68

Persistent (Co)Homology in Matrix Multiplication Time

Authors: Dmitriy Morozov and Primoz Skraba

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

Most algorithms for computing persistent homology do so by tracking cycles that represent homology classes. There are many choices of such cycles, and specific choices have found different uses in applications. Although it is known that persistence diagrams can be computed in matrix multiplication time for the more general case of zigzag persistent homology [Milosavljević et al., 2011], it is not clear how to extract cycle representatives, especially if specific representatives are desired. In this paper, we provide the same matrix multiplication bound for computing representatives for the two choices common in applications in the case of ordinary persistent (co)homology. We first provide a fast version of the reduction algorithm, which is simpler than the algorithm in [Milosavljević et al., 2011], but returns a different set of representatives than the standard algorithm [Edelsbrunner et al., 2002]. We then give a fast version of a variant called the row algorithm [De Silva et al., 2011], which returns the same representatives as the standard algorithm.

Cite as

Dmitriy Morozov and Primoz Skraba. Persistent (Co)Homology in Matrix Multiplication Time. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 68:1-68:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{morozov_et_al:LIPIcs.SoCG.2025.68,
  author =	{Morozov, Dmitriy and Skraba, Primoz},
  title =	{{Persistent (Co)Homology in Matrix Multiplication Time}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{68:1--68:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.68},
  URN =		{urn:nbn:de:0030-drops-232204},
  doi =		{10.4230/LIPIcs.SoCG.2025.68},
  annote =	{Keywords: persistent homology, matrix multiplication, cycle representatives}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2024.49

Computing Zigzag Vineyard Efficiently Including Expansions and Contractions

Authors: Tamal K. Dey and Tao Hou

Published in: LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

Abstract

Vines and vineyard connecting a stack of persistence diagrams have been introduced in the non-zigzag setting by Cohen-Steiner et al. [Cohen-Steiner et al., 2006]. We consider computing these vines over changing filtrations for zigzag persistence while incorporating two more operations: expansions and contractions in addition to the transpositions considered in the non-zigzag setting. Although expansions and contractions can be implemented in quadratic time in the non-zigzag case by utilizing the linear-time transpositions, it is not obvious how they can be carried out under the zigzag framework with the same complexity. While transpositions alone can be easily conducted in linear time using the recent FastZigzag algorithm [Tamal K. Dey and Tao Hou, 2022], expansions and contractions pose difficulty in breaking the barrier of cubic complexity [Dey and Hou, 2022]. Our main result is that, the half-way constructed up-down filtration in the FastZigzag algorithm indeed can be used to achieve linear time complexity for transpositions and quadratic time complexity for expansions and contractions, matching the time complexity of all corresponding operations in the non-zigzag case.

Cite as

Tamal K. Dey and Tao Hou. Computing Zigzag Vineyard Efficiently Including Expansions and Contractions. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 49:1-49:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{dey_et_al:LIPIcs.SoCG.2024.49,
  author =	{Dey, Tamal K. and Hou, Tao},
  title =	{{Computing Zigzag Vineyard Efficiently Including Expansions and Contractions}},
  booktitle =	{40th International Symposium on Computational Geometry (SoCG 2024)},
  pages =	{49:1--49:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-316-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{293},
  editor =	{Mulzer, Wolfgang and Phillips, Jeff M.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.49},
  URN =		{urn:nbn:de:0030-drops-199942},
  doi =		{10.4230/LIPIcs.SoCG.2024.49},
  annote =	{Keywords: zigzag persistence, vines and vineyard, update operations}
}

Document

Survey

DOI: 10.4230/TGDK.2.1.3

Semantic Web: Past, Present, and Future

Authors: Ansgar Scherp, Gerd Groener, Petr Škoda, Katja Hose, and Maria-Esther Vidal

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

Abstract

Ever since the vision was formulated, the Semantic Web has inspired many generations of innovations. Semantic technologies have been used to share vast amounts of information on the Web, enhance them with semantics to give them meaning, and enable inference and reasoning on them. Throughout the years, semantic technologies, and in particular knowledge graphs, have been used in search engines, data integration, enterprise settings, and machine learning. In this paper, we recap the classical concepts and foundations of the Semantic Web as well as modern and recent concepts and applications, building upon these foundations. The classical topics we cover include knowledge representation, creating and validating knowledge on the Web, reasoning and linking, and distributed querying. We enhance this classical view of the so-called "Semantic Web Layer Cake" with an update of recent concepts that include provenance, security and trust, as well as a discussion of practical impacts from industry-led contributions. We conclude with an outlook on the future directions of the Semantic Web. This is a living document. If you like to contribute, please contact the first author and visit: https://github.com/ascherp/semantic-web-primer

Cite as

Ansgar Scherp, Gerd Groener, Petr Škoda, Katja Hose, and Maria-Esther Vidal. Semantic Web: Past, Present, and Future. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 3:1-3:37, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{scherp_et_al:TGDK.2.1.3,
  author =	{Scherp, Ansgar and Groener, Gerd and \v{S}koda, Petr and Hose, Katja and Vidal, Maria-Esther},
  title =	{{Semantic Web: Past, Present, and Future}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:37},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.3},
  URN =		{urn:nbn:de:0030-drops-198607},
  doi =		{10.4230/TGDK.2.1.3},
  annote =	{Keywords: Linked Open Data, Semantic Web Graphs, Knowledge Graphs}
}

Document

Vision

DOI: 10.4230/TGDK.1.1.10

Multilingual Knowledge Graphs and Low-Resource Languages: A Review

Authors: Lucie-Aimée Kaffee, Russa Biswas, C. Maria Keet, Edlira Kalemi Vakaj, and Gerard de Melo

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

There is a lack of multilingual data to support applications in a large number of languages, especially for low-resource languages. Knowledge graphs (KG) could contribute to closing the gap of language support by providing easily accessible, machine-readable, multilingual linked data, which can be reused across applications. In this paper, we provide an overview of work in the domain of multilingual KGs with a focus on low-resource languages. We review the current state of multilingual KGs along with the different aspects that are crucial for creating KGs with language coverage in mind. Special consideration is given to challenges particular to low-resource languages in KGs. We further provide an overview of applications that yield multilingual KG information as well as downstream applications reusing such multilingual data. Finally, we explore open problems regarding multilingual KGs with a focus on low-resource languages.

Cite as

Lucie-Aimée Kaffee, Russa Biswas, C. Maria Keet, Edlira Kalemi Vakaj, and Gerard de Melo. Multilingual Knowledge Graphs and Low-Resource Languages: A Review. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{kaffee_et_al:TGDK.1.1.10,
  author =	{Kaffee, Lucie-Aim\'{e}e and Biswas, Russa and Keet, C. Maria and Vakaj, Edlira Kalemi and de Melo, Gerard},
  title =	{{Multilingual Knowledge Graphs and Low-Resource Languages: A Review}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{10:1--10:19},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.10},
  URN =		{urn:nbn:de:0030-drops-194845},
  doi =		{10.4230/TGDK.1.1.10},
  annote =	{Keywords: knowledge graphs, multilingual, low-resource languages, review}
}

Document

Vision

DOI: 10.4230/TGDK.1.1.8

Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges

Authors: Claudia d'Amato, Louis Mahon, Pierre Monnin, and Giorgos Stamou

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

The graph model is nowadays largely adopted to model a wide range of knowledge and data, spanning from social networks to knowledge graphs (KGs), representing a successful paradigm of how symbolic and transparent AI can scale on the World Wide Web. However, due to their unprecedented volume, they are generally tackled by Machine Learning (ML) and mostly numeric based methods such as graph embedding models (KGE) and deep neural networks (DNNs). The latter methods have been proved lately very efficient, leading the current AI spring. In this vision paper, we introduce some of the main existing methods for combining KGs and ML, divided into two categories: those using ML to improve KGs, and those using KGs to improve results on ML tasks. From this introduction, we highlight research gaps and perspectives that we deem promising and currently under-explored for the involved research communities, spanning from KG support for LLM prompting, integration of KG semantics in ML models to symbol-based methods, interpretability of ML models, and the need for improved benchmark datasets. In our opinion, such perspectives are stepping stones in an ultimate view of KGs as central assets for neuro-symbolic and explainable AI.

Cite as

Claudia d'Amato, Louis Mahon, Pierre Monnin, and Giorgos Stamou. Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 8:1-8:35, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{damato_et_al:TGDK.1.1.8,
  author =	{d'Amato, Claudia and Mahon, Louis and Monnin, Pierre and Stamou, Giorgos},
  title =	{{Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{8:1--8:35},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.8},
  URN =		{urn:nbn:de:0030-drops-194824},
  doi =		{10.4230/TGDK.1.1.8},
  annote =	{Keywords: Graph-based Learning, Knowledge Graph Embeddings, Large Language Models, Explainable AI, Knowledge Graph Completion \& Curation}
}

19 Search Results for "Hou, Tao"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message