DROPS

Document

DOI: 10.4230/OASIcs.ICPEC.2025.4

Can Open Large Language Models Catch Vulnerabilities?

Authors: Diogo Gaspar Lopes, Tiago Espinha Gasiba, Sathwik Amburi, and Maria Pinto-Albuquerque

Published in: OASIcs, Volume 133, 6th International Computer Programming Education Conference (ICPEC 2025)

Abstract

As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model’s performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings reveal a sharp contrast between high detection rates and markedly poor classification accuracy, with frequent overgeneralization and misclassification. Moreover, we analyze model-specific biases and common failure modes, shedding light on the limitations of current LLMs in performing fine-grained security reasoning.These insights are especially relevant in educational contexts, where LLMs are being adopted as learning aids despite their limitations. A nuanced understanding of their behaviour is essential to prevent the propagation of misconceptions among students. Our results expose key challenges that must be addressed before LLMs can be reliably deployed in security-sensitive environments.

Cite as

Diogo Gaspar Lopes, Tiago Espinha Gasiba, Sathwik Amburi, and Maria Pinto-Albuquerque. Can Open Large Language Models Catch Vulnerabilities?. In 6th International Computer Programming Education Conference (ICPEC 2025). Open Access Series in Informatics (OASIcs), Volume 133, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gasparlopes_et_al:OASIcs.ICPEC.2025.4,
  author =	{Gaspar Lopes, Diogo and Espinha Gasiba, Tiago and Amburi, Sathwik and Pinto-Albuquerque, Maria},
  title =	{{Can Open Large Language Models Catch Vulnerabilities?}},
  booktitle =	{6th International Computer Programming Education Conference (ICPEC 2025)},
  pages =	{4:1--4:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-393-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{133},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Portela, Filipe and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2025.4},
  URN =		{urn:nbn:de:0030-drops-240340},
  doi =		{10.4230/OASIcs.ICPEC.2025.4},
  annote =	{Keywords: Large Language Models (LLMs), Secure Coding, CWE Classification, Machine Learning, Software Vulnerability Detection, Artificial Intelligence, Code Analysis, Big-Vul Dataset}
}

@InProceedings{gasparlopes_et_al:OASIcs.ICPEC.2025.4,
  author =	{Gaspar Lopes, Diogo and Espinha Gasiba, Tiago and Amburi, Sathwik and Pinto-Albuquerque, Maria},
  title =	{{Can Open Large Language Models Catch Vulnerabilities?}},
  booktitle =	{6th International Computer Programming Education Conference (ICPEC 2025)},
  pages =	{4:1--4:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-393-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{133},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Portela, Filipe and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2025.4},
  URN =		{urn:nbn:de:0030-drops-240340},
  doi =		{10.4230/OASIcs.ICPEC.2025.4},
  annote =	{Keywords: Large Language Models (LLMs), Secure Coding, CWE Classification, Machine Learning, Software Vulnerability Detection, Artificial Intelligence, Code Analysis, Big-Vul Dataset}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.6

Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories

Authors: Tianyu Chen, Zeyu Wang, Lin Li, Ding Li, Zongyang Li, Xiaoning Chang, Pan Bian, Guangtai Liang, Qianxiang Wang, and Tao Xie

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

Functionality-specific vulnerabilities, which mainly occur in Application Programming Interfaces (APIs) with specific functionalities, are crucial for software developers to detect and avoid. When detecting individual functionality-specific vulnerabilities, the existing two categories of approaches are ineffective because they consider only the API bodies and are unable to handle diverse implementations of functionality-equivalent APIs. To effectively detect functionality-specific vulnerabilities, we propose APISS, the first approach to utilize API doc strings and signatures instead of API bodies. APISS first retrieves functionality-equivalent APIs for APIs with existing vulnerabilities and then migrates Proof-of-Concepts (PoCs) of the existing vulnerabilities for newly detected vulnerable APIs. To retrieve functionality-equivalent APIs, we leverage a Large Language Model for API embedding to improve the accuracy and address the effectiveness and scalability issues suffered by the existing approaches. To migrate PoCs of the existing vulnerabilities for newly detected vulnerable APIs, we design a semi-automatic schema to substantially reduce manual costs. We conduct a comprehensive evaluation to empirically compare APISS with four state-of-the-art approaches of detecting vulnerabilities and two state-of-the-art approaches of retrieving functionality-equivalent APIs. The evaluation subjects include 180 widely used Java repositories using 10 existing vulnerabilities, along with their PoCs. The results show that APISS effectively retrieves functionality-equivalent APIs, achieving a Top-1 Accuracy of 0.81 while the best of the baselines under comparison achieves only 0.55. APISS is highly efficient: the manual costs are within 10 minutes per vulnerability and the end-to-end runtime overhead of testing one candidate API is less than 2 hours. APISS detects 179 new vulnerabilities and receives 60 new CVE IDs, bringing high value to security practice.

Cite as

Tianyu Chen, Zeyu Wang, Lin Li, Ding Li, Zongyang Li, Xiaoning Chang, Pan Bian, Guangtai Liang, Qianxiang Wang, and Tao Xie. Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 6:1-6:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ECOOP.2025.6,
  author =	{Chen, Tianyu and Wang, Zeyu and Li, Lin and Li, Ding and Li, Zongyang and Chang, Xiaoning and Bian, Pan and Liang, Guangtai and Wang, Qianxiang and Xie, Tao},
  title =	{{Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{6:1--6:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.6},
  URN =		{urn:nbn:de:0030-drops-232999},
  doi =		{10.4230/LIPIcs.ECOOP.2025.6},
  annote =	{Keywords: Application Security, Vulnerability Detection, Large Language Model}
}

@InProceedings{chen_et_al:LIPIcs.ECOOP.2025.6,
  author =	{Chen, Tianyu and Wang, Zeyu and Li, Lin and Li, Ding and Li, Zongyang and Chang, Xiaoning and Bian, Pan and Liang, Guangtai and Wang, Qianxiang and Xie, Tao},
  title =	{{Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{6:1--6:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.6},
  URN =		{urn:nbn:de:0030-drops-232999},
  doi =		{10.4230/LIPIcs.ECOOP.2025.6},
  annote =	{Keywords: Application Security, Vulnerability Detection, Large Language Model}
}

Document

Track B: Automata, Logic, Semantics, and Theory of Programming

DOI: 10.4230/LIPIcs.ICALP.2019.103

Monadic Decomposability of Regular Relations (Track B: Automata, Logic, Semantics, and Theory of Programming)

Authors: Pablo Barceló, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, and Reino Niskanen

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Abstract

Monadic decomposibility - the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas - is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it is recognizable, i.e., it has a monadic decomposition (that is, a representation as a boolean combination of cartesian products of regular languages). Regular relations are expressive formalisms which, using an appropriate string encoding, can capture relations definable in Presburger Arithmetic. In fact, their expressive power coincide with relations definable in a universal automatic structure; equivalently, those definable by finite set interpretations in WS1S (Weak Second Order Theory of One Successor). Determining whether a regular relation admits a recognizable relation was known to be decidable (and in exponential time for binary relations), but its precise complexity still hitherto remains open. Our main contribution is to fully settle the complexity of this decision problem by developing new techniques employing infinite Ramsey theory. The complexity for DFA (resp. NFA) representations of regular relations is shown to be NLOGSPACE-complete (resp. PSPACE-complete).

Cite as

Pablo Barceló, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, and Reino Niskanen. Monadic Decomposability of Regular Relations (Track B: Automata, Logic, Semantics, and Theory of Programming). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 103:1-103:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICALP.2019.103,
  author =	{Barcel\'{o}, Pablo and Hong, Chih-Duo and Le, Xuan-Bach and Lin, Anthony W. and Niskanen, Reino},
  title =	{{Monadic Decomposability of Regular Relations}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{103:1--103:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.103},
  URN =		{urn:nbn:de:0030-drops-106790},
  doi =		{10.4230/LIPIcs.ICALP.2019.103},
  annote =	{Keywords: Transducers, Automata, Synchronized Rational Relations, Ramsey Theory, Variable Independence, Automatic Structures}
}

@InProceedings{barcelo_et_al:LIPIcs.ICALP.2019.103,
  author =	{Barcel\'{o}, Pablo and Hong, Chih-Duo and Le, Xuan-Bach and Lin, Anthony W. and Niskanen, Reino},
  title =	{{Monadic Decomposability of Regular Relations}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{103:1--103:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.103},
  URN =		{urn:nbn:de:0030-drops-106790},
  doi =		{10.4230/LIPIcs.ICALP.2019.103},
  annote =	{Keywords: Transducers, Automata, Synchronized Rational Relations, Ramsey Theory, Variable Independence, Automatic Structures}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2016.19

Decidability and Complexity of Tree Share Formulas

Authors: Xuan Bach Le, Aquinas Hobor, and Anthony W. Lin

Published in: LIPIcs, Volume 65, 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)

Abstract

Fractional share models are used to reason about how multiple actors share ownership of resources. We examine the decidability and complexity of reasoning over the "tree share" model of Dockins et al. using first-order logic, or fragments thereof. We pinpoint a connection between the basic operations on trees union, intersection, and complement and countable atomless Boolean algebras, allowing us to obtain decidability with the precise complexity of both first-order and existential theories over the tree share model with the aforementioned operations. We establish a connection between the multiplication operation on trees and the theory of word equations, allowing us to derive the decidability of its existential theory and the undecidability of its full first-order theory. We prove that the full first-order theory over the model with both the Boolean operations and the restricted multiplication operation (with constants on the right hand side) is decidable via an embedding to tree-automatic structures.

Cite as

Xuan Bach Le, Aquinas Hobor, and Anthony W. Lin. Decidability and Complexity of Tree Share Formulas. In 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 65, pp. 19:1-19:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{le_et_al:LIPIcs.FSTTCS.2016.19,
  author =	{Le, Xuan Bach and Hobor, Aquinas and Lin, Anthony W.},
  title =	{{Decidability and Complexity of Tree Share Formulas}},
  booktitle =	{36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)},
  pages =	{19:1--19:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-027-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{65},
  editor =	{Lal, Akash and Akshay, S. and Saurabh, Saket and Sen, Sandeep},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2016.19},
  URN =		{urn:nbn:de:0030-drops-68544},
  doi =		{10.4230/LIPIcs.FSTTCS.2016.19},
  annote =	{Keywords: Fractional Share Models, Resource Accounting, Countable Atomless Boolean Algebras, Word Equations, Tree Automatic Structures}
}

4 Search Results for "Le, Xuan-Bach"

Can Open Large Language Models Catch Vulnerabilities?

Abstract

Cite as

Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories

Abstract

Cite as

Monadic Decomposability of Regular Relations (Track B: Automata, Logic, Semantics, and Theory of Programming)

Abstract

Cite as

Decidability and Complexity of Tree Share Formulas

Abstract

Cite as

Thanks for your feedback!

Could not send message