Using Machine Learning for Vulnerability Detection and Classification

Baptista, Tiago; Oliveira, Nuno; Henriques, Pedro Rangel

doi:10.4230/OASIcs.SLATE.2021.14

File

PDF

OASIcs.SLATE.2021.14.pdf

Filesize: 1.2 MB
14 pages

Document Identifiers

DOI: 10.4230/OASIcs.SLATE.2021.14
URN: urn:nbn:de:0030-drops-144315

Subject Classification

ACM Subject Classification

Security and privacy → Vulnerability scanners
Computing methodologies → Machine learning

Keywords

Vulnerability Detection
Source Code Analysis
Machine Learning

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

The work described in this paper aims at developing a machine learning based tool for automatic identification of vulnerabilities on programs (source, high level code), that uses an abstract syntax tree representation. It is based on FastScan, using code2seq approach. Fastscan is a recently developed system aimed capable of detecting vulnerabilities in source code using machine learning techniques. Nevertheless, FastScan is not able of identifying the vulnerability type. In the presented work the main goal is to go further and develop a method to identify specific types of vulnerabilities. As will be shown, the goal will be achieved by optimizing the model’s hyperparameters, changing the method of preprocessing the input data and developing an architecture that brings together multiple models to predict different specific vulnerabilities. The preliminary results obtained from the training stage, are very promising. The best f1 metric obtained is 93% resulting in a precision of 90% and accuracy of 85%, according to the performed tests and regarding a trained model to predict vulnerabilities of the injection type.

Cite As Get BibTex

Tiago Baptista, Nuno Oliveira, and Pedro Rangel Henriques. Using Machine Learning for Vulnerability Detection and Classification. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 14:1-14:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.SLATE.2021.14

Author Details

Tiago Baptista

Centro Algoritmi, Departamento de Informática, University of Minho, Braga, Portugal

Nuno Oliveira

Checkmarx, Braga, Portugal

Pedro Rangel Henriques

Centro Algoritmi, Departamento de Informática, University of Minho, Braga, Portugal

Acknowledgements

Special thanks to Search-ON2: Revitalization of HPC infrastructure of UMinho, (NORTE-07-0162-FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2-O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF).

References

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. code2seq: Generating sequences from structured representations of code. arXiv preprint, 2018. URL: http://arxiv.org/abs/1808.01400.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1-29, 2019.
Philip K Chan and Richard P Lippmann. Machine learning for computer security. Journal of Machine Learning Research, 7(Dec):2669-2672, 2006.
Brian Chess and Gary McGraw. Static analysis for security. IEEE security & privacy, 2(6):76-79, 2004.
Brian Chess and Jacob West. Secure programming with static analysis. Pearson Education, 2007.
Crispan Cowan, Calton Pu, Dave Maier, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, Qian Zhang, and Heather Hinton. Stackguard: Automatic adaptive detection and prevention of buffer-overflow attacks. In USENIX security symposium, volume 98, pages 63-78. San Antonio, TX, 1998.
Mark Dowd, John McDonald, and Justin Schuh. The art of software security assessment: Identifying and preventing software vulnerabilities. Pearson Education, 2006.
Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. An updated performance comparison of virtual machines and linux containers. In 2015 IEEE international symposium on performance analysis of systems and software (ISPASS), pages 171-172. IEEE, 2015.
Samuel Gonçalves Ferreira. Vulnerabilities fast scan - tackling sast performance issues with machine learning. Master’s thesis, University of Minho, 2019.
Rahma Mahmood and Qusay H Mahmoud. Evaluation of static analysis tools for finding vulnerabilities in Java and C/C++ source code. arXiv preprint, 2018. URL: http://arxiv.org/abs/1805.09040.
Marco Pistoia, Satish Chandra, Stephen J Fink, and Eran Yahav. A survey of static analysis methods for identifying security vulnerabilities in software systems. IBM Systems Journal, 46(2):265-288, 2007.
R. W. Shirey. Internet security glossary, version 2. RFC, 4949:1-365, 2007.
Robert W. Shirey. Internet security glossary, version 2. RFC, 4949:1-365, 2007. URL: https://doi.org/10.17487/RFC4949.
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. arXiv preprint, 2012. URL: http://arxiv.org/abs/1206.2944.

Using Machine Learning for Vulnerability Detection and Classification

Authors Tiago Baptista , Nuno Oliveira, Pedro Rangel Henriques

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Using Machine Learning for Vulnerability Detection and Classification

Authors Tiago Baptista , Nuno Oliveira, Pedro Rangel Henriques

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message