Search Results

Documents authored by Klauke, Jonas


Document
Java Bytecode Normalization for Code Similarity Analysis

Authors: Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, and Eric Bodden

Published in: LIPIcs, Volume 313, 38th European Conference on Object-Oriented Programming (ECOOP 2024)


Abstract
Analyzing the similarity of two code fragments has many applications, including code clone, vulnerability and plagiarism detection. Most existing approaches for similarity analysis work on source code. However, in scenarios like plagiarism detection, copyright violation detection or Software Bill of Materials creation source code is often not available and thus similarity analysis has to be performed on binary formats. Java bytecode is a binary format executable by the Java Virtual Machine and obtained from the compilation of Java source code. Performing similarity detection on bytecode is challenging because different compilers can compile the same source code to syntactically vastly different bytecode. In this work we assess to what extent one can nonetheless enable similarity detection by bytecode normalization, a procedure to transform Java bytecode into a representation that is identical for the same original source code, irrespective of the Java compiler and Java version used during compilation. Our manual study revealed 16 classes of compilation differences that various compilation environments may induce. Based on these findings, we implemented bytecode normalization in a tool jNorm. It uses Jimple as intermediate representation, applies common code optimizations and transforms all classes of compilation difference to a normalized form, thus achieving a representation of the bytecode that is identical despite different compilation environments. Our evaluation, performed on more than 300 popular Java projects, shows that solely the act of incrementing a compiler version may cause differences in 46% of all resulting bytecode files. By applying bytecode normalization, one can remove more than 99% of these differences, thus acting as a crucial enabler for subsequent applications of bytecode similarity analysis.

Cite as

Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, and Eric Bodden. Java Bytecode Normalization for Code Similarity Analysis. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 37:1-37:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{schott_et_al:LIPIcs.ECOOP.2024.37,
  author =	{Schott, Stefan and Ponta, Serena Elisa and Fischer, Wolfram and Klauke, Jonas and Bodden, Eric},
  title =	{{Java Bytecode Normalization for Code Similarity Analysis}},
  booktitle =	{38th European Conference on Object-Oriented Programming (ECOOP 2024)},
  pages =	{37:1--37:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-341-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{313},
  editor =	{Aldrich, Jonathan and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.37},
  URN =		{urn:nbn:de:0030-drops-208865},
  doi =		{10.4230/LIPIcs.ECOOP.2024.37},
  annote =	{Keywords: Bytecode, Java Compiler, Code Similarity Analysis}
}
Document
Artifact
Java Bytecode Normalization for Code Similarity Analysis (Artifact)

Authors: Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, and Eric Bodden

Published in: DARTS, Volume 10, Issue 2, Special Issue of the 38th European Conference on Object-Oriented Programming (ECOOP 2024)


Abstract
This artifact supports the claim that different Java compilation environments can produce significantly different bytecode and that bytecode normalization applied via our tool jNorm heavily decreases the amount of compilation differences and helps boost the performance of subsequent code similarity analysis. Our artifact provides the source code of the tool jNorm and all scripts needed to reproduce the results we presented in our evaluation. Furthermore, it contains a study on the usage of different Java compilers and target levels within popular open-source projects, which showcases that the JDK compiler is by far the most relevant compiler in practice.

Cite as

Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, and Eric Bodden. Java Bytecode Normalization for Code Similarity Analysis (Artifact). In Special Issue of the 38th European Conference on Object-Oriented Programming (ECOOP 2024). Dagstuhl Artifacts Series (DARTS), Volume 10, Issue 2, pp. 20:1-20:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Article{schott_et_al:DARTS.10.2.20,
  author =	{Schott, Stefan and Ponta, Serena Elisa and Fischer, Wolfram and Klauke, Jonas and Bodden, Eric},
  title =	{{Java Bytecode Normalization for Code Similarity Analysis (Artifact)}},
  pages =	{20:1--20:3},
  journal =	{Dagstuhl Artifacts Series},
  ISBN =	{978-3-95977-342-3},
  ISSN =	{2509-8195},
  year =	{2024},
  volume =	{10},
  number =	{2},
  editor =	{Schott, Stefan and Ponta, Serena Elisa and Fischer, Wolfram and Klauke, Jonas and Bodden, Eric},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.10.2.20},
  URN =		{urn:nbn:de:0030-drops-209189},
  doi =		{10.4230/DARTS.10.2.20},
  annote =	{Keywords: Bytecode, Java Compiler, Code Similarity Analysis}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail