Dagstuhl Seminar Proceedings, Volume 6301

Document

06301 Abstracts Collection – Duplication, Redundancy, and Similarity in Software

Authors: Rainer Koschke, Andrew Walenstein, and Ettore Merlo

Abstract

From 23.07.06 to 26.07.06, the Dagstuhl Seminar 06301 ``Duplication, Redundancy, and Similarity in Software'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Rainer Koschke, Andrew Walenstein, and Ettore Merlo. 06301 Abstracts Collection – Duplication, Redundancy, and Similarity in Software. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{koschke_et_al:DagSemProc.06301.1,
  author =	{Koschke, Rainer and Walenstein, Andrew and Merlo, Ettore},
  title =	{{06301 Abstracts Collection – Duplication, Redundancy, and Similarity in Software}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--12},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.1},
  URN =		{urn:nbn:de:0030-drops-9729},
  doi =		{10.4230/DagSemProc.06301.1},
  annote =	{Keywords: Software clones, code redundancy, clone detection, redundancy removal, software refactoring, software}
}

Document

DOI: 10.4230/DagSemProc.06301.2

06301 Summary – Duplication, Redundancy, and Similarity in Software

Authors: Andrew Walenstein, Rainer Koschke, and Ettore Merlo

Abstract

This paper summarizes the proceedings and outcomes of the Dagstuhl Seminar 06301. The purpose of the seminar was to bring together a broad selection of experts on duplication, redundancy, and similarity in software in order to: synthesize a comprehensive understanding of the topic area, appreciate the diversity in the topic, and to critically evaluate current knowledge. The structure of the seminar was specifically formulated to evoke such a synthesis and evaluation. We report here the success of this seminar and summarize its results, much of which is a record of working groups charged with discussing the topics of interest.

Cite as

Andrew Walenstein, Rainer Koschke, and Ettore Merlo. 06301 Summary – Duplication, Redundancy, and Similarity in Software. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{walenstein_et_al:DagSemProc.06301.2,
  author =	{Walenstein, Andrew and Koschke, Rainer and Merlo, Ettore},
  title =	{{06301 Summary – Duplication, Redundancy, and Similarity in Software}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.2},
  URN =		{urn:nbn:de:0030-drops-9717},
  doi =		{10.4230/DagSemProc.06301.2},
  annote =	{Keywords: Duplication, redundancy, similarity, code clone, clone detector, refactor, code smells, software evolution, program development, visualization, softwa}
}

Document

DOI: 10.4230/DagSemProc.06301.3

06301 Working Session Summary: Presentation and Visualization of Redundant Code

Authors: Andrew Walenstein, James R. Cordy, William S. Evans, Ahmed Hassan, Toshihiro Kamiya, Cory Kapser, and Ettore Merlo

Abstract

This report summarizes the proceedings of a workshop discussion session presentation and visualization of aspects relating to duplicated, copied, or cloned code. The main outcomes of the working session were: (a) a realization that two researchers had independently generated very similar methods for browsing and visualization clone "clusters," and (b) a list of questions for visualization, particularly in relation to how the "proximity" of clones may relate to interest in the clone.

Cite as

Andrew Walenstein, James R. Cordy, William S. Evans, Ahmed Hassan, Toshihiro Kamiya, Cory Kapser, and Ettore Merlo. 06301 Working Session Summary: Presentation and Visualization of Redundant Code. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{walenstein_et_al:DagSemProc.06301.3,
  author =	{Walenstein, Andrew and Cordy, James R. and Evans, William S. and Hassan, Ahmed and Kamiya, Toshihiro and Kapser, Cory and Merlo, Ettore},
  title =	{{06301 Working Session Summary: Presentation and Visualization of Redundant Code}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.3},
  URN =		{urn:nbn:de:0030-drops-9661},
  doi =		{10.4230/DagSemProc.06301.3},
  annote =	{Keywords: Code clone, clone visualization, presentation, software visualization}
}

Document

DOI: 10.4230/DagSemProc.06301.4

Allowing Overlapping Boundaries in Source Code using a Search Based Approach to Concept Binding

Authors: Kiarash Mahdavi, Nicolas Gold, Zheng Li, and Mark Harman

Abstract

One approach to supporting program comprehension involves binding concepts to source code. Previously proposed approaches to concept binding have enforced nonoverlapping boundaries. However, real-world programs may contain overlapping concepts. This paper presents techniques to allow boundary overlap in the binding of concepts to source code. In order to allow boundaries to overlap, the concept binding problem is reformulated as a search problem. It is shown that the search space of overlapping concept bindings is exponentially large, indicating the suitability of sampling-based search algorithms. Hill climbing and genetic algorithms are introduced for sampling the space. The paper reports on experiments that apply these algorithms to 21 COBOL II programs taken from the commercial financial services sector. The results show that the genetic algorithm produces significantly better solutions than both the hill climber and random search.

Cite as

Kiarash Mahdavi, Nicolas Gold, Zheng Li, and Mark Harman. Allowing Overlapping Boundaries in Source Code using a Search Based Approach to Concept Binding. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{mahdavi_et_al:DagSemProc.06301.4,
  author =	{Mahdavi, Kiarash and Gold, Nicolas and Li, Zheng and Harman, Mark},
  title =	{{Allowing Overlapping Boundaries in Source Code using a Search Based Approach to Concept Binding}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--10},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.4},
  URN =		{urn:nbn:de:0030-drops-9616},
  doi =		{10.4230/DagSemProc.06301.4},
  annote =	{Keywords: Concept Assignment, Slicing, Clustering, Heuristic Algorithms}
}

Document

DOI: 10.4230/DagSemProc.06301.5

Clone Detector Use Questions: A List of Desirable Empirical Studies

Authors: Thomas R. Dean, Massamiliano Di Penta, Kostas Kontogiannis, and Andrew Walenstein

Abstract

Code "clones" are similar segments of code that are frequently introduced by "scavenging" existing code, that is, reusing code by copying it and adapting it for a new use. In order to scavenge the code, the developer must be aware of it already, or must find it. Little is known about how tools - particularly search tools - impact the clone construction process, nor how developers use them for this purpose. This paper lists five outstanding research questions in this area and proposes sketches of designs for five empirical studies that might be conducted to help shed light on those questions.

Cite as

Thomas R. Dean, Massamiliano Di Penta, Kostas Kontogiannis, and Andrew Walenstein. Clone Detector Use Questions: A List of Desirable Empirical Studies. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{dean_et_al:DagSemProc.06301.5,
  author =	{Dean, Thomas R. and Di Penta, Massamiliano and Kontogiannis, Kostas and Walenstein, Andrew},
  title =	{{Clone Detector Use Questions:  A List of Desirable Empirical Studies}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.5},
  URN =		{urn:nbn:de:0030-drops-9695},
  doi =		{10.4230/DagSemProc.06301.5},
  annote =	{Keywords: Code clone, clone detector, code search, reuse, code scavenging, empirical study}
}

Document

DOI: 10.4230/DagSemProc.06301.6

Code Clones: Reconsidering Terminology

Authors: Andrew Walenstein

Abstract

This report discusses terminology choices and considerations relating to copied or redundant code within software systems, i.e., relating to "code clones." Inadequacies of existing terminology are raised and alternative terms are discussed.

Cite as

Andrew Walenstein. Code Clones: Reconsidering Terminology. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-7, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{walenstein:DagSemProc.06301.6,
  author =	{Walenstein, Andrew},
  title =	{{Code Clones:  Reconsidering Terminology}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--7},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.6},
  URN =		{urn:nbn:de:0030-drops-9654},
  doi =		{10.4230/DagSemProc.06301.6},
  annote =	{Keywords: Code clone, exact clone, near clone, clone types, accidental clone, duplicate, copy, redundant}
}

Document

DOI: 10.4230/DagSemProc.06301.7

Detection of Plagiarism in University Projects Using Metrics-based Spectral Similarity

Authors: Ettore Merlo

Abstract

An original method of spectral similarity analysis for plagiarism detection in university project is presented. The approach is based on a clone detection tool called CLAN that performs metrics based similarity analysis of source code fragments. Definitions and algorithms for spectral similarity analysis are presented and discussed. Experiments performed on university projects are presented. Experimental results include the distribution of similarity in C and C++ projects. Analysis of spectral similarity distribution identifies the most similar pairs of projects that can be considered as candidates for plagiarism.

Cite as

Ettore Merlo. Detection of Plagiarism in University Projects Using Metrics-based Spectral Similarity. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{merlo:DagSemProc.06301.7,
  author =	{Merlo, Ettore},
  title =	{{Detection of Plagiarism in University Projects Using Metrics-based Spectral Similarity}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--10},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.7},
  URN =		{urn:nbn:de:0030-drops-9864},
  doi =		{10.4230/DagSemProc.06301.7},
  annote =	{Keywords: Plagiarism detection, software comparison, clone detection, spectral analysis, code metrics}
}

Document

DOI: 10.4230/DagSemProc.06301.8

Generic modelling of code clones

Authors: Simon Giesecke

Abstract

Code clones, i.e. instances of duplicated code, can be found in many software systems. They adversely affect the software systems’ quality, in particular their maintainability and comprehensibility. Thus, this aspect is particularly important to consider in software maintenance and reengineering. Many different algorithms detecting code clones have been developed. For various reasons, it is difficult to compare the results of different algorithms. Most notable among these reasons is that there is no conceptual model allowing description of code clones determined by different algorithms. Much more, each algorithm uses its specific concept of code clones, which is rarely made explicit. To overcome these problems, we have developed a generic model for describing clones. The model is generic in that it is independent of the programming language examined and of the clone detection algorithm used. It is flexible enough to facilitate various granularities of artifacts employed for selection and comparison, including inexact clones. The model allows separation of concerns between clone detection, description and management, which reduces the effort for the implementation of tools supporting these activities. On the basis of the model, we have implemented a prototype tool supporting these activities, tightly integrated into the Eclipse environment.

Cite as

Simon Giesecke. Generic modelling of code clones. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{giesecke:DagSemProc.06301.8,
  author =	{Giesecke, Simon},
  title =	{{Generic modelling of code clones}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--23},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.8},
  URN =		{urn:nbn:de:0030-drops-9608},
  doi =		{10.4230/DagSemProc.06301.8},
  annote =	{Keywords: Code clones, clone detection, reference model}
}

Document

DOI: 10.4230/DagSemProc.06301.9

Managing Known Clones: Issues and Open Questions

Authors: Kostas Kontogiannis

Abstract

Many software systems contained cloned code, i.e., segments of code that are highly similar to each other, typically because one has been copied from the other, and then possibly modified. In some contexts, clones are of interest because they are targets for refactoring. This paper summarizes the results of a working session in which the problems of merely managing clones that are already known to exist. Six key issues in the space are briefly reviewed, and open questions raised in the working session are listed.

Cite as

Kostas Kontogiannis. Managing Known Clones: Issues and Open Questions. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{kontogiannis:DagSemProc.06301.9,
  author =	{Kontogiannis, Kostas},
  title =	{{Managing Known Clones:  Issues and Open Questions}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.9},
  URN =		{urn:nbn:de:0030-drops-9679},
  doi =		{10.4230/DagSemProc.06301.9},
  annote =	{Keywords: Code clone, software evolution, change management, code visualization, redundancy, metamodels, software management environments}
}

Document

DOI: 10.4230/DagSemProc.06301.10

Program Compression

Authors: William S. Evans

Abstract

The talk focused on a grammar-based technique for identifying redundancy in program code and taking advantage of that redundancy to reduce the memory required to store and execute the program. The idea is to start with a simple context-free grammar that represents all valid basic blocks of any program. We represent a program by the parse trees (i.e. derivations) of its basic blocks using the grammar. We then modify the grammar, by considering sample programs, so that idioms of the language have shorter derivations in the modified grammar. Since each derivation represents a basic block, we can interpret the resulting set of derivations much as we would interpret the original program. We need only expand the grammar rules indicated by the derivation to produce a sequence of original program instructions to execute. The result is a program representation that is approximately 40% of the original program size and is interpretable by a very modest-sized interpreter.

Cite as

William S. Evans. Program Compression. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{evans:DagSemProc.06301.10,
  author =	{Evans, William S.},
  title =	{{Program Compression}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--10},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.10},
  URN =		{urn:nbn:de:0030-drops-9635},
  doi =		{10.4230/DagSemProc.06301.10},
  annote =	{Keywords: Program compression, clone detection, bytecode interpretation, variable-to-fixed length codes, context-free grammars}
}

Document

DOI: 10.4230/DagSemProc.06301.11

Similarity in Programs

Authors: Andrew Walenstein, Mohammad El-Ramly, James R. Cordy, William S. Evans, Kiarash Mahdavi, Markus Pizka, Ganesan Ramalingam, and Jürgen Wolff von Gudenberg

Abstract

An overview of the concept of program similarity is presented. It divides similarity into two types - syntactic and semantic - and provides a review of eight categories of methods that may be used to measure program similarity. A summary of some applications of these methods is included. The paper is intended to be a starting point for a more comprehensive analysis of the subject of similarity in programs, which is critical to understand if progress is to be made in fields such as clone detection.

Cite as

Andrew Walenstein, Mohammad El-Ramly, James R. Cordy, William S. Evans, Kiarash Mahdavi, Markus Pizka, Ganesan Ramalingam, and Jürgen Wolff von Gudenberg. Similarity in Programs. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{walenstein_et_al:DagSemProc.06301.11,
  author =	{Walenstein, Andrew and El-Ramly, Mohammad and Cordy, James R. and Evans, William S. and Mahdavi, Kiarash and Pizka, Markus and Ramalingam, Ganesan and von Gudenberg, J\"{u}rgen Wolff},
  title =	{{Similarity in Programs}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.11},
  URN =		{urn:nbn:de:0030-drops-9681},
  doi =		{10.4230/DagSemProc.06301.11},
  annote =	{Keywords: Computer programs, similarity, code clone, software comparison, program metrics, Levenshtein distance, parameterized difference, feature space, shared}
}

Document

DOI: 10.4230/DagSemProc.06301.12

Subjectivity in Clone Judgment: Can We Ever Agree?

Authors: Cory Kapser, Paul Anderson, Michael Godfrey, Rainer Koschke, Matthias Rieger, Filip van Rysselberghe, and Peter Weißgerber

Abstract

An objective definition of what a code clone is currently eludes the field. A small study was performed at an international workshop to elicit judgments and discussions from world experts regarding what characteristics define a code clone. Less than half of the clone candidates judged had 80% agreement amongst the judges. Judges appeared to differ primarily in their criteria for judgment rather than their interpretation of the clone candidates. In subsequent open discussion the judges provided several reasons for their judgments. The study casts additional doubt on the reliability of experimental results in the field when the full criterion for clone judgment is not spelled out.

Cite as

Cory Kapser, Paul Anderson, Michael Godfrey, Rainer Koschke, Matthias Rieger, Filip van Rysselberghe, and Peter Weißgerber. Subjectivity in Clone Judgment: Can We Ever Agree?. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{kapser_et_al:DagSemProc.06301.12,
  author =	{Kapser, Cory and Anderson, Paul and Godfrey, Michael and Koschke, Rainer and Rieger, Matthias and van Rysselberghe, Filip and Wei{\ss}gerber, Peter},
  title =	{{Subjectivity in Clone Judgment:  Can We Ever Agree?}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.12},
  URN =		{urn:nbn:de:0030-drops-9701},
  doi =		{10.4230/DagSemProc.06301.12},
  annote =	{Keywords: Code clone, study, inter-rater agreement, ill-defined problem}
}

Document

DOI: 10.4230/DagSemProc.06301.13

Survey of Research on Software Clones

Authors: Rainer Koschke

Abstract

This report summarizes my overview talk on software clone detection research. It first discusses the notion of software redundancy, cloning, duplication, and similarity. Then, it describes various categorizations of clone types, empirical studies on the root causes for cloning, current opinions and wisdom of consequences of cloning, empirical studies on the evolution of clones, ways to remove, to avoid, and to detect them, empirical evaluations of existing automatic clone detector performance (such as recall, precision, time and space consumption) and their fitness for a particular purpose, benchmarks for clone detector evaluations, presentation issues, and last but not least application of clone detection in other related fields. After each summary of a subarea, I am listing open research questions.

Cite as

Rainer Koschke. Survey of Research on Software Clones. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{koschke:DagSemProc.06301.13,
  author =	{Koschke, Rainer},
  title =	{{Survey of Research on Software Clones}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--24},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.13},
  URN =		{urn:nbn:de:0030-drops-9625},
  doi =		{10.4230/DagSemProc.06301.13},
  annote =	{Keywords: Software redundancy, code clone, software evolution, clone detector, empirical evaluation}
}

Document

DOI: 10.4230/DagSemProc.06301.14

The Software Similarity Problem in Malware Analysis

Authors: Andrew Walenstein and Arun Lakhotia

Abstract

In software engineering contexts software may be compared for similarity in order to detect duplicate code that indicates poor design, and to reconstruct evolution history. Malicious software, being nothing other than a particular type of software, can also be compared for similarity in order to detect commonalities and evolution history. This paper provides a brief introduction to the issue of measuring similarity between malicious programs, and how evolution is known to occur in the area. It then uses this review to try to draw lines that connect research in software engineering (e.g., on "clone detection") to problems in anti-malware research.

Cite as

Andrew Walenstein and Arun Lakhotia. The Software Similarity Problem in Malware Analysis. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{walenstein_et_al:DagSemProc.06301.14,
  author =	{Walenstein, Andrew and Lakhotia, Arun},
  title =	{{The Software Similarity Problem in Malware Analysis}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--10},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.14},
  URN =		{urn:nbn:de:0030-drops-9640},
  doi =		{10.4230/DagSemProc.06301.14},
  annote =	{Keywords: Software, software evolution, commonality, program similarity, code clones, code smells, malicious software, malware, worms, Trojans, viruses, spyware}
}

Dagstuhl Seminar Proceedings, Volume 6301

Publication Details

Access Numbers

Documents

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Filters

Thanks for your feedback!

Could not send message