Search Results

Documents authored by Muroya, Stefanie


Document
The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments

Authors: Petr Maj, Stefanie Muroya, Konrad Siek, Luca Di Grazia, and Jan Vitek

Published in: LIPIcs, Volume 313, 38th European Conference on Object-Oriented Programming (ECOOP 2024)


Abstract
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.

Cite as

Petr Maj, Stefanie Muroya, Konrad Siek, Luca Di Grazia, and Jan Vitek. The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 27:1-27:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{maj_et_al:LIPIcs.ECOOP.2024.27,
  author =	{Maj, Petr and Muroya, Stefanie and Siek, Konrad and Di Grazia, Luca and Vitek, Jan},
  title =	{{The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments}},
  booktitle =	{38th European Conference on Object-Oriented Programming (ECOOP 2024)},
  pages =	{27:1--27:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-341-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{313},
  editor =	{Aldrich, Jonathan and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.27},
  URN =		{urn:nbn:de:0030-drops-208769},
  doi =		{10.4230/LIPIcs.ECOOP.2024.27},
  annote =	{Keywords: software, mining code repositories, source code analysis}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail