,
Maria João Varanda Pereira
,
Pedro Rangel Henriques
Creative Commons Attribution 4.0 International license
In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories.
@InProceedings{loureiro_et_al:OASIcs.SLATE.2025.13,
author = {Loureiro, Jo\~{a}o and Costa Neto, Alvaro and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
title = {{Mining GitHub Software Repositories to Look for Programming Language Cocktails}},
booktitle = {14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
pages = {13:1--13:16},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-387-4},
ISSN = {2190-6807},
year = {2025},
volume = {135},
editor = {Baptista, Jorge and Barateiro, Jos\'{e}},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.13},
URN = {urn:nbn:de:0030-drops-236933},
doi = {10.4230/OASIcs.SLATE.2025.13},
annote = {Keywords: Software Repository Mining, Source Version Control, GitHub Scraping, Programming Cocktails}
}