Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations

Authors David Rodrigues, António L. Lopes , Fernando Batista

Thumbnail PDF


  • Filesize: 0.56 MB
  • 11 pages

Document Identifiers

Author Details

David Rodrigues
  • Iscte - University Institute of Lisbon, Portugal
António L. Lopes
  • Instituto de Telecomunicações, Iscte - University Institute of Lisbon, Portugal
Fernando Batista
  • Iscte - University Institute of Lisbon, Portugal
  • INESC-ID Lisbon, Portugal

Cite AsGet BibTex

David Rodrigues, António L. Lopes, and Fernando Batista. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


The number of citations a research paper receives is a crucial metric for both researchers and institutions. However, since citation databases have their own source lists, finding all the citations of a given paper can be a challenge. As a result, there may be missing citations that are not counted towards a paper’s total citation count. To address this issue, we present an automated approach to find missing citations leveraging the use of multiple indexing databases. In this research, Web of Science (WoS) serves as a case study and OpenAlex is used as a reference point for comparison. For a given paper, we identify all citing papers found in both research databases. Then, for each citing paper we check if it is indexed in WoS, but not referred in WoS as a citing paper, in order to determine if it is a missing citation. In our experiments, from a set of 1539 papers indexed by WoS, we found 696 missing citations. This outcome proves the success of our approach, and reveals that WoS does not always consider the full list of citing papers of a given publication, even when these citing papers are indexed by WoS. We also found that WoS has a higher chance of missing information for more recent publications. These findings provide relevant insights about this indexing research database, and provide enough motivation for considering other research databases in our study, such as Scopus and Google Scholar, in order to improve the matching and querying algorithms, and to reduce false positives, towards providing a more comprehensive and accurate view of the citations of a paper.

Subject Classification

ACM Subject Classification
  • Applied computing → Publishing
  • General and reference → Verification
  • Information systems → Digital libraries and archives
  • Information systems → Enterprise applications
  • Applied computing → Digital libraries and archives
  • Information systems → Data cleaning
  • Research Databases
  • Citations
  • Citation Databases
  • Web of Science
  • OpenAlex


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Robert A. Buchanan. Accuracy of cited references: The role of citation databases. College & Research Libraries, 67(4):292-303, 2006. URL: https://doi.org/10.5860/crl.67.4.292.
  2. Alessia Cioffi, Sara Coppini, Arcangelo Massari, Arianna Moretti, Silvio Peroni, Cristian Santini, and Nooshin Shahidzadeh Asadi. Identifying and correcting invalid citations due to doi errors in crossref data. Scientometrics, 127:3593-3612, 2022. URL: https://doi.org/10.1007/s11192-022-04367-w.
  3. Fiorenzo Franceschini, Domenico Maisano, and Luca Mastrogiacomo. A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics. Journal of the American Society for Information Science and Technology, 64(10):2149-2156, 2013. URL: https://doi.org/10.1002/asi.22898.
  4. Fiorenzo Franceschini, Domenico Maisano, and Luca Mastrogiacomo. Errors in doi indexing by bibliometric databases. Scientometrics, 102:2181-2186, 2015. URL: https://doi.org/10.1007/s11192-014-1503-4.
  5. Miguel A. García-Pérez. Accuracy and completeness of publication and citation records in the web of science, psycinfo, and google scholar: A case study for the computation of h indices in psychology. Journal of the American Society for Information Science and Technology, 61(10):2070-2085, 2010. URL: https://doi.org/10.1002/asi.21372.
  6. Erwin Krauskopf. Missing documents in scopus: the case of the journal enfermeria nefrologica. Scientometrics, 119:543-547, 2019. URL: https://doi.org/10.1007/s11192-019-03040-z.
  7. Henk F. Moed, Judit Bar-Ilan, and Gali Halevi. A new methodology for comparing google scholar and scopus. Journal of Informetrics, 10(2):533-551, 2016. URL: https://doi.org/10.1016/j.joi.2016.04.017.
  8. Jason Priem, Heather Piwowar, and Richard Orr. Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts, 2022. URL: https://doi.org/10.48550/ARXIV.2205.01833.
  9. Misha Teplitskiy, Eamon Duede, Michael Menietti, and Karim R. Lakhani. How status of research papers affects the way they are read and cited. Research Policy, 51(4):104484, 2022. URL: https://doi.org/10.1016/j.respol.2022.104484.
  10. Nees Jan van Eck and Ludo Waltman. Accuracy of citation data in web of science and scopus, 2019. URL: https://doi.org/10.48550/ARXIV.1906.07011.
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail