Opening Digitized Newspapers Corpora: Europeana’s Full-Text Data Interoperability Case

Authors Nuno Freire , Antoine Isaac , Twan Goosen , Daan Broeder , Hugo Manguinhas, Valentine Charles

Thumbnail PDF


  • Filesize: 1.17 MB
  • 14 pages

Document Identifiers

Author Details

Nuno Freire
  • INESC-ID, Lisbon, Portugal
Antoine Isaac
  • Europeana Foundation, The Hague, The Netherlands
  • Vrije Universiteit Amsterdam, The Netherlands
Twan Goosen
  • CLARIN ERIC, Utrecht, The Netherlands
Daan Broeder
  • KNAW Humanities Cluster, Amsterdam, The Netherlands
Hugo Manguinhas
  • Europeana Foundation, The Hague, The Netherlands
Valentine Charles
  • Europeana Foundation, The Hague, The Netherlands

Cite AsGet BibTex

Nuno Freire, Antoine Isaac, Twan Goosen, Daan Broeder, Hugo Manguinhas, and Valentine Charles. Opening Digitized Newspapers Corpora: Europeana’s Full-Text Data Interoperability Case. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a "full-text profile" for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

Subject Classification

ACM Subject Classification
  • Applied computing → Annotation
  • Applied computing → Document metadata
  • Applied computing → Digital libraries and archives
  • Metadata
  • Full-text
  • Interoperability
  • Data aggregation
  • Cultural Heritage
  • Research Infrastructures


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Timothy Berners-Lee. Linked Data Design Issues. W3C-Internal Document, 2006. Google Scholar
  2. Valentine Charles, Nuno Freire, Hugo Manguinhas, Peter Vos, and Glen Robson. Recommendations for enhancing EDM to represent digital content. Technical report, Europeana Cloud D4.4, 2016. Google Scholar
  3. Valentine Charles and Antoine Isaac. Enhancing the Europeana Data Model (EDM). Technical report, Europeana V3.0, 2015. Google Scholar
  4. CMDI Taskforce. Component Metadata Infrastructure (CMDI) Component Metadata Specification Version 1.2, 2016. Google Scholar
  5. Pascal Dugenie, Nuno Freire, and Daan Broeder. Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA: Two concrete case studies for exploring big archival data. In Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo A. Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, and Masashi Toyoda, editors, 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017, pages 2231-2239. IEEE Computer Society, 2017. URL:
  6. Alastair Dunning, Alena Fedesenka, Anastasia Gasia, and Markus Muhr. Report on newspapers data aggregated by The European Library. Technical report, Europeana Newspapers D4.5, 2015. Google Scholar
  7. Europeana Foundation. Definition of the Europeana Data Model v5.2.8, 2017. Google Scholar
  8. Europeana Foundation. Europeana Publishing Guide v1.5, 2017. Google Scholar
  9. Twan Goosen, Dieter Van Uytvanck, and Nuno Freire. Results and Impact of Sharing Europeana Data with CLARIN. Technical report, Europeana DSI-2 MS2.2, 2017. Google Scholar
  10. Sergiu Gordea, Hugo Manguinhas, Antoine Isaac, Valentine Charles, Maarten Brinkerink, Alessio Piccioli, and Breandán Knowlton. Modelling and exchanging annotation for Europeana projects. In Semantic Web in Libraries Conference 2015, 2015. Google Scholar
  11. Günter Mühlberger. METS ALTO Profile (ENMAP). Technical report, Europeana Newspapers D5.2, 2014. Google Scholar