A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data

Authors Ilkcan Keles , Omar Qawasmeh , Tabea Tietz , Ludovica Marinucci , Roberto Reda , Marieke van Erp

Thumbnail PDF


  • Filesize: 347 kB
  • 8 pages

Document Identifiers

Author Details

Ilkcan Keles
  • Aalborg University, Dept. of Computer Science, Denmark
Omar Qawasmeh
  • Univ. Lyon, CNRS, Lab. Hubert Curien UMR 5516, F-42023 Saint-Étienne, France
Tabea Tietz
  • FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
  • Karlsruhe Institute of Technology, Germany
Ludovica Marinucci
  • Semantic Technology Laboratory (STLab), Istituto di Scienze e Tecnologie della Cognizione-Consiglio Nazionale delle Ricerche (ISTC-CNR), Rome, Italy
Roberto Reda
  • Department of Computer Science and Engineering, University of Bologna, Italy
Marieke van Erp
  • KNAW Humanities Cluster, DHLab, The Netherlands


This work was made possible by the http://stlab.istc.cnr.it/isws/wordpress/ in Bertinoro, July 2018. The authors would like to thank the Summer School directors, Valentina Presutti and Harald Sack, as well as the tutors, the organizing team and the fellow students, in particular Amanda Pacini de Moura, Amr Azzam and Amina Annane for their suggestions and input.

Cite AsGet BibTex

Ilkcan Keles, Omar Qawasmeh, Tabea Tietz, Ludovica Marinucci, Roberto Reda, and Marieke van Erp. A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


The Web of Data has grown explosively over the past few years, and as with any dataset, there are bound to be invalid statements in the data, as well as gaps. Natural Language Processing (NLP) is gaining interest to fill gaps in data by transforming (unstructured) text into structured data. However, there is currently a fundamental mismatch in approaches between Linked Data and NLP as the latter is often based on statistical methods, and the former on explicitly modelling knowledge. However, these fields can strengthen each other by joining forces. In this position paper, we argue that using linked data to validate the output of an NLP system, and using textual data to validate Linked Open Data (LOD) cloud statements is a promising research avenue. We illustrate our proposal with a proof of concept on a corpus of historical travel stories.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • data validity
  • natural language processing
  • linked data


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Davide Ceolin, Valentina Maccatrozzo, Lora Aroyo, and T De-Nies. Linking Trust to Data Quality. In 4th International Workshop on Methods for Establishing Trust of (Open) Data, 2015. Google Scholar
  2. Davide Ceolin, Willem Robert van Hage, Wan Fokkink, and Guus Schreiber. Estimating Uncertainty of Categorical Web Data. In Proceedings of the 7th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2011), Bonn, Germany, October 23, 2011, pages 15-26, 2011. URL: http://ceur-ws.org/Vol-778/paper2.pdf.
  3. Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1925):3875-3889, 2010. Google Scholar
  4. Filip Ilievski, Piek Vossen, and Marieke van Erp. Hunger for Contextual Knowledge and a Road Map to Intelligent Entity Linking. In International Conference on Language, Data and Knowledge, pages 143-149. Springer, 2017. Google Scholar
  5. Andrew McCallum. Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9):48-57, 2005. URL: http://dx.doi.org/10.1145/1105664.1105679.
  6. Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, pages 1-8. ACM, 2011. Google Scholar
  7. Delip Rao, Paul McNamee, and Mark Dredze. Entity linking: Finding extracted entities in a knowledge base. In Multi-source, multilingual information extraction and summarization, pages 93-115. Springer, 2013. Google Scholar
  8. GBIF Secretariat. GBIF Backbone Taxonomy. Global Biodiversity Information Facility, 2013. URL: http://www.gbif.org/species/2879175.
  9. Rachele Sprugnoli. "Two days we have passed with the ancients...": a Digital Resource of Historical Travel Writings on Italy. SocArXiv, 2018. Google Scholar
  10. Marieke van Erp, Robert Hensel, Davide Ceolin, and Marian van der Meij. Georeferencing Animal Specimen Datasets. Trans. GIS, 19(4):563-581, 2015. URL: http://dx.doi.org/10.1111/tgis.12110.
  11. Maria Vasardani, Stephan Winter, and Kai-Florian Richter. Locating place names from place descriptions. International Journal of Geographical Information Science, 27(12):2509-2532, 2013. URL: http://dx.doi.org/10.1080/13658816.2013.785550.
  12. Iris Xie and Krystyna Matusiak. Discover digital libraries: Theory and practice. Elsevier, 2016. Google Scholar