A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data

Keles, Ilkcan; Qawasmeh, Omar; Tietz, Tabea; Marinucci, Ludovica; Reda, Roberto; van Erp, Marieke

doi:10.4230/OASIcs.LDK.2019.13

Abstract

The Web of Data has grown explosively over the past few years, and as with any dataset, there are bound to be invalid statements in the data, as well as gaps. Natural Language Processing (NLP) is gaining interest to fill gaps in data by transforming (unstructured) text into structured data. However, there is currently a fundamental mismatch in approaches between Linked Data and NLP as the latter is often based on statistical methods, and the former on explicitly modelling knowledge. However, these fields can strengthen each other by joining forces. In this position paper, we argue that using linked data to validate the output of an NLP system, and using textual data to validate Linked Open Data (LOD) cloud statements is a promising research avenue. We illustrate our proposal with a proof of concept on a corpus of historical travel stories.

Cite As Get BibTex

Ilkcan Keles, Omar Qawasmeh, Tabea Tietz, Ludovica Marinucci, Roberto Reda, and Marieke van Erp. A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/OASIcs.LDK.2019.13

Author Details

Ilkcan Keles

Aalborg University, Dept. of Computer Science, Denmark

Omar Qawasmeh

Univ. Lyon, CNRS, Lab. Hubert Curien UMR 5516, F-42023 Saint-Étienne, France

Tabea Tietz

FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
Karlsruhe Institute of Technology, Germany

Ludovica Marinucci

Semantic Technology Laboratory (STLab), Istituto di Scienze e Tecnologie della Cognizione-Consiglio Nazionale delle Ricerche (ISTC-CNR), Rome, Italy

Roberto Reda

Department of Computer Science and Engineering, University of Bologna, Italy

Marieke van Erp

KNAW Humanities Cluster, DHLab, The Netherlands

Acknowledgements

This work was made possible by the http://stlab.istc.cnr.it/isws/wordpress/ in Bertinoro, July 2018. The authors would like to thank the Summer School directors, Valentina Presutti and Harald Sack, as well as the tutors, the organizing team and the fellow students, in particular Amanda Pacini de Moura, Amr Azzam and Amina Annane for their suggestions and input.

References

Davide Ceolin, Valentina Maccatrozzo, Lora Aroyo, and T De-Nies. Linking Trust to Data Quality. In 4th International Workshop on Methods for Establishing Trust of (Open) Data, 2015.
Davide Ceolin, Willem Robert van Hage, Wan Fokkink, and Guus Schreiber. Estimating Uncertainty of Categorical Web Data. In Proceedings of the 7th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2011), Bonn, Germany, October 23, 2011, pages 15-26, 2011. URL: http://ceur-ws.org/Vol-778/paper2.pdf.
Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1925):3875-3889, 2010.
Filip Ilievski, Piek Vossen, and Marieke van Erp. Hunger for Contextual Knowledge and a Road Map to Intelligent Entity Linking. In International Conference on Language, Data and Knowledge, pages 143-149. Springer, 2017.
Andrew McCallum. Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9):48-57, 2005. URL: http://dx.doi.org/10.1145/1105664.1105679.
Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, pages 1-8. ACM, 2011.
Delip Rao, Paul McNamee, and Mark Dredze. Entity linking: Finding extracted entities in a knowledge base. In Multi-source, multilingual information extraction and summarization, pages 93-115. Springer, 2013.
GBIF Secretariat. GBIF Backbone Taxonomy. Global Biodiversity Information Facility, 2013. URL: http://www.gbif.org/species/2879175.
Rachele Sprugnoli. "Two days we have passed with the ancients...": a Digital Resource of Historical Travel Writings on Italy. SocArXiv, 2018.
Marieke van Erp, Robert Hensel, Davide Ceolin, and Marian van der Meij. Georeferencing Animal Specimen Datasets. Trans. GIS, 19(4):563-581, 2015. URL: http://dx.doi.org/10.1111/tgis.12110.
Maria Vasardani, Stephan Winter, and Kai-Florian Richter. Locating place names from place descriptions. International Journal of Geographical Information Science, 27(12):2509-2532, 2013. URL: http://dx.doi.org/10.1080/13658816.2013.785550.
Iris Xie and Krystyna Matusiak. Discover digital libraries: Theory and practice. Elsevier, 2016.

A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data

Authors Ilkcan Keles , Omar Qawasmeh , Tabea Tietz , Ludovica Marinucci , Roberto Reda , Marieke van Erp

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message