Annotation of Fine-Grained Geographical Entities in German Texts

Moreno-Schneider, Julián; Plakidis, Melina; Rehm, Georg

doi:10.4230/OASIcs.LDK.2021.11

Abstract

We work on the creation of a corpus, crawled from the internet, on the Berlin district of Moabit, primarily meant for training NER systems in German and English. Typical NER corpora and corresponding systems distinguish persons, organisations and locations, but do not distinguish different types of location entities. For our tourism-inspired use case, we need fine-grained annotations for toponyms. In this paper, we outline the fine-grained classification of geographical entities, the resulting annotations and we present preliminary results on automatically tagging toponyms in a small, bootstrapped gold corpus.

Cite As Get BibTex

Julián Moreno-Schneider, Melina Plakidis, and Georg Rehm. Annotation of Fine-Grained Geographical Entities in German Texts. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 11:1-11:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.LDK.2021.11

Author Details

Julián Moreno-Schneider

DFKI GmbH, Berlin, Germany

Melina Plakidis

DFKI GmbH, Berlin, Germany

Georg Rehm

DFKI GmbH, Berlin, Germany

Funding

The research presented in this paper is funded by the German Federal Ministry of Education and Research (BMBF) through the project QURATOR (http://qurator.ai) (Unternehmen Region, Wachstumskern, grant no. 03WKDA1A).

Supplementary Materials

Collection (Collection of documents about Moabit district annotated with Geographical Entities)) https://gitlab.com/jmschnei/Moabit-Collection

References

Beatrice Alex, Claire Grover, Richard Tobin, and Jon Oberlander. Geoparsing historical and contemporary literary text set in the City of Edinburgh. Language Resources and Evaluation, 53(4):651-675, 2019.
Irma Arts, Anke Fischer, Dominic Duckett, and René van der Wal. Information technology and the optimisation of experience – The role of mobile devices and social media in human-nature interactions. Geoforum, 122:55-62, 2021. URL: https://doi.org/10.1016/j.geoforum.2021.03.009.
Grant DeLozier, Jason Baldridge, and Loretta London. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 2382–2388. AAAI Press, 2015.
Milan Gritta, Mohammad Taher Pilehvar, and Nigel Collier. A Pragmatic Guide to Geoparsing Evaluation. CoRR, abs/1810.12368, 2018. URL: http://arxiv.org/abs/1810.12368.
Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, and Nigel Collier. What’s missing in geographical parsing? Language Resources and Evaluation, 52(2):603-623, 2018.
Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, 368:3875-89, August 2010. URL: https://doi.org/10.1098/rsta.2010.0149.
Morteza Karimzadeh, Wenyi Huang, Siddhartha Banerjee, Jan Oliver Wallgrün, Frank Hardisty, Scott Pezanowski, Prasenjit Mitra, and Alan M. MacEachren. GeoTxt: A Web API to Leverage Place References in Text. In Proceedings of the 7th Workshop on Geographic Information Retrieval, GIR ’13, page 72–73, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2533888.2533942.
Elena Leitner, Georg Rehm, and Julián Moreno-Schneider. Fine-grained Named Entity Recognition in Legal Documents. In Maribel Acosta, Philippe Cudré-Mauroux, Maria Maleshkova, Tassilo Pellegrini, Harald Sack, and York Sure-Vetter, editors, Semantic Systems. The Power of AI and Knowledge Graphs. Proceedings of the 15th International Conference (SEMANTiCS 2019), number 11702 in Lecture Notes in Computer Science, pages 272-287, Karlsruhe, Germany, 2019. Springer. 10/11 September 2019.
Elena Leitner, Georg Rehm, and Julián Moreno-Schneider. A Dataset of German Legal Documents for Named Entity Recognition. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Christopher Cieri, Khalid Choukri, Thierry Declerck, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, France, May 2020. European Language Resources Association (ELRA). Accepted for publication. Submitted version available as preprint.
Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Chiara Ghidini, Axel-Cyrille Ngonga Ngomo, Stefanie N. Lindstaedt, and Tassilo Pellegrini, editors, I-SEMANTICS, ACM International Conference Proceeding Series, pages 1-8. ACM, 2011. URL: http://dblp.uni-trier.de/db/conf/i-semantics/i-semantics2011.html#MendesJGB11.
Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. Learning Multilingual Named Entity Recognition from Wikipedia. Artif. Intell., 194:151–175, 2013. URL: https://doi.org/10.1016/j.artint.2012.03.006.
Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, and Ondrej Klejch. European Language Grid: An Overview. In Nicoletta Calzolari et al., editor, Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), pages 3359-3373, Marseille, France, 2020. European Language Resources Association (ELRA).
Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle, Jens Rauenbusch, Lisa Rutenburg, André Schmidt, Mikka Wild, Henry Hoffmann, Julian Fink, Sarah Schulz, Jurica Seva, Joachim Quantz, Joachim Böttger, Josefine Matthey, Rolf Fricke, Jan Thomsen, Adrian Paschke, Jamal Al Qundus, Thomas Hoppe, Naouel Karam, Frauke Weichhardt, Christian Fillies, Clemens Neudecker, Mike Gerber, Kai Labusch, Vahid Rezanezhad, Robin Schaefer, David Zellhöfer, Daniel Siewert, Patrick Bunk, Lydia Pintscher, Elena Aleynikova, and Franziska Heine. QURATOR: Innovative Technologies for Content and Data Curation. In Adrian Paschke, Clemens Neudecker, Georg Rehm, Jamal Al Qundus, and Lydia Pintscher, editors, Proceedings of QURATOR 2020 - The conference for intelligent content solutions, Berlin, Germany, 2020. CEUR Workshop Proceedings, Volume 2535. 20/21 January 2020.
Georg Rehm, Karolina Zaczynska, Peter Bourgonje, Malte Ostendorff, Julián Moreno-Schneider, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild, Joachim Böttger, Joachim Quantz, Jan Thomsen, and Rolf Fricke. Semantic Storytelling: From Experiments and Prototypes to a Technical Solution. In Tommaso Caselli, Eduard Hovy, Martha Palmer, and Piek Vossen, editors, Computational Analysis of Storylines: Making Sense of Events. Cambridge University Press, 2021. In print.
Georg Rehm, Karolina Zaczynska, Julián Moreno Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, André Schmidt, and Mikka Wild. Towards Discourse Parsing-inspired Semantic Storytelling. In Adrian Paschke, Clemens Neudecker, Georg Rehm, Jamal Al Qundus, and Lydia Pintscher, editors, Proceedings of QURATOR 2020 - The conference for intelligent content solutions, Berlin, Germany, 2020. CEUR Proc., Vol. 2535. 20/21 Jan. 2020.
M. Reznicek. Linguistische Annotation von Nichtstandardvarietäten: Guidelines und Best Practices: Guidelines NER. Technical report, Humboldt-Universität zu Berlin, September 2013.
Erik F. Tjong Kim Sang. Introduction to the CoNLL-2002 Shared Task: Language-Independent NER. In Proceedings of CoNLL-2002, pages 155-158. Taipei, Taiwan, 2002.
Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ’03, page 142–147, USA, 2003. Association for Computational Linguistics. URL: https://doi.org/10.3115/1119176.1119195.
Won, Miguel and Murrieta-Flores, Patricia and Martins, Bruno. Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora. Frontiers in Digital Humanities, 5:2, 2018. URL: https://doi.org/10.3389/fdigh.2018.00002.

Annotation of Fine-Grained Geographical Entities in German Texts

Authors Julián Moreno-Schneider , Melina Plakidis, Georg Rehm

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message