Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

Chiarcos, Christian; Fäth, Christian

doi:10.4230/OASIcs.LDK.2019.9

Abstract

This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). 
RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually.
A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.

Cite As Get BibTex

Christian Chiarcos and Christian Fäth. Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/OASIcs.LDK.2019.9

Author Details

Christian Chiarcos

Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Christian Fäth

Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Funding

The research described in this paper was conducted in the context of the project Linked Open Dictionaries (LiODi, 2015-2020), funded by the German Ministry for Education and Research (BMBF), as well as the project Specialised Information Service Linguistics (Fachinformationsdienst Linguistik, funding period 2017-2019) funded by the German Research Foundation (DFG).

Supplementary Materials

The software described in this paper are available under the Apache 2.0 license from https://github.com/acoli-repo/RRG. This includes build scripts for the data. We aim to provide the data under the same license as the annotations it is derived from (CC-BY-SA), but we are still in the process of copyright clearance for the original text.

References

Steven Bird and Mark Liberman. Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis. In Marilyn Walker, editor, Towards Standards and Tools for Discourse Tagging, Maryland, USA, June 1999. Association for Computational Linguistics. URL: http://aclweb.org/anthology/W99-0301.
Carlos Buil Aranda, Olivier Corby, Souripriya Das, Lee Feigenbaum, Paula Gearon, Birte Glimm, Steve Harris, Sandro Hawke, Ivan Herman, Nicholas Humfrey, Nico Michaelis, Chimezie Ogbuji, Matthew Perry, Alexandre Passant, Axel Polleres, Eric Prud'hommeaux, Andy Seaborne, and Gregory Todd Williams. SPARQL 1.1 Overview. https://www.w3.org/TR/sparql11-overview, 2013.
Danqi Chen and Christopher D Manning. A Fast and Accurate Dependency Parser using Neural Networks. In In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar, 2014. Association for Computational Linguistics (ACL), 2014.
Christian Chiarcos. POWLA: Modeling Linguistic Corpora in OWL/DL. In Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications, pages 225-239, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
Christian Chiarcos and Christian Fäth. CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way. In Language, Data, and Knowledge - First International Conference, LDK 2017, Galway, Ireland, June 19-20, 2017, Proceedings, pages 74-88, 2017. URL: http://dx.doi.org/10.1007/978-3-319-59888-8_6.
Christian Chiarcos, Benjamin Kosmehl, Christian Fäth, and Maria Sukhareva. Analyzing Middle High German Syntax with RDF and SPARQL. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), 2018. URL: http://www.lrec-conf.org/lrec2018.
Christian Chiarcos and Niko Schenk. The ACoLi CoNLL Libraries: Beyond Tab-Separated Values. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), May 2018.
William A. Foley and Jr. Robert D. Van Valin. Role and Reference Grammar. In E.A. Moravscik and J.A. Wirth, editors, Current approaches to syntax, pages 329-352. Academic Press, New York, 1980.
Dan Klein and Christopher D. Manning. Accurate Unlexicalized Parsing. In In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 2003. Association for Computational Linguistics (ACL), 2003.
Wolfgang Lezius. TigerSearch - Ein Suchwerkzeug für Baumbanken. In Proceedings of the 6. Konferenz zur Verarbeitung natürlicher Sprache (6th Conference on Natural Language Processing, KONVENS 2002), Saarbrücken, Germany, 2002.
Andreas Mengel and Wolfgang Lezius. An XML-based Representation Format for Syntactically Annotated Corpora. In In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 2000. European Language Resources Association (ELRA), 2000.
Vivi Nastase, Rada Mihalcea, and Dragomir R. Radev. A survey of graphs in natural language processing. Natural Language Engineering, 21(5):665?698, 2015. URL: http://dx.doi.org/10.1017/S1351324915000340.
Jens Nilsson, Joakim Nivre, and Johan Hall. Graph Transformations in Data-Driven Dependency Parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, January 2006. URL: http://dx.doi.org/10.3115/1220175.1220208.
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. Universal Dependencies v1: A Multilingual Treebank Collection. In In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Solvenia, 2016. European Language Resources Association (ELRA), May 2016.
Tim O'Gorman, Sameer Pradhan, Martha Palmer, Julia Bonn, Kathryn Conger, and James Gung. The New Propbank: Aligning Propbank with AMR through POS Unification. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), 2018.
Jungyun Seo and Robert F. Simmons. Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees. Computational Linguistics, 15(1):19-32, March 1989. URL: http://dl.acm.org/citation.cfm?id=68960.68962.
Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel Bowman, Miriam Connor, John Bauer, and Christopher D. Manning. A Gold Standard Dependency Corpus for English. In In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, 2014. European Language Resources Association (ELRA), 2014.
Robert D Van Valin, Robert D van Valin Jr, and Randy J LaPolla. Syntax: Structure, meaning, and function. Cambridge University Press, 1997.
Robert D Van Valin Jr. Exploring the syntax-semantics interface. Cambridge University Press, 2005.
Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base. In In Proceedings of the Joint Conference of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (AFNLP), Beijing, China, 2015. Association for Computational Linguistics (ACL), pages 1321-1331, January 2015. URL: http://dx.doi.org/10.3115/v1/P15-1128.

Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

Authors Christian Chiarcos , Christian Fäth

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message