Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

Authors Christian Chiarcos , Christian Fäth



PDF
Thumbnail PDF

File

OASIcs.LDK.2019.9.pdf
  • Filesize: 0.68 MB
  • 11 pages

Document Identifiers

Author Details

Christian Chiarcos
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany
Christian Fäth
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Cite AsGet BibTex

Christian Chiarcos and Christian Fäth. Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.9

Abstract

This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually. A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
  • Information systems → Semantic web description languages
  • Computing methodologies → Natural language processing
  • Computing methodologies → Lexical semantics
Keywords
  • Role and Reference Grammar
  • NLP
  • Corpus
  • Semantic Web
  • LLOD
  • Syntax
  • Semantics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Steven Bird and Mark Liberman. Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis. In Marilyn Walker, editor, Towards Standards and Tools for Discourse Tagging, Maryland, USA, June 1999. Association for Computational Linguistics. URL: http://aclweb.org/anthology/W99-0301.
  2. Carlos Buil Aranda, Olivier Corby, Souripriya Das, Lee Feigenbaum, Paula Gearon, Birte Glimm, Steve Harris, Sandro Hawke, Ivan Herman, Nicholas Humfrey, Nico Michaelis, Chimezie Ogbuji, Matthew Perry, Alexandre Passant, Axel Polleres, Eric Prud'hommeaux, Andy Seaborne, and Gregory Todd Williams. SPARQL 1.1 Overview. https://www.w3.org/TR/sparql11-overview, 2013.
  3. Danqi Chen and Christopher D Manning. A Fast and Accurate Dependency Parser using Neural Networks. In In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar, 2014. Association for Computational Linguistics (ACL), 2014. Google Scholar
  4. Christian Chiarcos. POWLA: Modeling Linguistic Corpora in OWL/DL. In Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications, pages 225-239, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. Google Scholar
  5. Christian Chiarcos and Christian Fäth. CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way. In Language, Data, and Knowledge - First International Conference, LDK 2017, Galway, Ireland, June 19-20, 2017, Proceedings, pages 74-88, 2017. URL: http://dx.doi.org/10.1007/978-3-319-59888-8_6.
  6. Christian Chiarcos, Benjamin Kosmehl, Christian Fäth, and Maria Sukhareva. Analyzing Middle High German Syntax with RDF and SPARQL. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), 2018. URL: http://www.lrec-conf.org/lrec2018.
  7. Christian Chiarcos and Niko Schenk. The ACoLi CoNLL Libraries: Beyond Tab-Separated Values. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), May 2018. Google Scholar
  8. William A. Foley and Jr. Robert D. Van Valin. Role and Reference Grammar. In E.A. Moravscik and J.A. Wirth, editors, Current approaches to syntax, pages 329-352. Academic Press, New York, 1980. Google Scholar
  9. Dan Klein and Christopher D. Manning. Accurate Unlexicalized Parsing. In In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 2003. Association for Computational Linguistics (ACL), 2003. Google Scholar
  10. Wolfgang Lezius. TigerSearch - Ein Suchwerkzeug für Baumbanken. In Proceedings of the 6. Konferenz zur Verarbeitung natürlicher Sprache (6th Conference on Natural Language Processing, KONVENS 2002), Saarbrücken, Germany, 2002. Google Scholar
  11. Andreas Mengel and Wolfgang Lezius. An XML-based Representation Format for Syntactically Annotated Corpora. In In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 2000. European Language Resources Association (ELRA), 2000. Google Scholar
  12. Vivi Nastase, Rada Mihalcea, and Dragomir R. Radev. A survey of graphs in natural language processing. Natural Language Engineering, 21(5):665?698, 2015. URL: http://dx.doi.org/10.1017/S1351324915000340.
  13. Jens Nilsson, Joakim Nivre, and Johan Hall. Graph Transformations in Data-Driven Dependency Parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, January 2006. URL: http://dx.doi.org/10.3115/1220175.1220208.
  14. Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. Universal Dependencies v1: A Multilingual Treebank Collection. In In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Solvenia, 2016. European Language Resources Association (ELRA), May 2016. Google Scholar
  15. Tim O'Gorman, Sameer Pradhan, Martha Palmer, Julia Bonn, Kathryn Conger, and James Gung. The New Propbank: Aligning Propbank with AMR through POS Unification. In In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. European Language Resources Association (ELRA), 2018. Google Scholar
  16. Jungyun Seo and Robert F. Simmons. Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees. Computational Linguistics, 15(1):19-32, March 1989. URL: http://dl.acm.org/citation.cfm?id=68960.68962.
  17. Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel Bowman, Miriam Connor, John Bauer, and Christopher D. Manning. A Gold Standard Dependency Corpus for English. In In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, 2014. European Language Resources Association (ELRA), 2014. Google Scholar
  18. Robert D Van Valin, Robert D van Valin Jr, and Randy J LaPolla. Syntax: Structure, meaning, and function. Cambridge University Press, 1997. Google Scholar
  19. Robert D Van Valin Jr. Exploring the syntax-semantics interface. Cambridge University Press, 2005. Google Scholar
  20. Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base. In In Proceedings of the Joint Conference of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (AFNLP), Beijing, China, 2015. Association for Computational Linguistics (ACL), pages 1321-1331, January 2015. URL: http://dx.doi.org/10.3115/v1/P15-1128.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail