Universal Dependencies for Multilingual Open Information Extraction

Authors Massinissa Atmani, Mathieu Lafourcade



PDF
Thumbnail PDF

File

OASIcs.LDK.2021.24.pdf
  • Filesize: 0.69 MB
  • 15 pages

Document Identifiers

Author Details

Massinissa Atmani
  • LIRMM, University of Montpellier, 860 rue de St Priest, 34095 Montpellier, France
  • Amaris Research Unit, 25 boulevard Eugène Deruelle, 69003 Lyon, France
  • massinissa.atmani@etu.umontpellier.fr
Mathieu Lafourcade
  • LIRMM, University of Montpellier, 860 rue de St Priest 34095 Montpellier, France

Cite AsGet BibTex

Massinissa Atmani and Mathieu Lafourcade. Universal Dependencies for Multilingual Open Information Extraction. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 24:1-24:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/OASIcs.LDK.2021.24

Abstract

In this paper, we present our approach for Multilingual Open Information Extraction. Our sequence labeling based approach builds only on Universal Dependency representation to capture OpenIE’s regularities and to perform Cross-lingual Multilingual OpenIE. We propose a new two-stage pipeline model for sequence labeling, that first identifies all the arguments of the relation and only then classifies them according to their most likely label. This paper also introduces a new benchmark evaluation for French. Experimental Evaluation shows that our approach achieves the best results in the available Benchmarks (English, French, Spanish and Portuguese).

Subject Classification

ACM Subject Classification
  • Computing methodologies → Information extraction
Keywords
  • Natural Language Processing
  • Information Extraction
  • Machine Learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Niranjan Balasubramanian, Stephen Soderland, Oren Etzioni, et al. Generating coherent event schemas at scale. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1721-1731, 2013. Google Scholar
  2. Sangnie Bhardwaj, Samarth Aggarwal, and Mausam Mausam. CaRB: A crowdsourced benchmark for open IE. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6262-6267, Hong Kong, China, November 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/D19-1651.
  3. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52-60, Los Angeles, California, June 2010. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W10-0907.
  4. Lucianno Del Corro and Rainer Gemulla. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 355-366, 2013. Google Scholar
  5. Lei Cui, Furu Wei, and Ming Zhou. Neural open information extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 407-413, Melbourne, Australia, July 2018. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/P18-2065.
  6. Timothy Dozat and Christopher D Manning. Deep biaffine attention for neural dependency parsing. arXiv preprint, 2016. URL: http://arxiv.org/abs/1611.01734.
  7. Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1535-1545, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D11-1142.
  8. Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. Using local knowledge graph construction to scale seq2seq models to multi-document inputs. arXiv preprint, 2019. URL: http://arxiv.org/abs/1910.08435.
  9. Pablo Gamallo and Marcos Garcia. Multilingual open information extraction. In Portuguese Conference on Artificial Intelligence, pages 711-722. Springer, 2015. Google Scholar
  10. Ali Jabbari, Olivier Sauvage, Hamada Zeine, and Hamza Chergui. A French corpus and annotation schema for named entity recognition and relation extraction of financial news. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 2293-2299, Marseille, France, May 2020. European Language Resources Association. URL: https://www.aclweb.org/anthology/2020.lrec-1.279.
  11. Keshav Kolluru, Vaibhav Adlakha, Samarth Aggarwal, Mausam, and Soumen Chakrabarti. OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3748-3761, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.emnlp-main.306.
  12. Keshav Kolluru, Samarth Aggarwal, Vipul Rathore, Mausam, and Soumen Chakrabarti. IMoJIE: Iterative memory-based joint open information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5871-5886, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.521.
  13. William Lechelle, Fabrizio Gotti, and Phillippe Langlais. WiRe57 : A fine-grained benchmark for open information extraction. In Proceedings of the 13th Linguistic Annotation Workshop, pages 6-15, Florence, Italy, August 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/W19-4002.
  14. Juntao Li, Ruidan He, Hai Ye, Hwee Tou Ng, Lidong Bing, and Rui Yan. Unsupervised domain adaptation of a pretrained cross-lingual language model. arXiv preprint, 2020. URL: http://arxiv.org/abs/2011.11499.
  15. Mausam Mausam. Open information extraction systems and downstream applications. In Proceedings of the twenty-fifth international joint conference on artificial intelligence, pages 4074-4077, 2016. Google Scholar
  16. Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. Effectiveness and efficiency of open relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 447-457, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D13-1043.
  17. Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1659-1666, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA). URL: https://www.aclweb.org/anthology/L16-1262.
  18. Hiroki Ouchi, Hiroyuki Shindo, and Yuji Matsumoto. A span selection model for semantic role labeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1630-1642, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/D18-1191.
  19. Harinder Pal and Mausam. Demonyms and compound relational nouns in nominal open IE. In Proceedings of the 5th Workshop on Automated Knowledge Base Construction, pages 35-39, San Diego, CA, June 2016. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/W16-1307.
  20. Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. Stanza: A python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101-108, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-demos.14.
  21. Youngbin Ro, Yukyung Lee, and Pilsung Kang. Multi^2OIE: Multilingual open information extraction based on multi-head attention with BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1107-1117, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.findings-emnlp.99.
  22. Gabriel Stanovsky, Ido Dagan, et al. Open ie as an intermediate structure for semantic tasks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 303-308, 2015. Google Scholar
  23. Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan. Supervised open information extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 885-895, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/N18-1081.
  24. Aaron Steven White, Drew Reisinger, Keisuke Sakaguchi, Tim Vieira, Sheng Zhang, Rachel Rudinger, Kyle Rawlins, and Benjamin Van Durme. Universal decompositional semantics on Universal Dependencies. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1713-1723, Austin, Texas, November 2016. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/D16-1177.
  25. Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Halevy. ReNoun: Fact extraction for nominal attributes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 325-335, Doha, Qatar, October 2014. Association for Computational Linguistics. URL: https://doi.org/10.3115/v1/D14-1038.
  26. Alexander Yates, Michele Banko, Matthew Broadhead, Michael Cafarella, Oren Etzioni, and Stephen Soderland. TextRunner: Open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 25-26, Rochester, New York, USA, April 2007. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/N07-4013.
  27. Junlang Zhan and Hai Zhao. Span based open information extraction. arXiv preprint, 2019. URL: http://arxiv.org/abs/1901.10879.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail