Towards Learning Terminological Concept Systems from Multilingual Natural Language Text

Wachowiak, Lennart; Lang, Christian; Heinisch, Barbara; Gromann, Dagmar

doi:10.4230/OASIcs.LDK.2021.22

Abstract

Terminological Concept Systems (TCS) provide a means of organizing, structuring and representing domain-specific multilingual information and are important to ensure terminological consistency in many tasks, such as translation and cross-border communication. While several approaches to (semi-)automatic term extraction exist, learning their interrelations is vastly underexplored. We propose an automated method to extract terms and relations across natural languages and specialized domains. To this end, we adapt pretrained multilingual neural language models, which we evaluate on term extraction standard datasets with best performing results and a combination of relation extraction standard datasets with competitive results. Code and dataset are publicly available.

ISO 1087:2019. Terminology work and terminology science - Vocabulary. Standard, International Organization for Standardization, Geneva, CH, 2019.
ISO 30042:2019. Management of terminology resources - TermBase eXchange (TBX). Standard, International Organization for Standardization, Geneva, CH, 2019.
ISO 704:2009. Terminology work - Principles and methods. Standard, International Organization for Standardization, Geneva, CH, 2009.
Nikita Astrakhantsev. ATR4S: toolkit with state-of-the-art automatic terms recognition methods in scala. Language Resources and Evaluation, 52(3):853-872, 2018.
Nathalie Aussenac-Gilles, Sylvie Despres, and Sylvie Szulman. The terminae method and platform for ontology engineering from texts, 2008.
Sasan Azimi, Hadi Veisi, and Reyhaneh Amouie. A method for automatic detection of acronyms in texts and building a dataset for acronym disambiguation. In 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pages 1-4. IEEE, 2019. URL: https://doi.org/10.1109/ICSPIS48872.2019.9066084.
Matthias Bay, Daniel Bruneß, Miriam Herold, Christian Schulze, Michael Guckert, and Mirjam Minor. Term extraction from medical documents using word embeddings. In 4th IEEE Conference on Machine Learning and Natural Language Processing (MNLP 2020). IEEE Computer Society, 2020. URL: http://www.wi.cs.uni-frankfurt.de/webdav/publications/TLDIA_Paper_IEEE_CRC.pdf.
Philipp Cimiano and Johanna Völker. text2onto. In International conference on application of natural language to information systems, pages 227-238. Springer, 2005.
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440-8451, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.747.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/N19-1423.
Maria Pia di Buono, Philipp Cimiano, Mohammad Fazleh Elahi, and Frank Grimm. Terme-à-LLOD: Simplifying the conversion and hosting of terminological resources as linked data. In Maxim Ionov, John P. McCrae, Christian Chiarcos, Thierry Declerck, Julia Bosque-Gil, and Jorge Gracia, editors, Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020), pages 28-35, Marseille, France, May 2020. European Language Resources Association. URL: https://www.aclweb.org/anthology/2020.ldl-1.5.
Tsu-Jui Fu, Peng-Hsuan Li, and Wei-Yun Ma. GraphRel: Modeling text as relational graphs for joint entity and relation extraction. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1409-1418, Florence, Italy, July 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/P19-1136.
Emden R. Gansner and Stephen C. North. An open graph visualization system and its applications to software engineering. Software - Practice and Experience, 30(11):1203–1233, 2000.
Zhiqiang Geng, Yanhui Zhang, and Yongming Han. Joint entity and relation extraction model based on rich semantics. Neurocomputing, 429:132-140, 2021. URL: https://doi.org/10.1016/j.neucom.2020.12.037.
Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. SemEval-2007 task 04: Classification of semantic relations between nominals. In Eneko Agirre, Lluís Màrquez, and Richard Wicentowski, editors, Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 13-18, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/S07-1003.
Dagmar Gromann. A model and method to terminologize existing domain ontologies. In Terminology and Knowledge Engineering 2014, pages 10-p, 2014.
Zhijiang Guo, Yan Zhang, and Wei Lu. Attention guided graph convolutional networks for relation extraction. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 241-251, Florence, Italy, July 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/P19-1024.
Anna Hätty, Dominik Schlechtweg, Michael Dorna, and Sabine Schulte im Walde. Predicting degrees of technicality in automatic terminology extraction. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2883-2889, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.258.
Amir Hazem, Mérieme Bouhandi, Florian Boudin, and Beatrice Daille. TermEval 2020: TALN-LS2N system for automatic term extraction. In Béatrice Daille, Kyo Kageura, and Ayla Rigouts Terryn, editors, Proceedings of the 6th International Workshop on Computational Terminology, pages 95-100, Marseille, France, 2020. European Language Resources Association. URL: https://www.aclweb.org/anthology/2020.computerm-1.13.
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Katrin Erk and Carlo Strapparava, editors, Proceedings of the 5th International Workshop on Semantic Evaluation, pages 33-38, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/S10-1006.
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength Natural Language Processing in Python, 2020. URL: https://doi.org/10.5281/zenodo.1212303.
Bin Ji, Jie Yu, Shasha Li, Jun Ma, Qingbo Wu, Yusong Tan, and Huijun Liu. Span-based joint entity and relation extraction with attention-based span-specific and contextual semantic representations. In Donia Scott, Nuria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, pages 88-99, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.coling-main.8.
Ahlem Chérifa Khadir, Hassina Aliane, and Ahmed Guessoum. Ontology learning: Grand tour and challenges. Computer Science Review, 39, 2021. URL: https://doi.org/10.1016/j.cosrev.2020.100339.
Javier Lacasta, Javier Nogueras-Iso, and Francisco Javier Zarazaga Soria. Terminological Ontologies: Design, Management and Practical Applications, volume 9. Springer Science & Business Media, 2010.
Jiao Li, Yueping Sun, Robin Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn Mattingly, Thomas Wiegers, and Zhiyong lu. Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, page baw068, 2016. URL: https://doi.org/10.1093/database/baw068.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. URL: http://arxiv.org/abs/1907.11692.
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric de la Clergerie, Djamé Seddah, and Benoît Sagot. CamemBERT: a tasty French language model. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203-7219, Online, 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.645.
Roberto Navigli, Paola Velardi, and Juana Maria Ruiz-Martínez. An annotated dataset for extracting definitions and hypernyms from the web. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, May 2010. European Language Resources Association (ELRA). URL: http://www.lrec-conf.org/proceedings/lrec2010/pdf/20_Paper.pdf.
Tapas Nayak and Hwee Tou Ng. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 8528-8535. AAAI Press, 2020. URL: https://aaai.org/ojs/index.php/AAAI/article/view/6374.
Anita Nuopponen. Tangled web of concept relations. concept relations for iso 1087-1 and iso 704. In Terminology and Knowledge Engineering 2014, Berlin, Germany, 2014. Association for Computational Linguistics. URL: https://hal.archives-ouvertes.fr/hal-01005882.
Giulio Petrucci, Marco Rospocher, and Chiara Ghidini. Expressive ontology learning as neural machine translation. Journal of Web Semantics, 52:66-82, 2018. URL: https://doi.org/10.1016/j.websem.2018.10.002.
Behrang QasemiZadeh and Anne-Kathrin Schumann. The ACL RD-TEC 2.0: A language resource for evaluating term extraction and entity recognition methods. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1862-1868, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA). URL: https://www.aclweb.org/anthology/L16-1294.
Bo Qiao, Zhuoyang Zou, Yu Huang, Kui Fang, Xinghui Zhu, and Yiming Chen. A joint model for entity and relation extraction based on bert. Neural Computing and Applications, pages 1-11, 2021. URL: https://doi.org/10.1007/s00521-021-05815-z.
Ayla Rigouts Terryn, Veronique Hoste, Patrick Drouin, and Els Lefever. TermEval 2020: Shared task on automatic term extraction using the annotated corpora for term extraction research (ACTER) dataset. In Béatrice Daille, Kyo Kageura, and Ayla Rigouts Terryn, editors, Proceedings of the 6th International Workshop on Computational Terminology, pages 85-94, Marseille, France, May 2020. European Language Resources Association. URL: https://www.aclweb.org/anthology/2020.computerm-1.12.
Christophe Roche, Marie Calberg-Challot, Luc Damas, and Philippe Rouard. Ontoterminology: A new paradigm for terminology. In International Conference on Knowledge Engineering and Ontology Development, pages 321-326, 2009.
Aivaras Rokas, Sigita Rackevičienė, and Andrius Utka. Automatic extraction of lithuanian cybersecurity terms using deep learning approaches. In Human Language Technologies-The Baltic Perspective, volume 328, pages 39-46. IOS Press, 2020. URL: https://doi.org/10.3233/FAIA200600.
Antonio Šajatović, Maja Buljan, Jan Šnajder, and Bojana Dalbelo Bašić. Evaluating automatic term extraction methods on individual documents. In Agata Savary, Carla Parra Escartín, Francis Bond, Jelena Mitrović, and Verginica Barbu Mititelu, editors, Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 149-154, Florence, Italy, 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/W19-5118.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, undefinedukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
Lennart Wachowiak, Christian Lang, Barbara Heinisch, and Dagmar Gromann. CogALex-VI shared task: Transrelation - a robust multilingual language model for multilingual relation identification. In Rong Xiang, Emmanuele Chersoni, Luca Iacoponi, and Enrico Santus, editors, Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 59-64, Online, 2020. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/2020.cogalex-1.7.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771, 2019. URL: http://arxiv.org/abs/1910.03771.
Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6442-6454, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.emnlp-main.523.
Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. DocRED: A large-scale document-level relation extraction dataset. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 764-777, Florence, Italy, July 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/P19-1074.
Shuang Zeng, Runxin Xu, Baobao Chang, and Lei Li. Double graph based reasoning for document-level relation extraction. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1630-1640, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.emnlp-main.127.
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. Position-aware attention and supervised data improve slot filling. In Martha Palmer, Rebecca Hwa, and Sebastian Riedel, editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35-45, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/D17-1004.
Ziqi Zhang, Jose Iria, Christopher Brewster, and Fabio Ciravegna. A comparative evaluation of term recognition algorithms. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May 2008. European Language Resources Association (ELRA). URL: http://www.lrec-conf.org/proceedings/lrec2008/pdf/538_paper.pdf.
Suncong Zheng, Yuexing Hao, Dongyuan Lu, Hongyun Bao, Jiaming Xu, Hongwei Hao, and Bo Xu. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing, 257:59-66, 2017. URL: https://doi.org/10.1016/j.neucom.2016.12.075.
Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing Huang. Document-level relation extraction with adaptive thresholding and localized context pooling. CoRR, abs/2010.11304, 2020. URL: http://arxiv.org/abs/2010.11304.

Towards Learning Terminological Concept Systems from Multilingual Natural Language Text

Authors Lennart Wachowiak, Christian Lang, Barbara Heinisch, Dagmar Gromann

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Towards Learning Terminological Concept Systems from Multilingual Natural Language Text

Authors Lennart Wachowiak, Christian Lang, Barbara Heinisch, Dagmar Gromann

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message