Question Answering For Toxicological Information Extraction

Ferreira, Bruno Carlos Luís; Gonçalo Oliveira, Hugo; Amaro, Hugo; Laranjeiro, Ângela; Silva, Catarina

doi:10.4230/OASIcs.SLATE.2022.3

Abstract

Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Cite As Get BibTex

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/OASIcs.SLATE.2022.3

Author Details

Bruno Carlos Luís Ferreira

DEI, CISUC, University of Coimbra, Portugal

Hugo Gonçalo Oliveira

DEI, CISUC, University of Coimbra, Portugal

Hugo Amaro

LIS, Instituto Pedro Nunes, Portugal

Ângela Laranjeiro

Cosmedesk, Coimbra, Portugal

Catarina Silva

DEI, CISUC, University of Coimbra, Portugal

Funding

This work was partially funded by: the project SafetyDesk: Smart Toxicological Analysis of Chemical Substances (CENTRO-01-0247-FEDER-113485), co-financed by the European Regional Development Fund (FEDER), through Portugal 2020 (PT2020), and by the Regional Operational Programme Centro 2020; and national funds through the FCT - Foundation for Science and Technology, I.P., within the scope of the project CISUC - UID/CEC/00326/2020 and by the European Social Fund, through the Regional Operational Program Centro 2020.

References

Sally Ali, Hamdy Mousa, and M Hussien. A review of open information extraction techniques. IJCI. International Journal of Computers and Information, 6(1):20-28, 2019.
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. arXiv preprint, 2020. URL: http://arxiv.org/abs/2003.10555.
Lei Cui, Furu Wei, and Ming Zhou. Neural Open Information Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 407-413, 2018.
A. Cvitaš. Information extraction in business intelligence systems. In The 33rd International Convention MIPRO, pages 1278-1282, 2010.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1 (Long and Short Papers), pages 4171-4186. ACL, 2019.
Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the 2011 conference on empirical methods in natural language processing, pages 1535-1545, 2011.
Lin Gui, Jiannan Hu, Yulan He, Ruifeng Xu, Qin Lu, and Jiachen Du. A question answering approach to emotion cause extraction. arXiv preprint, 2017. URL: http://arxiv.org/abs/1708.05482.
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234-1240, 2020.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint, 2019. URL: http://arxiv.org/abs/1907.11692.
Minh-Tien Nguyen, Dung Tien Le, and Linh Le. Transformers-based information extraction with limited data for domain-specific business documents. Engineering Applications of Artificial Intelligence, 97:104100, 2021.
Minh-Tien Nguyen, Dung Tien Le, Nguyen Hong Son, Bui Cong Minh, Akira Shojiguchi, et al. Information extraction of domain-specific business documents with limited data. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-8. IEEE, 2021.
Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don't know: Unanswerable questions for SQuAD. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (Vol 2: Short Papers), pages 784-789. ACL, 2018.
Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan. Supervised Open Information Extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 885-895, 2018.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

Question Answering For Toxicological Information Extraction

Authors Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira , Hugo Amaro, Ângela Laranjeiro, Catarina Silva

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message