Question Answering For Toxicological Information Extraction

Authors Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira , Hugo Amaro, Ângela Laranjeiro, Catarina Silva



PDF
Thumbnail PDF

File

OASIcs.SLATE.2022.3.pdf
  • Filesize: 0.66 MB
  • 10 pages

Document Identifiers

Author Details

Bruno Carlos Luís Ferreira
  • DEI, CISUC, University of Coimbra, Portugal
Hugo Gonçalo Oliveira
  • DEI, CISUC, University of Coimbra, Portugal
Hugo Amaro
  • LIS, Instituto Pedro Nunes, Portugal
Ângela Laranjeiro
  • Cosmedesk, Coimbra, Portugal
Catarina Silva
  • DEI, CISUC, University of Coimbra, Portugal

Cite AsGet BibTex

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/OASIcs.SLATE.2022.3

Abstract

Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Information extraction
Keywords
  • Information Extraction
  • Question Answering
  • Transformers
  • Toxicological Analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sally Ali, Hamdy Mousa, and M Hussien. A review of open information extraction techniques. IJCI. International Journal of Computers and Information, 6(1):20-28, 2019. Google Scholar
  2. Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. arXiv preprint, 2020. URL: http://arxiv.org/abs/2003.10555.
  3. Lei Cui, Furu Wei, and Ming Zhou. Neural Open Information Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 407-413, 2018. Google Scholar
  4. A. Cvitaš. Information extraction in business intelligence systems. In The 33rd International Convention MIPRO, pages 1278-1282, 2010. Google Scholar
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1 (Long and Short Papers), pages 4171-4186. ACL, 2019. Google Scholar
  6. Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the 2011 conference on empirical methods in natural language processing, pages 1535-1545, 2011. Google Scholar
  7. Lin Gui, Jiannan Hu, Yulan He, Ruifeng Xu, Qin Lu, and Jiachen Du. A question answering approach to emotion cause extraction. arXiv preprint, 2017. URL: http://arxiv.org/abs/1708.05482.
  8. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234-1240, 2020. Google Scholar
  9. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint, 2019. URL: http://arxiv.org/abs/1907.11692.
  10. Minh-Tien Nguyen, Dung Tien Le, and Linh Le. Transformers-based information extraction with limited data for domain-specific business documents. Engineering Applications of Artificial Intelligence, 97:104100, 2021. Google Scholar
  11. Minh-Tien Nguyen, Dung Tien Le, Nguyen Hong Son, Bui Cong Minh, Akira Shojiguchi, et al. Information extraction of domain-specific business documents with limited data. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-8. IEEE, 2021. Google Scholar
  12. Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don't know: Unanswerable questions for SQuAD. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (Vol 2: Short Papers), pages 784-789. ACL, 2018. Google Scholar
  13. Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan. Supervised Open Information Extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 885-895, 2018. Google Scholar
  14. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail