Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Carvalho, Nuno Ramos; Simões, Alberto; Almeida, José João

doi:10.4230/OASIcs.SLATE.2021.18

Abstract

Question answering systems are mainly concerned with fulfilling an information query written in natural language, given a collection of documents with relevant information. They are key elements in many popular application systems as personal assistants, chat-bots, or even FAQ-based online support systems.
This paper describes an exploratory work carried out to come up with a state-of-the-art model for question-answering tasks, for the Portuguese language, based on deep neural networks. We also describe the automatic construction of a data-set for training and testing the model.
The final model is not trained in any specific topic or context, and is able to handle generic documents, achieving 50% accuracy in the testing data-set. While the results are not exceptional, this work can support further development in the area, as both the data-set and model are publicly available.

Cite As Get BibTex

Nuno Ramos Carvalho, Alberto Simões, and José João Almeida. Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper). In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 18:1-18:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.SLATE.2021.18

Author Details

Nuno Ramos Carvalho

Rua A 350 2E, 4810-217 Guimararães, Portugal

Alberto Simões

2Ai, School of Technology, IPCA, Barcelos, Portugal

José João Almeida

Centro Algoritmi, Departamento de Informática, University of Minho, Braga, Portugal

Funding

This project was partly funded through the European Regional Development Fund (FEDER), by Portuguese national funds (PIDDAC), through the FCT - Fundação para a Ciência e Tecnologia and FCT/MCTES under the scope of the projects "UIDB/05549/2020" and "UIDB/00319/2020"

Supplementary Materials

Software (Source Code) https://github.com/nunorc/qaptnet

References

N.R. Carvalho, 2019 (last accessed: 28- 08-2019). URL: https://github.com/nunorc/qaptnet.
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint, 2019. URL: http://arxiv.org/abs/1901.02860.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, 2018. URL: http://arxiv.org/abs/1810.04805.
SurveyMonkey Audience Drift and Myclever Salesforce. The 2018 state of chatbots report. how chatbots are reshaping online experiences, 2019.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint, 2014. URL: http://arxiv.org/abs/1412.6980.
Emilio Soria Olivas. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques. IGI Global, 2009.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint, 2016. URL: http://arxiv.org/abs/1606.05250.
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754-5764, 2019.

Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Authors Nuno Ramos Carvalho, Alberto Simões , José João Almeida

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message