Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Carvalho, Nuno Ramos; Simões, Alberto; Almeida, José João

doi:10.4230/OASIcs.SLATE.2021.18

File

Author Details

Nuno Ramos Carvalho

Rua A 350 2E, 4810-217 Guimararães, Portugal

Alberto Simões

2Ai, School of Technology, IPCA, Barcelos, Portugal

José João Almeida

Centro Algoritmi, Departamento de Informática, University of Minho, Braga, Portugal

Cite As Get BibTex

Nuno Ramos Carvalho, Alberto Simões, and José João Almeida. Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper). In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 18:1-18:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.SLATE.2021.18

Abstract

Question answering systems are mainly concerned with fulfilling an information query written in natural language, given a collection of documents with relevant information. They are key elements in many popular application systems as personal assistants, chat-bots, or even FAQ-based online support systems. This paper describes an exploratory work carried out to come up with a state-of-the-art model for question-answering tasks, for the Portuguese language, based on deep neural networks. We also describe the automatic construction of a data-set for training and testing the model. The final model is not trained in any specific topic or context, and is able to handle generic documents, achieving 50% accuracy in the testing data-set. While the results are not exceptional, this work can support further development in the area, as both the data-set and model are publicly available.

Subject Classification

ACM Subject Classification

Computing methodologies → Discourse, dialogue and pragmatics

Keywords

Portuguese language
question answering
deep learning

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

N.R. Carvalho, 2019 (last accessed: 28- 08-2019). URL: https://github.com/nunorc/qaptnet.
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint, 2019. URL: http://arxiv.org/abs/1901.02860.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, 2018. URL: http://arxiv.org/abs/1810.04805.
SurveyMonkey Audience Drift and Myclever Salesforce. The 2018 state of chatbots report. how chatbots are reshaping online experiences, 2019.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint, 2014. URL: http://arxiv.org/abs/1412.6980.
Emilio Soria Olivas. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques. IGI Global, 2009.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint, 2016. URL: http://arxiv.org/abs/1606.05250.
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754-5764, 2019.

Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Authors Nuno Ramos Carvalho, Alberto Simões , José João Almeida

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Authors Nuno Ramos Carvalho, Alberto Simões , José João Almeida

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message