Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper)

Authors Nuno Ramos Carvalho, Alberto Simões , José João Almeida

Thumbnail PDF


  • Filesize: 469 kB
  • 5 pages

Document Identifiers

Author Details

Nuno Ramos Carvalho
  • Rua A 350 2E, 4810-217 Guimararães, Portugal
Alberto Simões
  • 2Ai, School of Technology, IPCA, Barcelos, Portugal
José João Almeida
  • Centro Algoritmi, Departamento de Informática, University of Minho, Braga, Portugal

Cite AsGet BibTex

Nuno Ramos Carvalho, Alberto Simões, and José João Almeida. Bootstrapping a Data-Set and Model for Question-Answering in Portuguese (Short Paper). In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 18:1-18:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Question answering systems are mainly concerned with fulfilling an information query written in natural language, given a collection of documents with relevant information. They are key elements in many popular application systems as personal assistants, chat-bots, or even FAQ-based online support systems. This paper describes an exploratory work carried out to come up with a state-of-the-art model for question-answering tasks, for the Portuguese language, based on deep neural networks. We also describe the automatic construction of a data-set for training and testing the model. The final model is not trained in any specific topic or context, and is able to handle generic documents, achieving 50% accuracy in the testing data-set. While the results are not exceptional, this work can support further development in the area, as both the data-set and model are publicly available.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Discourse, dialogue and pragmatics
  • Portuguese language
  • question answering
  • deep learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. N.R. Carvalho, 2019 (last accessed: 28- 08-2019). URL:
  2. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint, 2019. URL:
  3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, 2018. URL:
  4. SurveyMonkey Audience Drift and Myclever Salesforce. The 2018 state of chatbots report. how chatbots are reshaping online experiences, 2019. Google Scholar
  5. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. Google Scholar
  6. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint, 2014. URL:
  7. Emilio Soria Olivas. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques. IGI Global, 2009. Google Scholar
  8. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint, 2016. URL:
  9. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754-5764, 2019. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail