Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Authors Hugo Gonçalo Oliveira , Sara Inácio, Catarina Silva

Thumbnail PDF


  • Filesize: 0.52 MB
  • 11 pages

Document Identifiers

Author Details

Hugo Gonçalo Oliveira
  • CISUC, DEI, University of Coimbra, Portugal
Sara Inácio
  • CISUC, DEI, University of Coimbra, Portugal
Catarina Silva
  • CISUC, DEI, University of Coimbra, Portugal

Cite AsGet BibTex

Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva. Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 19:1-19:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Following the current interest in developing automatic question answering systems, we analyse alternative approaches for finding suitable answers from a list of Frequently Asked Questions (FAQs), in Portuguese. These rely on different technologies, some more established and others more recent, and are all easily adaptable to new lists of FAQs, on new domains. We analyse the effort required for their configuration, the accuracy of their answers, and the time they take to get such answers. We conclude that traditional Information Retrieval (IR) can be a solution for smaller lists of FAQs, but approaches based on deep neural networks for sentence encoding are at least as reliable and less dependent on the number and complexity of the FAQs. We also contribute with a small dataset of Portuguese FAQs on the domain of telecommunications, which was used in our experiments.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • Natural Language Processing
  • Portuguese
  • Question Answering
  • FAQs
  • Information Retrieval
  • Sentence Encoding
  • Transformers


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877-1901, 2020. Google Scholar
  2. Robin D Burke, Kristian J Hammond, Vladimir Kulyukin, Steven L Lytinen, Noriko Tomuro, and Scott Schoenberg. Question answering from frequently asked question files: Experiences with the FAQ finder system. AI magazine, 18(2):57-57, 1997. Google Scholar
  3. Nuno Carriço and Paulo Quaresma. Sentence embeddings and sentence similarity for portuguese faqs. Proceedings of IberSPEECH 2021, pages 200-204, 2021. Google Scholar
  4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. Google Scholar
  5. Erick Fonseca, Leandro Santos, Marcelo Criscuolo, and Sandra Aluísio. Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática, 8(2):3-13, 2016. Google Scholar
  6. Erick R. Fonseca, Simone Magnolini, Anna Feltracco, Mohammed R. H. Qwaider, and Bernardo Magnini. Tweaking word embeddings for faq ranking. In Proceedings of 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS, 2016. Google Scholar
  7. Hugo Gonçalo Oliveira and Ana Alves. AIA-BDE: um corpo de perguntas, variações e outras anotações. Linguamática, 13(2):19-35, December 2021. Google Scholar
  8. Kalpana D Joshi and PS Nalwade. Modified k-means for better initial cluster centres. International Journal of Computer Science and Mobile Computing, 2(7):219-223, 2013. Google Scholar
  9. Mladen Karan, Lovro Žmak, and Jan Šnajder. Frequently asked questions retrieval for Croatian based on semantic textual similarity. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pages 24-33, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. Google Scholar
  10. Oleksandr Kolomiyets and Marie-Francine Moens. A Survey on Question Answering Technology from an Information Retrieval Perspective. Information Sciences, 181(24):5412-5434, December 2011. Google Scholar
  11. Govind Kothari, Sumit Negi, Tanveer A. Faruquie, Venkatesan T. Chakaravarthy, and L. Venkata Subramaniam. Sms based interface for faq retrieval. In Proc Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, ACL '09, pages 852-860. ACL, 2009. Google Scholar
  12. Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086-6096, Florence, Italy, July 2019. ACL. Google Scholar
  13. Yosi Mass, Boaz Carmeli, Haggai Roitman, and David Konopnicki. Unsupervised FAQ retrieval with question generation and BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 807-812, 2020. Google Scholar
  14. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311-318, 2002. Google Scholar
  15. Arianna Pipitone, Giuseppe Tirone, and Roberto Pirrone. ChiLab4It system in the QA4FAQ competition. In Proceedings of 5^th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS, 2016. URL:
  16. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. Google Scholar
  17. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383-2392, 2016. Google Scholar
  18. Livy Real, Erick Fonseca, and Hugo Gonçalo Oliveira. The ASSIN 2 shared task: a quick overview. In Computational Processing of the Portuguese Language - 13th International Conference, PROPOR 2020, Évora, Portugal, March 2-4, 2020, Proceedings, volume 12037 of LNCS, pages 406-412. Springer, 2020. Google Scholar
  19. Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kurohashi. FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1113-1116, 2019. Google Scholar
  20. José Santos, Luís Duarte, João Ferreira, Ana Alves, and Hugo Gonçalo Oliveira. Developing Amaia: A conversational agent for helping portuguese entrepreneurs — an extensive exploration of question-matching approaches for Portuguese. Information, 11(9), 2020. Google Scholar
  21. Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear), 2020. Google Scholar
  22. Ellen M. Voorhees. The TREC-8 Question Answering track report. In Proceedings of The Eighth Text REtrieval Conference, TREC 1999, Gaithersburg, Maryland, USA. NIST, November 1999. Google Scholar
  23. Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. Multilingual Universal Sentence Encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 87-94. ACL, July 2020. Google Scholar
  24. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models. arXiv preprint, 2022. URL:
  25. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with BERT. arXiv preprint, 2019. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail