Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Gonçalo Oliveira, Hugo; Inácio, Sara; Silva, Catarina

doi:10.4230/OASIcs.SLATE.2022.19

File

Author Details

Hugo Gonçalo Oliveira

CISUC, DEI, University of Coimbra, Portugal

Sara Inácio

CISUC, DEI, University of Coimbra, Portugal

Catarina Silva

CISUC, DEI, University of Coimbra, Portugal

Cite AsGet BibTex

Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva. Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 19:1-19:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/OASIcs.SLATE.2022.19

Abstract

Following the current interest in developing automatic question answering systems, we analyse alternative approaches for finding suitable answers from a list of Frequently Asked Questions (FAQs), in Portuguese. These rely on different technologies, some more established and others more recent, and are all easily adaptable to new lists of FAQs, on new domains. We analyse the effort required for their configuration, the accuracy of their answers, and the time they take to get such answers. We conclude that traditional Information Retrieval (IR) can be a solution for smaller lists of FAQs, but approaches based on deep neural networks for sentence encoding are at least as reliable and less dependent on the number and complexity of the FAQs. We also contribute with a small dataset of Portuguese FAQs on the domain of telecommunications, which was used in our experiments.

Subject Classification

ACM Subject Classification

Computing methodologies → Natural language processing

Keywords

Natural Language Processing
Portuguese
Question Answering
FAQs
Information Retrieval
Sentence Encoding
Transformers

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877-1901, 2020.
Robin D Burke, Kristian J Hammond, Vladimir Kulyukin, Steven L Lytinen, Noriko Tomuro, and Scott Schoenberg. Question answering from frequently asked question files: Experiences with the FAQ finder system. AI magazine, 18(2):57-57, 1997.
Nuno Carriço and Paulo Quaresma. Sentence embeddings and sentence similarity for portuguese faqs. Proceedings of IberSPEECH 2021, pages 200-204, 2021.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
Erick Fonseca, Leandro Santos, Marcelo Criscuolo, and Sandra Aluísio. Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática, 8(2):3-13, 2016.
Erick R. Fonseca, Simone Magnolini, Anna Feltracco, Mohammed R. H. Qwaider, and Bernardo Magnini. Tweaking word embeddings for faq ranking. In Proceedings of 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS, 2016.
Hugo Gonçalo Oliveira and Ana Alves. AIA-BDE: um corpo de perguntas, variações e outras anotações. Linguamática, 13(2):19-35, December 2021.
Kalpana D Joshi and PS Nalwade. Modified k-means for better initial cluster centres. International Journal of Computer Science and Mobile Computing, 2(7):219-223, 2013.
Mladen Karan, Lovro Žmak, and Jan Šnajder. Frequently asked questions retrieval for Croatian based on semantic textual similarity. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pages 24-33, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
Oleksandr Kolomiyets and Marie-Francine Moens. A Survey on Question Answering Technology from an Information Retrieval Perspective. Information Sciences, 181(24):5412-5434, December 2011.
Govind Kothari, Sumit Negi, Tanveer A. Faruquie, Venkatesan T. Chakaravarthy, and L. Venkata Subramaniam. Sms based interface for faq retrieval. In Proc Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, ACL '09, pages 852-860. ACL, 2009.
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086-6096, Florence, Italy, July 2019. ACL.
Yosi Mass, Boaz Carmeli, Haggai Roitman, and David Konopnicki. Unsupervised FAQ retrieval with question generation and BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 807-812, 2020.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311-318, 2002.
Arianna Pipitone, Giuseppe Tirone, and Roberto Pirrone. ChiLab4It system in the QA4FAQ competition. In Proceedings of 5^th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS, 2016. URL: http://ceur-ws.org/Vol-1749/.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383-2392, 2016.
Livy Real, Erick Fonseca, and Hugo Gonçalo Oliveira. The ASSIN 2 shared task: a quick overview. In Computational Processing of the Portuguese Language - 13th International Conference, PROPOR 2020, Évora, Portugal, March 2-4, 2020, Proceedings, volume 12037 of LNCS, pages 406-412. Springer, 2020.
Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kurohashi. FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1113-1116, 2019.
José Santos, Luís Duarte, João Ferreira, Ana Alves, and Hugo Gonçalo Oliveira. Developing Amaia: A conversational agent for helping portuguese entrepreneurs — an extensive exploration of question-matching approaches for Portuguese. Information, 11(9), 2020.
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear), 2020.
Ellen M. Voorhees. The TREC-8 Question Answering track report. In Proceedings of The Eighth Text REtrieval Conference, TREC 1999, Gaithersburg, Maryland, USA. NIST, November 1999.
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. Multilingual Universal Sentence Encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 87-94. ACL, July 2020.
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models. arXiv preprint, 2022. URL: http://arxiv.org/abs/2205.01068.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with BERT. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.09675.

Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Authors Hugo Gonçalo Oliveira , Sara Inácio, Catarina Silva

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Authors Hugo Gonçalo Oliveira , Sara Inácio, Catarina Silva

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References