Using Lucene for Developing a Question-Answering Agent in Portuguese

Gonçalo Oliveira, Hugo; Filipe, Ricardo; Rodrigues, Ricardo; Alves, Ana

doi:10.4230/OASIcs.SLATE.2019.2

Abstract

Given the limitations of available platforms for creating conversational agents, and that a question-answering agent suffices in many scenarios, we take advantage of the Information Retrieval library Lucene for developing such an agent for Portuguese. The solution described answers natural language questions based on an indexed list of FAQs. Its adaptation to different domains is a matter of changing the underlying list. Different configurations of this solution, mostly on the language analysis level, resulted in different search strategies, which were tested for answering questions about the economic activity in Portugal. In addition to comparing the different search strategies, we concluded that, towards better answers, it is fruitful to combine the results of different strategies with a voting method.

Ana Alves, Hugo Gonçalo Oliveira, Ricardo Rodrigues, and Rui Encarnação. ASAPP 2.0: Advancing the State-of-the-Art of Semantic Textual Similarity for Portuguese. In Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OASIcs, pages 12:1-12:17, Dagstuhl, Germany, June 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
Anabela Barreiro and Luís Miguel Cabral. ReEscreve: a translator-friendly multi-purpose paraphrasing software tool. In Proceedings of the Workshop Beyond Translation Memories: New Tools for Translators, 2009.
Daniel Braun, Adrian Hernandez-Mendez, Florian Matthes, and Manfred Langen. Evaluating natural language understanding services for conversational question answering systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 174-185, 2017.
Annalina Caputo, Marco Degemmis, Pasquale Lops, Francesco Lovecchio, and Vito Manzari. Overview of the EVALITA 2016 Question Answering for Frequently Asked Questions (QA4FAQ) Task. In Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), volume 1749 of CEUR Workshop Proceedings. CEUR-WS.org, 2016.
Lei Cui, Shaohan Huang, Furu Wei, Chuanqi Tan, Chaoqun Duan, and Ming Zhou. SuperAgent: A Customer Service Chatbot for E-commerce Websites. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, System Demonstrations, pages 97-102. ACL Press, 2017.
Peter Emerson. The original Borda count and partial voting. Social Choice and Welfare, 40(2):353-358, February 2013.
Erick R. Fonseca, Simone Magnolini, Anna Feltracco, Mohammed R. H. Qwaider, and Bernardo Magnini. Tweaking Word Embeddings for FAQ Ranking. In Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS.org, 2016.
Hugo Gonçalo Oliveira. A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination. Information, 9(2), 2018.
Lynette Hirschman and Robert Gaizauskas. Natural Language Question Answering: the View from Here. Natural Language Engineering, 7(4):275-300, 2001.
Zongcheng Ji, Zhengdong Lu, and Hang Li. An Information Retrieval Approach to Short Text Conversation. CoRR, abs/1408.6988, 2014.
Valentin Jijkoun and Maarten de Rijke. Retrieving Answers from Frequently Asked Questions Pages on the Web. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM '05, pages 76-83, New York, NY, USA, 2005. ACM.
Oleksandr Kolomiyets and Marie-Francine Moens. A Survey on Question Answering Technology from an Information Retrieval Perspective. Information Sciences, 181(24):5412-5434, December 2011.
Govind Kothari, Sumit Negi, Tanveer A. Faruquie, Venkatesan T. Chakaravarthy, and L. Venkata Subramaniam. SMS Based Interface for FAQ Retrieval. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL '09, pages 852-860, Stroudsburg, PA, USA, 2009. ACL Press.
Daniel Magarreiro, Luísa Coheur, and Francisco S. Melour. Using subtitles to deal with Out-of-Domain interactions. In Proceedings of 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial), pages 98-106, 2014.
Arianna Pipitone, Giuseppe Tirone, and Roberto Pirrone. ChiLab4It system in the QA4FAQ competition. In Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749. CEUR-WS.org, 2016.
Fabio Rinaldi, James Dowdall, Michael Hess, Diego Mollá, Rolf Schwitter, and Kaarel Kaljurand. Knowledge-Based Question Answering. In Proceedings of the 7superscriptth International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), pages 785-792, Oxford, UK, September 2003. Springer-Verlag.
Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., 3(4):333-389, April 2009.
Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. NLPPort: A Pipeline for Portuguese NLP (Short Paper). In 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OASIcs, pages 18:1-18:9, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
Yiping Song, Cheng-Te Li, Jian-Yun Nie, Ming Zhang, Dongyan Zhao, and Rui Yan. An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 4382-4388. International Joint Conferences on Artificial Intelligence Organization, July 2018.
Oriol Vinyals and Quoc V. Le. A Neural Conversational Model. In Proceedings of ICML 2015 Deep Learning Workshop, Lille, France, 2015.
Ellen M. Voorhees. The TREC Question Answering Track. Nat. Lang. Eng., 7(4):361-378, December 2001.
Joseph Weizenbaum. ELIZA: a computer program for the study of natural language communication between man and machine. Commun. ACM, 9(1):36-45, January 1966.

Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors Hugo Gonçalo Oliveira , Ricardo Filipe, Ricardo Rodrigues , Ana Alves

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors Hugo Gonçalo Oliveira , Ricardo Filipe, Ricardo Rodrigues , Ana Alves

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message