Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors Hugo Gonçalo Oliveira , Ricardo Filipe, Ricardo Rodrigues , Ana Alves

Thumbnail PDF


  • Filesize: 470 kB
  • 14 pages

Document Identifiers

Author Details

Hugo Gonçalo Oliveira
  • CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
Ricardo Filipe
  • ISEC, Polytechnic Institute of Coimbra, Portugal
Ricardo Rodrigues
  • CISUC, University of Coimbra, Portugal
  • ESEC, Polytechnic Institute of Coimbra, Portugal
Ana Alves
  • CISUC, University of Coimbra, Portugal
  • ISEC, Polytechnic Institute of Coimbra, Portugal

Cite AsGet BibTex

Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves. Using Lucene for Developing a Question-Answering Agent in Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Given the limitations of available platforms for creating conversational agents, and that a question-answering agent suffices in many scenarios, we take advantage of the Information Retrieval library Lucene for developing such an agent for Portuguese. The solution described answers natural language questions based on an indexed list of FAQs. Its adaptation to different domains is a matter of changing the underlying list. Different configurations of this solution, mostly on the language analysis level, resulted in different search strategies, which were tested for answering questions about the economic activity in Portugal. In addition to comparing the different search strategies, we concluded that, towards better answers, it is fruitful to combine the results of different strategies with a voting method.

Subject Classification

ACM Subject Classification
  • Information systems → Search interfaces
  • Information systems → Question answering
  • Computing methodologies → Natural language processing
  • information retrieval
  • question answering
  • natural language interface
  • natural language processing
  • natural language understanding


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Ana Alves, Hugo Gonçalo Oliveira, Ricardo Rodrigues, and Rui Encarnação. ASAPP 2.0: Advancing the State-of-the-Art of Semantic Textual Similarity for Portuguese. In Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OASIcs, pages 12:1-12:17, Dagstuhl, Germany, June 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  2. Anabela Barreiro and Luís Miguel Cabral. ReEscreve: a translator-friendly multi-purpose paraphrasing software tool. In Proceedings of the Workshop Beyond Translation Memories: New Tools for Translators, 2009. Google Scholar
  3. Daniel Braun, Adrian Hernandez-Mendez, Florian Matthes, and Manfred Langen. Evaluating natural language understanding services for conversational question answering systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 174-185, 2017. Google Scholar
  4. Annalina Caputo, Marco Degemmis, Pasquale Lops, Francesco Lovecchio, and Vito Manzari. Overview of the EVALITA 2016 Question Answering for Frequently Asked Questions (QA4FAQ) Task. In Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), volume 1749 of CEUR Workshop Proceedings., 2016. Google Scholar
  5. Lei Cui, Shaohan Huang, Furu Wei, Chuanqi Tan, Chaoqun Duan, and Ming Zhou. SuperAgent: A Customer Service Chatbot for E-commerce Websites. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, System Demonstrations, pages 97-102. ACL Press, 2017. Google Scholar
  6. Peter Emerson. The original Borda count and partial voting. Social Choice and Welfare, 40(2):353-358, February 2013. Google Scholar
  7. Erick R. Fonseca, Simone Magnolini, Anna Feltracco, Mohammed R. H. Qwaider, and Bernardo Magnini. Tweaking Word Embeddings for FAQ Ranking. In Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749., 2016. Google Scholar
  8. Hugo Gonçalo Oliveira. A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination. Information, 9(2), 2018. Google Scholar
  9. Lynette Hirschman and Robert Gaizauskas. Natural Language Question Answering: the View from Here. Natural Language Engineering, 7(4):275-300, 2001. Google Scholar
  10. Zongcheng Ji, Zhengdong Lu, and Hang Li. An Information Retrieval Approach to Short Text Conversation. CoRR, abs/1408.6988, 2014. Google Scholar
  11. Valentin Jijkoun and Maarten de Rijke. Retrieving Answers from Frequently Asked Questions Pages on the Web. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM '05, pages 76-83, New York, NY, USA, 2005. ACM. Google Scholar
  12. Oleksandr Kolomiyets and Marie-Francine Moens. A Survey on Question Answering Technology from an Information Retrieval Perspective. Information Sciences, 181(24):5412-5434, December 2011. Google Scholar
  13. Govind Kothari, Sumit Negi, Tanveer A. Faruquie, Venkatesan T. Chakaravarthy, and L. Venkata Subramaniam. SMS Based Interface for FAQ Retrieval. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL '09, pages 852-860, Stroudsburg, PA, USA, 2009. ACL Press. Google Scholar
  14. Daniel Magarreiro, Luísa Coheur, and Francisco S. Melour. Using subtitles to deal with Out-of-Domain interactions. In Proceedings of 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial), pages 98-106, 2014. Google Scholar
  15. Arianna Pipitone, Giuseppe Tirone, and Roberto Pirrone. ChiLab4It system in the QA4FAQ competition. In Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 1749., 2016. Google Scholar
  16. Fabio Rinaldi, James Dowdall, Michael Hess, Diego Mollá, Rolf Schwitter, and Kaarel Kaljurand. Knowledge-Based Question Answering. In Proceedings of the 7superscriptth International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), pages 785-792, Oxford, UK, September 2003. Springer-Verlag. Google Scholar
  17. Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., 3(4):333-389, April 2009. Google Scholar
  18. Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. NLPPort: A Pipeline for Portuguese NLP (Short Paper). In 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OASIcs, pages 18:1-18:9, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  19. Yiping Song, Cheng-Te Li, Jian-Yun Nie, Ming Zhang, Dongyan Zhao, and Rui Yan. An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 4382-4388. International Joint Conferences on Artificial Intelligence Organization, July 2018. Google Scholar
  20. Oriol Vinyals and Quoc V. Le. A Neural Conversational Model. In Proceedings of ICML 2015 Deep Learning Workshop, Lille, France, 2015. Google Scholar
  21. Ellen M. Voorhees. The TREC Question Answering Track. Nat. Lang. Eng., 7(4):361-378, December 2001. Google Scholar
  22. Joseph Weizenbaum. ELIZA: a computer program for the study of natural language communication between man and machine. Commun. ACM, 9(1):36-45, January 1966. Google Scholar