Distinguishing Different Classes of Utterances - the UC-PT Corpus

Authors Mariana Gaspar Fernandes, Cátia Dias, Luísa Coheur

Thumbnail PDF


  • Filesize: 360 kB
  • 8 pages

Document Identifiers

Author Details

Mariana Gaspar Fernandes
  • INESC-ID, Lisboa, Portugal
  • Instituto Superior Técnico, Universidade de Lisboa, Portugal
Cátia Dias
  • INESC-ID, Lisboa, Portugal
  • Instituto Superior Técnico, Universidade de Lisboa, Portugal
Luísa Coheur
  • INESC-ID, Lisboa, Portugal
  • Instituto Superior Técnico, Universidade de Lisboa, Portugal

Cite AsGet BibTex

Mariana Gaspar Fernandes, Cátia Dias, and Luísa Coheur. Distinguishing Different Classes of Utterances - the UC-PT Corpus. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 14:1-14:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Conversational bots are being used in many scenarios and we can find them playing museum guides or providing customer support, for instance. These bots base their answers in specific information related with their domain of expertise, but there is general information, presented in each user request that, when properly identified, could also be useful for the agent to decide what to answer. As an example, if the user is asking a question or uttering a statement, the bot’s action in its search for a response will probably differ. In this paper we present three corpora for the Portuguese language - the UC-PT corpus - that can be used to help conversational bots to distinguish: a) questions from non questions, b) yes-no-questions from other types of questions; and c) personal from non-personal questions. With this information, the agent can decide, for instance, not to answer, redirect the question to a persona chatbot or decide to answer it with a simple "yes", "no" or "maybe". In addition, we benchmark the classification process in these corpora. This corpora will be made publicly available.

Subject Classification

ACM Subject Classification
  • Information systems → Question answering
  • Computing methodologies → Language resources
  • Social and professional topics → Computer and information systems training
  • Applied computing → Annotation
  • Computing methodologies → Supervised learning
  • Information systems → Information extraction
  • Corpora
  • Questions
  • Conversational Agents
  • Portuguese Language


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Bas Aarts. Oxford Modern English Grammar. Oxford University Press, 2011. Google Scholar
  2. Charu C. Aggarwal and ChengXiang Zhai. A Survey of Text Classification Algorithms, pages 163-222. Springer US, Boston, MA, 2012. Google Scholar
  3. David Ameixa. Say Something Smart - ensinando um chatbot a responder com base em legendas de filmes. Master’s thesis, Instituto Superior Técnico, Lisboa, Portugal, 2015. Google Scholar
  4. Edward Loper Bird, Steven and Ewan Klein. Natural Language Processing with Python. O’Reilly Media Inc., 2009. Google Scholar
  5. Ângela Costa, Tiago Luís, Joana Ribeiro, Ana Cristina Mendes, and Luísa Coheur. An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012). European Language Resources Association (ELRA), 2012. Google Scholar
  6. Pedro Fialho, Luísa Coheur, Sérgio dos Santos Lopes Curto, Pedro Miguel Abrunhosa Cládio, Ângela Costa, Alberto Abad, Hugo Meinedo, and Isabel Trancoso. MEET EDGAR, A TUTORING AGENT AT MONSERRATE. In ACL, Proceedings of the 51st Annual Meeting of the Association f, August 2013. Google Scholar
  7. Xin Li and Dan Roth. Learning Question Classifiers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, COLING '02, pages 1-7, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. URL: http://dx.doi.org/10.3115/1072228.1072378.
  8. Ana Cristina Mendes and Luísa Coheur. An Approach to Answer Selection in Question-Answering Based on Semantic Relations. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 1852-1857, 2011. URL: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-310.
  9. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  10. Maria Pereira. Just.chat - dos sistemas de pergunta/resposta para os chatbots. Master’s thesis, Instituto Superior Técnico, Lisboa, Portugal, 2015. Google Scholar
  11. Maria João Pereira, Luísa Coheur, Pedro Fialho, and Ricardo Ribeiro. Chatbots' Greetings to Human-Computer Communication. CoRR, abs/1609.06479, 2016. URL: http://arxiv.org/abs/1609.06479.
  12. Yiping Song, Rui Yan, Yansong Feng, Yaoyuan Zhang, Dongyan Zhao, and Ming Zhang. Towards a Neural Conversation Model With Diversity Net Using Determinantal Point Processes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail