Distinguishing Different Classes of Utterances - the UC-PT Corpus

Fernandes, Mariana Gaspar; Dias, Cátia; Coheur, Luísa

doi:10.4230/OASIcs.SLATE.2019.14

File

Subject Classification

ACM Subject Classification

Information systems → Question answering
Computing methodologies → Language resources
Social and professional topics → Computer and information systems training
Applied computing → Annotation
Computing methodologies → Supervised learning
Information systems → Information extraction

Keywords

Corpora
Questions
Conversational Agents
Portuguese Language

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

Conversational bots are being used in many scenarios and we can find them playing museum guides or providing customer support, for instance. These bots base their answers in specific information related with their domain of expertise, but there is general information, presented in each user request that, when properly identified, could also be useful for the agent to decide what to answer. As an example, if the user is asking a question or uttering a statement, the bot’s action in its search for a response will probably differ. In this paper we present three corpora for the Portuguese language - the UC-PT corpus - that can be used to help conversational bots to distinguish: a) questions from non questions, b) yes-no-questions from other types of questions; and c) personal from non-personal questions. With this information, the agent can decide, for instance, not to answer, redirect the question to a persona chatbot or decide to answer it with a simple "yes", "no" or "maybe". In addition, we benchmark the classification process in these corpora. This corpora will be made publicly available.

Cite As Get BibTex

Mariana Gaspar Fernandes, Cátia Dias, and Luísa Coheur. Distinguishing Different Classes of Utterances - the UC-PT Corpus. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 14:1-14:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/OASIcs.SLATE.2019.14

Author Details

Mariana Gaspar Fernandes

INESC-ID, Lisboa, Portugal
Instituto Superior Técnico, Universidade de Lisboa, Portugal

Cátia Dias

INESC-ID, Lisboa, Portugal
Instituto Superior Técnico, Universidade de Lisboa, Portugal

Luísa Coheur

INESC-ID, Lisboa, Portugal
Instituto Superior Técnico, Universidade de Lisboa, Portugal

References

Bas Aarts. Oxford Modern English Grammar. Oxford University Press, 2011.
Charu C. Aggarwal and ChengXiang Zhai. A Survey of Text Classification Algorithms, pages 163-222. Springer US, Boston, MA, 2012.
David Ameixa. Say Something Smart - ensinando um chatbot a responder com base em legendas de filmes. Master’s thesis, Instituto Superior Técnico, Lisboa, Portugal, 2015.
Edward Loper Bird, Steven and Ewan Klein. Natural Language Processing with Python. O’Reilly Media Inc., 2009.
Ângela Costa, Tiago Luís, Joana Ribeiro, Ana Cristina Mendes, and Luísa Coheur. An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012). European Language Resources Association (ELRA), 2012.
Pedro Fialho, Luísa Coheur, Sérgio dos Santos Lopes Curto, Pedro Miguel Abrunhosa Cládio, Ângela Costa, Alberto Abad, Hugo Meinedo, and Isabel Trancoso. MEET EDGAR, A TUTORING AGENT AT MONSERRATE. In ACL, Proceedings of the 51st Annual Meeting of the Association f, August 2013.
Xin Li and Dan Roth. Learning Question Classifiers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, COLING '02, pages 1-7, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. URL: http://dx.doi.org/10.3115/1072228.1072378.
Ana Cristina Mendes and Luísa Coheur. An Approach to Answer Selection in Question-Answering Based on Semantic Relations. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 1852-1857, 2011. URL: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-310.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
Maria Pereira. Just.chat - dos sistemas de pergunta/resposta para os chatbots. Master’s thesis, Instituto Superior Técnico, Lisboa, Portugal, 2015.
Maria João Pereira, Luísa Coheur, Pedro Fialho, and Ricardo Ribeiro. Chatbots' Greetings to Human-Computer Communication. CoRR, abs/1609.06479, 2016. URL: http://arxiv.org/abs/1609.06479.
Yiping Song, Rui Yan, Yansong Feng, Yaoyuan Zhang, Dongyan Zhao, and Ming Zhang. Towards a Neural Conversation Model With Diversity Net Using Determinantal Point Processes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018.

Distinguishing Different Classes of Utterances - the UC-PT Corpus

Authors Mariana Gaspar Fernandes, Cátia Dias, Luísa Coheur

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Distinguishing Different Classes of Utterances - the UC-PT Corpus

Authors Mariana Gaspar Fernandes, Cátia Dias, Luísa Coheur

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message