Classification of Public Administration Complaints

Authors Francisco Caldeira, Luís Nunes , Ricardo Ribeiro



PDF
Thumbnail PDF

File

OASIcs.SLATE.2022.9.pdf
  • Filesize: 0.64 MB
  • 12 pages

Document Identifiers

Author Details

Francisco Caldeira
  • Iscte, University Institute of Lisbon, Portugal
Luís Nunes
  • Iscte, University Institute of Lisbon, Portugal
  • ISTAR, Lisbon, Portugal
Ricardo Ribeiro
  • Iscte, University Institute of Lisbon, Portugal
  • INESC-ID Lisbon, Portugal

Cite As Get BibTex

Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro. Classification of Public Administration Complaints. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/OASIcs.SLATE.2022.9

Abstract

Complaint management is a problem faced by many organizations that is both vital to customer image and highly dependent on human resources. This work attempts to tackle a part of the problem, by classifying summaries of complaints using machine learning models in order to better redirect these to the appropriate responders. The main challenges of this task is that training datasets are often small and highly imbalanced. This can can have a big impact on the performance of classification models. The dataset analyzed in this work suffers from both of these problems, being relatively small and having labels in different proportions. In this work, two different techniques are analyzed: combining classes together to increase the number of elements of the new class; and, providing new artificial examples for some classes via translation into other languages. The classification models explored were the following: k-NN, SVM, Naïve Bayes, boosting, and Deep Learning approaches, including transformers. The paper concludes that although, as expected, the classes with little representation are hard to classify, the techniques explored helped to boost the performance, especially in the classes with a low number of elements. SVM and BERT-based models outperformed their peers.

Subject Classification

ACM Subject Classification
  • Information systems → Clustering and classification
Keywords
  • Text Classification
  • Natural Language Processing
  • Deep Learning
  • BERT

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Fernando Batista and Ricardo Ribeiro. Sentiment analysis and topic classification based on binary maximum entropy classifiers. Proces. del Leng. Natural, 50:77-84, 2013. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4662.
  2. André Fazendeiro. Automatic correspondence distribution for a public institution. Master’s thesis, Instituto Superior Técnico, 2021. Google Scholar
  3. Ana Catarina Forte and Pavel B. Brazdil. Determining the level of clients' dissatisfaction from their commentaries. In João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami, and António Branco, editors, Computational Processing of the Portuguese Language, pages 74-85, Cham, 2016. Springer International Publishing. Google Scholar
  4. Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luisa Coheur, and Ana Alves. AIA-BDE: A corpus of FAQs in Portuguese and their variations. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 5442-5449, Marseille, France, May 2020. European Language Resources Association. URL: https://aclanthology.org/2020.lrec-1.669.
  5. Henrique Lopes-Cardoso, Tomás Freitas Osório, Luís Vilar Barbosa, Gil Rocha, Luís Paulo Reis, João Pedro Machado, and Ana Maria Oliveira. Robust complaint processing in portuguese. Information, 12(12), 2021. URL: https://doi.org/10.3390/info12120525.
  6. Pedro Henrique Luz de Araujo, Teófilo Emidio de Campos, and Marcelo Magalhães Silva de Sousa. Inferring the source of official texts: Can svm beat ulmfit? In Paulo Quaresma, Renata Vieira, Sandra Aluísio, Helena Moniz, Fernando Batista, and Teresa Gonçalves, editors, Computational Processing of the Portuguese Language, pages 76-86, Cham, 2020. Springer International Publishing. Google Scholar
  7. Luis Neto. Cia: Citizen contact center agent assistant. Master’s thesis, Instituto Superior Técnico, January 2021. Google Scholar
  8. Vilma Neves. Automatic classification of correspondence from a public institution. Master’s thesis, Instituto Superior Técnico, 2021. Google Scholar
  9. Sara Silva, Ricardo Ribeiro, and Rúben Pereira. Less is more in incident categorization. In Pedro Rangel Henriques, José Paulo Leal, António Menezes Leitão, and Xavier Gómez Guinovart, editors, 7th Symposium on Languages, Applications and Technologies, SLATE 2018, June 21-22, 2018, Guimaraes, Portugal, volume 62 of OASIcs, pages 17:1-17:7. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SLATE.2018.17.
  10. Alberto Simões, Xavier Gómez Guinovart, and José João Almeida. Enriching a portuguese wordnet using synonyms from a monolingual dictionary. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, May 2016. European Language Resources Association (ELRA). Google Scholar
  11. Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. Bertimbau: Pretrained bert models for brazilian portuguese. In Ricardo Cerri and Ronaldo C. Prati, editors, Intelligent Systems, pages 403-417, Cham, 2020. Springer International Publishing. Google Scholar
  12. Xiaobo Tang, Hao Mou, Jiangnan Liu, and Xin Du. Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching. Scientific Reports, 11(1):11849, June 2021. URL: https://doi.org/10.1038/s41598-021-91189-0.
  13. Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. Joint embedding of words and labels for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2321-2331, Melbourne, Australia, July 2018. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/P18-1216.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail