Document Open Access Logo

Less is more in incident categorization (Short Paper)

Authors Sara Silva, Ricardo Ribeiro , Rubén Pereira

Thumbnail PDF


  • Filesize: 350 kB
  • 7 pages

Document Identifiers

Author Details

Sara Silva
  • Instituto Universitário de Lisboa (ISCTE-IUL) Lisbon, Portugal
Ricardo Ribeiro
  • INESC-ID Lisboa, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
Rubén Pereira
  • Instituto Universitário de Lisboa (ISCTE-IUL) Lisbon, Portugal

Cite AsGet BibTex

Sara Silva, Ricardo Ribeiro, and Rubén Pereira. Less is more in incident categorization (Short Paper). In 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Open Access Series in Informatics (OASIcs), Volume 62, pp. 17:1-17:7, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


The IT incident management process requires a correct categorization to attribute incident tickets to the right resolution group and obtain as quickly as possible an operational system, impacting the minimum as possible the business and costumers. In this work, we introduce automatic text classification, demonstrating the application of several natural language processing techniques and analyzing the impact of each one on a real incident tickets dataset. The techniques that we explore in the pre-processing of the text that describes an incident are the following: tokenization, stemming, eliminating stop-words, named-entity recognition, and TF xIDF-based document representation. Finally, to build the model and observe the results after applying the previous techniques, we use two machine learning algorithms: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). Two important findings result from this study: a shorter description of an incident is better than a full description of an incident; and, pre-processing has little impact on incident categorization, mainly due the specific vocabulary used in this type of text.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • machine learning
  • automated incident categorization
  • SVM
  • incident management
  • natural language


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Muchahit Altintas and A. Cuneyd Tantug. Machine learning based volume diagnosis. In International Conference on Artificial Intelligence and Computer Science (AICS), pages 195-207, 2014. Google Scholar
  2. Sylvain Arlot and Alain Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40-79, 2010. URL:
  3. Rajeev Gupta, K. Hima Prasad, Laura Luan, Daniela Rosu, and Chris Ward. Multi-dimensional knowledge integration for efficient incident management in a services cloud. In IEEE International Conference on Services Computing, pages 57-64, 2009. URL:
  4. Rajeev Gupta, K. Hima Prasad, and Mukesh Mohania. Information integration techniques to automate incident management. In IEEE Network Operations and Management Symposium (NOMS), pages 979-982, 2008. URL:
  5. Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A practical guide to support vector classification. BJU international, 101(1):1396-1400, 2008. URL:
  6. Thorsten Joachims. Text categorization with Support Vector Machines: Learning with many relevant features. In Machine Learning: ECML-98, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer, Berlin, Heidelberg, 1998. Google Scholar
  7. John O. Long. Service operation. In Itil Version 3 at a Glance: Information Quick Reference, pages 55-74. Springer, 2008. URL:
  8. Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980. Google Scholar
  9. Sara Silva, Rúben Pereira, and Ricardo Ribeiro. Machine learning in incident categorization automation. In Proceedings of CISTI'2018: 13th Iberian Conference on Information Systems and Technologies, 2018. Google Scholar
  10. Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, and C Lee Giles. IKNN: Informative K-Nearest Neighbor Pattern Classification. Proceedings of the European conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pages 248-264, 2007. URL:
  11. Bruno Trstenjak, Sasa Mikac, and Dzenana Donko. KNN with TF-IDF based framework for text categorization. Procedia Engineering, 69:1356-1364, 2014. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail