Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments

Authors Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho , Ricardo Ribeiro , Fernando Batista



PDF
Thumbnail PDF

File

OASIcs.SLATE.2022.10.pdf
  • Filesize: 0.5 MB
  • 12 pages

Document Identifiers

Author Details

Bernardo Cunha Matos
  • INESC-ID Lisboa, Portugal
  • Instituto Superior Técnico, Lisbon, Portugal
Raquel Bento Santos
  • INESC-ID Lisbon, Portugal
  • Instituto Superior Técnico, Lisbon, Portugal
Paula Carvalho
  • INESC-ID Lisbon, Portugal
Ricardo Ribeiro
  • INESC-ID Lisbon, Portugal
  • Iscte - University Institute of Lisbon, Portugal
Fernando Batista
  • INESC-ID Lisbon, Portugal
  • Iscte - University Institute of Lisbon, Portugal

Cite AsGet BibTex

Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista. Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/OASIcs.SLATE.2022.10

Abstract

Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Transfer learning
  • Social and professional topics → Hate speech
  • Computing methodologies → Supervised learning
  • Computing methodologies → Machine learning approaches
  • Information systems → Clustering and classification
Keywords
  • Hate Speech
  • Text Classification
  • Transfer Learning
  • Supervised Learning
  • Deep Learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hala Al Kuwatly, Maximilian Wich, and Georg Groh. Identifying and measuring annotator bias based on annotators' demographic characteristics. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 184-190, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.alw-1.21.
  2. Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. Deep learning for hate speech detection in tweets. In Proceedings of the 26th international conference on World Wide Web companion, pages 759-760, 2017. Google Scholar
  3. Fabienne Baider. Covert hate speech, conspiracy theory and anti-semitism: Linguistic analysis versus legal judgement. International Journal for the Semiotics of Law-Revue internationale de Sémiotique juridique, pages 1-25, 2022. Google Scholar
  4. Paula Carvalho, Danielle Caled, Cláudia Silva, Fernando Batista, and Ricardo Ribeiro. The expression of Hate Speech against Afro-descendant, Roma and LGBTQ+ communities in YouTube comments. submitted, 2022. Google Scholar
  5. Danilo Croce, Giuseppe Castellucci, and Roberto Basili. GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2114-2119, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.191.
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/N19-1423.
  7. Suman Dowlagar and Radhika Mamidi. Hasocone@fire-hasoc2020: Using BERT and multilingual BERT models for hate speech detection. CoRR, abs/2101.09007, 2021. URL: http://arxiv.org/abs/2101.09007.
  8. Paula Fortuna and Sérgio Nunes. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):1-30, 2018. Google Scholar
  9. Paula Fortuna, João Rocha da Silva, Juan Soler-Company, Leo Wanner, and Sérgio Nunes. A hierarchically-labeled Portuguese hate speech dataset. In Proceedings of the Third Workshop on Abusive Language Online, pages 94-104, Florence, Italy, August 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/W19-3510.
  10. Björn Gambäck and Utpal Kumar Sikdar. Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online, pages 85-90, 2017. Google Scholar
  11. European University Institute. Monitoring media pluralism in the digital era: application of the media pluralism monitor in the European Union, Albania and Turkey in the years 2018 2019: country report Portugal. Publications Office, 2020. URL: https://doi.org/10.2870/292300.
  12. Md Saroar Jahan and Mourad Oussalah. A systematic review of hate speech automatic detection using natural language processing. arXiv preprint, 2021. URL: http://arxiv.org/abs/2106.00742.
  13. Satyajit Kamble and Aditya Joshi. Hate speech detection from code-mixed hindi-english tweets using deep learning models. arXiv preprint, 2018. URL: http://arxiv.org/abs/1811.05145.
  14. György Kovács, Pedro Alonso, and Rajkumar Saini. Challenges of hate speech detection in social media. SN Computer Science, 2(2), February 2021. URL: https://doi.org/10.1007/s42979-021-00457-3.
  15. Ping Liu, Wen Li, and Liang Zou. Nuli at semeval-2019 task 6: Transfer learning for offensive language detection using bidirectional transformers. In Proceedings of the 13th international workshop on semantic evaluation, pages 87-91, 2019. Google Scholar
  16. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. URL: https://doi.org/10.48550/ARXIV.1301.3781.
  17. Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. A bert-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications, pages 928-940. Springer, 2019. Google Scholar
  18. Karsten Müller and Carlo Schwarz. Fanning the flames of hate: Social media and hate crime. Journal of the European Economic Association, 19(4):2131-2167, 2021. Google Scholar
  19. Council of Europe. Portugal should act more resolutely to tackle racism and continue efforts to combat violence against women. https://www.coe.int/en/web/commissioner/-/portugal-should-act-more-resolutely-to-tackle-racism-and-continue-efforts-to-combat-violence-against-women, June 2021. Google Scholar
  20. Ryan Ong. Offensive language analysis using deep learning architecture. arXiv preprint, 2019. URL: http://arxiv.org/abs/1903.05280.
  21. Fabio Poletto, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, and Viviana Patti. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 55(2):477-523, 2021. Google Scholar
  22. Tharindu Ranasinghe, Marcos Zampieri, and Hansi Hettiarachchi. Brums at hasoc 2019: Deep learning models for multilingual hate speech and offensive language identification. In FIRE (Working Notes), pages 199-207, 2019. Google Scholar
  23. Ali Safaya, Moutasem Abdullatif, and Deniz Yuret. KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2054-2059, Barcelona (online), December 2020. International Committee for Computational Linguistics. URL: https://www.aclweb.org/anthology/2020.semeval-1.271.
  24. Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems, pages 403-417. Springer, 2020. Google Scholar
  25. Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint, 2019. URL: http://arxiv.org/abs/1903.08983.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail