A Review and Cluster Analysis of German Polarity Resources for Sentiment Analysis

Kern, Bettina M. J.; Baumann, Andreas; Kolb, Thomas E.; Sekanina, Katharina; Hofmann, Klaus; Wissik, Tanja; Neidhardt, Julia

doi:10.4230/OASIcs.LDK.2021.37

Abstract

The domain of German polarity dictionaries is heterogeneous with many small dictionaries created for different purposes and using different methods. This paper aims to map out the landscape of freely available German polarity dictionaries by clustering them to uncover similarities and shared features. We find that, although most dictionaries seem to agree in their assessment of a word’s sentiment, subsets of them form groups of interrelated dictionaries. These dependencies are in most cases an immediate reflex of how these dictionaries were designed and compiled. As a consequence, we argue that sentiment evaluation should be based on multiple and diverse sentiment resources in order to avoid error propagation and amplification of potential biases.

Sattam Almatarneh and Pablo Gamallo. Automatic construction of domain-specific sentiment lexicons for polarity classification. In Trends in Cyber-Physical Multi-Agent Systems. The PAAMS Collection - 15th International Conference, PAAMS 2017, pages 175-182. Springer International Publishing, 2018. URL: https://doi.org/10.1007/978-3-319-61578-3_17.
Jens Ambrasat, Christian von Scheve, Markus Conrad, Gesche Schauenburg, and Tobias Schröder. Consensus and stratification in the affective meaning of human sociality. Proceedings of the National Academy of Sciences, 111(22):8001-8006, 2014.
Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M Pérez, and Inigo Perona. An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1):243-256, 2013. URL: https://doi.org/10.1016/j.patcog.2012.07.021.
Guy Brock, Vasyl Pihur, Susmita Datta, and Somnath Datta. clValid: An R package for cluster validation. Journal of Statistical Software, 25(4):1-22, 2008. URL: http://www.jstatsoft.org/v25/i04/.
Chung-hong Chan, Joseph Bajjalieh, Loretta Auvil, Hartmut Wessler, Scott Althaus, Kasper Welbers, Wouter van Atteveldt, and Marc Jungblut. Four best practices for measuring news sentiment using ‘off-the-shelf’dictionaries: a large-scale p-hacking experiment. Computational Communication Research, 3(1):1-27, 2021. URL: https://doi.org/10.5117/CCR2021.1.001.CHAN.
Mark Cieliebak, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. A twitter corpus and benchmark resources for german sentiment analysis. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pages 45-51, 2017.
Evgenia Dimitriadou, Sara Dolničar, and Andreas Weingessel. An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1):137-159, 2002.
Elizabeth Duffy. Emotion: an example of the need for reorientation in psychology. Psychological Review, 41(2):184, 1934. URL: https://doi.org/10.1037/h0074603.
Joseph C Dunn. Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4(1):95-104, 1974. URL: https://doi.org/10.1080/01969727408546059.
Paul Ekman. Basic emotions. Handbook of cognition and emotion, 98(45-60):16, 1999.
Guy Emerson and Thierry Declerck. Sentimerge: Combining sentiment lexicons in a bayesian framework. In Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pages 30-38, 2014.
Julia Handl, Joshua Knowles, and Douglas B Kell. Computational cluster validation in post-genomic data analysis. Bioinformatics, 21(15):3201-3212, 2005. URL: https://doi.org/10.1093/bioinformatics/bti517.
Philipp Kanske and Sonja A Kotz. Leipzig affective norms for german: A reliability study. Behavior research methods, 42(4):987-991, 2010. URL: https://doi.org/10.3758/BRM.42.4.987.
Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons, 2009.
Manfred Klenner, Angela Fahrni, and Stefanos Petrakis. Polart: A robust tool for sentiment analysis. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), pages 235-238, 2009.
Roman Klinger, Surayya Samat Suliya, and Nils Reiter. Automatic emotion detection for quantitative literary studies. a case study based on franz kafka’s “das schloss” and “amerika”. Proceedings of the Digital Humanities, 2016.
Thomas Kolb, Katharina Sekanina, Andreas Baumann, and Julia Neidhardt. Austrian language polarity in newspapers (ALPIN). Dataset, v1.0. URL: https://phaidra.univie.ac.at/o:1169855.
Maximilian Köper and Sabine Schulte Im Walde. Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 german lemmas. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2595-2598, 2016.
Olaf Lahl, Anja S Göritz, Reinhard Pietrowsky, and Jessica Rosenberg. Using the world-wide web to obtain large-scale word norms: 190,212 ratings on a set of 2,654 german nouns. Behavior Research Methods, 41(1):13-19, 2009. URL: https://doi.org/10.3758/BRM.41.1.13.
Peter Lang. Behavioral treatment and bio-behavioral assessment: Computer applications. Technology in mental health care delivery systems, pages 119-137, 1980.
Martin Maechler, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik. cluster: Cluster Analysis Basics and Extensions, 2019.
Ujjwal Maulik and Sanghamitra Bandyopadhyay. Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on pattern analysis and machine intelligence, 24(12):1650-1654, 2002. URL: https://doi.org/10.1109/TPAMI.2002.1114856.
Glenn W Milligan and Martha C Cooper. An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2):159-179, 1985. URL: https://doi.org/10.1007/BF02294245.
Charles E Osgood. Dimensionality of the semantic space for communication via facial expressions. Scandinavian journal of psychology, 7(1):1-30, 1966. URL: https://doi.org/10.1111/j.1467-9450.1966.tb01334.x.
Charles E Osgood, George J Suci, and Percy H Tannenbaum. 1957the measurement of meaning. Urbana: University of Illinois Press, 47, 1957.
Vasyl Pihur, Somnath Datta, and Susmita Datta. RankAggreg: Weighted Rank Aggregation, 2020. R package version 0.6.6. URL: https://CRAN.R-project.org/package=RankAggreg.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020. URL: https://www.R-project.org/.
Jutta Ransmayr, Karlheinz Mörth, and Matej Ďurčo. Ii. amc (austrian media corpus) - korpusbasierte forschungen zum österreichischen deutsch, 2017.
Robert Remus, Uwe Quasthoff, and Gerhard Heyer. Sentiws - a publicly available german-language resource for sentiment analysis. In Proceedings of the 7th International Language Resources and Evaluation (LREC'10), pages 1168-1171, 2010.
Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. Sentibench-a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science, 5(1):1-29, 2016. URL: https://doi.org/10.1140/epjds/s13688-016-0085-1.
Sven Rill, Sven Adolph, Johannes Drescher, Dirk Reinel, Jörg Scheidt, Oliver Schütz, Florian Wogenstein, Roberto V Zicari, and Nikolaos Korfiatis. A phrase-based opinion list for the german language. In Proceedings of the 11th Conference on Natural Language Processing (KONVENS'2012), pages 305-313, 2012.
Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53-65, 1987. URL: https://doi.org/10.1016/0377-0427(87)90125-7.
Josef Ruppenhofer, Petra Steiner, and Michael Wiegand. Evaluating the morphological compositionality of polarity. In Proceedings of the 11th international conference on Recent Advances in Natural Language Processing (RANLP'2017), pages 625-–633, 2017.
James A Russell. A circumplex model of affect. Journal of personality and social psychology, 39(6):1161, 1980.
David S Schmidtke, Tobias Schröder, Arthur M Jacobs, and Markus Conrad. Angst: Affective norms for german sentiment terms, derived from the affective norms for english words. Behavior research methods, 46(4):1108-1118, 2014. URL: https://doi.org/10.3758/s13428-013-0426-y.
Tobias Schröder. A model of language-based impression formation and attribution among germans. Journal of Language and Social Psychology, 30(1):82-102, 2011. URL: https://doi.org/10.1177/0261927X10387103.
Peter Turney, Yair Neuman, Dan Assaf, and Yohai Cohen. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 680-690, 2011.
Peter D Turney and Michael L Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4):315-346, 2003. URL: https://doi.org/10.1145/944012.944013.
Wouter van Atteveldt, Mariken ACG van der Velden, and Mark Boukes. The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, pages 1-20, 2021. URL: https://doi.org/10.1080/19312458.2020.1869198.
Melissa LH Vo, Markus Conrad, Lars Kuchinke, Karolina Urton, Markus J Hofmann, and Arthur M Jacobs. The berlin affective word list reloaded (bawl-r). Behavior research methods, 41(2):534-538, 2009. URL: https://doi.org/10.3758/BRM.41.2.534.
Ulli Waltinger. German polarity clues: A lexical resource for german sentiment analysis. In LREC, pages 1638-1642. Citeseer, 2010.
Ulli Waltinger. Sentiment analysis reloaded-a comparative study on sentiment polarity identification combining machine learning and subjectivity features. In WEBIST (1), pages 203-210, 2010.
Hadley Wickham. stringr: Simple, Consistent Wrappers for Common String Operations, 2019. R package version 1.4.0. URL: https://CRAN.R-project.org/package=stringr.

A Review and Cluster Analysis of German Polarity Resources for Sentiment Analysis

Authors Bettina M. J. Kern , Andreas Baumann , Thomas E. Kolb , Katharina Sekanina, Klaus Hofmann, Tanja Wissik , Julia Neidhardt

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

A Review and Cluster Analysis of German Polarity Resources for Sentiment Analysis

Authors Bettina M. J. Kern , Andreas Baumann , Thomas E. Kolb , Katharina Sekanina, Klaus Hofmann, Tanja Wissik , Julia Neidhardt

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message