Supporting the Annotation Experience Through CorEx and Word Mover’s Distance

Author Stefania Pecòre

Thumbnail PDF


  • Filesize: 1.02 MB
  • 15 pages

Document Identifiers

Author Details

Stefania Pecòre
  • School of Electrical Engineering and Computer Science, University of Ottawa, Canada


We thank MITACS and SafeToNet Canada for their generous funding. In addition to this, we thank the University of Ottawa and the supervisor of the project, Professor Diana Inkpen, for their support.

Cite AsGet BibTex

Stefania Pecòre. Supporting the Annotation Experience Through CorEx and Word Mover’s Distance. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Online communities can be used to promote destructive behaviours, as in pro-Eating Disorder (ED) communities. Research needs annotated data to study these phenomena. Even though many platforms have already moderated this type of content, Twitter has not, and it can still be used for research purposes. In this paper, we unveiled emojis, words, and uncommon linguistic patterns within the ED Twitter community by using the Correlation Explanation (CorEx) algorithm on unstructured and non-annotated data to retrieve the topics. Then we annotated the dataset following these topics. We analysed then the use of CorEx and Word Mover’s Distance to retrieve automatically similar new sentences and augment the annotated dataset.

Subject Classification

ACM Subject Classification
  • Applied computing → Document management and text processing
  • Applied computing → Annotation
  • topic retrieval
  • annotation
  • eating disorders
  • natural language processing


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alina Arseniev-Koehler, Hedwig Lee, Tyler McCormick, and Megan A Moreno. # proana: Pro-eating disorder socialization on Twitter. Journal of Adolescent Health, 58(6):659-664, 2016. Google Scholar
  2. Carolina Figueras Bates. "I am a waste of breath, of space, of time" metaphors of self in a pro-anorexia group. Qualitative Health Research, 25(2):189-204, 2015. Google Scholar
  3. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993-1022, 2003. Google Scholar
  4. Leah Boepple and J Kevin Thompson. A content analytic comparison of fitspiration and thinspiration websites. International Journal of Eating Disorders, 49(1):98-101, 2016. Google Scholar
  5. Dina LG Borzekowski, Summer Schenk, Jenny L Wilson, and Rebecka Peebles. e-ana and e-mia: A content analysis of pro-eating disorder web sites. American journal of public health, 100(8):1526-1534, 2010. Google Scholar
  6. Patricia A Cavazos-Rehg, Melissa J Krauss, Shaina J Costello, Nina Kaiser, Elizabeth S Cahn, Ellen E Fitzsimmons-Craft, and Denise E Wilfley. "I just want to be skinny.": A content analysis of tweets expressing eating disorder symptoms. PloS one, 14(1):e0207506, 2019. Google Scholar
  7. Stevie Chancellor, Yannis Kalantidis, Jessica A Pater, Munmun De Choudhury, and David A Shamma. Multimodal classification of moderated online pro-eating disorder content. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 3213-3226, 2017. Google Scholar
  8. Stevie Chancellor, Zhiyuan Lin, Erica L Goodman, Stephanie Zerwas, and Munmun De Choudhury. Quantifying and predicting mental illness severity in online pro-eating disorder communities. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pages 1171-1184, 2016. Google Scholar
  9. Stevie Chancellor, Jessica Annette Pater, Trustin Clear, Eric Gilbert, and Munmun De Choudhury. # thyghgapp: Instagram content moderation and lexical variation in pro-eating disorder communities. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pages 1201-1213, 2016. Google Scholar
  10. Jonathan Chang, Sean Gerrish, Chong Wang, Jordan Boyd-graber, and David Blei. Reading tea leaves: How humans interpret topic models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22, pages 288-296. Curran Associates, Inc., 2009. URL:
  11. Yohan Chon, Yungeun Kim, Hyojeong Shin, and Hojung Cha. Topic modeling-based semantic annotation of place using personal behavior and environmental features. Transportation, 23:110, 2009. Google Scholar
  12. Glen Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, and Margaret Mitchell. Clpsych 2015 shared task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pages 31-39, 2015. Google Scholar
  13. Kristen M Culbert, Sarah E Racine, and Kelly L Klump. Research review: What we have learned about the causes of eating disorders - a synthesis of sociocultural, psychological, and biological research. Journal of Child Psychology and Psychiatry, 56(11):1141-1164, 2015. Google Scholar
  14. Munmun De Choudhury. Anorexia on tumblr: A characterization study. In Proceedings of the 5th international conference on digital health 2015, pages 43-50, 2015. Google Scholar
  15. Elizabeth W Diemer, Julia D Grant, Melissa A Munn-Chernoff, David A Patterson, and Alexis E Duncan. Gender identity, sexual orientation, and eating-related pathology in a national sample of college students. Journal of Adolescent Health, 57(2):144-149, 2015. Google Scholar
  16. Danielle A Gagne, Ann Von Holle, Kimberly A Brownley, Cristin D Runfola, Sara Hofmeier, Kateland E Branch, and Cynthia M Bulik. Eating disorder symptoms and weight and shape concerns in a large web-based convenience sample of women ages 50 and above: Results of the gender and body image (gabi) study. International Journal of Eating Disorders, 45(7):832-844, 2012. Google Scholar
  17. Ryan J Gallagher, Kyle Reing, David Kale, and Greg Ver Steeg. Anchored correlation explanation: Topic modeling with minimal domain knowledge. Transactions of the Association for Computational Linguistics, 5:529-542, 2017. Google Scholar
  18. Ryan J. Gallagher, Kyle Reing, David Kale, and Greg Ver Steeg. Anchored correlation explanation: Topic modeling with minimal domain knowledge. Transactions of the Association for Computational Linguistics, 5:529-542, 2017. URL:
  19. Jannath Ghaznavi and Laramie D Taylor. Bones, body parts, and sex appeal: An analysis of# thinspiration images on popular social media. Body image, 14:54-61, 2015. Google Scholar
  20. David Giles. Constructing identities in cyberspace: The case of eating disorders. British journal of social psychology, 45(3):463-477, 2006. Google Scholar
  21. Grace J Johnson and Paul J Ambrose. Neo-tribes: The power and potential of online communities in health care. Communications of the ACM, 49(1):107-113, 2006. Google Scholar
  22. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. Compressing text classification models. arXiv preprint, 2016. URL:
  23. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of tricks for efficient text classification. arXiv preprint, 2016. URL:
  24. Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957-966, 2015. Google Scholar
  25. David E Losada, Fabio Crestani, and Javier Parapar. Overview of erisk: early risk prediction on the internet. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 343-361. Springer, 2018. Google Scholar
  26. David E Losada, Fabio Crestani, and Javier Parapar. Overview of erisk 2019 early risk prediction on the internet. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 340-357. Springer, 2019. Google Scholar
  27. David E Losada, Fabio Crestani, and Javier Parapar. erisk 2020: Self-harm and depression challenges. In European Conference on Information Retrieval, pages 557-563. Springer, 2020. Google Scholar
  28. Luana Marques, Margarita Alegria, Anne E Becker, Chih-nan Chen, Angela Fang, Anne Chosak, and Juliana Belo Diniz. Comparative prevalence, correlates of impairment, and service utilization for eating disorders across us ethnic groups: Implications for reducing ethnic disparities in health care access for eating disorders. International Journal of Eating Disorders, 44(5):412-420, 2011. Google Scholar
  29. Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. URL:
  30. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL:
  31. David Mimno, Hanna M. Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, page 262–272, USA, 2011. Association for Computational Linguistics. Google Scholar
  32. Markus Moessner, Johannes Feldhege, Markus Wolf, and Stephanie Bauer. Analyzing big data in social media: Text and network analyses of an eating disorder forum. International Journal of Eating Disorders, 51(7):656-667, 2018. Google Scholar
  33. Jessica A Pater, Brooke Farrington, Alycia Brown, Lauren E Reining, Tammy Toscos, and Elizabeth D Mynatt. Exploring indicators of digital self-harm with eating disorder patients: A case study. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1-26, 2019. Google Scholar
  34. Danielle C Ransom, Jennifer G La Guardia, Erik Z Woody, and Jennifer L Boyd. Interpersonal interactions on online forums addressing eating concerns. International Journal of Eating Disorders, 43(2):161-170, 2010. Google Scholar
  35. Azadeh A Rikani, Zia Choudhry, Adnan M Choudhry, Huma Ikram, Muhammad W Asghar, Dilkash Kajal, Abdul Waheed, and Nusrat J Mobassarah. A critique of the literature on etiology of eating disorders. Annals of neurosciences, 20(4):157, 2013. Google Scholar
  36. Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases. In Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pages 59-66, 1998. URL:
  37. Yuanlong Shao, Yuan Zhou, Xiaofei He, Deng Cai, and Hujun Bao. Semi-supervised topic modeling for image annotation. In Proceedings of the 17th ACM International Conference on Multimedia, MM ’09, page 521–524, New York, NY, USA, 2009. Association for Computing Machinery. URL:
  38. Shaina J Sowles, Monique McLeary, Allison Optican, Elizabeth Cahn, Melissa J Krauss, Ellen E Fitzsimmons-Craft, Denise E Wilfley, and Patricia A Cavazos-Rehg. A content analysis of an online pro-eating disorder community on reddit. Body image, 24:137-144, 2018. Google Scholar
  39. Shabbir Syed-Abdul, Luis Fernandez-Luque, Wen-Shan Jian, Yu-Chuan Li, Steven Crain, Min-Huei Hsu, Yao-Chin Wang, Dorjsuren Khandregzen, Enkhzaya Chuluunbaatar, Phung Anh Nguyen, et al. Misleading health-related information promoted through video-based social media: anorexia on youtube. Journal of medical Internet research, 15(2):e30, 2013. Google Scholar
  40. Catherine Victoria Talbot, Jeffrey Gavin, Tommy Van Steen, and Yvette Morey. A content analysis of thinspiration, fitspiration, and bonespiration imagery on social media. Journal of eating disorders, 5(1):1-8, 2017. Google Scholar
  41. Marika Tiggemann, Owen Churches, Lewis Mitchell, and Zoe Brown. Tweeting weight loss: A comparison of# thinspiration and# fitspiration communities on Twitter. Body Image, 25:133-138, 2018. Google Scholar
  42. Marcel Trotzek, Sven Koitka, and Christoph M Friedrich. Word embeddings and linguistic metadata at the clef 2018 tasks for early detection of depression and anorexia. In CLEF (Working Notes), 2018. Google Scholar
  43. Suppawong Tuarob, Line C Pouchard, Prasenjit Mitra, and C Lee Giles. A generalized topic modeling approach for automatic document annotation. International Journal on Digital Libraries, 16(2):111-128, 2015. Google Scholar
  44. Greg Ver Steeg and Aram Galstyan. Discovering structure in high-dimensional data through correlation explanation. In Advances in Neural Information Processing Systems, pages 577-585, 2014. Google Scholar
  45. Tao Wang, Markus Brede, Antonella Ianni, and Emmanouil Mentzakis. Detecting and characterizing eating-disorder communities on social media. In Proceedings of the Tenth ACM International conference on web search and data mining, pages 91-100, 2017. Google Scholar
  46. Elad Yom-Tov, Luis Fernandez-Luque, Ingmar Weber, and Steven P Crain. Pro-anorexia and pro-recovery photo sharing: a tale of two warring tribes. Journal of medical Internet research, 14(6):e151, 2012. Google Scholar
  47. Wei Zhang, Yan-Chuan Sim, Jian Su, and Chew-Lim Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011. Google Scholar
  48. Sicheng Zhou, Yunpeng Zhao, Rubina Rizvi, Jiang Bian, Ann F Haynos, and Rui Zhang. Analysis of Twitter to identify topics related to eating disorder symptoms. In 2019 IEEE International Conference on Healthcare Informatics (ICHI), pages 1-4. IEEE, 2019. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail