Document Open Access Logo

Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors Omnia Zayed , John P. McCrae , Paul Buitelaar



PDF
Thumbnail PDF

File

OASIcs.LDK.2019.10.pdf
  • Filesize: 0.64 MB
  • 17 pages

Document Identifiers

Author Details

Omnia Zayed
  • Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland
John P. McCrae
  • Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland
Paul Buitelaar
  • Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland

Cite AsGet BibTex

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.10

Abstract

Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • Computing methodologies → Language resources
Keywords
  • metaphor
  • identification
  • tweets
  • dataset
  • annotation
  • crowd-sourcing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Julia Birke and Anoop Sarkar. A clustering approach for nearly unsupervised recognition of nonliteral language. In In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, EACL '06, pages 329-336, Trento, Italy, April 2006. Google Scholar
  2. Lou Burnard. About the British National Corpus, 2009. URL: http://www.natcorp.ox.ac.uk/corpus/index.xml.
  3. Jonathan Charteris-Black. Metaphor in Political Discourse. In Politicians and Rhetoric: The Persuasive Power of Metaphor, pages 28-51. Palgrave Macmillan UK, London, 2011. Google Scholar
  4. Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP '14, pages 740-750, Doha, Qatar, October 2014. Google Scholar
  5. Mark Davies. The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics, 14(2):159-190, 2009. Google Scholar
  6. Erik-Lân Do Dinh, Hannah Wieland, and Iryna Gurevych. Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP '18, pages 1412-1424, Brussels, Belgium, November 2018. Google Scholar
  7. David Evans. Compiling a corpus. Corpus building and investigation for the Humanities, 2007 (accessed December 23, 2018). URL: https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/Intro/Unit2.pdf.
  8. Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998. Google Scholar
  9. Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378-382, 1971. Google Scholar
  10. W. Nelson Francis and Henry Kucera. The Brown Corpus: A Standard Corpus of Present-Day Edited American English. Technical report, Brown University Liguistics Department, 1979. Google Scholar
  11. Dedre Gentner, Brian Bowdle, Phillip Wolff, and Consuelo Boronat. Metaphor Is Like Analogy. In D. Gentner, K. J. Holyoak, and B. N. Kokinov, editors, The analogical mind: Perspectives from cognitive science, pages 199-253. The MIT Press, Cambridge, MA, USA, 2001. Google Scholar
  12. Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Ekaterina Shutova, John Barnden, and Antonio Reyes. SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval '15, pages 470-478, Denver, CO, USA, June 2015. Google Scholar
  13. Miha Grčar, Darko Cherepnalkoski, Igor Mozetič, and Petra Kralj Novak. Stance and influence of Twitter users regarding the Brexit referendum. Computational Social Networks, 4(6):1-25, July 2017. Google Scholar
  14. Patrick Hanks. Three Kinds of Semantic Resonance. In Proceedings of the 17th EURALEX International Congress, pages 37-48, Tbilisi, Georgia, September 2016. Google Scholar
  15. Dirk Hovy, Shashank Srivastava, Sujay Kumar Jauhar, Mrinmaya Sachan, Kartik Goyal, Huiying Li, Whitney Sanders, and Eduard Hovy. Identifying Metaphorical Word Use with Tree Kernels. In Proceedings of the 1st Workshop on Metaphor in NLP, pages 52-56, Atlanta, GA, USA, June 2013. Google Scholar
  16. Hyeju Jang, Seungwhan Moon, Yohan Jo, and Carolyn Rose. Metaphor detection in discourse. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '15, pages 384-392, Prague, Czech Republic, September 2015. Google Scholar
  17. George Lakoff and Mark Johnson. Metaphors we live by. University of Chicago Press, Chicago, USA, 1980. Google Scholar
  18. J. Richard Landis and Gary G. Koch. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1):159-174, 1977. Google Scholar
  19. Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. SemEval-2018 Task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, SemEval '18, pages 1-17, New Orleans, LA, USA, June 2018. Google Scholar
  20. Saif M. Mohammad, Ekaterina Shutova, and Peter D. Turney. Metaphor as a Medium for Emotion: An Empirical Study. In Proceedings of the 5th Joint Conference on Lexical and Computational Semantics, *Sem '16, pages 23-33, Berlin, Germany, 2016. Google Scholar
  21. Michael Mohler, David Bracewell, Marc Tomlinson, and David Hinote. Semantic Signatures for Example-Based Linguistic Metaphor Detection. In Proceedings of the 1st Workshop on Metaphor in NLP, pages 27-35, Atlanta, GA, USA, June 2013. Google Scholar
  22. Michael Mohler, Mary Brunson, Bryan Rink, and Marc Tomlinson. Introducing the LCC Metaphor Datasets. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC '16, pages 4221-4227, Portorož, Slovenia, May 2016. Google Scholar
  23. Natalie Parde and Rodney Nielsen. A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC '18, pages 1535-1540, Miyazaki, Japan, May 2018. Google Scholar
  24. Karin Kipper Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA, 2006. Google Scholar
  25. Ekaterina Shutova. Design and Evaluation of Metaphor Processing Systems. Computational Linguistics, 41(4):579-623, December 2015. Google Scholar
  26. Ekaterina Shutova, Douwe Kiela, and Jean Maillard. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT '16, pages 160-170, San Diego, CA, USA, June 2016. Google Scholar
  27. Ekaterina Shutova and Simone Teufel. Metaphor Corpus Annotated for Source-Target Domain Mappings. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC '10, pages 255-261, Malta, May 2010. Google Scholar
  28. Ekaterina Shutova, Simone Teufel, and Anna Korhonen. Statistical Metaphor Processing. Computational Linguistics, 39(2):301-353, June 2013. Google Scholar
  29. S. Siegel and N. Castellan. Nonparametric statistics for the behavioral sciences. Mc Graw-Hill, 1988. Google Scholar
  30. Gerard J. Steen, Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr, and Trijntje Pasma. A Method for Linguistic Metaphor Identification: From MIP to MIPVU. Converging evidence in language and communication research. John Benjamins Publishing Company, 2010. Google Scholar
  31. Yulia Tsvetkov, Leonid Boytsov, Anatole Gershman, Eric Nyberg, and Chris Dyer. Metaphor Detection with Cross-Lingual Model Transfer. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL '14, pages 248-258, Baltimore, MD, USA, June 2014. Google Scholar
  32. Peter D. Turney, Yair Neuman, Dan Assaf, and Yohai Cohen. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 680-690, Edinburgh, Scotland, UK, July 2011. Google Scholar
  33. Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, and Clarissa Somers. Constructing an Annotated Corpus of Verbal MWEs for English. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWE-CxG-2018, pages 193-200, Santa Fe, NM, USA, August 2018. Google Scholar
  34. Omnia Zayed, John Philip McCrae, and Paul Buitelaar. Phrase-Level Metaphor Identification using Distributed Representations of Word Meaning. In Proceedings of the Workshop on Figurative Language Processing, pages 81-90, New Orleans, LA, USA, June 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail