Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Zayed, Omnia; McCrae, John P.; Buitelaar, Paul

doi:10.4230/OASIcs.LDK.2019.10

File

Author Details

Omnia Zayed

Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland

John P. McCrae

Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland

Paul Buitelaar

Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland

Cite AsGet BibTex

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.10

Abstract

Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Subject Classification

ACM Subject Classification

Computing methodologies → Natural language processing
Computing methodologies → Language resources

Keywords

metaphor
identification
tweets
dataset
annotation
crowd-sourcing

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Julia Birke and Anoop Sarkar. A clustering approach for nearly unsupervised recognition of nonliteral language. In In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, EACL '06, pages 329-336, Trento, Italy, April 2006.
Lou Burnard. About the British National Corpus, 2009. URL: http://www.natcorp.ox.ac.uk/corpus/index.xml.
Jonathan Charteris-Black. Metaphor in Political Discourse. In Politicians and Rhetoric: The Persuasive Power of Metaphor, pages 28-51. Palgrave Macmillan UK, London, 2011.
Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP '14, pages 740-750, Doha, Qatar, October 2014.
Mark Davies. The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics, 14(2):159-190, 2009.
Erik-Lân Do Dinh, Hannah Wieland, and Iryna Gurevych. Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP '18, pages 1412-1424, Brussels, Belgium, November 2018.
David Evans. Compiling a corpus. Corpus building and investigation for the Humanities, 2007 (accessed December 23, 2018). URL: https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/Intro/Unit2.pdf.
Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378-382, 1971.
W. Nelson Francis and Henry Kucera. The Brown Corpus: A Standard Corpus of Present-Day Edited American English. Technical report, Brown University Liguistics Department, 1979.
Dedre Gentner, Brian Bowdle, Phillip Wolff, and Consuelo Boronat. Metaphor Is Like Analogy. In D. Gentner, K. J. Holyoak, and B. N. Kokinov, editors, The analogical mind: Perspectives from cognitive science, pages 199-253. The MIT Press, Cambridge, MA, USA, 2001.
Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Ekaterina Shutova, John Barnden, and Antonio Reyes. SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval '15, pages 470-478, Denver, CO, USA, June 2015.
Miha Grčar, Darko Cherepnalkoski, Igor Mozetič, and Petra Kralj Novak. Stance and influence of Twitter users regarding the Brexit referendum. Computational Social Networks, 4(6):1-25, July 2017.
Patrick Hanks. Three Kinds of Semantic Resonance. In Proceedings of the 17th EURALEX International Congress, pages 37-48, Tbilisi, Georgia, September 2016.
Dirk Hovy, Shashank Srivastava, Sujay Kumar Jauhar, Mrinmaya Sachan, Kartik Goyal, Huiying Li, Whitney Sanders, and Eduard Hovy. Identifying Metaphorical Word Use with Tree Kernels. In Proceedings of the 1st Workshop on Metaphor in NLP, pages 52-56, Atlanta, GA, USA, June 2013.
Hyeju Jang, Seungwhan Moon, Yohan Jo, and Carolyn Rose. Metaphor detection in discourse. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '15, pages 384-392, Prague, Czech Republic, September 2015.
George Lakoff and Mark Johnson. Metaphors we live by. University of Chicago Press, Chicago, USA, 1980.
J. Richard Landis and Gary G. Koch. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1):159-174, 1977.
Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. SemEval-2018 Task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, SemEval '18, pages 1-17, New Orleans, LA, USA, June 2018.
Saif M. Mohammad, Ekaterina Shutova, and Peter D. Turney. Metaphor as a Medium for Emotion: An Empirical Study. In Proceedings of the 5th Joint Conference on Lexical and Computational Semantics, *Sem '16, pages 23-33, Berlin, Germany, 2016.
Michael Mohler, David Bracewell, Marc Tomlinson, and David Hinote. Semantic Signatures for Example-Based Linguistic Metaphor Detection. In Proceedings of the 1st Workshop on Metaphor in NLP, pages 27-35, Atlanta, GA, USA, June 2013.
Michael Mohler, Mary Brunson, Bryan Rink, and Marc Tomlinson. Introducing the LCC Metaphor Datasets. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC '16, pages 4221-4227, Portorož, Slovenia, May 2016.
Natalie Parde and Rodney Nielsen. A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC '18, pages 1535-1540, Miyazaki, Japan, May 2018.
Karin Kipper Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA, 2006.
Ekaterina Shutova. Design and Evaluation of Metaphor Processing Systems. Computational Linguistics, 41(4):579-623, December 2015.
Ekaterina Shutova, Douwe Kiela, and Jean Maillard. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT '16, pages 160-170, San Diego, CA, USA, June 2016.
Ekaterina Shutova and Simone Teufel. Metaphor Corpus Annotated for Source-Target Domain Mappings. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC '10, pages 255-261, Malta, May 2010.
Ekaterina Shutova, Simone Teufel, and Anna Korhonen. Statistical Metaphor Processing. Computational Linguistics, 39(2):301-353, June 2013.
S. Siegel and N. Castellan. Nonparametric statistics for the behavioral sciences. Mc Graw-Hill, 1988.
Gerard J. Steen, Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr, and Trijntje Pasma. A Method for Linguistic Metaphor Identification: From MIP to MIPVU. Converging evidence in language and communication research. John Benjamins Publishing Company, 2010.
Yulia Tsvetkov, Leonid Boytsov, Anatole Gershman, Eric Nyberg, and Chris Dyer. Metaphor Detection with Cross-Lingual Model Transfer. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL '14, pages 248-258, Baltimore, MD, USA, June 2014.
Peter D. Turney, Yair Neuman, Dan Assaf, and Yohai Cohen. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 680-690, Edinburgh, Scotland, UK, July 2011.
Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, and Clarissa Somers. Constructing an Annotated Corpus of Verbal MWEs for English. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWE-CxG-2018, pages 193-200, Santa Fe, NM, USA, August 2018.
Omnia Zayed, John Philip McCrae, and Paul Buitelaar. Phrase-Level Metaphor Identification using Distributed Representations of Word Meaning. In Proceedings of the Workshop on Figurative Language Processing, pages 81-90, New Orleans, LA, USA, June 2018.

Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors Omnia Zayed , John P. McCrae , Paul Buitelaar

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors Omnia Zayed , John P. McCrae , Paul Buitelaar

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message