Search Results

Documents authored by Zayed, Omnia


Document
Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors: Omnia Zayed, John P. McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Cite as

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{zayed_et_al:OASIcs.LDK.2019.10,
  author =	{Zayed, Omnia and McCrae, John P. and Buitelaar, Paul},
  title =	{{Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{10:1--10:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.10},
  URN =		{urn:nbn:de:0030-drops-103740},
  doi =		{10.4230/OASIcs.LDK.2019.10},
  annote =	{Keywords: metaphor, identification, tweets, dataset, annotation, crowd-sourcing}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail