Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Inel, Oana; Aroyo, Lora

doi:10.4230/OASIcs.LDK.2019.12

File

Author Details

Oana Inel

Delft University of Technology, The Netherlands
Vrije Universiteit Amsterdam, The Netherlands

Lora Aroyo

Google Research, New York, US

Cite AsGet BibTex

Oana Inel and Lora Aroyo. Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.12

Abstract

Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

Subject Classification

ACM Subject Classification

Information systems → Crowdsourcing
Human-centered computing → Empirical studies in HCI
Computing methodologies → Machine learning

Keywords

Crowdsourcing
Human-in-the-Loop
Event Extraction
Time Extraction

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

L. Aroyo and C. Welty. Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1):15-24, 2015.
S. Bethard. ClearTK-TimeML: A minimalist approach to TempEval 2013. In * SEM, Volume 2: SemEval 2013, volume 2, pages 10-14, 2013.
K. Braunschweig, M. Thiele, J. Eberius, and W. Lehner. Enhancing named entity extraction by effectively incorporating the crowd. BTW Workshop, 2013.
K. Cao, X. Li, M. Fan, and R. Grishman. Improving event detection with active learning. In International Conference Recent Advances in Natural Language Processing, pages 72-77, 2015.
T. Caselli and O. Inel. Crowdsourcing StoryLines: Harnessing the Crowd for Causal Relation Annotation. In Proceedings of the Workshop Events and Stories in the News, 2018.
T. Caselli and R. Morante. Systems' Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task. In LREC, 2018.
T. Caselli, R. Sprugnoli, and O. Inel. Temporal Information Annotation: Crowd vs. Experts. In LREC, 2016.
A. Ceroni, U. Gadiraju, and M. Fisichella. Justevents: A crowdsourced corpus for event validation with strict temporal constraints. In ECIR, pages 484-492, 2017.
N. Chambers. NavyTime: Event and time ordering from raw text. Technical report, Naval Academy Annapolis MD, 2013.
A. Chang and C. D. Manning. SUTime: Evaluation in tempeval-3. In * SEM, Volume 2: SemEval 2013, volume 2, pages 78-82, 2013.
G. Demartini. Hybrid human-machine information systems: Challenges and opportunities. Computer Networks, 90:5-13, 2015.
A. Dumitrache, L. Aroyo, and C. Welty. Capturing Ambiguity in Crowdsourcing Frame Disambiguation. In HCOMP 2018, pages 12-20, 2018.
A. Dumitrache, O. Inel, L. Aroyo, B. Timmermans, and C. Welty. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. arXiv preprint arXiv:1808.06080, 2018.
A. Gangemi. A comparison of knowledge extraction tools for the semantic web. In ESWC Conference, pages 351-366, 2013.
O. Inel and L. Aroyo. Harnessing diversity in crowds and machines for better NER performance. In European Semantic Web Conference, pages 289-304, 2017.
O. Inel, L. Aroyo, C. Welty, and R.-J. Sips. Domain-independent quality measures for crowd truth disagreement. DeRiVE Workshop, page 2, 2013.
O. Inel, G. Haralabopoulos, D. Li, C. Van Gysel, Z. Szlávik, E. Simperl, E. Kanoulas, and L. Aroyo. Studying Topical Relevance with Evidence-based Crowdsourcing. In CIKM, pages 1253-1262. ACM, 2018.
H. Jung and A. Stent. ATT1: Temporal annotation using big windows and rich syntactic and semantic features. In *SEM, Volume 2: SemEval 2013, volume 2, pages 20-24, 2013.
O. Kolomiyets and M.-F. Moens. KUL: Data-driven approach to temporal parsing of newswire articles. In * SEM, Volume 2: SemEval 2013, volume 2, pages 83-87, 2013.
A. K. Kolya, A. Kundu, R. Gupta, A. Ekbal, and S. Bandyopadhyay. JU_CSE: A CRF based approach to annotation of temporal expression, event and temporal relations. In *SEM, Volume 2: SemEval 2013, volume 2, 2013.
K. Lee, Y. Artzi, Y. Choi, and L. Zettlemoyer. Event detection and factuality assessment with non-expert supervision. In EMNLP, pages 1643-1648, 2015.
S. Liao and R. Grishman. Using prediction from sentential scope to build a pseudo co-testing learner for event extraction. In IJCNLP, pages 714-722, 2011.
H. Llorens, E. Saquete, and B. Navarro. TIPsem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In SemEval, 2010.
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics System Demonstrations, pages 55-60, 2014.
C. Min, M. Srikanth, and A. Fowler. LCC-TE: a hybrid approach to temporal relation identification in news text. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 219-222, 2007.
J. Pustejovsky, R. Knippen, J. Littman, and R. Saurí. Temporal and event information in natural language text. Language resources and evaluation, 39(2):123-164, 2005.
J. Pustejovsky, J. Littman, R. Saurí, and M. Verhagen. TimeBank 1.2. Linguistic Data Consortium, 40, 2006.
R. Saurí, J. Littman, B. Knippen, R. Gaizauskas, A. Setzer, and J. Pustejovsky. TimeML annotation guidelines. Version, 1(1):31, 2006.
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP, pages 254-263, 2008.
R. Sprugnoli and A. Lenci. Crowdsourcing for the identification of event nominals: an experiment. In LREC, pages 1949-1955, 2014.
J. Strötgen, J. Zell, and M. Gertz. Heideltime: Tuning english and developing spanish resources for tempeval-3. In * SEM, Volume 2: SemEval 2013, volume 2, pages 15-19, 2013.
N. UzZaman, H. Llorens, L. Derczynski, J. Allen, M. Verhagen, and J. Pustejovsky. SemEval-2013 Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In * SEM, Volume 2: SemEval 2013, pages 1-9, 2013.
C. Van Son, O. Inel, R. Morante, L. Aroyo, and P. Vossen. Resource Interoperability for Sustainable Benchmarking: The Case of Events. In LREC, 2018.

Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Authors Oana Inel, Lora Aroyo

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References