Animacy Detection in Stories

Karsdorp, Folgert; van der Meulen, Marten; Meder, Theo; van den Bosch, Antal

doi:10.4230/OASIcs.CMN.2015.82

Abstract

This paper presents a linguistically uninformed computational model for animacy classification. The model makes use of word n-grams in combination with lower dimensional word embedding representations that are learned from a web-scale corpus. We compare the model to a number of linguistically informed models that use features such as dependency tags and show competitive results. We apply our animacy classifier to a large collection of Dutch folktales to obtain a list of all characters in the stories. We then draw a semantic map of all automatically extracted characters which provides a unique entrance point to the collection.

Jelke Bloem and Gosse Bouma. Automatic animacy classification for Dutch. Computational Linguistics in the Netherlands Journal, 3:82-102, 2013.
Gosse Bouma, Gertjan Van Noord, and Robert Malouf. Alpino: Wide-coverage computational analysis of dutch. Language and Computers, 37(1):45-59, 2001.
Samuel Bowman and Harshit Chopra. Automatic animacy classification. In Proceedings of the NAACL - HLT 2012 Student Research Workshop, pages 7-10, 2012.
Bernard Comrie. Language Universals and Linguistic Typology. University of Chicago Press, 2nd edition, 1989.
Tom De Smedt and Walter Daelemans. Pattern for Python. Journal of Machine Learning Research, 13:2031-2035, 2012.
Daniel Dennett. The Intentional Stance. Cambridge, Massachusetts: The MIT Press, 1996.
Richard Evans and Constantin Orăsan. Improving anaphore resolution by identifying animate entities in texts. In Proceedings of the Discourse Anaphora and Reference Resolution Conference, pages 154-162, 2000.
Tao Gao, Brian Scholl, and Gregory McCarthy. Dissociating the detection of intentionality from animacy in the right posterior superior temporal sulcus. The Journal of neuroscience: the official journal of the Society for Neuroscience, 32(41):14276-14280, 2012.
Emiel Krahmer Jorrig Vogels and Alfons Maes. When a stone tries to climb up a slope: the interplay between lexical and perceptual animacy in referential choices. Frontiers in Psychology, 4(154):1-15, 2013.
Folgert Karsdorp, Peter Van Kranenburg, Theo Meder, and Antal Van den Bosch. Casting a spell: Indentification and ranking of actors in folktales. In F Mambrini, M Passarotti, and C Sporleder, editors, Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), pages 39-50, 2012.
Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 39(4), 2013.
Theo Meder. From a dutch folktale database towards an international folktale database. Fabula, 51(1-2):6-22, 2010.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Proceedings of Workship at ICLR, 2013.
Joshua Moore, Christopher Burges, Erin Renshaw, and Wen tau Yih. Animacy detection with voting models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 55-60, 2013.
Mante S. Nieuwland and Jos J.A. van Berkum. When Peanuts Fall in Love: N400 Evidence for the Power of Discourse. Journal of Cognitive Neuroscience, 18(7):1098-1111, 2005.
John Opfer. Identifying living and sentient kinds from dynamic information: The case of goal-directed versus aimless autonomous movement in conceptual change. Cognition, 86(2):97-122, 2002.
Constantin Orăsan and Richard Evans. Learning to identify animate references. In Walter Daelemans and Rémi Zajac, editors, Proceedings of CoNLL-2001, pages 129-136, Toulouse, France, July, 6 - 7 2001.
Constantin Orăsan and Richard Evans. Np animacy identification for anaphora resolution. Journal of Artificial Intelligence Research, 29:79-103, 2007.
Lilja Øvrelid. Animacy classification based on morphosyntactic corpus frequencies: Some experiments with Norwegian nouns. In Kiril Simov, Dimitar Kazakov, and Petya Osenova, editors, Proceedings of the Workshop on Exploring Syntactically Annotated Corpora, pages 24-34, 2005.
Lilja Øvrelid. Towards robust animacy classification using morphosyntactic distributional features. In Proceedings of the EACL 2006 Student Research Workshop, pages 47-54, 2006.
Lilja Øvrelid. Linguistic features in data-driven dependency parsing. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL 2008), pages 25-32, 2008.
Lilja Øvrelid and Joakim Nivre. When word order and part-of-speech tags are not enough - Swedish dependency parsing with rich linguistic features. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pages 447-451, 2007.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Proceedings of The 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532-1543, Doha, Qatar, 2014.
Anette Rosenbach. Animacy and grammatical variation - findings from english genitive variation. Lingua, 118:151-171, 2008.
Roland Schäfer and Felix Bildhauer. Building large corpora from the web using a new efficient tool chain. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pages 486-493, Istanbul, 2012. ELRA.
Jacques Sinninghe. Volkssprookjes uit Nederland en Vlaanderen. Kruseman, Den Haag, 1978.
Hans-Jörg Uther. The Types of International Folktales: a Classification and Bibliography Based on the System of Antti Aarne and Stith Thompson, volume 1-3 of FF Communications. Academia Scientarium Fennica, Helsinki, 2004.
Lauren Van der Maaten and Geoffrey Hinton. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, pages 2579-2605, 2008.
Cornelis Van Rijsbergen. Information Retrieval. Butterworths, 1979.

Animacy Detection in Stories

Authors Folgert Karsdorp, Marten van der Meulen, Theo Meder, Antal van den Bosch

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References