Comparing Extant Story Classifiers: Results & New Directions

Eisenberg, Joshua D.; Yarlott, W. Victor H.; Finlayson, Mark A.

doi:10.4230/OASIcs.CMN.2016.6

File

Author Details

Joshua D. Eisenberg

W. Victor H. Yarlott

Mark A. Finlayson

Cite As Get BibTex

Joshua D. Eisenberg, W. Victor H. Yarlott, and Mark A. Finlayson. Comparing Extant Story Classifiers: Results & New Directions. In 7th Workshop on Computational Models of Narrative (CMN 2016). Open Access Series in Informatics (OASIcs), Volume 53, pp. 6:1-6:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/OASIcs.CMN.2016.6

Abstract

Having access to a large set of stories is a necessary first step for robust and wide-ranging computational narrative modeling; happily, language data - including stories - are increasingly available in electronic form. Unhappily, the process of automatically separating stories from other forms of written discourse is not straightforward, and has resulted in a data collection bottleneck. Therefore researchers have sought to develop reliable, robust automatic algorithms for identifying story text mixed with other non-story text. In this paper we report on the reimplementation and experimental comparison of the two approaches to this task: Gordon's unigram classifier, and Corman's semantic triplet classifier. We cross-analyze their performance on both Gordon's and Corman's corpora, and discuss similarities, differences, and gaps in the performance of these classifiers, and point the way forward to improving their approaches.

Subject Classification

Keywords

Story Detection
Machine Learning
Natural Language Processing
Perceptron Learning

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Kevin Burton, Akshay Java, and Ian Soboroff. The icwsm 2009 spinn3r dataset. In Proceedings of the Third Annual Conference on Weblogs and Social Media (ICWSM 2009), San Jose, CA, 2009.
B. Ceran, R. Karad, S. Corman, and H. Davulcu. A hybrid model and memory based story classifier. In The Third Workshop on Computational Models of Narrative (CMN). Istanbul, Turkey, 2012.
Betul Ceran, Ravi Karad, Ajay Mandvekar, Steven R. Corman, and Hasan Davulcu. A semantic triplet based story classifier. In Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on, pages 573-580. IEEE, 2012.
James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. An NLP curator (or: How I learned to stop worrying and love NLP pipelines). In LREC, pages 3276-3283, 2012.
Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification. In Proceedings of the 25th international conference on Machine learning, pages 264-271. ACM, 2008.
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363-370. Association for Computational Linguistics, 2005.
Mark A. Finlayson. The story workbench: An extensible semi-automatic text annotation tool. In Intelligent Narrative Technologies, 2011.
Edward Morgan Forster. Aspects of the Novel. RosettaBooks, 2010.
Andrew Gordon and Reid Swanson. Identifying personal stories in millions of weblog entries. In Third International Conference on Weblogs and Social Media, Data Challenge Workshop, San Jose, CA, 2009.
S. Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural computation, 15(7):1667-1689, 2003.
Paul Kingsbury and Martha Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003.
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL (System Demonstrations), pages 55-60, 2014.
Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. Feature selection, perceptron learning, and a usability case study for text categorization. In ACM SIGIR Forum, volume 31, pages 67-73. ACM, 1997.
Vasin Punyakanok, Dan Roth, and Wen-tau Yih. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2):257-287, 2008.
Lev Ratinov and Dan Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147-155. Association for Computational Linguistics, 2009.
Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386, 1958.
Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975.
Beatrice Santorini. Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Technical report, University of Pennsylvania, 1990.
Karin Kipper Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, 2005.

Comparing Extant Story Classifiers: Results & New Directions

Authors Joshua D. Eisenberg, W. Victor H. Yarlott, Mark A. Finlayson

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message