Comparing Extant Story Classifiers: Results & New Directions

Authors Joshua D. Eisenberg, W. Victor H. Yarlott, Mark A. Finlayson

Thumbnail PDF


  • Filesize: 403 kB
  • 10 pages

Document Identifiers

Author Details

Joshua D. Eisenberg
W. Victor H. Yarlott
Mark A. Finlayson

Cite AsGet BibTex

Joshua D. Eisenberg, W. Victor H. Yarlott, and Mark A. Finlayson. Comparing Extant Story Classifiers: Results & New Directions. In 7th Workshop on Computational Models of Narrative (CMN 2016). Open Access Series in Informatics (OASIcs), Volume 53, pp. 6:1-6:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Having access to a large set of stories is a necessary first step for robust and wide-ranging computational narrative modeling; happily, language data - including stories - are increasingly available in electronic form. Unhappily, the process of automatically separating stories from other forms of written discourse is not straightforward, and has resulted in a data collection bottleneck. Therefore researchers have sought to develop reliable, robust automatic algorithms for identifying story text mixed with other non-story text. In this paper we report on the reimplementation and experimental comparison of the two approaches to this task: Gordon's unigram classifier, and Corman's semantic triplet classifier. We cross-analyze their performance on both Gordon's and Corman's corpora, and discuss similarities, differences, and gaps in the performance of these classifiers, and point the way forward to improving their approaches.
  • Story Detection
  • Machine Learning
  • Natural Language Processing
  • Perceptron Learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Kevin Burton, Akshay Java, and Ian Soboroff. The icwsm 2009 spinn3r dataset. In Proceedings of the Third Annual Conference on Weblogs and Social Media (ICWSM 2009), San Jose, CA, 2009. Google Scholar
  2. B. Ceran, R. Karad, S. Corman, and H. Davulcu. A hybrid model and memory based story classifier. In The Third Workshop on Computational Models of Narrative (CMN). Istanbul, Turkey, 2012. Google Scholar
  3. Betul Ceran, Ravi Karad, Ajay Mandvekar, Steven R. Corman, and Hasan Davulcu. A semantic triplet based story classifier. In Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on, pages 573-580. IEEE, 2012. Google Scholar
  4. James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. An NLP curator (or: How I learned to stop worrying and love NLP pipelines). In LREC, pages 3276-3283, 2012. Google Scholar
  5. Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification. In Proceedings of the 25th international conference on Machine learning, pages 264-271. ACM, 2008. Google Scholar
  6. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363-370. Association for Computational Linguistics, 2005. Google Scholar
  7. Mark A. Finlayson. The story workbench: An extensible semi-automatic text annotation tool. In Intelligent Narrative Technologies, 2011. Google Scholar
  8. Edward Morgan Forster. Aspects of the Novel. RosettaBooks, 2010. Google Scholar
  9. Andrew Gordon and Reid Swanson. Identifying personal stories in millions of weblog entries. In Third International Conference on Weblogs and Social Media, Data Challenge Workshop, San Jose, CA, 2009. Google Scholar
  10. S. Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural computation, 15(7):1667-1689, 2003. Google Scholar
  11. Paul Kingsbury and Martha Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003. Google Scholar
  12. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL (System Demonstrations), pages 55-60, 2014. Google Scholar
  13. Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. Feature selection, perceptron learning, and a usability case study for text categorization. In ACM SIGIR Forum, volume 31, pages 67-73. ACM, 1997. Google Scholar
  14. Vasin Punyakanok, Dan Roth, and Wen-tau Yih. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2):257-287, 2008. Google Scholar
  15. Lev Ratinov and Dan Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147-155. Association for Computational Linguistics, 2009. Google Scholar
  16. Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386, 1958. Google Scholar
  17. Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975. Google Scholar
  18. Beatrice Santorini. Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Technical report, University of Pennsylvania, 1990. Google Scholar
  19. Karin Kipper Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, 2005. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail