The Secret to Popular Chinese Web Novels: A Corpus-Driven Study

Authors Yi-Ju Lin, Shu-Kai Hsieh

Thumbnail PDF


  • Filesize: 354 kB
  • 8 pages

Document Identifiers

Author Details

Yi-Ju Lin
  • Graduate Institute of Linguistics, National Taiwan University, Taiwan
Shu-Kai Hsieh
  • Graduate Institute of Linguistics, National Taiwan University, Taiwan

Cite AsGet BibTex

Yi-Ju Lin and Shu-Kai Hsieh. The Secret to Popular Chinese Web Novels: A Corpus-Driven Study. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 24:1-24:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


What is the secret to writing popular novels? The issue is an intriguing one among researchers from various fields. The goal of this study is to identify the linguistic features of several popular web novels as well as how the textual features found within and the overall tone interact with the genre and themes of each novel. Apart from writing style, non-textual information may also reveal details behind the success of web novels. Since web fiction has become a major industry with top writers making millions of dollars and their stories adapted into published books, determining essential elements of "publishable" novels is of importance. The present study further examines how non-textual information, namely, the number of hits, shares, favorites, and comments, may contribute to several features of the most popular published and unpublished web novels. Findings reveal that keywords, function words, and lexical diversity of a novel are highly related to its genres and writing style while dialogue proportion shows the narration voice of the story. In addition, relatively shorter sentences are found in these novels. The data also reveal that the number of favorites and comments serve as significant predictors for the number of shares and hits of unpublished web novels, respectively; however, the number of hits and shares of published web novels is more unpredictable.

Subject Classification

ACM Subject Classification
  • General and reference → Empirical studies
  • General and reference
  • Popular Chinese Web Novels
  • NLP techniques
  • Sentiment Analysis
  • Publication of Web novels


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Jodie Archer and Matthew L. Jockers. The Bestseller Code: Anatomy of the Blockbuster Novel. St. Martin’s Press, Inc., New York, NY, USA, 2016. Google Scholar
  2. Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. Gender, genre, and writing style in formal written texts. Text - Interdisciplinary Journal for the Study of Discourse, 23:321-346, 2006. Google Scholar
  3. V.G. Ashok, S Feng, and Y Choi. Success with style: Using writing style to predict the success of novels. EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pages 1753-1764, 2013. Google Scholar
  4. Pieter de Haan. More on the language of dialogue in fiction. ICAME Journal, 20, 1997. Google Scholar
  5. Helena Montserrat Gomez Adorno, Germán Rios, Juan Pablo Posadas Durán, Grigori Sidorov, and Gerardo Sierra. Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts. Computación y Sistemas, 22(1), 2018. Google Scholar
  6. David L. Hoover. Frequent Collocations and Authorial Style. Literary and Linguistic Computing, 18(3):261-286, 2003. URL:
  7. Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 168-177, New York, NY, USA, 2004. ACM. URL:
  8. Kim Jautze, Corina Koolen, Andreas van Cranenburgh, and Hayco de Jong. From high heels to weed attics: a syntactic investigation of chick lit and literature. In Proceedings of the Workshop on Computational Linguistics for Literature, pages 72-81. Association for Computational Linguistics, 2013. URL:
  9. Brett Kessler, Geoffrey Nunberg, and Hinrich Schütze. Automatic Detection of Text Genre. In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics, pages 32-38, 1997. Google Scholar
  10. Matthias Landt. Sentiment Analysis as a Tool for Understanding Fiction. In ACM 2010 Annual Meeting, 2010. Google Scholar
  11. Ying Liu and TianJiu Xiao. A Stylistic Analysis for Gu Long’s Kung Fu Novels. Journal of Quantitative Linguistics, pages 1-30, 2018. URL:
  12. Jeanice A. Radway. Reading the Romance: Women, Patriarchy, and Popular Literature. University of North Carolina Press, 1991. URL:
  13. Biwu Shang. Unnatural narratives in contemporary Chinese time travel fiction: patterns, values, and interpretive options. Neohelicon, 43:7-25, July 2016. URL:
  14. D. Sreejith, M. P. Devika, Naga Santosh Tadikamalla, and Sanju Varghese Mathew. Sentiment Analysis of English Literature using Rasa-Oriented Semantic Ontology. Indian Journal of Science and Technology, 10(24), 2017. URL:
  15. Fiona J. Tweedie and R. Harald Baayen. How Variable May a Constant be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5):323-352, September 1998. URL:
  16. Marc Verboord. Cultural products go online: Comparing the internet and print media on distributions of gender, genre and commercial success. Communications, 36(4):441-462, 2011. Google Scholar
  17. Chin-Wei Wu. A Linguistic Stylistic Analysis of the Sentences in Wang Wen-hsing’s Novel-Backed Against the Sea. Journal of Chinese Literature of National Cheng Kung University, 59:181-215, 2017. URL:
  18. Bei Yu. Function Words for Chinese Authorship Attribution. In Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, pages 45-53, Montréal, Canada, June 2012. Association for Computational Linguistics. URL:
  19. Burcu Yucesoy, Xindi Wang, Junming Huang, and Albert-László Barabási. Success in books: a big data approach to bestsellers. EPJ Data Science, 7(1), April 2018. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail