A Workbench for Corpus Linguistic Discourse Analysis

Authors Julia Krasselt , Matthias Fluor , Klaus Rothenhäusler , Philipp Dreesen

Thumbnail PDF


  • Filesize: 0.84 MB
  • 9 pages

Document Identifiers

Author Details

Julia Krasselt
  • Zurich University of Applied Sciences, Switzerland
Matthias Fluor
  • Zurich University of Applied Sciences, Switzerland
Klaus Rothenhäusler
  • Zurich University of Applied Sciences, Switzerland
Philipp Dreesen
  • Zurich University of Applied Sciences, Switzerland

Cite AsGet BibTex

Julia Krasselt, Matthias Fluor, Klaus Rothenhäusler, and Philipp Dreesen. A Workbench for Corpus Linguistic Discourse Analysis. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 26:1-26:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


In this paper, we introduce the Swiss-AL workbench, an online tool for corpus linguistic discourse analysis. The workbench enables the analysis of Swiss-AL, a multilingual Swiss web corpus with sources from media, politics, industry, science, and civil society. The workbench differs from other corpus analysis tools in three characteristics: (1) easy access and tidy interface, (2) focus on visualizations, and (3) wide range of analysis options, ranging from classic corpus linguistic analysis (e.g., collocation analysis) to more recent NLP approaches (topic modeling and word embeddings). It is designed for researchers of various disciplines, practitioners, and students.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
  • Computing methodologies → Discourse, dialogue and pragmatics
  • corpus analysis software
  • discourse analysis
  • data visualization


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. L. Anthony. Antconc: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In IPCC 2005. Proceedings. International Professional Communication Conference, 2005., pages 729-737, 2005. URL: https://doi.org/10.1109/IPCC.2005.1494244.
  2. Paul Baker. Using Corpora in Discourse Analysis. Continuum, London, New York, 2006. Google Scholar
  3. Berlin-Brandenburgischen Akademie der Wissenschaften. DWDS endash Digitales Wörterbuch der deutschen Sprache. Das Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart. Google Scholar
  4. Andreas Blaette. polmineR: Verbs and Nouns for Corpus Analysis, 2020. R package version 0.8.2. URL: https://doi.org/10.5281/zenodo.4042093.
  5. David M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77, 2012. URL: https://doi.org/10.1145/2133806.2133826.
  6. Vaclav Brezina, P. Weill-Tessier, and A. McEnery. #LancsBox, 2020. v. 5.x. Google Scholar
  7. Noah Bubenhofer. Sprachgebrauchsmuster. Korpuslinguistik als Methode der Diskurs- und Kulturanalyse. Number 4 in Sprache und Wissen. De Gruyter, Berlin, New York, 2009. Google Scholar
  8. Noah Bubenhofer, Selena Calleri, and Philipp Dreesen. Politisierung in rechtspopulistischen Medien: Wortschatzanalyse und Word Embeddings. Osnabrücker Beiträge zur Sprachtheorie (OBST), 95:211-241, 2019. Google Scholar
  9. Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. shiny: Web Application Framework for R, 2021. R package version 1.6.0. URL: https://CRAN.R-project.org/package=shiny.
  10. Nils Diewald, Michael Hanl, Eliza Margaretha, Joachim Bingel, Marc Kupietz, Piotr Bański, and Andreas Witt. KorAP architecture - diving in the Deep Sea of Corpus Data. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3586-3591, Portorož, Slovenia, 2016. European Language Resources Association (ELRA). URL: https://www.aclweb.org/anthology/L16-1569.
  11. Philipp Dreesen and Julia Krasselt. Exploring and analyzing linguistic environments. In François Cooren and Peter Stücheli-Herlach, editors, Handbook of Management Communication, number 16 in Handbooks of Applied Linguistics. De Gruyter, Berlin, Bostom, to appear. URL: https://doi.org/10.1515/9781501508059-021.
  12. Britt Erman and Beatrice Warren. The idiom principle and the open choice principle. Text, 20(1):29-62, 2000. Google Scholar
  13. Stefan Evert. Corpora and collocations. In Anke Lüdeling and Merja Kytö, editors, Corpus Linguistics. An International Handbook, pages 1212-1248. De Gruyter, Berlin, 2008. Google Scholar
  14. Stefan Evert and Andrew Hardie. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, 2011. Google Scholar
  15. Andrew Hardie. CQPweb - Combining Power, Flexibility and Usability in a Corpus Analysis Tool. International Journal of Corpus Linguistics, 17(3):380-409, 2012. URL: https://doi.org/10.1075/ijcl.17.3.04har.
  16. Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. The Sketch Engine: Ten years on. Lexicography, 1(1):7-36, 2014. URL: https://doi.org/10.1007/s40607-014-0009-9.
  17. Julia Krasselt, Philipp Dreesen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, and Maren Runte. Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC), pages 4138-4144, Marseille, France, 2020. Google Scholar
  18. Leibniz-Institut für Deutsche Sprache. COSMAS I/II (Corpus Search, Management and Analysis System). URL: https://cosmas2.ids-mannheim.de/.
  19. Alessandro Lenci. Distributional Models of Word Meaning. Annual Review of Linguistics, 4(1):151-171, 2018. URL: https://doi.org/10.1146/annurev-linguistics-030514-125254.
  20. Andrew Kachites McCallum. MALLET: A Machine Learning for Language Toolkit, 2002. URL: http://mallet.cs.umass.edu.
  21. Tony McEnery and Andrew Hardie. Corpus Linguistics: Method, Theory and Practice. Cambridge Textbooks in Linguistics. Cambridge University Press, Cambridge, New York, 2012. Google Scholar
  22. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746-751, Atlanta, Georgia, 2013. Association for Computational Linguistics. Google Scholar
  23. David Mimno. mallet: A wrapper around the Java machine learning tool MALLET, 2013. R package version 1.0. URL: https://CRAN.R-project.org/package=mallet.
  24. Paul Rayson. From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4):519-549, 2008. URL: https://doi.org/10.1075/ijcl.13.4.06ray.
  25. Jan Oliver Rüdiger. CorpusExplorer, 2018. URL: http://corpusexplorer.de.
  26. Mike Scott. Developing wordsmith. International Journal of English Studies, 8(1):95-106, 2008. Google Scholar
  27. Carson Sievert and Kenneth Shirley. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pages 63-70, Baltimore, Maryland, USA, 2014. Association for Computational Linguistics. URL: https://doi.org/10.3115/v1/W14-3110.
  28. Jürgen Spitzmüller and Ingo H. Warnke. Diskurslinguistik. Eine Einführung in Theorien und Methoden der transtextuellen Sprachanalyse. De Gruyter, Berlin, Boston, 2011. Google Scholar
  29. Sascha Wolfer, Alexander Koplenig, Frank Michaelis, and Carolin Müller-Spitzer. cOWIDplus Viewer, 2020. URL: https://www.owid.de/plus/cowidplusviewer2020/.