A Workbench for Corpus Linguistic Discourse Analysis

Krasselt, Julia; Fluor, Matthias; Rothenhäusler, Klaus; Dreesen, Philipp

doi:10.4230/OASIcs.LDK.2021.26

Abstract

In this paper, we introduce the Swiss-AL workbench, an online tool for corpus linguistic discourse analysis. The workbench enables the analysis of Swiss-AL, a multilingual Swiss web corpus with sources from media, politics, industry, science, and civil society. The workbench differs from other corpus analysis tools in three characteristics: (1) easy access and tidy interface, (2) focus on visualizations, and (3) wide range of analysis options, ranging from classic corpus linguistic analysis (e.g., collocation analysis) to more recent NLP approaches (topic modeling and word embeddings). It is designed for researchers of various disciplines, practitioners, and students.

L. Anthony. Antconc: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In IPCC 2005. Proceedings. International Professional Communication Conference, 2005., pages 729-737, 2005. URL: https://doi.org/10.1109/IPCC.2005.1494244.
Paul Baker. Using Corpora in Discourse Analysis. Continuum, London, New York, 2006.
Berlin-Brandenburgischen Akademie der Wissenschaften. DWDS endash Digitales Wörterbuch der deutschen Sprache. Das Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart.
Andreas Blaette. polmineR: Verbs and Nouns for Corpus Analysis, 2020. R package version 0.8.2. URL: https://doi.org/10.5281/zenodo.4042093.
David M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77, 2012. URL: https://doi.org/10.1145/2133806.2133826.
Vaclav Brezina, P. Weill-Tessier, and A. McEnery. #LancsBox, 2020. v. 5.x.
Noah Bubenhofer. Sprachgebrauchsmuster. Korpuslinguistik als Methode der Diskurs- und Kulturanalyse. Number 4 in Sprache und Wissen. De Gruyter, Berlin, New York, 2009.
Noah Bubenhofer, Selena Calleri, and Philipp Dreesen. Politisierung in rechtspopulistischen Medien: Wortschatzanalyse und Word Embeddings. Osnabrücker Beiträge zur Sprachtheorie (OBST), 95:211-241, 2019.
Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. shiny: Web Application Framework for R, 2021. R package version 1.6.0. URL: https://CRAN.R-project.org/package=shiny.
Nils Diewald, Michael Hanl, Eliza Margaretha, Joachim Bingel, Marc Kupietz, Piotr Bański, and Andreas Witt. KorAP architecture - diving in the Deep Sea of Corpus Data. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3586-3591, Portorož, Slovenia, 2016. European Language Resources Association (ELRA). URL: https://www.aclweb.org/anthology/L16-1569.
Philipp Dreesen and Julia Krasselt. Exploring and analyzing linguistic environments. In François Cooren and Peter Stücheli-Herlach, editors, Handbook of Management Communication, number 16 in Handbooks of Applied Linguistics. De Gruyter, Berlin, Bostom, to appear. URL: https://doi.org/10.1515/9781501508059-021.
Britt Erman and Beatrice Warren. The idiom principle and the open choice principle. Text, 20(1):29-62, 2000.
Stefan Evert. Corpora and collocations. In Anke Lüdeling and Merja Kytö, editors, Corpus Linguistics. An International Handbook, pages 1212-1248. De Gruyter, Berlin, 2008.
Stefan Evert and Andrew Hardie. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, 2011.
Andrew Hardie. CQPweb - Combining Power, Flexibility and Usability in a Corpus Analysis Tool. International Journal of Corpus Linguistics, 17(3):380-409, 2012. URL: https://doi.org/10.1075/ijcl.17.3.04har.
Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. The Sketch Engine: Ten years on. Lexicography, 1(1):7-36, 2014. URL: https://doi.org/10.1007/s40607-014-0009-9.
Julia Krasselt, Philipp Dreesen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, and Maren Runte. Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC), pages 4138-4144, Marseille, France, 2020.
Leibniz-Institut für Deutsche Sprache. COSMAS I/II (Corpus Search, Management and Analysis System). URL: https://cosmas2.ids-mannheim.de/.
Alessandro Lenci. Distributional Models of Word Meaning. Annual Review of Linguistics, 4(1):151-171, 2018. URL: https://doi.org/10.1146/annurev-linguistics-030514-125254.
Andrew Kachites McCallum. MALLET: A Machine Learning for Language Toolkit, 2002. URL: http://mallet.cs.umass.edu.
Tony McEnery and Andrew Hardie. Corpus Linguistics: Method, Theory and Practice. Cambridge Textbooks in Linguistics. Cambridge University Press, Cambridge, New York, 2012.
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746-751, Atlanta, Georgia, 2013. Association for Computational Linguistics.
David Mimno. mallet: A wrapper around the Java machine learning tool MALLET, 2013. R package version 1.0. URL: https://CRAN.R-project.org/package=mallet.
Paul Rayson. From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4):519-549, 2008. URL: https://doi.org/10.1075/ijcl.13.4.06ray.
Jan Oliver Rüdiger. CorpusExplorer, 2018. URL: http://corpusexplorer.de.
Mike Scott. Developing wordsmith. International Journal of English Studies, 8(1):95-106, 2008.
Carson Sievert and Kenneth Shirley. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pages 63-70, Baltimore, Maryland, USA, 2014. Association for Computational Linguistics. URL: https://doi.org/10.3115/v1/W14-3110.
Jürgen Spitzmüller and Ingo H. Warnke. Diskurslinguistik. Eine Einführung in Theorien und Methoden der transtextuellen Sprachanalyse. De Gruyter, Berlin, Boston, 2011.
Sascha Wolfer, Alexander Koplenig, Frank Michaelis, and Carolin Müller-Spitzer. cOWIDplus Viewer, 2020. URL: https://www.owid.de/plus/cowidplusviewer2020/.

A Workbench for Corpus Linguistic Discourse Analysis

Authors Julia Krasselt , Matthias Fluor , Klaus Rothenhäusler , Philipp Dreesen

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

A Workbench for Corpus Linguistic Discourse Analysis

Authors Julia Krasselt , Matthias Fluor , Klaus Rothenhäusler , Philipp Dreesen

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message