Linking Discourse Marker Inventories

Authors Christian Chiarcos , Maxim Ionov

Thumbnail PDF


  • Filesize: 1.11 MB
  • 15 pages

Document Identifiers

Author Details

Christian Chiarcos
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany
Maxim Ionov
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Cite AsGet BibTex

Christian Chiarcos and Maxim Ionov. Linking Discourse Marker Inventories. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 40:1-40:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Discourse, dialogue and pragmatics
  • Information systems → Graph-based database models
  • discourse processing
  • discourse markers
  • linked data
  • OntoLex
  • OLiA


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Lasha Abzianidze, Rik van Noord, Hessel Haagsma, and Johan Bos. The first shared task on discourse representation structure parsing. In Proc. of the IWCS Shared Task on Semantic Parsing, 2019. Google Scholar
  2. Laura Alonso. Representing discourse for automatic text summarization via shallow NLP techniques. PhD thesis, Tesis doctoral. Barcelona: Universitat de Barcelona, 2005. Google Scholar
  3. Nicholas Asher, Nicholas Michael Asher, and Alex Lascarides. Logics of conversation. Cambridge University Press, 2003. Google Scholar
  4. Maja Bärenfänger, Mirco Hilbert, Henning Lobin, and Harald Lüngen. Using owl ontologies in discourse parsing. OTT'06, 1:87, 2007. Google Scholar
  5. Andrea Bellandi, Emiliano Giovannetti, Silvia Piccini, and Anja Weingart. Developing lexo: a collaborative editor of multilingual lexica and termino-ontological resources in the humanities. In LOTKS-2017, 2017. Google Scholar
  6. Francis Bond and Kyonghee Paik. A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference (GWC 2012), pages 64-71, Matsue, 2012. Google Scholar
  7. Johan Bos. Open-domain semantic parsing with boxer. In Proceedings of the 20th nordic conference of computational linguistics (NODALIDA 2015), pages 301-304, 2015. Google Scholar
  8. Peter Bourgonje, Jet Hoek, Jacqueline Evers-Vermeul, Gisela Redeker, Ted Sanders, and Manfred Stede. Constructing a lexicon of dutch discourse connectives. Computational Linguistics in the Netherlands Journal, 8:163-175, 2018. Google Scholar
  9. Peter Bourgonje and Manfred Stede. Exploiting a lexical resource for discourse connective disambiguation in german. In Proc. of the 28th International Conference on Computational Linguistics, pages 5737-5748, 2020. Google Scholar
  10. Aljoscha Burchardt, Sebastian Padó, Dennis Spohr, Anette Frank, and Ulrich Heid. Formalising Multi-layer Corpora in OWL/DL - Lexicon Modelling, Querying and Consistency Control. In Proc. of the 3rd International Joint Conf on NLP (IJCNLP), pages 389-396, Hyderabad, India, 2008. Google Scholar
  11. Lynn Carlson, Daniel Marcu, and Mary Ellen Okurowski. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In Jan van Kuppevelt and Ronnie W. Smith, editors, Current and New Directions in Discourse and Dialogue, Text, Speech, and Language Technology; 22, chapter 5. Kluwer, Dordrecht, 2003. Google Scholar
  12. C. Chiarcos and M. Sukhareva. OLiA - Ontologies of Linguistic Annotation. Semantic Web Journal, 518:379-386, 2015. Google Scholar
  13. Christian Chiarcos. Towards interoperable discourse annotation. discourse features in the ontologies of linguistic annotation. In LREC, pages 4569-4577. Citeseer, 2014. Google Scholar
  14. Christian Chiarcos and Tomaz Erjavec. OWL/DL formalization of the multext-east morphosyntactic specifications. In LAW-2011, pages 11-20, Portland, Oregon, USA, June 2011. ACL. Google Scholar
  15. Christian Chiarcos, Christian Fäth, and Frank Abromeit. Annotation interoperability for the Post-ISOCat era. In LREC-2020, pages 5668-5677, 2020. Google Scholar
  16. Christian Chiarcos, Christian Fäth, and Maxim Ionov. The ACoLi dictionary graph. In LREC-2020, pages 3281-3290, 2020. Google Scholar
  17. Christian Chiarcos, Maxim Ionov, Jesse de Does, Katrien Depuydt, Fahad Khan, Sander Stolk, Thierry Declerck, and John Philip McCrae. Modelling frequency and attestations for ontolex-lemon. In Globalex-2020, pages 1-9, 2020. Google Scholar
  18. Christian Chiarcos, Julia Ritz, and Manfred Stede. Querying and visualizing coreference annotation in multi-layer corpora. In DAARC-2011, pages 80-92, 2011. Google Scholar
  19. Philipp Cimiano, Christian Chiarcos, John P McCrae, and Jorge Gracia. Linguistic Linked Data. Springer, 2020. Google Scholar
  20. Philipp Cimiano, John P. McCrae, and Paul Buitelaar. Lexicon Model for Ontologies. Technical report, W3C Community Report, 10 May 2016, 2016. Google Scholar
  21. Debopam Das, Manfred Stede, Soumya Sankar Ghosh, and Lahari Chatterjee. DiMLex-Bangla: A lexicon of Bangla discourse connectives. In LREC, pages 1097-1102, Marseille, France, 2020. ELRA. Google Scholar
  22. Gimena del Rio Riande and Valeria Vitale. Recogito-in-a-box: From annotation to digital edition. Modern Languages Open, 2020. Google Scholar
  23. S. Farrar and D.T. Langendoen. A linguistic ontology for the semantic web. Glot International, 7(3):97-100, 2003. Google Scholar
  24. Anna Feltracco, Elisabetta Jezek, Bernardo Magnini, and Manfred Stede. Lico: A lexicon of italian connectives. CLiC it, page 141, 2016. Google Scholar
  25. Maria Fuentes Fort. A flexible multitask summarizer for documents from different media, domain and language. Universitat Politècnica de Catalunya, 2008. Google Scholar
  26. Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Giovanni Nuzzolese, Francesco Draicchio, and Misael Mongiovì. Semantic web machine reading with fred. Semantic Web, 8(6):873-893, 2017. Google Scholar
  27. D. Goecke, H. Lüngen, F. Sasaki, A. Witt, and S. Farrar. GOLD and discourse: Domain-and community-specific extensions. In E-MELD Workshop, Cambridge, Massachusetts, July 2005. Google Scholar
  28. Jorge Gracia, Besim Kabashi, Ilan Kernerman, Marta Lanau-Coronas, and Dorielle Lonke. Results of the translation inference across dictionaries 2019 shared task. In TIAD, pages 1-12, 2019. Google Scholar
  29. Maxim Ionov, Florian Stein, Sagar Sehgal, and Christian Chiarcos. cqp4rdf: Towards a suite for rdf-based corpus linguistics. In ESWC-2020, pages 115-121. Springer, 2020. Google Scholar
  30. Alistair Knott and Robert Dale. Using linguistic phenomena to motivate a set of coherence relations. Discourse processes, 18(1):35-62, 1994. Google Scholar
  31. Harald Lüngen, Maja Bärenfänger, Mirco Hilbert, Henning Lobin, and Csilla Puskás. Discourse relations and document structure. In Linguistic modeling of information and markup languages, pages 97-123. Springer, 2010. Google Scholar
  32. William C Mann and Sandra A Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243-281, 1988. Google Scholar
  33. Amália Mendes, Iria del Rio, Manfred Stede, and Felix Dombek. A lexicon of discourse markers for portuguese-ldm-pt. In LREC-2018, pages 4379-4384, 2018. Google Scholar
  34. Jiří Mírovský, Pavlína Synková, Magdaléna Rysová, and Lucie Poláková. CzeDLex 0.5, 2017. Google Scholar
  35. Umangi Oza, Rashmi Prasad, Sudheer Kolachina, Dipti Misra Sharma, and Aravind Joshi. The Hindi discourse relation bank. In LAW III, pages 158-161, 2009. Google Scholar
  36. Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. The Penn Discourse TreeBank 2.0. In LREC-2008, pages 2961-2968, Marrakech, Morocco, 2008. Google Scholar
  37. Charlotte Roze, Laurence Danlos, and Philippe Muller. Lexconn: a french lexicon of discourse connectives. Discours, 10, 2012. Google Scholar
  38. Stephen Soderland, Oren Etzioni, Daniel S Weld, Kobi Reiter, Michael Skinner, Marcus Sammer, Jeff Bilmes, et al. Panlingual lexical translation via probabilistic inference. Artificial Intelligence, 174(9-10):619-637, 2010. Google Scholar
  39. Caroline Sporleder and Alex Lascarides. Using automatically labelled examples to classify rhetorical relations: An assessment. Natural Language Engineering, 14(3):369, 2008. Google Scholar
  40. Manfred Stede, Tatjana Scheffler, and Amália Mendes. Connective-lex: A web-based multilingual lexical resource for connectives. Discours, 24, 2019. Google Scholar
  41. Manfred Stede and Carla Umbach. Dimlex: A lexicon of discourse markers for text generation and understanding. In COLING-ACL-1998, pages 1238-1242, 1998. Google Scholar
  42. Florian Wolf and Edward Gibson. Representing Discourse Coherence: A Corpus-Based Study. Computational Linguistics, 31(2):249-287, 2005. Google Scholar
  43. Deniz Zeyrek, Amalia Mendes, Yulia Grishina, Murathan Kurfali, Samuel Gibbon, and Maciej Ogrodniczuk. Ted multilingual discourse bank (ted-mdb): a parallel corpus annotated in the PDTB style. LREC-2019, pages 1-38, 2019. Google Scholar
  44. Yuping Zhou and Nianwen Xue. PDTB-style discourse annotation of chinese text. In ACL-2012, pages 69-77, 2012. Google Scholar
  45. Šárka Zikánová, Jiří Mírovsky, and Pavlína Synková. Explicit and implicit discourse relations in the prague discourse treebank. In TSD-2019, pages 236-248. Springer, 2019. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail