Search Results

Documents authored by Durrell, Martin


Document
GerManC - Towards a Methodology for Constructing and Annotating Historical Corpora

Authors: Astrid Ensslin, Martin Durrell, and Paul Bennett

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
Our paper focuses on the one hand on the challenges posed by the structural variability, flexibility and ambiguity found in historical corpora and evaluates methods of dealing with them on the other. We are currently engaged in a project which aims to compile a representative corpus of German for the period 1650-1800. Looking at exemplary data from the first stage of this project (1650-1700), which consists of newspaper texts from this period, we first aim from the perspective of corpus linguistics to identify the problems associated with the morphological, syntactical and graphemic peculiarities that are characteristic of that particular stage. Specific phenomena which significantly complicate automatic tagging, lemmatisation and parsing include, for instance, "abperlende" (Admoni 1980; Demske-Neumann 1990), i.e. complex and often asyndetic syntax; non-syntactic, prosodic, virgulated punctuation (Demske et al. 2004; cf. Stolt 1990), inflectional variability (e.g. Admoni 1990; Besch & Wegera 1987), as well as partly unsystematic and almost experimental allomorphic and allographic (Kettmann, 1992) diversity. Secondly, we outline a methodology which is intended to facilitate the construction and annotation of such corpora which antedate linguistic standardisation. This is informed by "conventional" and innovative tagging techniques and tools, which are evaluated in terms of utility and accuracy. Finally, we attempt to evaluate the degree to which annotation tools for specialist corpora of this kind can be developed which will substitute for manual or semi-automated annotation.

Cite as

Astrid Ensslin, Martin Durrell, and Paul Bennett. GerManC - Towards a Methodology for Constructing and Annotating Historical Corpora. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{ensslin_et_al:DagSemProc.06491.7,
  author =	{Ensslin, Astrid and Durrell, Martin and Bennett, Paul},
  title =	{{GerManC - Towards a Methodology for Constructing and Annotating Historical Corpora}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.7},
  URN =		{urn:nbn:de:0030-drops-10467},
  doi =		{10.4230/DagSemProc.06491.7},
  annote =	{Keywords: Early Modern German; newspaper corpus; GerManC; variation; annotation; tagging}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail