License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.06491.16
URN: urn:nbn:de:0030-drops-10478
Go to the corresponding Portal

Erjavec, Toma┼ż

TEI and Microsoft: a marriage made in...

06491.ErjavecTomaz.Paper.1047.pdf (1 MB)


In several on-going projects we were faced with the dilemma of how to reconcile our goal of delivering standardly encoded historical documents, yet have the actual editing and annotation performed by researchers and students who had no knowledge of XML and TEI, and, for the most part, no interest in learning them. The solution we developed consists of allowing the annotators use familiar and flexible editors, such as Microsoft Word (for structural annotation of documents) and Excel (for word-level linguistic annotation) and automatically converting these into TEI. Given the unconstrained nature of such editors this sounds like a recipe for disaster. But the solution crucially depends on a dedicated Web service, to which the annotators can up-load their files; these are then immediately converted to XML/TEI and from it back to a visual format, either HTML or Excel XML, and presented to the annotators. These then get immediate feedback about the quality of their encoding in the source, and can thus correct errors before they accumulate; and the responsibility for the correct encoding rests with the annotators, rather than with the developers of the conversion procedure. The paper describes the web service and details its use in three projects. The main conclusions are that the proposed solution is appropriate for shallow encodings, and nevertheless does require producing detailed annotation guidelines.

BibTeX - Entry

  author =	{Erjavec, Toma\v{z}},
  title =	{{TEI and Microsoft: a marriage made in...}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--19},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-10478},
  doi =		{10.4230/DagSemProc.06491.16},
  annote =	{Keywords: Text encoding, manual annotation, open standards, XML, Microsoft}

Keywords: Text encoding, manual annotation, open standards, XML, Microsoft
Collection: 06491 - Digital Historical Corpora- Architecture, Annotation, and Retrieval
Issue Date: 2007
Date of publication: 13.06.2007

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI