TEI and Microsoft: a marriage made in...

Author Tomaž Erjavec

Thumbnail PDF


  • Filesize: 1.4 MB
  • 19 pages

Document Identifiers

Author Details

Tomaž Erjavec

Cite AsGet BibTex

Tomaž Erjavec. TEI and Microsoft: a marriage made in.... In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


In several on-going projects we were faced with the dilemma of how to reconcile our goal of delivering standardly encoded historical documents, yet have the actual editing and annotation performed by researchers and students who had no knowledge of XML and TEI, and, for the most part, no interest in learning them. The solution we developed consists of allowing the annotators use familiar and flexible editors, such as Microsoft Word (for structural annotation of documents) and Excel (for word-level linguistic annotation) and automatically converting these into TEI. Given the unconstrained nature of such editors this sounds like a recipe for disaster. But the solution crucially depends on a dedicated Web service, to which the annotators can up-load their files; these are then immediately converted to XML/TEI and from it back to a visual format, either HTML or Excel XML, and presented to the annotators. These then get immediate feedback about the quality of their encoding in the source, and can thus correct errors before they accumulate; and the responsibility for the correct encoding rests with the annotators, rather than with the developers of the conversion procedure. The paper describes the web service and details its use in three projects. The main conclusions are that the proposed solution is appropriate for shallow encodings, and nevertheless does require producing detailed annotation guidelines.
  • Text encoding
  • manual annotation
  • open standards
  • XML
  • Microsoft


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads