License
When quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-10478
URL: http://drops.dagstuhl.de/opus/volltexte/2007/1047/
Go to the corresponding Portal


Erjavec, Toma×

TEI and Microsoft: a marriage made in...

pdf-format:
Document 1.pdf (1,438 KB)


Abstract

In several on-going projects we were faced with the dilemma of how to reconcile our goal of delivering standardly encoded historical documents, yet have the actual editing and annotation performed by researchers and students who had no knowledge of XML and TEI, and, for the most part, no interest in learning them. The solution we developed consists of allowing the annotators use familiar and flexible editors, such as Microsoft Word (for structural annotation of documents) and Excel (for word-level linguistic annotation) and automatically converting these into TEI. Given the unconstrained nature of such editors this sounds like a recipe for disaster. But the solution crucially depends on a dedicated Web service, to which the annotators can up-load their files; these are then immediately converted to XML/TEI and from it back to a visual format, either HTML or Excel XML, and presented to the annotators. These then get immediate feedback about the quality of their encoding in the source, and can thus correct errors before they accumulate; and the responsibility for the correct encoding rests with the annotators, rather than with the developers of the conversion procedure. The paper describes the web service and details its use in three projects. The main conclusions are that the proposed solution is appropriate for shallow encodings, and nevertheless does require producing detailed annotation guidelines.

BibTeX - Entry

@InProceedings{erjavec:DSP:2007:1047,
  author =	{Toma├ů┬ż Erjavec},
  title =	{TEI and Microsoft: a marriage made in...},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  year =	{2007},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L{\"u}deling },
  number =	{06491},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Internationales Begegnungs- und Forschungszentrum f{\"u}r Informatik (IBFI), Schloss Dagstuhl, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2007/1047},
  annote =	{Keywords: Text encoding, manual annotation, open standards, XML, Microsoft}
}

Keywords: Text encoding, manual annotation, open standards, XML, Microsoft
Seminar: 06491 - Digital Historical Corpora- Architecture, Annotation, and Retrieval
Issue Date: 2007
Date of publication: 13.06.2007


DROPS-Home | Fulltext Search | Imprint Published by LZI