Search Results

Documents authored by Erjavec, Tomaž


Document
TEI and Microsoft: a marriage made in...

Authors: Tomaž Erjavec

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
In several on-going projects we were faced with the dilemma of how to reconcile our goal of delivering standardly encoded historical documents, yet have the actual editing and annotation performed by researchers and students who had no knowledge of XML and TEI, and, for the most part, no interest in learning them. The solution we developed consists of allowing the annotators use familiar and flexible editors, such as Microsoft Word (for structural annotation of documents) and Excel (for word-level linguistic annotation) and automatically converting these into TEI. Given the unconstrained nature of such editors this sounds like a recipe for disaster. But the solution crucially depends on a dedicated Web service, to which the annotators can up-load their files; these are then immediately converted to XML/TEI and from it back to a visual format, either HTML or Excel XML, and presented to the annotators. These then get immediate feedback about the quality of their encoding in the source, and can thus correct errors before they accumulate; and the responsibility for the correct encoding rests with the annotators, rather than with the developers of the conversion procedure. The paper describes the web service and details its use in three projects. The main conclusions are that the proposed solution is appropriate for shallow encodings, and nevertheless does require producing detailed annotation guidelines.

Cite as

Tomaž Erjavec. TEI and Microsoft: a marriage made in.... In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{erjavec:DagSemProc.06491.16,
  author =	{Erjavec, Toma\v{z}},
  title =	{{TEI and Microsoft: a marriage made in...}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--19},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.16},
  URN =		{urn:nbn:de:0030-drops-10478},
  doi =		{10.4230/DagSemProc.06491.16},
  annote =	{Keywords: Text encoding, manual annotation, open standards, XML, Microsoft}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail