Searching in text databases with non-standard orthography

Author Thomas Pilz



PDF
Thumbnail PDF

File

DagSemProc.06491.14.pdf
  • Filesize: 19 kB
  • 2 pages

Document Identifiers

Author Details

Thomas Pilz

Cite As Get BibTex

Thomas Pilz. Searching in text databases with non-standard orthography. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007) https://doi.org/10.4230/DagSemProc.06491.14

Abstract

In this paper we present research results of the recent project "Rule based search in text data bases with non-standard orthography". There are numerous steps involved from facsimile to searchable text-document. This paper focuses on techniques to ensure better retrieval results on historical texts with non-standard spellings. Historical documents – especially those in black letter fonts – encourage recognition errors. Adequate preparation of the image sources prior to OCR can successfully reduce the amount of misinterpretation of characters. Furthermore, the application of a search engine with categorized distance measures between user interface and text database can help to enhance retrieval results. Specific metrics cover problems in optical character recognition, transcription and historical spelling variation. With a synoptic view interface the users can be kept completely unaware of the methods applied after their queries.

Subject Classification

Keywords
  • Rule based search
  • Optical character recognition
  • spelling variation
  • edit distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail