Search Results

Documents authored by de Rijke, Maarten


Document
A Cross-Language Approach to Historic Document Retrieval

Authors: Jaap Kamps, Marijn Koolen, Frans Adriaans, and Maarten de Rijke

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
Our cultural heritage, as preserved in libraries, archives and museums, is made up of documents written many centuries ago. Large-scale digitization initiatives, like DigiCULT, make these documents available to non-expert users through digital libraries and vertical search engines. For a user, querying a historic document collection may be a disappointing experience. Natural languages evolve over time, changing in pronunciation and spelling, and new words are introduced continuously, while older words may disappear out of everyday use. For these reasons, queries involving modern words may not be very effective for retrieving documents that contain many historic terms. Although reading a 300-year-old document might not be problematic because the words are still recognizable, the changes in vocabulary and spelling can make it difficult to use a search engine to find relevant documents. To illustrate this, consider the following example from our collection of 17th century Dutch law texts. Looking for information on the tasks of a lawyer (modern Dutch: {it advocaat}) in these texts, the modern spelling will not lead you to documents containing the 17th century Dutch spelling variant {it advocaet}. Since spelling rules were not introduced until the 19th century, 17th century Dutch spelling is inconsistent. Being based mainly on pronunciation, words were often spelled in several different variants, which poses a problem for standard retrieval engines. We therefore define Historic Document Retrieval (HDR) as the retrieval of relevant historic documents for a modern query. Our approach to this problem is to treat the historic and modern languages as different languages, and use cross-language information retrieval (CLIR) techniques to translate one language into the other.

Cite as

Jaap Kamps, Marijn Koolen, Frans Adriaans, and Maarten de Rijke. A Cross-Language Approach to Historic Document Retrieval. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{kamps_et_al:DagSemProc.06491.3,
  author =	{Kamps, Jaap and Koolen, Marijn and Adriaans, Frans and de Rijke, Maarten},
  title =	{{A Cross-Language Approach to Historic Document Retrieval}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.3},
  URN =		{urn:nbn:de:0030-drops-10497},
  doi =		{10.4230/DagSemProc.06491.3},
  annote =	{Keywords: Historic Documents, Information Retrieval, Spelling variation, Modernizing Spelling, 17th Century Dutch}
}
Document
Towards Task-Based Temporal Extraction and Recognition

Authors: David Ahn, Sisay Fissaha Adafre, and Maarten de Rijke

Published in: Dagstuhl Seminar Proceedings, Volume 5151, Annotating, Extracting and Reasoning about Time and Events (2005)


Abstract
We seek to improve the robustness and portability of temporal information extraction systems by incorporating data-driven techniques. We present two sets of experiments pointing us in this direction. The first shows that machine-learning-based recognition of temporal expressions not only achieves high accuracy on its own but can also improve rule-based normalization. The second makes use of a staged normalization architecture to experiment with machine learned classifiers for certain disambiguation sub-tasks within the normalization task.

Cite as

David Ahn, Sisay Fissaha Adafre, and Maarten de Rijke. Towards Task-Based Temporal Extraction and Recognition. In Annotating, Extracting and Reasoning about Time and Events. Dagstuhl Seminar Proceedings, Volume 5151, pp. 1-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{ahn_et_al:DagSemProc.05151.12,
  author =	{Ahn, David and Fissaha Adafre, Sisay and de Rijke, Maarten},
  title =	{{Towards Task-Based Temporal Extraction and Recognition}},
  booktitle =	{Annotating, Extracting and Reasoning about Time and Events},
  pages =	{1--16},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5151},
  editor =	{Graham Katz and James Pustejovsky and Frank Schilder},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.05151.12},
  URN =		{urn:nbn:de:0030-drops-3150},
  doi =		{10.4230/DagSemProc.05151.12},
  annote =	{Keywords: Information extraction, natural language, temporal reasoning, text mining}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail