LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese

Authors Ricardo Rodrigues, Hugo Gonçalo Oliveira, Paulo Gomes



PDF
Thumbnail PDF

File

OASIcs.SLATE.2014.267.pdf
  • Filesize: 446 kB
  • 8 pages

Document Identifiers

Author Details

Ricardo Rodrigues
Hugo Gonçalo Oliveira
Paulo Gomes

Cite AsGet BibTex

Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 267-274, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)
https://doi.org/10.4230/OASIcs.SLATE.2014.267

Abstract

Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.
Keywords
  • lemmatization
  • normalization
  • rules
  • lexicon

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail