Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages

Authors Thomas Eckart, Sonja Bosch, Dirk Goldhahn, Uwe Quasthoff, Bettina Klimek

Thumbnail PDF


  • Filesize: 0.91 MB
  • 11 pages

Document Identifiers

Author Details

Thomas Eckart
  • Natural Language Processing Group, University of Leipzig, Germany
Sonja Bosch
  • Department of African Languages, University of South Africa, Pretoria, South Africa
Dirk Goldhahn
  • Natural Language Processing Group, University of Leipzig, Germany
Uwe Quasthoff
  • Natural Language Processing Group, University of Leipzig, Germany
Bettina Klimek
  • Institute of Computer Science, University of Leipzig, Germany

Cite AsGet BibTex

Thomas Eckart, Sonja Bosch, Dirk Goldhahn, Uwe Quasthoff, and Bettina Klimek. Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 17:1-17:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Despite a large number of active speakers, most Bantu languages can be considered as under- or less-resourced languages. This includes especially the current situation of lexicographical data, which is highly unsatisfactory concerning the size, quality and consistency in format and provided information. Unfortunately, this does not only hold for the amount and quality of data for monolingual dictionaries, but also for their lack of interconnection to form a network of dictionaries. Current endeavours to promote the use of Bantu languages in primary and secondary education in countries like South Africa show the urgent need for high-quality digital dictionaries. This contribution describes a prototypical implementation for aligning Xhosa, Zimbabwean Ndebele and Kalanga language dictionaries based on their English translations using simple string matching techniques and via WordNet URIs. The RDF-based representation of the data using the Bantu Language Model (BLM) and - partial - references to the established WordNet dataset supported this process significantly.

Subject Classification

ACM Subject Classification
  • Information systems → Resource Description Framework (RDF)
  • Computing methodologies → Phonology / morphology
  • Information systems → Dictionaries
  • Cross-language dictionary alignment
  • Bantu languages
  • translation
  • linguistic linked data
  • under-resourced languages


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Sonja Bosch, Thomas Eckart, Bettina Klimek, Dirk Goldhahn, and Uwe Quasthoff. Preparation and Usage of Xhosa Lexicographical Data for a Multilingual, Federated Environment. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki (Japan), 2018. Google Scholar
  2. CBOLD. Bantuists' Manifesto. Website, 1996. Available:; Accessed on 8 January 2019.
  3. G.-M. De Schryver. Oxford School Dictionary: Xhosa-English. Oxford University Press Southern Africa, Cape Town, 2014. Google Scholar
  4. Ethnologue. Ndebele. Website, 2019. Available:; Accessed on 8th January 2019.
  5. Christiane Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998. Google Scholar
  6. Jorge Gracia, Marta Villegas, Asuncion Gomez-Perez, and Nuria Bel. The apertium bilingual dictionaries on the web of data. Semantic Web, pages 231-240, 2018. URL:
  7. J. T. Mathangwane. Kalanga. Comparative Bantu OnLine Dictionary CBOLD, 1994. URL:
  8. J. T. Mathangwane. Ikalanga 50 Years On: A Cross Border Language Against Tremendous Odds. Botswana Notes and Records, 48, 2016. Google Scholar
  9. Derek Nurse and Gérard Philippson. The Bantu Languages. Routledge, London, 2003. Google Scholar
  10. J.N. Pelling. A Practical Ndebele Dictionary. Comparative Bantu OnLine Dictionary CBOLD, 1971. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail