A Method for Proper Noun Extraction in Kurdish

Author Hossein Hassani



PDF
Thumbnail PDF

File

OASIcs.SLATE.2017.19.pdf
  • Filesize: 150 kB
  • 13 pages

Document Identifiers

Author Details

Hossein Hassani

Cite AsGet BibTex

Hossein Hassani. A Method for Proper Noun Extraction in Kurdish. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 19:1-19:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/OASIcs.SLATE.2017.19

Abstract

This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application based on an architecture which includes a number of name lists, a set of rules, and a set of processes that recognizes Kurdish person names. This can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation. We conducted several experiments which showed that the precision of the method is more than 95%, the recall is between 40% to 80%, and the F-measure is close to 60% to more than 80%. The reason for the low recall precision was because our name lists were not exhaustive enough to cover the vast majority of the Kurdish names.
Keywords
  • Proper Noun Recognition
  • Named Entity Recognition
  • Information Extraction
  • Natural Language Processing
  • Kurdish

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Lazgin Al-Barany, Asma Albamarni, and Dilggash M. Shareef. Kurdish personal names in Kurdistan of Iraq: A sociolinguistic perspective, 2014. URL: https://www.academia.edu/9662401/Kurdish_Personal_Names_in_Kurdistan_of_Iraq_A_Sociolinguistic_Perspective.
  2. Silviu Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning, volume 7, pages 708-716, 2007. Google Scholar
  3. Geoffrey Haig and Ergin Öpengin. Introduction to special issue-Kurdish: A critical research overview. Kurdish Studies, 2(2):99-122, 2014. Google Scholar
  4. Hossein Hassani. Kurdish interdialect machine translation. In VarDial Workshop, pages 63-72, April 2017. Google Scholar
  5. Hossein Hassani and Rahel Kareem. Kurdish text to speech (KTTS). In Tenth International Workshop on Internationalisation of Products and Systems, pages 79-89, 2011. Google Scholar
  6. Hossein Hassani and Dzejla Medjedovic. Automatic Kurdish dialects identification. Computer Science &Information Technology, 6(2):61-78, 2016. Google Scholar
  7. Amir Hassanpour. Nationalism and language in Kurdistan, 1918-1985. Edwin Mellen Pr, 1992. Google Scholar
  8. Ulf Hermjakob, Kevin Knight, and Hal Daumé III. Name translation in statistical machine translation-learning when to transliterate. In Association for Computational Linguistics, pages 389-397, 2008. Google Scholar
  9. Hesami. Kurdish definition, origin and usage of names, 2016. URL: http://www.hesami.com/names/kurdish/.
  10. Daniel Jurafsky and James H. Martin. Speech and Language Processing. Prentice Hall, 2 edition, 2008. Google Scholar
  11. Kurdish Academy of Languages. The Kurdish Population, 2016. URL: http://www.kurdishacademy.org/?q=node/199.
  12. Kurdish Daily. Kurdish names for your baby, 2016. URL: http://ekurd.net/mismas/kurdishnames.htm.
  13. Kurdish Institute of Paris. Kurdish Names, 2016. URL: http://www.institutkurde.org/en/kurdorama/kurdish_baby_names.php.
  14. Shervin Malmasi. Subdialectal differences in Sorani Kurdish. In Third Workshop on NLP for Similar Languages, Varieties and Dialects, pages 89-96, 2016. Google Scholar
  15. Inderjeet Mani, T Richard MacMillan, Susann Luperfoy, Elaine Lusher, and Sharon Laskowski. Identifying unknown proper names in newswire text. In Workshop on Acquisition of Lexical Knowledge from Text, pages 44-54, 1993. Google Scholar
  16. Gideon S. Mann and David Yarowsky. Unsupervised personal name disambiguation. In Seventh conference on Natural language learning at HLT-NAACL, volume 4, pages 33-40, 2003. Google Scholar
  17. Minstray of Higher Education and Scientific Research. Admitted students in 2010, 2016. URL: http://www.mhe-krg.org/ku/node/698.
  18. Thierry Poibeau and Leila Kosseim. Proper name extraction from non-journalistic texts. Language and Computers, 37(1):144-157, 2001. Google Scholar
  19. Kashif Riaz. Rule-based named entity recognition in Urdu. In Named Entities Workshop, pages 126-135, 2010. Google Scholar
  20. Rudaw, 2015. URL: http://rudaw.net/sorani.
  21. Khaled Shaalan. A survey of arabic named entity recognition and classification. Computational Linguistics, 40(2):469-510, 2014. Google Scholar
  22. Khaled Shaalan and Hafsa Raza. Person name entity recognition for Arabic. In Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pages 17-24, 2007. Google Scholar
  23. Khaled Shaalan and Hafsa Raza. NERA: Named entity recognition for arabic. Journal of the American Society for Information Science and Technology, 60(8):1652-1663, 2009. Google Scholar
  24. Kyumars Sheykh Esmaili. Challenges in Kurdish text processing. arXiv preprint arXiv:1212.0074, 2012. Google Scholar
  25. Kyumars Sheykh Esmaili, Shahin Salavati, and Anwitaman Datta. Towards Kurdish information retrieval. Transactions on Asian Language Information Processing (TALIP), 13(2):7, 2014. Google Scholar
  26. Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou, and Changning Huang. Chinese named entity identification using class-based language model. In 19th International Conference on Computational Linguistics, pages 1-7, 2002. Google Scholar
  27. Tribal Directory. American Indian Names, 2016. URL: http://tribaldirectory.com/information/american-indian-names.html.
  28. Tzong-Han Tsai, Shih-Hung Wu, and Wen-Lian Hsu. Mencius: A Chinese named entity recognizer using hybrid model. In Research on Computational Linguistics Conference XV, pages 193-209, 2003. Google Scholar
  29. Tzong-Han Tsai, Shih-Hung Wu, Cheng-Wei Lee, Cheng-Wei Shih, and Wen-Lian Hsu. Mencius: A chinese named entity recognizer using the maximum entropy-based hybrid model. International Journal of Computational Linguistics and Chinese Language Processing, 9(1), 2004. Google Scholar
  30. Wikipedia. Hûn bi xêr hatin Wîkîpediyaya kurdî, 2016. URL: https://ku.wikipedia.org/wiki/Destp%C3%AAk.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail