TatWordNet: A Linguistic Linked Open Data-Integrated WordNet Resource for Tatar

Authors Alexander Kirillovich , Marat Shaekhov , Alfiya Galieva, Olga Nevzorova , Dmitry Ilvovsky , Natalia Loukachevitch



PDF
Thumbnail PDF

File

OASIcs.LDK.2021.16.pdf
  • Filesize: 0.65 MB
  • 12 pages

Document Identifiers

Author Details

Alexander Kirillovich
  • Kazan Federal University, Kazan, Russia
  • Higher School of Economics, Moscow, Russia
Marat Shaekhov
  • Kazan Federal University, Kazan, Russia
Alfiya Galieva
  • Kazan Federal University, Kazan, Russia
Olga Nevzorova
  • Kazan Federal University, Kazan, Russia
  • Higher School of Economics, Moscow, Russia
Dmitry Ilvovsky
  • Kazan Federal University, Kazan, Russia
  • Higher School of Economics, Moscow, Russia
Natalia Loukachevitch
  • Moscow State University, Moscow, Russia
  • Kazan Federal University, Kazan, Russia
  • Higher School of Economics, Moscow, Russia

Cite AsGet BibTex

Alexander Kirillovich, Marat Shaekhov, Alfiya Galieva, Olga Nevzorova, Dmitry Ilvovsky, and Natalia Loukachevitch. TatWordNet: A Linguistic Linked Open Data-Integrated WordNet Resource for Tatar. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 16:1-16:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/OASIcs.LDK.2021.16

Abstract

We present the first release of TatWordNet (http://wordnet.tatar), a wordnet resource for Tatar. TatWordNet has been constructed by the combination of the expand and the merge approaches. The synsets of TatWordNet have been compiled by: (i) the automatic conversion of concepts of TatThes, a socio-political Tatar; (ii) semi-automatic translation of synsets of RuWordNet, a wordnet resource for Russian with the followed manual verification and correction; (iii) manual translation of base RuWordNet synsets; (iv) and manual translation of the all hypernyms of the previously translated RuWordNet synsets. The currents version of TatWordNet contains 18,583 synsets, 36,540 lexical entries and 49,525 senses. The resource has been published to the Linguistic Linked Open Data cloud and interlinked with the Global WordNet Grid.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
Keywords
  • Linguistic Linked Open Data
  • WordNet
  • Thesaurus
  • Tatar language

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Özge Bakay, Özlem Ergelen, Elif Sarmış, Selin Yıldırım, Atilla Kocabalcıoglu, Bilge Nas Arıcan, Merve Özçelik, Ezgi Sanıyar, Oguzhan Kuyrukcu, Begüm Avar, and Olcay Yildiz. Turkish WordNet KeNet. In Piek Vossen and Christiane Fellbaum, editors, Proceedings of the 11th Global Wordnet Conference (GWC 2021), Potchefstroom, South Africa, 18-21 Jan, 2021, pages 166-174. GWA, 2021. URL: https://www.aclweb.org/anthology/2021.gwc-1.19/.
  2. Orhan Bilgin, Özlem Çetinoğlu, and Kemal Oflazer. Building a Wordnet for Turkish. Romanian Journal of Information Science and Technology, 7(1-2):163-172, 2004. URL: http://research.sabanciuniv.edu/379/.
  3. Francis Bond and Ryan Foster. Linking and Extending an Open Multilingual Wordnet. In Hinrich Schuetze et al., editors, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 4-9 August, 2013. Volume 1: Long Papers, pages 1352-1362. ACL, 2013. URL: https://www.aclweb.org/anthology/P13-1133.
  4. Francis Bond, Piek Vossen, John P. McCrae, and Christiane Fellbaum. CILI: The collaborative interlingual index. In Christiane Fellbaum, Piek Vossen, Verginica Barbu Mititelu, and Corina Forascu, editors, Proceedings of the 8th Global WordNet Conference (GWC 2016), Bucharest, Romania, 27-30 January, 2016, pages 50-57. GWA, 2016. URL: https://www.aclweb.org/anthology/2016.gwc-1.9/.
  5. Özlem Çetinoğlu, Orhan Bilgin, and Kemal Oflazer. Turkish Wordnet. In Kemal Oflazer and Murat Saraçlar, editors, Turkish Natural Language Processing, pages 317-336. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-90165-7_15.
  6. Philipp Cimiano, Christian Chiarcos, John P. McCrae, and Jorge Gracia. Linguistic Linked Open Data Cloud. In Philipp Cimiano et al., editors, Linguistic Linked Data: Representation, Generation and Applications, pages 29-41. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-30225-2_3.
  7. Elin Ehsani. KeNet: A Comprehensive Turkish Wordnet and Using it in Text Clustering. PhD thesis, Işık University, 2018. URL: https://doi.org/10.13140/RG.2.2.20932.27524.
  8. Razieh Ehsani, Ercan Solak, and Olcay Taner Yildiz. Constructing a WordNet for Turkish Using Manual and Automatic Annotation. ACM Transactions on Asian and Low-Resource Language Information Processing, 17(3), April 2018. URL: https://doi.org/10.1145/3185664.
  9. Christiane Fellbaum. Wordnet. In Roberto Poli et al., editors, Theory and Applications of Ontology: Computer Applications, pages 231-243. Springer, 2010. URL: https://doi.org/10.1007/978-90-481-8847-5_10.
  10. Alfiya Galieva, Alexander Kirillovich, Bulat Khakimov, Natalia Loukachevitch, Olga Nevzorova, and Dzhavdet Suleymanov. Toward Domain-Specific Russian-Tatar Thesaurus Construction. In Radomir Bolgov et al., editors, Proceedings of the International Conference IMS-2017, St. Petersburg, Russia, 21-24 June 2017, ACM International Conference Proceeding Series, pages 120-124. ACM Press, New York, 2017. URL: https://doi.org/10.1145/3143699.3143716.
  11. Adam Kilgarriff and Christiane Fellbaum. WordNet: an Electronic Lexical Database. MIT Press, 2000. Google Scholar
  12. Alexander Kirillovich, Olga Nevzorova, Emil Gimadiev, and Natalia Loukachevitch. RuThes Cloud: Towards a Multilevel Linguistic Linked Open Data Resource for Russian. In Przemysław Różewski and Christoph Lange, editors, Proceedings of the 8th International Conference on Knowledge Engineering and Semantic Web (KESW 2017), Szczecin, Poland, November 8-10, 2017, Communications in Computer and Information Science, vol. 786, pages 38-52. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-69548-8_4.
  13. N. Loukachevitch, B. Dobrov, and I. Chetviorkin. RuThes-lite, a Publicly Available Version of Thesaurus of Russian Language RuThes. In Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference «Dialogue», pages 340-349. RGGU, 2014. URL: http://www.dialog-21.ru/digests/dialog2014/materials/pdf/LoukachevitchNV.pdf.
  14. N. V. Loukachevitch, G. Lashevich, A. A. Gerasimova, V. V. Ivanov, and B. V. Dobrov. Creating Russian WordNet by Conversion. In Computational Linguistics and Intellectual Technologies: papers from the Annual Conference “Dialogue”, pages 405-415. RGGU, 2016. URL: http://www.dialog-21.ru/media/3409/loukachevitchnvetal.pdf.
  15. Natalia Loukachevitch and Boris Dobrov. RuThes Linguistic Ontology vs. Russian Wordnets. In Heili Orav, Christiane Fellbaum, and Piek Vossen, editors, Proceedings of the 7th Global Wordnet Conference (GWC 2014), Tartu, Estonia, 25-29 January, 2014, pages 154-162. University of Tartu Press, 2014. URL: https://www.aclweb.org/anthology/W14-0121/.
  16. Natalia Loukachevitch and Boris Dobrov. RuThes Thesaurus for Natural Language Processing. In Daria Gritsenko, Mariȩlle Wijermars, and Mikhail Kopotev, editors, The Palgrave Handbook of Digital Russia Studies, pages 319-334. Palgrave Macmillan, 2021. URL: https://doi.org/10.1007/978-3-030-42855-6_18.
  17. Natalia Loukachevitch, German Lashevich, and Boris Dobrov. Comparing two thesaurus representations for Russian. In Francis Bond, Takayuki Kuribayashi, Christiane Fellbaum, and Piek Vossen, editors, Proceedings of the 9th Global Wordnet Conference (GWC 2018), Singapore, 8-12 January, 2018, pages 34-43. GWA, 2018. URL: https://www.aclweb.org/anthology/2018.gwc-1.5/.
  18. Roberto Navigli and Simone Paolo Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217-250, December 2012. URL: https://doi.org/10.1016/j.artint.2012.07.001.
  19. D. Tufis, D. Cristeau, and S. Stamou. BalkaNet: Aims, Methods, Results and Perspectives. A General Overview. Romanian Journal of Information Science and Technology, 7(1-2):9-43, 2004. URL: http://www.dblab.upatras.gr/balkanet/journal/7_Overview.pdf.
  20. Piek Vossen. EuroWordNet General Document. Technical Report LE2-4003, LE4-8328, University of Amsterdam, July 1999. URL: http://www.illc.uva.nl/EuroWordNet/docs.html.
  21. Piek Vossen, Francis Bond, and John P. McCrae. Toward a truly multilingual global wordnet grid. In Christiane Fellbaum, Piek Vossen, Verginica Barbu Mititelu, and Corina Forascu, editors, Proceedings of the 8th Global WordNet Conference (GWC 2016), Bucharest, Romania, 27-30 January, 2016, pages 419-426. GWA, 2016. URL: https://www.aclweb.org/anthology/2016.gwc-1.59/.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail