2 Search Results for "Nov�k, Attila"


Document
A Pseudonymization Prototype for Hungarian

Authors: Attila Novák and Borbála Novák

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)


Abstract
In this paper, we present a pseudonymization prototype for Hungarian, an agglutinating language with complex morphology, implemented as a web service. The service provides the following functions: entity identification and extraction; automatic generation and selection of replacement candidates; automatic and consistent replacement and reinflection of entities in the final pseudonymized document. The named entity recognition model applied handles names of persons well, and it has decent performance on other entity types as well. However ID-like entities need to be handled separately to achieve proper performance (not handled in the current prototype version). For automatic replacement candidate generation, a simple entity embedding model is used. We discuss the performance and limitations of the prototype in detail.

Cite as

Attila Novák and Borbála Novák. A Pseudonymization Prototype for Hungarian. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{novak_et_al:OASIcs.SLATE.2023.3,
  author =	{Nov\'{a}k, Attila and Nov\'{a}k, Borb\'{a}la},
  title =	{{A Pseudonymization Prototype for Hungarian}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.3},
  URN =		{urn:nbn:de:0030-drops-185177},
  doi =		{10.4230/OASIcs.SLATE.2023.3},
  annote =	{Keywords: named entity recognition, morphological reinflection, pseudonymization, entity embedding model}
}
Document
Combining Language Independent Part-of-Speech Tagging Tools

Authors: György Orosz, László János Laki, Attila Novák, and Borbála Siklósi

Published in: OASIcs, Volume 29, 2nd Symposium on Languages, Applications and Technologies (2013)


Abstract
Part-of-speech tagging is a fundamental task of natural language processing. For languages with a very rich agglutinating morphology, generic PoS tagging algorithms do not yield very high accuracy due to data sparseness issues. Though integrating a morphological analyzer can efficiently solve this problem, this is a resource-intensive solution. In this paper we show a method of combining language independent statistical solutions -- including a statistical machine translation tool -- of PoS-tagging to effectively boost tagging accuracy. Our experiments show that, using the same training set, our combination of language independent tools yield an accuracy that approaches that of a language dependent system with an integrated morphological analyzer.

Cite as

György Orosz, László János Laki, Attila Novák, and Borbála Siklósi. Combining Language Independent Part-of-Speech Tagging Tools. In 2nd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 29, pp. 249-257, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{orosz_et_al:OASIcs.SLATE.2013.249,
  author =	{Orosz, Gy\"{o}rgy and Laki, L\'{a}szl\'{o} J\'{a}nos and Nov\'{a}k, Attila and Sikl\'{o}si, Borb\'{a}la},
  title =	{{Combining Language Independent Part-of-Speech Tagging Tools}},
  booktitle =	{2nd Symposium on Languages, Applications and Technologies},
  pages =	{249--257},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-52-1},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{29},
  editor =	{Leal, Jos\'{e} Paulo and Rocha, Ricardo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2013.249},
  URN =		{urn:nbn:de:0030-drops-40441},
  doi =		{10.4230/OASIcs.SLATE.2013.249},
  annote =	{Keywords: part-of-speech tagging, combination, agglutinative languages, machine learning, machine translation}
}
  • Refine by Author
  • 2 Novák, Attila
  • 1 Laki, László János
  • 1 Novák, Borbála
  • 1 Orosz, György
  • 1 Siklósi, Borbála

  • Refine by Classification
  • 1 Computing methodologies → Natural language processing

  • Refine by Keyword
  • 1 agglutinative languages
  • 1 combination
  • 1 entity embedding model
  • 1 machine learning
  • 1 machine translation
  • Show More...

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 2013
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail