A Pseudonymization Prototype for Hungarian

Authors Attila Novák, Borbála Novák

Attila Novák
  • Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Borbála Novák
  • Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary

Attila Novák and Borbála Novák. A Pseudonymization Prototype for Hungarian. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/OASIcs.SLATE.2023.3


In this paper, we present a pseudonymization prototype for Hungarian, an agglutinating language with complex morphology, implemented as a web service. The service provides the following functions: entity identification and extraction; automatic generation and selection of replacement candidates; automatic and consistent replacement and reinflection of entities in the final pseudonymized document. The named entity recognition model applied handles names of persons well, and it has decent performance on other entity types as well. However ID-like entities need to be handled separately to achieve proper performance (not handled in the current prototype version). For automatic replacement candidate generation, a simple entity embedding model is used. We discuss the performance and limitations of the prototype in detail.

  • Computing methodologies → Natural language processing
  • named entity recognition
  • morphological reinflection
  • pseudonymization
  • entity embedding model


