Building a Dictionary using XML Technology

Authors Alberto Simões, José João Almeida, Ana Salgado

Thumbnail PDF


  • Filesize: 0.49 MB
  • 8 pages

Document Identifiers

Author Details

Alberto Simões
José João Almeida
Ana Salgado

Cite AsGet BibTex

Alberto Simões, José João Almeida, and Ana Salgado. Building a Dictionary using XML Technology. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 14:1-14:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


In this article we describe the workflow implemented to convert a dictionary saved as a PDF file into an XML document and posterior importation into an XML aware database, and the process to edit, add and delete new entries. The conversion process was challenging given the format of the PDF file, and the fine grained detail of the XML schema that was used. For that, an iterative filtering approach was used. To store the dictionary we decided to use an XML aware database (eXist-DB), that stores each dictionary entry as a separate resource. It can be queried used a web interface developed using XQuery. The lexicographers can edit entries using the oXygen XML editor, reading and storing them directly in the database. In order to guarantee incremental backups, it was defined a mechanism to import the XML database into a GIT repository. Finally, a couple of programs were created in order to prepare regular reports on the dictionary revision process, as well as to backup it in a GIT repository.
  • XML databases
  • dictionaries
  • XQuery
  • PDF files


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. João Malaca Casteleiro, editor. Dicionário da Língua Portuguesa Contemporânea. Academia das Ciências de Lisboa, Verbo, 2001. Google Scholar
  2. Micah Dubinko. XForms Essentials. O'Reilly Media, Inc., August 2003. Google Scholar
  3. Xavier Gómez Guinovart and Alberto Simões. Retreading Dictionaries for the 21st Century. In José Paulo Leal, Ricardo Rocha, and Alberto Simões, editors, 2nd Symposium on Languages, Applications and Technologies, volume 29 of OpenAccess Series in Informatics (OASIcs), pages 115-126, Dagstuhl, Germany, 2013. URL:
  4. Wolfgang Meier. exist: An open source native xml database. In Akmal B. Chaudhri, Mario Jeckle, Erhard Rahm, and Rainer Unland, editors, Web, Web-Services, and Database Systems: NODe 2002 Web- and Database-Related Workshops Erfurt, Germany, October 7-10, 2002 Revised Papers, pages 169-183. Springer, Berlin, Heidelberg, 2003. URL:
  5. Alberto Simões and José João Almeida. Processing XML: a rewriting system approach. In Alberto Simões, Daniela da Cruz, and José Carlos Ramalho, editors, XATA 2010 - 8ª Conferência Nacional em XML, Aplicações e Tecnologias Associadas, pages 27-38, 2010. Google Scholar
  6. Alberto Simões, Álvaro Iriarte, and José João Almeida. Dicionário-Aberto: Construção semiautomática de uma funcionalidade codificadora. In Alain Lemaréchal, Peter Koch, and Pierre Swiggers, editors, Actes du XXVIIe Congrès international de linguistique et de philologie romanes, Nancy, 15-20 july 2013 2014. ALTIF. Section 16 : Projets en cours; ressources et outils nouveaux. Google Scholar
  7. Edward Vanhoutte. An Introduction to the TEI and the TEI Consortium. Literary and Linguistic Computing, 19(1):9-16, 2004. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail