3 Search Results for "Almeida, J. João"


Document
OCRticle - a Structure-Aware OCR Application

Authors: Sofia G. Rodrigues dos Santos and J. João Dias de Almeida

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)


Abstract
While there are currently many applications and websites capable of performing Optical Character Recognition (OCR), none of the widely available options offer structured OCR, i.e., OCR that maintains the text’s original structure. For example, if a document has a title, after performing OCR on it, the title should have a different formatting, in order to distinguish it from the rest of the text. This paper covers the topic of structure-aware OCR, first by describing the current state of OCR tools, then by showcasing a prototype tool capable of retaining the structure of articles scanned from an image.

Cite as

Sofia G. Rodrigues dos Santos and J. João Dias de Almeida. OCRticle - a Structure-Aware OCR Application. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{rodriguesdossantos_et_al:OASIcs.SLATE.2023.8,
  author =	{Rodrigues dos Santos, Sofia G. and Dias de Almeida, J. Jo\~{a}o},
  title =	{{OCRticle - a Structure-Aware OCR Application}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.8},
  URN =		{urn:nbn:de:0030-drops-185220},
  doi =		{10.4230/OASIcs.SLATE.2023.8},
  annote =	{Keywords: OCR, Optical Character Recognition, Data Structure, Data Parsing, Document Structure}
}
Document
Reasoning with Portuguese Word Embeddings

Authors: Luís Filipe Cunha, J. João Almeida, and Alberto Simões

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)


Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.

Cite as

Luís Filipe Cunha, J. João Almeida, and Alberto Simões. Reasoning with Portuguese Word Embeddings. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2022.17,
  author =	{Cunha, Lu{\'\i}s Filipe and Almeida, J. Jo\~{a}o and Sim\~{o}es, Alberto},
  title =	{{Reasoning with Portuguese Word Embeddings}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{17:1--17:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.17},
  URN =		{urn:nbn:de:0030-drops-167636},
  doi =		{10.4230/OASIcs.SLATE.2022.17},
  annote =	{Keywords: Word Embeddings, Word2Vec, Evaluation Methods}
}
Document
Hunting Ancestors: A Unified Approach for Discovering Genealogical Information

Authors: José João Almeida and Rui Castro Mendes

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)


Abstract
This paper presents an unified approach for discovering genealogical information. It presents a frameworks for storing information concerning ancestors, locations, dates and documents. It also intends to provide a framework that is able to perform inference concerning dates by using constraints and for handling relations, locations and sources. The DSL presented also aims to help users store information from heterogeneous sources along with the evidence contained therein.

Cite as

José João Almeida and Rui Castro Mendes. Hunting Ancestors: A Unified Approach for Discovering Genealogical Information. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 22:1-22:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{almeida_et_al:OASIcs.SLATE.2019.22,
  author =	{Almeida, Jos\'{e} Jo\~{a}o and Mendes, Rui Castro},
  title =	{{Hunting Ancestors: A Unified Approach for Discovering Genealogical Information}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{22:1--22:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.22},
  URN =		{urn:nbn:de:0030-drops-108890},
  doi =		{10.4230/OASIcs.SLATE.2019.22},
  annote =	{Keywords: Genealogy, Domain Specific Language, Temporal Constraints}
}
  • Refine by Author
  • 1 Almeida, J. João
  • 1 Almeida, José João
  • 1 Cunha, Luís Filipe
  • 1 Dias de Almeida, J. João
  • 1 Mendes, Rui Castro
  • Show More...

  • Refine by Classification
  • 1 Applied computing → Optical character recognition
  • 1 Computing methodologies → Machine learning
  • 1 Computing methodologies → Natural language processing
  • 1 Software and its engineering → Domain specific languages
  • 1 Software and its engineering → Scripting languages
  • Show More...

  • Refine by Keyword
  • 1 Data Parsing
  • 1 Data Structure
  • 1 Document Structure
  • 1 Domain Specific Language
  • 1 Evaluation Methods
  • Show More...

  • Refine by Type
  • 3 document

  • Refine by Publication Year
  • 1 2019
  • 1 2022
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail