3 Search Results for "Cunha, Luís"

Document

DOI: 10.4230/OASIcs.SLATE.2022.17

Reasoning with Portuguese Word Embeddings

Authors: Luís Filipe Cunha, J. João Almeida, and Alberto Simões

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.

Cite as

Luís Filipe Cunha, J. João Almeida, and Alberto Simões. Reasoning with Portuguese Word Embeddings. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2022.17,
  author =	{Cunha, Lu{\'\i}s Filipe and Almeida, J. Jo\~{a}o and Sim\~{o}es, Alberto},
  title =	{{Reasoning with Portuguese Word Embeddings}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{17:1--17:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.17},
  URN =		{urn:nbn:de:0030-drops-167636},
  doi =		{10.4230/OASIcs.SLATE.2022.17},
  annote =	{Keywords: Word Embeddings, Word2Vec, Evaluation Methods}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.8

NER in Archival Finding Aids

Authors: Luís Filipe Costa Cunha and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

At the moment, the vast majority of Portuguese archives with an online presence use a software solution to manage their finding aids: e.g. Digitarq or Archeevo. Most of these finding aids are written in natural language without any annotation that would enable a machine to identify named entities, geographical locations or even some dates. That would allow the machine to create smart browsing tools on top of those record contents like entity linking and record linking. In this work we have created a set of datasets to train Machine Learning algorithms to find those named entities and geographical locations. After training several algorithms we tested them in several datasets and registered their precision and accuracy. These results enabled us to achieve some conclusions about what kind of precision we can achieve with this approach in this context and what to do with the results: do we have enough precision and accuracy to create toponymic and anthroponomic indexes for archival finding aids? Is this approach suitable in this context? These are some of the questions we intend to answer along this paper.

Cite as

Luís Filipe Costa Cunha and José Carlos Ramalho. NER in Archival Finding Aids. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{costacunha_et_al:OASIcs.SLATE.2021.8,
  author =	{Costa Cunha, Lu{\'\i}s Filipe and Ramalho, Jos\'{e} Carlos},
  title =	{{NER in Archival Finding Aids}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{8:1--8:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.8},
  URN =		{urn:nbn:de:0030-drops-144257},
  doi =		{10.4230/OASIcs.SLATE.2021.8},
  annote =	{Keywords: Named Entity Recognition, Archival Descriptions, Machine Learning, Deep Learning}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.19

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Authors: Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words.

Cite as

Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye. Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.CPM.2017.19,
  author =	{Cunha, Lu{\'\i}s and Dantas, Simone and Gagie, Travis and Wittler, Roland and Kowada, Luis and Stoye, Jens},
  title =	{{Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{19:1--19:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.19},
  URN =		{urn:nbn:de:0030-drops-73418},
  doi =		{10.4230/LIPIcs.CPM.2017.19},
  annote =	{Keywords: string algorithms, indexing, jumbled pattern matching, run-length encoding}
}

Refine by Author
1 Almeida, J. João
1 Costa Cunha, Luís Filipe
1 Cunha, Luís
1 Cunha, Luís Filipe
1 Dantas, Simone
Show More...

Refine by Classification
2 Computing methodologies → Machine learning
2 Computing methodologies → Natural language processing
1 Computing methodologies → Maximum entropy modeling
1 Computing methodologies → Neural networks
1 Information systems → Digital libraries and archives

Refine by Keyword
1 Archival Descriptions
1 Deep Learning
1 Evaluation Methods
1 Machine Learning
1 Named Entity Recognition
Show More...

Refine by Type
3 document

Refine by Publication Year
1 2017
1 2021
1 2022

3 Search Results for "Cunha, Luís"

Reasoning with Portuguese Word Embeddings

Abstract

Cite as

NER in Archival Finding Aids

Abstract

Cite as

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Abstract

Cite as

Thanks for your feedback!

Could not send message