DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2022.17

Reasoning with Portuguese Word Embeddings

Authors: Luís Filipe Cunha, J. João Almeida, and Alberto Simões

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.

Cite as

Luís Filipe Cunha, J. João Almeida, and Alberto Simões. Reasoning with Portuguese Word Embeddings. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2022.17,
  author =	{Cunha, Lu{\'\i}s Filipe and Almeida, J. Jo\~{a}o and Sim\~{o}es, Alberto},
  title =	{{Reasoning with Portuguese Word Embeddings}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{17:1--17:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.17},
  URN =		{urn:nbn:de:0030-drops-167636},
  doi =		{10.4230/OASIcs.SLATE.2022.17},
  annote =	{Keywords: Word Embeddings, Word2Vec, Evaluation Methods}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.6

DataGen: JSON/XML Dataset Generator

Authors: Filipa Alves dos Santos, Hugo André Coelho Cardoso, João da Cunha e Costa, Válter Ferreira Picas Carvalho, and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

In this document we describe the steps towards DataGen implementation. DataGen is a versatile and powerful tool that allows for quick prototyping and testing of software applications, since currently too few solutions offer both the complexity and scalability necessary to generate adequate datasets in order to feed a data API or a more complex APP enabling those applications testing with appropriate data volume and data complexity. DataGen core is a Domain Specific Language (DSL) that was created to specify datasets. This language suffered several updates: repeating fields (with no limit), fuzzy fields (statistically generated), lists, highorder functions over lists, custom made transformation functions. The final result is a complex algebra that allows the generation of very complex datasets coping with very complex requirements. Throughout the paper we will give several examples of the possibilities. After generating a dataset DataGen gives the user the possibility to generate a RESTFull data API with that dataset, creating a running prototype. This solution has already been used in real life cases, described with more detail throughout the paper, in which it was able to create the intended datasets successfully. These allowed the application’s performance to be tested and for the right adjustments to be made. The tool is currently being deployed for general use.

Cite as

Filipa Alves dos Santos, Hugo André Coelho Cardoso, João da Cunha e Costa, Válter Ferreira Picas Carvalho, and José Carlos Ramalho. DataGen: JSON/XML Dataset Generator. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2021.6,
  author =	{Santos, Filipa Alves dos and Cardoso, Hugo Andr\'{e} Coelho and da Cunha e Costa, Jo\~{a}o and Carvalho, V\'{a}lter Ferreira Picas and Ramalho, Jos\'{e} Carlos},
  title =	{{DataGen: JSON/XML Dataset Generator}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.6},
  URN =		{urn:nbn:de:0030-drops-144239},
  doi =		{10.4230/OASIcs.SLATE.2021.6},
  annote =	{Keywords: JSON, XML, Data Generation, Open Source, REST API, Strapi, JavaScript, Node.js, Vue.js, Scalability, Fault Tolerance, Dataset, DSL, PEG.js, MongoDB}
}

@InProceedings{santos_et_al:OASIcs.SLATE.2021.6,
  author =	{Santos, Filipa Alves dos and Cardoso, Hugo Andr\'{e} Coelho and da Cunha e Costa, Jo\~{a}o and Carvalho, V\'{a}lter Ferreira Picas and Ramalho, Jos\'{e} Carlos},
  title =	{{DataGen: JSON/XML Dataset Generator}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.6},
  URN =		{urn:nbn:de:0030-drops-144239},
  doi =		{10.4230/OASIcs.SLATE.2021.6},
  annote =	{Keywords: JSON, XML, Data Generation, Open Source, REST API, Strapi, JavaScript, Node.js, Vue.js, Scalability, Fault Tolerance, Dataset, DSL, PEG.js, MongoDB}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.8

NER in Archival Finding Aids

Authors: Luís Filipe Costa Cunha and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

At the moment, the vast majority of Portuguese archives with an online presence use a software solution to manage their finding aids: e.g. Digitarq or Archeevo. Most of these finding aids are written in natural language without any annotation that would enable a machine to identify named entities, geographical locations or even some dates. That would allow the machine to create smart browsing tools on top of those record contents like entity linking and record linking. In this work we have created a set of datasets to train Machine Learning algorithms to find those named entities and geographical locations. After training several algorithms we tested them in several datasets and registered their precision and accuracy. These results enabled us to achieve some conclusions about what kind of precision we can achieve with this approach in this context and what to do with the results: do we have enough precision and accuracy to create toponymic and anthroponomic indexes for archival finding aids? Is this approach suitable in this context? These are some of the questions we intend to answer along this paper.

Cite as

Luís Filipe Costa Cunha and José Carlos Ramalho. NER in Archival Finding Aids. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{costacunha_et_al:OASIcs.SLATE.2021.8,
  author =	{Costa Cunha, Lu{\'\i}s Filipe and Ramalho, Jos\'{e} Carlos},
  title =	{{NER in Archival Finding Aids}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{8:1--8:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.8},
  URN =		{urn:nbn:de:0030-drops-144257},
  doi =		{10.4230/OASIcs.SLATE.2021.8},
  annote =	{Keywords: Named Entity Recognition, Archival Descriptions, Machine Learning, Deep Learning}
}

Document

DOI: 10.4230/OASIcs.ICPEC.2020.2

Challenges and Solutions from an Embedded Programming Bootcamp

Authors: J. Pedro Amaro, Jorge Barreiros, Fernanda Coutinho, João Durães, Frederico Santos, Ana Alves, Marco Silva, and João Cunha

Published in: OASIcs, Volume 81, First International Computer Programming Education Conference (ICPEC 2020)

Abstract

Due to the proliferation of IT companies developing web and mobile applications, computer programmers are in such high demand that universities can’t satisfy it with newly graduated students. In response, some organisations started to create coding bootcamps, providing intensive full-time courses focused on unemployed people or individuals seeking for a career change. There is, however, a different set of skills that is becoming increasingly required, but is not addressed by those courses: embedded programming. In fact, the Internet of Things is connecting every device to the internet, thus making knowledge on hardware and C/C++ programming very relevant skills. A group of computer science and electrical engineering university teachers, in collaboration with several industry stakeholders, have promoted an embedded systems programming course in C and C++. This course is based on an intensive project-based approach comprising 6 months of daylong classes followed by 9 months of paid internships. After two editions, thirty embedded programmers, with no relevant previous programming experience, have been placed with the partners’ working force. In this paper, the course organisation and pedagogical methodologies are described. Problems, challenges and adopted solutions are presented and analysed. We conclude that in spite of the intense rhythm and demanding nature of the subject matter, it is possible to find the structure and solutions that keep students engaged and motivated throughout the course, allowing them to gain the required competences and successfully transition into a new career path.

Cite as

J. Pedro Amaro, Jorge Barreiros, Fernanda Coutinho, João Durães, Frederico Santos, Ana Alves, Marco Silva, and João Cunha. Challenges and Solutions from an Embedded Programming Bootcamp. In First International Computer Programming Education Conference (ICPEC 2020). Open Access Series in Informatics (OASIcs), Volume 81, pp. 2:1-2:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{amaro_et_al:OASIcs.ICPEC.2020.2,
  author =	{Amaro, J. Pedro and Barreiros, Jorge and Coutinho, Fernanda and Dur\~{a}es, Jo\~{a}o and Santos, Frederico and Alves, Ana and Silva, Marco and Cunha, Jo\~{a}o},
  title =	{{Challenges and Solutions from an Embedded Programming Bootcamp}},
  booktitle =	{First International Computer Programming Education Conference (ICPEC 2020)},
  pages =	{2:1--2:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-153-5},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{81},
  editor =	{Queir\'{o}s, Ricardo and Portela, Filipe and Pinto, M\'{a}rio and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2020.2},
  URN =		{urn:nbn:de:0030-drops-122896},
  doi =		{10.4230/OASIcs.ICPEC.2020.2},
  annote =	{Keywords: Coding Bootcamp, Embedded Programming, Career Change}
}

@InProceedings{amaro_et_al:OASIcs.ICPEC.2020.2,
  author =	{Amaro, J. Pedro and Barreiros, Jorge and Coutinho, Fernanda and Dur\~{a}es, Jo\~{a}o and Santos, Frederico and Alves, Ana and Silva, Marco and Cunha, Jo\~{a}o},
  title =	{{Challenges and Solutions from an Embedded Programming Bootcamp}},
  booktitle =	{First International Computer Programming Education Conference (ICPEC 2020)},
  pages =	{2:1--2:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-153-5},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{81},
  editor =	{Queir\'{o}s, Ricardo and Portela, Filipe and Pinto, M\'{a}rio and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2020.2},
  URN =		{urn:nbn:de:0030-drops-122896},
  doi =		{10.4230/OASIcs.ICPEC.2020.2},
  annote =	{Keywords: Coding Bootcamp, Embedded Programming, Career Change}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.19

Quarmic: A Data-Driven Web Development Framework

Authors: Pedro Miguel Pereira Cunha and José Paulo Leal

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Quarmic is a web framework for rapid prototyping of web applications. Its main goal is to facilitate the development of web applications by providing a high level of abstraction that hides Web communication complexities. This framework allows developers to build scalable applications capable of handling data communication in different models, data persistence and authentication, requiring them just to use simple annotations. Quarmic’s approach consists of the replication of the shared object among clients and server in order to communicate through its methods execution. Where the annotations, namely decorators, are used to indicate the concern (model or view) that each method addresses and to implement the framework’s inversion of control. By indicating the method concern, it enables the separation of its execution across the clients (responsible for the view) and the server (responsible for the model) which facilitates the state management and code maintenance.

Cite as

Pedro Miguel Pereira Cunha and José Paulo Leal. Quarmic: A Data-Driven Web Development Framework. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 19:1-19:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2019.19,
  author =	{Cunha, Pedro Miguel Pereira and Leal, Jos\'{e} Paulo},
  title =	{{Quarmic: A Data-Driven Web Development Framework}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{19:1--19:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.19},
  URN =		{urn:nbn:de:0030-drops-108869},
  doi =		{10.4230/OASIcs.SLATE.2019.19},
  annote =	{Keywords: web development, framework, data-driven}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.19

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Authors: Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words.

Cite as

Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye. Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.CPM.2017.19,
  author =	{Cunha, Lu{\'\i}s and Dantas, Simone and Gagie, Travis and Wittler, Roland and Kowada, Luis and Stoye, Jens},
  title =	{{Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{19:1--19:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.19},
  URN =		{urn:nbn:de:0030-drops-73418},
  doi =		{10.4230/LIPIcs.CPM.2017.19},
  annote =	{Keywords: string algorithms, indexing, jumbled pattern matching, run-length encoding}
}

6 Search Results for "Cunha, Lu�s"

Reasoning with Portuguese Word Embeddings

Abstract

Cite as

DataGen: JSON/XML Dataset Generator

Abstract

Cite as

NER in Archival Finding Aids

Abstract

Cite as

Challenges and Solutions from an Embedded Programming Bootcamp

Abstract

Cite as

Quarmic: A Data-Driven Web Development Framework

Abstract

Cite as

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Abstract

Cite as

Thanks for your feedback!

Could not send message