DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2022.2

Automatic Classification of Portuguese Proverbs

Authors: Jorge Baptista and Sónia Reis

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

In this paper, natural language processing (NLP) and machine learning methods and tools are applied to the task of topic (thematic or semantic) classification of Portuguese proverbs. This is a difficult task since proverbs are usually very short sentences. Such classification should allow an easier selection of the most relevant proverbs for a given situation, considering their context in discourse or within a text. For that, we used, on the one hand, a collection of +32,000 proverbial expressions organized "thematically" into a large set of previously attributed topics (+2,200) and, on the other hand, the Orange data mining toolkit, along with the NLP and machine learning tools it provides. Since the classification provided in the collection of proverbs is, for the most part, based only on a keyword in the body of the proverbs, 2 experiments were set up, to determine the feasibility of the task with a modicum of effort and the most promising configurations applicable. Different sample sizes, 100 and 50 proverbs randomly selected per topic, corresponding to Scenario 1 and 2, respectively, were contrasted; several preprocessing strategies were explored, and different data representation methods tested against several learning algorithms. Results show that Neural Networks is the best performing model, achieving the best classification accuracy of 70% and 61%, in the two different experimental scenarios, Scenario 1 and 2, respectively. Some of the inaccurate classification cases seem to indicate that the machine learning approach can sometimes do a better job than a human classifier, especially considering the manual attribution of the topics by the collection’s author, the sheer number of topics involved, and the very unbalanced distribution of proverbs per topic. Based on the results achieved, the paper presents some proposals for future work to cope with such difficulties.

Cite as

Jorge Baptista and Sónia Reis. Automatic Classification of Portuguese Proverbs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 2:1-2:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2022.2,
  author =	{Baptista, Jorge and Reis, S\'{o}nia},
  title =	{{Automatic Classification of Portuguese Proverbs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{2:1--2:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.2},
  URN =		{urn:nbn:de:0030-drops-167480},
  doi =		{10.4230/OASIcs.SLATE.2022.2},
  annote =	{Keywords: Portuguese Proverbs, Automatic Topic Classification, Machine Learning}
}

Document

DOI: 10.4230/OASIcs.LDK.2021.19

Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors: Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)

Abstract

Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research.

Cite as

Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar. Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{masoud_et_al:OASIcs.LDK.2021.19,
  author =	{Masoud, Maraim and Pereira, Bianca and McCrae, John and Buitelaar, Paul},
  title =	{{Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{19:1--19:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.19},
  URN =		{urn:nbn:de:0030-drops-145556},
  doi =		{10.4230/OASIcs.LDK.2021.19},
  annote =	{Keywords: Knowledge Graph Construction, Enterprise Knowledge Graph}
}

@InProceedings{masoud_et_al:OASIcs.LDK.2021.19,
  author =	{Masoud, Maraim and Pereira, Bianca and McCrae, John and Buitelaar, Paul},
  title =	{{Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{19:1--19:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.19},
  URN =		{urn:nbn:de:0030-drops-145556},
  doi =		{10.4230/OASIcs.LDK.2021.19},
  annote =	{Keywords: Knowledge Graph Construction, Enterprise Knowledge Graph}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.3

Major Minors - Ontological Representation of Minorities by Newspapers

Authors: Paulo Jorge Pereira Martins, Leandro José Abreu Dias Costa, and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

The stigma associated with certain minorities has changed throughout the years, yet there’s no central data repository that enables a concrete tracking of this representation. Published articles on renowned newspapers are a way of determining the public perception on this subject, mainly digital newspapers, being it through the media representation (text and photo illustrations) or user comments. The present paper seeks to showcase a project that attempts to fulfill that shortage of data by providing a repository in the form of an ontology: RDF triplestores composing a semantic database (W3C standards for Semantic Web). This open-source project aims to be a research tool for mapping and studying the representation of minority groups in a Portuguese journalistic context over the course of two decades.

Cite as

Paulo Jorge Pereira Martins, Leandro José Abreu Dias Costa, and José Carlos Ramalho. Major Minors - Ontological Representation of Minorities by Newspapers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{martins_et_al:OASIcs.SLATE.2021.3,
  author =	{Martins, Paulo Jorge Pereira and Costa, Leandro Jos\'{e} Abreu Dias and Ramalho, Jos\'{e} Carlos},
  title =	{{Major Minors - Ontological Representation of Minorities by Newspapers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.3},
  URN =		{urn:nbn:de:0030-drops-144201},
  doi =		{10.4230/OASIcs.SLATE.2021.3},
  annote =	{Keywords: RDF, OWL, Ontologies, Knowledge Representation, Minorities}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2017.4

LTZVisor: TrustZone is the Key

Authors: Sandro Pinto, Jorge Pereira, Tiago Gomes, Adriano Tavares, and Jorge Cabral

Published in: LIPIcs, Volume 76, 29th Euromicro Conference on Real-Time Systems (ECRTS 2017)

Abstract

Virtualization technology starts becoming more and more widespread in the embedded systems arena, driven by the upward trend for integrating multiple environments into the same hardware platform. The penalties incurred by standard software-based virtualization, altogether with the strict timing requirements imposed by real-time virtualization are pushing research towards hardware-assisted solutions. Among existing commercial off-the-shelf (COTS) technologies, ARM TrustZone promises to be a game-changer for virtualization, despite of this technology still being seen with a lot of obscurity and scepticism. In this paper we present a Lightweight TrustZone-assisted Hypervisor (LTZVisor) as a tool to understand, evaluate and discuss the benefits and limitations of using TrustZone hardware to assist virtualization. We demonstrate how TrustZone can be adequately exploited for meeting the real-time needs, while presenting a low performance cost on running unmodified rich operating systems. While ARM continues to spread TrustZone technology from the applications processors to the smallest of microcontrollers, it is undeniable that this technology is gaining an increasing relevance. Our intent is to encourage research and drive the next generation of TrustZone-assisted virtualization solutions.

Cite as

Sandro Pinto, Jorge Pereira, Tiago Gomes, Adriano Tavares, and Jorge Cabral. LTZVisor: TrustZone is the Key. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 76, pp. 4:1-4:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pinto_et_al:LIPIcs.ECRTS.2017.4,
  author =	{Pinto, Sandro and Pereira, Jorge and Gomes, Tiago and Tavares, Adriano and Cabral, Jorge},
  title =	{{LTZVisor: TrustZone is the Key}},
  booktitle =	{29th Euromicro Conference on Real-Time Systems (ECRTS 2017)},
  pages =	{4:1--4:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-037-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{76},
  editor =	{Bertogna, Marko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2017.4},
  URN =		{urn:nbn:de:0030-drops-71535},
  doi =		{10.4230/LIPIcs.ECRTS.2017.4},
  annote =	{Keywords: hypervisor, virtualization, TrustZone, space and time partitioning, real-time, embedded systems}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.225

Automatic Identification of Whole-Part Relations in Portuguese

Authors: Ilia Markov, Nuno Mamede, and Jorge Baptista

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

In this paper, we improve the extraction of semantic relations between textual elements as it is currently performed by STRING, a hybrid statistical and rule-based Natural Language Processing chain for Portuguese, by targeting whole-part relations (meronymy), that is, a semantic relation between an entity that is perceived as a constituent part of another entity, or a member of a set. In this case, we focus on the type of meronymy involving human entities and body-part nouns.

Cite as

Ilia Markov, Nuno Mamede, and Jorge Baptista. Automatic Identification of Whole-Part Relations in Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 225-232, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{markov_et_al:OASIcs.SLATE.2014.225,
  author =	{Markov, Ilia and Mamede, Nuno and Baptista, Jorge},
  title =	{{Automatic Identification of Whole-Part Relations in Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{225--232},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.225},
  URN =		{urn:nbn:de:0030-drops-45723},
  doi =		{10.4230/OASIcs.SLATE.2014.225},
  annote =	{Keywords: whole-part relation, meronymy, body-part noun, disease noun, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.235

Automatic Detection of Proverbs and their Variants

Authors: Amanda P. Rassi, Jorge Baptista, and Oto Vale

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

This article presents the task of automatic detection of proverbs in Brazilian Portuguese, from the intersection of the regular syntactic structure of proverbs and their core elements. We created finite-state automata that enabled us to look for these word combinations in running texts. The rationale behind this method consists in the fact that although proverbs may have a normal sentence structure and often a very commonly used lexicon, their specific word-combinations may enables us to identify them and their variants irrespective of the syntactic or structural changes the proverb may undergo. The goal of this task is to gather the largest number of proverbs and their variants. The results showed precision 60.15%.

Cite as

Amanda P. Rassi, Jorge Baptista, and Oto Vale. Automatic Detection of Proverbs and their Variants. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 235-249, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{rassi_et_al:OASIcs.SLATE.2014.235,
  author =	{Rassi, Amanda P. and Baptista, Jorge and Vale, Oto},
  title =	{{Automatic Detection of Proverbs and their Variants}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{235--249},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.235},
  URN =		{urn:nbn:de:0030-drops-45738},
  doi =		{10.4230/OASIcs.SLATE.2014.235},
  annote =	{Keywords: Brazilian Portuguese, proverbs, syntactic structure, core element, variation}
}

6 Search Results for "Pereira, Jorge"

Automatic Classification of Portuguese Proverbs

Abstract

Cite as

Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Abstract

Cite as

Major Minors - Ontological Representation of Minorities by Newspapers

Abstract

Cite as

LTZVisor: TrustZone is the Key

Abstract

Cite as

Automatic Identification of Whole-Part Relations in Portuguese

Abstract

Cite as

Automatic Detection of Proverbs and their Variants

Abstract

Cite as

Thanks for your feedback!

Could not send message