Search Results

Documents authored by Baptista, Jorge


Document
Automatic Classification of Portuguese Proverbs

Authors: Jorge Baptista and Sónia Reis

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)


Abstract
In this paper, natural language processing (NLP) and machine learning methods and tools are applied to the task of topic (thematic or semantic) classification of Portuguese proverbs. This is a difficult task since proverbs are usually very short sentences. Such classification should allow an easier selection of the most relevant proverbs for a given situation, considering their context in discourse or within a text. For that, we used, on the one hand, a collection of +32,000 proverbial expressions organized "thematically" into a large set of previously attributed topics (+2,200) and, on the other hand, the Orange data mining toolkit, along with the NLP and machine learning tools it provides. Since the classification provided in the collection of proverbs is, for the most part, based only on a keyword in the body of the proverbs, 2 experiments were set up, to determine the feasibility of the task with a modicum of effort and the most promising configurations applicable. Different sample sizes, 100 and 50 proverbs randomly selected per topic, corresponding to Scenario 1 and 2, respectively, were contrasted; several preprocessing strategies were explored, and different data representation methods tested against several learning algorithms. Results show that Neural Networks is the best performing model, achieving the best classification accuracy of 70% and 61%, in the two different experimental scenarios, Scenario 1 and 2, respectively. Some of the inaccurate classification cases seem to indicate that the machine learning approach can sometimes do a better job than a human classifier, especially considering the manual attribution of the topics by the collection’s author, the sheer number of topics involved, and the very unbalanced distribution of proverbs per topic. Based on the results achieved, the paper presents some proposals for future work to cope with such difficulties.

Cite as

Jorge Baptista and Sónia Reis. Automatic Classification of Portuguese Proverbs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 2:1-2:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2022.2,
  author =	{Baptista, Jorge and Reis, S\'{o}nia},
  title =	{{Automatic Classification of Portuguese Proverbs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{2:1--2:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.2},
  URN =		{urn:nbn:de:0030-drops-167480},
  doi =		{10.4230/OASIcs.SLATE.2022.2},
  annote =	{Keywords: Portuguese Proverbs, Automatic Topic Classification, Machine Learning}
}
Document
Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese

Authors: Jorge Baptista and Nuno Mamede

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)


Abstract
This paper reports on-going work on building a rule-based grammar for (European) Portuguese, incorporating support verb constructions (SVC). The paper focuses on parsing sentences resulting from syntactic transformations of SVC, and presents a methodology to automatically generate testing examples directly from the SVC Lexicon-Grammar matrix where their linguistic properties are represented. These examples allow both to improve the linguistic description of these constructions and to test intrinsically the system parser, spotting unforeseen issues due to previous natural language processing steps.

Cite as

Jorge Baptista and Nuno Mamede. Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2020.11,
  author =	{Baptista, Jorge and Mamede, Nuno},
  title =	{{Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.11},
  URN =		{urn:nbn:de:0030-drops-130245},
  doi =		{10.4230/OASIcs.SLATE.2020.11},
  annote =	{Keywords: Support verb constructions, Rule-based parsing, syntactic transformations, language resources, European Portuguese}
}
Document
Vocatives in Portuguese: Identification and Processing

Authors: Jorge Baptista and Nuno Mamede

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)


Abstract
This paper describes the most salient linguistic aspects of vocative constructions in Portuguese, with special reference to its European variety. Next, the paper presents the strategy followed for implementing this linguistic knowledge in a computational grammar of Portuguese, developed for the natural language processing chain STRING and using the XIP rule-based parser. Very precise and detailed linguistic descriptions can be implemented in this way.

Cite as

Jorge Baptista and Nuno Mamede. Vocatives in Portuguese: Identification and Processing. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2017.22,
  author =	{Baptista, Jorge and Mamede, Nuno},
  title =	{{Vocatives in Portuguese: Identification and Processing}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{22:1--22:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.22},
  URN =		{urn:nbn:de:0030-drops-79555},
  doi =		{10.4230/OASIcs.SLATE.2017.22},
  annote =	{Keywords: Natural Language Processing, Text analysis, Portuguese, Vocative, Parsing}
}
Document
Automatic Identification of Whole-Part Relations in Portuguese

Authors: Ilia Markov, Nuno Mamede, and Jorge Baptista

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)


Abstract
In this paper, we improve the extraction of semantic relations between textual elements as it is currently performed by STRING, a hybrid statistical and rule-based Natural Language Processing chain for Portuguese, by targeting whole-part relations (meronymy), that is, a semantic relation between an entity that is perceived as a constituent part of another entity, or a member of a set. In this case, we focus on the type of meronymy involving human entities and body-part nouns.

Cite as

Ilia Markov, Nuno Mamede, and Jorge Baptista. Automatic Identification of Whole-Part Relations in Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 225-232, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{markov_et_al:OASIcs.SLATE.2014.225,
  author =	{Markov, Ilia and Mamede, Nuno and Baptista, Jorge},
  title =	{{Automatic Identification of Whole-Part Relations in Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{225--232},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.225},
  URN =		{urn:nbn:de:0030-drops-45723},
  doi =		{10.4230/OASIcs.SLATE.2014.225},
  annote =	{Keywords: whole-part relation, meronymy, body-part noun, disease noun, Portuguese}
}
Document
Automatic Detection of Proverbs and their Variants

Authors: Amanda P. Rassi, Jorge Baptista, and Oto Vale

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)


Abstract
This article presents the task of automatic detection of proverbs in Brazilian Portuguese, from the intersection of the regular syntactic structure of proverbs and their core elements. We created finite-state automata that enabled us to look for these word combinations in running texts. The rationale behind this method consists in the fact that although proverbs may have a normal sentence structure and often a very commonly used lexicon, their specific word-combinations may enables us to identify them and their variants irrespective of the syntactic or structural changes the proverb may undergo. The goal of this task is to gather the largest number of proverbs and their variants. The results showed precision 60.15%.

Cite as

Amanda P. Rassi, Jorge Baptista, and Oto Vale. Automatic Detection of Proverbs and their Variants. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 235-249, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rassi_et_al:OASIcs.SLATE.2014.235,
  author =	{Rassi, Amanda P. and Baptista, Jorge and Vale, Oto},
  title =	{{Automatic Detection of Proverbs and their Variants}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{235--249},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.235},
  URN =		{urn:nbn:de:0030-drops-45738},
  doi =		{10.4230/OASIcs.SLATE.2014.235},
  annote =	{Keywords: Brazilian Portuguese, proverbs, syntactic structure, core element, variation}
}
Document
Syntactic REAP.PT: Exercises on Clitic Pronouning

Authors: Tiago Freitas, Jorge Baptista, and Nuno Mamede

Published in: OASIcs, Volume 29, 2nd Symposium on Languages, Applications and Technologies (2013)


Abstract
The emerging interdisciplinary field of Intelligent Computer Assisted Language Learning (ICALL) aims to integrate the knowledge from computational linguistics into computer-assisted language learning (CALL). REAP.PT is a project emerging from this new field, aiming to teach Portuguese in an innovative and appealing way, and adapted to each student. In this paper, we present a new improvement of the REAP.PT system, consisting in developing new, automatically generated, syntactic exercises. These exercises deal with the complex phenomenon of pronominalization, that is, the substitution of a syntactic constituent with an adequate pronominal form. Though the transformation may seem simple, it involves complex lexical, syntactical and semantic constraints. The issues on pronominalization in Portuguese make it a particularly difficult aspect of language learning for non-native speakers. On the other hand, even native speakers can often be uncertain about the correct clitic positioning, due to the complexity and interaction of competing factors governing this phenomenon. A new architecture for automatic syntactic exercise generation is proposed. It proved invaluable in easing the development of this complex exercise, and is expected to make a relevant step forward in the development of future syntactic exercises, with the potential of becoming a syntactic exercise generation framework. A pioneer feedback system with detailed and automatically generated explanations for each answer is also presented, improving the learning experience, as stated in user comments. The expert evaluation and crowd-sourced testing positive results demonstrated the validity of the present approach.

Cite as

Tiago Freitas, Jorge Baptista, and Nuno Mamede. Syntactic REAP.PT: Exercises on Clitic Pronouning. In 2nd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 29, pp. 271-285, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{freitas_et_al:OASIcs.SLATE.2013.271,
  author =	{Freitas, Tiago and Baptista, Jorge and Mamede, Nuno},
  title =	{{Syntactic REAP.PT: Exercises on Clitic Pronouning}},
  booktitle =	{2nd Symposium on Languages, Applications and Technologies},
  pages =	{271--285},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-52-1},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{29},
  editor =	{Leal, Jos\'{e} Paulo and Rocha, Ricardo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2013.271},
  URN =		{urn:nbn:de:0030-drops-40433},
  doi =		{10.4230/OASIcs.SLATE.2013.271},
  annote =	{Keywords: Intelligent Computer Assisted Language Learning (ICALL), Portuguese, Syntactic Exercises, Automatic Exercise Generation, Clitic Pronouning}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail