DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2026.3

Cross-Language Text Readability Assessment: Leveraging Multilingual Models for Improved Performance in CEFR-Level Classification

Authors: Eugénio Ribeiro and Jorge Baptista

Published in: OASIcs, Volume 144, 15th Symposium on Languages, Applications and Technologies (SLATE 2026)

Abstract

The automatic assessment of text readability and the classification of texts by levels is essential for language education and language industries that rely on effective communication. This study explores cross-language automatic readability level classification using the levels defined by the Common European Framework of Reference for Languages (CEFR). We investigate the potential of using data in one language to improve classification performance in different languages and, thus, optimize the utilization of the limited labeled resources available for each language. We rely on a pre-trained multilingual Transformer-based language model, by fine-tuning it on annotated data in one language or in a combination of languages, and then assessing its ability to generalize even to unseen languages. In an additional scenario, we further fine-tune the models on data in the target language, to assess whether the models trained on data in different languages can capture generic information regarding text readability and then be further specialized to capture the specific characteristics of the target language. Our experiments covering the English, Dutch, and German languages revealed that direct generalization to unseen languages is challenging. However, when paired with data in the target language, multilingual data can be leveraged to capture cross-language aspects of text readability, leading to more robust and better-performing models.

Cite as

Eugénio Ribeiro and Jorge Baptista. Cross-Language Text Readability Assessment: Leveraging Multilingual Models for Improved Performance in CEFR-Level Classification. In 15th Symposium on Languages, Applications and Technologies (SLATE 2026). Open Access Series in Informatics (OASIcs), Volume 144, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{ribeiro_et_al:OASIcs.SLATE.2026.3,
  author =	{Ribeiro, Eug\'{e}nio and Baptista, Jorge},
  title =	{{Cross-Language Text Readability Assessment: Leveraging Multilingual Models for Improved Performance in CEFR-Level Classification}},
  booktitle =	{15th Symposium on Languages, Applications and Technologies (SLATE 2026)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-440-6},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{144},
  editor =	{Batista, Fernando and Ribeiro, Eug\'{e}nio and Ribeiro, Ricardo and Santos, Andr\'{e} L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2026.3},
  URN =		{urn:nbn:de:0030-drops-267016},
  doi =		{10.4230/OASIcs.SLATE.2026.3},
  annote =	{Keywords: Readability, Text Complexity, CEFR, Multilinguality}
}

Document

DOI: 10.4230/OASIcs.SLATE.2026.5

Inter-Sentential Relations in Enhanced Lexicalized Meaning Representation (E-LMR)

Authors: Jorge Baptista

Published in: OASIcs, Volume 144, 15th Symposium on Languages, Applications and Technologies (SLATE 2026)

Abstract

Abstract Meaning Representation (AMR) is a widely adopted framework for graph-based sentence-level semantics. However, its abstraction from surface form and its focus on intra-sentential structure limit its ability to capture discourse-level phenomena. Uniform Meaning Representation (UMR) addresses this limitation by introducing cross-sentential mechanisms, at the cost of increased representational complexity. We argue that inter-sentential relations can be modeled without abandoning lexical anchoring. Within Lexicalized Meaning Representation, we propose a programmatic extension grounded in three domains: (i) nominal and pronominal anaphora, (ii) temporal anchoring, and (iii) inter-sentential cohesion devices. We call this Enhanced LMR (E-LMR). We contend that these relations are not abstract add-ons, but are lexically realized and should be represented accordingly. Rather than a full annotation scheme, we introduce design principles and representation strategies illustrated with European Portuguese data. A lexically grounded approach improves interpretability, annotation consistency, and cross-linguistic robustness. We thus position E-LMR as a viable alternative within a modular semantic architecture where discourse relations are explicitly encoded while remaining tightly coupled to lexical form.

Cite as

Jorge Baptista. Inter-Sentential Relations in Enhanced Lexicalized Meaning Representation (E-LMR). In 15th Symposium on Languages, Applications and Technologies (SLATE 2026). Open Access Series in Informatics (OASIcs), Volume 144, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{baptista:OASIcs.SLATE.2026.5,
  author =	{Baptista, Jorge},
  title =	{{Inter-Sentential Relations in Enhanced Lexicalized Meaning Representation (E-LMR)}},
  booktitle =	{15th Symposium on Languages, Applications and Technologies (SLATE 2026)},
  pages =	{5:1--5:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-440-6},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{144},
  editor =	{Batista, Fernando and Ribeiro, Eug\'{e}nio and Ribeiro, Ricardo and Santos, Andr\'{e} L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2026.5},
  URN =		{urn:nbn:de:0030-drops-267030},
  doi =		{10.4230/OASIcs.SLATE.2026.5},
  annote =	{Keywords: Inter-sentential relations, Lexicalized Meaning Representation, Nominal and Pronominal Anaphora Resolution, Temporal Anaphora Resolution, Inter-Sentential Cohesion Devices, European Portuguese}
}

@InProceedings{baptista:OASIcs.SLATE.2026.5,
  author =	{Baptista, Jorge},
  title =	{{Inter-Sentential Relations in Enhanced Lexicalized Meaning Representation (E-LMR)}},
  booktitle =	{15th Symposium on Languages, Applications and Technologies (SLATE 2026)},
  pages =	{5:1--5:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-440-6},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{144},
  editor =	{Batista, Fernando and Ribeiro, Eug\'{e}nio and Ribeiro, Ricardo and Santos, Andr\'{e} L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2026.5},
  URN =		{urn:nbn:de:0030-drops-267030},
  doi =		{10.4230/OASIcs.SLATE.2026.5},
  annote =	{Keywords: Inter-sentential relations, Lexicalized Meaning Representation, Nominal and Pronominal Anaphora Resolution, Temporal Anaphora Resolution, Inter-Sentential Cohesion Devices, European Portuguese}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2025

OASIcs, Volume 135, SLATE 2025, Complete Volume

Authors: Jorge Baptista and José Barateiro

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

OASIcs, Volume 135, SLATE 2025, Complete Volume

Cite as

14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 1-220, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Proceedings{baptista_et_al:OASIcs.SLATE.2025,
  title =	{{OASIcs, Volume 135, SLATE 2025, Complete Volume}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{1--220},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025},
  URN =		{urn:nbn:de:0030-drops-243367},
  doi =		{10.4230/OASIcs.SLATE.2025},
  annote =	{Keywords: OASIcs, Volume 135, SLATE 2025, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2025.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Jorge Baptista and José Barateiro

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 0:i-0:xii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2025.0,
  author =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{0:i--0:xii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.0},
  URN =		{urn:nbn:de:0030-drops-243354},
  doi =		{10.4230/OASIcs.SLATE.2025.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2025.1

From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education

Authors: Miguel Da Corte and Jorge Baptista

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

Accurate text classification and placement remain challenges in U.S. higher education, with traditional automated systems like Accuplacer functioning as "black-box" models with limited assessment transparency. This study evaluates Large Language Models (LLMs) as complementary placement tools by comparing their classification performance against a human-rated gold standard and Accuplacer. A 450-essay corpus was classified using Claude, Gemini, GPT-3.5-turbo, and GPT-4o across four prompting strategies: Zero-shot, Few-shot, Enhanced, and Enhanced+ (definitions with examples). Two classification approaches were tested: (i) a 1-step, 3 class classification task, distinguishing DevEd Level 1, DevEd Level 2, and College-level texts in one single run; and (ii) a 2-step classification task, first separating College vs. Non-College texts before further classifying Non-College texts into DevEd sublevels. The results show that structured prompt refinement improves the precision of LLMs' classification, with Claude Enhanced + achieving 62.22% precision (1 step) and Gemini Enhanced + reaching 69.33% (2 step), both surpassing Accuplacer (58.22%). Gemini and Claude also demonstrated strong correlation with human ratings, with Claude achieving the highest Pearson scores (ρ = 0.75; 1-step, ρ = 0.73; 2-step) vs. Accuplacer (ρ = 0.67). While LLMs show promise for DevEd placement, their precision remains a work in progress, highlighting the need for further refinement and safeguards to ensure ethical and equitable placement.

Cite as

Miguel Da Corte and Jorge Baptista. From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education. In 14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 1:1-1:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{dacorte_et_al:OASIcs.SLATE.2025.1,
  author =	{Da Corte, Miguel and Baptista, Jorge},
  title =	{{From Prediction to Precision: Leveraging LLMs for Equitable and Data-Driven Writing Placement in Developmental Education}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{1:1--1:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.1},
  URN =		{urn:nbn:de:0030-drops-236817},
  doi =		{10.4230/OASIcs.SLATE.2025.1},
  annote =	{Keywords: Large Language Models (LLMs), Developmental Education (DevEd), writing assessment, text classification, English writing proficiency}
}

Document

DOI: 10.4230/OASIcs.SLATE.2025.6

Beyond the Score: Exploring the Intersection Between Sociodemographics and Linguistic Features in English (L1) Writing Placement

Authors: Miguel Da Corte and Jorge Baptista

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

This study examines the intersection of sociodemographic characteristics, linguistic features, and writing placement outcomes at a community college in the United States of America. It focuses on 210 anonymized writing samples from native English speakers (L1) that were automatically classified by Accuplacer and independently assessed by two trained raters. Disparities across gender and race using 40 top-ranked linguistic features selected from Coh-Metrix, CTAP, and Developmental Education-Specific (DES) sets were analyzed. Three statistical tests were used: one-way ANOVA, Tukey’s HSD, and Chi-square. ANOVA results showed racial differences in nine linguistic features, especially those tied to syntactic complexity, discourse markers, and lexical precision. Gender differences were more limited, with only one feature reaching significance (Positive Connectives, p = 0.007). Tukey’s HSD pairwise tests showed no significant gender group variation but revealed sensitivity in DES features when comparing racial groups. Chi-square analysis indicated no significant association between gender and placement outcomes but suggested a possible link between race and human-assigned levels (χ² = 9.588, p = 0.048). These findings suggest that while automated systems assess general writing skills, human-devised linguistic features and demographic insights can support more equitable placement practices for all students entering college-level programs.

Cite as

Miguel Da Corte and Jorge Baptista. Beyond the Score: Exploring the Intersection Between Sociodemographics and Linguistic Features in English (L1) Writing Placement. In 14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{dacorte_et_al:OASIcs.SLATE.2025.6,
  author =	{Da Corte, Miguel and Baptista, Jorge},
  title =	{{Beyond the Score: Exploring the Intersection Between Sociodemographics and Linguistic Features in English (L1) Writing Placement}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{6:1--6:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.6},
  URN =		{urn:nbn:de:0030-drops-236861},
  doi =		{10.4230/OASIcs.SLATE.2025.6},
  annote =	{Keywords: Developmental Education (DevEd), sociolinguistic variation, text classification, Machine Learning, placement equity}
}

Document

DOI: 10.4230/OASIcs.SLATE.2025.9

Semantic Representation of Adverbs in the Lexicalized Meaning Representation (LMR) Framework

Authors: Jorge Baptista, Izabela Müller, and Sónia Reis

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

Semantic parsing serves as a crucial interface between natural language and formal meaning representations, enabling computational systems to capture the underlying semantic structure of linguistic expressions. This paper addresses a relatively understudied area in both linguistic theory and natural language processing: the semantic representation of adverbs. We conduct a comparative analysis of annotation guidelines and practices across two semantic representation frameworks: Lexicalized Meaning Representation (LMR), applied to the European Portuguese edition of the novella "O Principezinho" by Antoine de Saint-Exupéry (1943); and Abstract Meaning Representation (AMR), applied to the Brazilian Portuguese edition, "O Pequeno Príncipe". The study reveals significant limitations in AMR’s handling of adverbial constructions, particularly when assessed against contemporary syntactic-semantic advances in linguistic theory. Furthermore, it highlights the theoretical and practical challenges that LMR continues to face in this domain.

Cite as

Jorge Baptista, Izabela Müller, and Sónia Reis. Semantic Representation of Adverbs in the Lexicalized Meaning Representation (LMR) Framework. In 14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2025.9,
  author =	{Baptista, Jorge and M\"{u}ller, Izabela and Reis, S\'{o}nia},
  title =	{{Semantic Representation of Adverbs in the Lexicalized Meaning Representation (LMR) Framework}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{9:1--9:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.9},
  URN =		{urn:nbn:de:0030-drops-236891},
  doi =		{10.4230/OASIcs.SLATE.2025.9},
  annote =	{Keywords: Semantic representation, Adverbs, Lexicalized Meaning Representation (LMR), Abstract Meaning Representation (AMR), Annotation guidelines, European Portuguese, Brazilian Portuguese, Comparative analysis, The Little Prince, Corpus linguistics, Natural Language Processing (NLP), Multi-word expressions, Syntactic-semantic interface, Linguistic theory}
}

@InProceedings{baptista_et_al:OASIcs.SLATE.2025.9,
  author =	{Baptista, Jorge and M\"{u}ller, Izabela and Reis, S\'{o}nia},
  title =	{{Semantic Representation of Adverbs in the Lexicalized Meaning Representation (LMR) Framework}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{9:1--9:18},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.9},
  URN =		{urn:nbn:de:0030-drops-236891},
  doi =		{10.4230/OASIcs.SLATE.2025.9},
  annote =	{Keywords: Semantic representation, Adverbs, Lexicalized Meaning Representation (LMR), Abstract Meaning Representation (AMR), Annotation guidelines, European Portuguese, Brazilian Portuguese, Comparative analysis, The Little Prince, Corpus linguistics, Natural Language Processing (NLP), Multi-word expressions, Syntactic-semantic interface, Linguistic theory}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.2

Automatic Classification of Portuguese Proverbs

Authors: Jorge Baptista and Sónia Reis

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

In this paper, natural language processing (NLP) and machine learning methods and tools are applied to the task of topic (thematic or semantic) classification of Portuguese proverbs. This is a difficult task since proverbs are usually very short sentences. Such classification should allow an easier selection of the most relevant proverbs for a given situation, considering their context in discourse or within a text. For that, we used, on the one hand, a collection of +32,000 proverbial expressions organized "thematically" into a large set of previously attributed topics (+2,200) and, on the other hand, the Orange data mining toolkit, along with the NLP and machine learning tools it provides. Since the classification provided in the collection of proverbs is, for the most part, based only on a keyword in the body of the proverbs, 2 experiments were set up, to determine the feasibility of the task with a modicum of effort and the most promising configurations applicable. Different sample sizes, 100 and 50 proverbs randomly selected per topic, corresponding to Scenario 1 and 2, respectively, were contrasted; several preprocessing strategies were explored, and different data representation methods tested against several learning algorithms. Results show that Neural Networks is the best performing model, achieving the best classification accuracy of 70% and 61%, in the two different experimental scenarios, Scenario 1 and 2, respectively. Some of the inaccurate classification cases seem to indicate that the machine learning approach can sometimes do a better job than a human classifier, especially considering the manual attribution of the topics by the collection’s author, the sheer number of topics involved, and the very unbalanced distribution of proverbs per topic. Based on the results achieved, the paper presents some proposals for future work to cope with such difficulties.

Cite as

Jorge Baptista and Sónia Reis. Automatic Classification of Portuguese Proverbs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 2:1-2:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2022.2,
  author =	{Baptista, Jorge and Reis, S\'{o}nia},
  title =	{{Automatic Classification of Portuguese Proverbs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{2:1--2:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.2},
  URN =		{urn:nbn:de:0030-drops-167480},
  doi =		{10.4230/OASIcs.SLATE.2022.2},
  annote =	{Keywords: Portuguese Proverbs, Automatic Topic Classification, Machine Learning}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.11

Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese

Authors: Jorge Baptista and Nuno Mamede

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

This paper reports on-going work on building a rule-based grammar for (European) Portuguese, incorporating support verb constructions (SVC). The paper focuses on parsing sentences resulting from syntactic transformations of SVC, and presents a methodology to automatically generate testing examples directly from the SVC Lexicon-Grammar matrix where their linguistic properties are represented. These examples allow both to improve the linguistic description of these constructions and to test intrinsically the system parser, spotting unforeseen issues due to previous natural language processing steps.

Cite as

Jorge Baptista and Nuno Mamede. Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2020.11,
  author =	{Baptista, Jorge and Mamede, Nuno},
  title =	{{Syntactic Transformations in Rule-Based Parsing of Support Verb Constructions: Examples from European Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.11},
  URN =		{urn:nbn:de:0030-drops-130245},
  doi =		{10.4230/OASIcs.SLATE.2020.11},
  annote =	{Keywords: Support verb constructions, Rule-based parsing, syntactic transformations, language resources, European Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.22

Vocatives in Portuguese: Identification and Processing

Authors: Jorge Baptista and Nuno Mamede

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

This paper describes the most salient linguistic aspects of vocative constructions in Portuguese, with special reference to its European variety. Next, the paper presents the strategy followed for implementing this linguistic knowledge in a computational grammar of Portuguese, developed for the natural language processing chain STRING and using the XIP rule-based parser. Very precise and detailed linguistic descriptions can be implemented in this way.

Cite as

Jorge Baptista and Nuno Mamede. Vocatives in Portuguese: Identification and Processing. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2017.22,
  author =	{Baptista, Jorge and Mamede, Nuno},
  title =	{{Vocatives in Portuguese: Identification and Processing}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{22:1--22:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.22},
  URN =		{urn:nbn:de:0030-drops-79555},
  doi =		{10.4230/OASIcs.SLATE.2017.22},
  annote =	{Keywords: Natural Language Processing, Text analysis, Portuguese, Vocative, Parsing}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.225

Automatic Identification of Whole-Part Relations in Portuguese

Authors: Ilia Markov, Nuno Mamede, and Jorge Baptista

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

In this paper, we improve the extraction of semantic relations between textual elements as it is currently performed by STRING, a hybrid statistical and rule-based Natural Language Processing chain for Portuguese, by targeting whole-part relations (meronymy), that is, a semantic relation between an entity that is perceived as a constituent part of another entity, or a member of a set. In this case, we focus on the type of meronymy involving human entities and body-part nouns.

Cite as

Ilia Markov, Nuno Mamede, and Jorge Baptista. Automatic Identification of Whole-Part Relations in Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 225-232, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{markov_et_al:OASIcs.SLATE.2014.225,
  author =	{Markov, Ilia and Mamede, Nuno and Baptista, Jorge},
  title =	{{Automatic Identification of Whole-Part Relations in Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{225--232},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.225},
  URN =		{urn:nbn:de:0030-drops-45723},
  doi =		{10.4230/OASIcs.SLATE.2014.225},
  annote =	{Keywords: whole-part relation, meronymy, body-part noun, disease noun, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.235

Automatic Detection of Proverbs and their Variants

Authors: Amanda P. Rassi, Jorge Baptista, and Oto Vale

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

This article presents the task of automatic detection of proverbs in Brazilian Portuguese, from the intersection of the regular syntactic structure of proverbs and their core elements. We created finite-state automata that enabled us to look for these word combinations in running texts. The rationale behind this method consists in the fact that although proverbs may have a normal sentence structure and often a very commonly used lexicon, their specific word-combinations may enables us to identify them and their variants irrespective of the syntactic or structural changes the proverb may undergo. The goal of this task is to gather the largest number of proverbs and their variants. The results showed precision 60.15%.

Cite as

Amanda P. Rassi, Jorge Baptista, and Oto Vale. Automatic Detection of Proverbs and their Variants. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 235-249, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{rassi_et_al:OASIcs.SLATE.2014.235,
  author =	{Rassi, Amanda P. and Baptista, Jorge and Vale, Oto},
  title =	{{Automatic Detection of Proverbs and their Variants}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{235--249},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.235},
  URN =		{urn:nbn:de:0030-drops-45738},
  doi =		{10.4230/OASIcs.SLATE.2014.235},
  annote =	{Keywords: Brazilian Portuguese, proverbs, syntactic structure, core element, variation}
}

Document

DOI: 10.4230/OASIcs.SLATE.2013.271

Syntactic REAP.PT: Exercises on Clitic Pronouning

Authors: Tiago Freitas, Jorge Baptista, and Nuno Mamede

Published in: OASIcs, Volume 29, 2nd Symposium on Languages, Applications and Technologies (2013)

Abstract

The emerging interdisciplinary field of Intelligent Computer Assisted Language Learning (ICALL) aims to integrate the knowledge from computational linguistics into computer-assisted language learning (CALL). REAP.PT is a project emerging from this new field, aiming to teach Portuguese in an innovative and appealing way, and adapted to each student. In this paper, we present a new improvement of the REAP.PT system, consisting in developing new, automatically generated, syntactic exercises. These exercises deal with the complex phenomenon of pronominalization, that is, the substitution of a syntactic constituent with an adequate pronominal form. Though the transformation may seem simple, it involves complex lexical, syntactical and semantic constraints. The issues on pronominalization in Portuguese make it a particularly difficult aspect of language learning for non-native speakers. On the other hand, even native speakers can often be uncertain about the correct clitic positioning, due to the complexity and interaction of competing factors governing this phenomenon. A new architecture for automatic syntactic exercise generation is proposed. It proved invaluable in easing the development of this complex exercise, and is expected to make a relevant step forward in the development of future syntactic exercises, with the potential of becoming a syntactic exercise generation framework. A pioneer feedback system with detailed and automatically generated explanations for each answer is also presented, improving the learning experience, as stated in user comments. The expert evaluation and crowd-sourced testing positive results demonstrated the validity of the present approach.

Cite as

Tiago Freitas, Jorge Baptista, and Nuno Mamede. Syntactic REAP.PT: Exercises on Clitic Pronouning. In 2nd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 29, pp. 271-285, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)

Copy BibTex To Clipboard

@InProceedings{freitas_et_al:OASIcs.SLATE.2013.271,
  author =	{Freitas, Tiago and Baptista, Jorge and Mamede, Nuno},
  title =	{{Syntactic REAP.PT: Exercises on Clitic Pronouning}},
  booktitle =	{2nd Symposium on Languages, Applications and Technologies},
  pages =	{271--285},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-52-1},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{29},
  editor =	{Leal, Jos\'{e} Paulo and Rocha, Ricardo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2013.271},
  URN =		{urn:nbn:de:0030-drops-40433},
  doi =		{10.4230/OASIcs.SLATE.2013.271},
  annote =	{Keywords: Intelligent Computer Assisted Language Learning (ICALL), Portuguese, Syntactic Exercises, Automatic Exercise Generation, Clitic Pronouning}
}

Search Results

Documents authored by Baptista, Jorge

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message