OASIcs, Volume 38

3rd Symposium on Languages, Applications and Technologies



Thumbnail PDF

Event

SLATE 2014, June 19-20, 2014, Bragança, Portugal

Editors

Maria João Varanda Pereira
José Paulo Leal
Alberto Simões

Publication Details

  • published at: 2014-06-18
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-939897-68-2
  • DBLP: db/conf/slate/slate2014

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
OASIcs, Volume 38, SLATE'14, Complete Volume

Authors: Maria João Varanda Pereira, José Paulo Leal, and Alberto Simões


Abstract
OASIcs, Volume 38, SLATE'14, Complete Volume

Cite as

3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@Proceedings{pereira_et_al:OASIcs.SLATE.2014,
  title =	{{OASIcs, Volume 38, SLATE'14, Complete Volume}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014},
  URN =		{urn:nbn:de:0030-drops-45905},
  doi =		{10.4230/OASIcs.SLATE.2014},
  annote =	{Keywords: Programming Languages, Interoperability, Natural Language Processing}
}
Document
Front Matter
Frontmatter, Table of Contents, Preface, Conference Organization

Authors: Maria João Varanda Pereira, José Paulo Leal, and Alberto Simões


Abstract
Frontmatter, Table of Contents, Preface, Conference Organization

Cite as

3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. i-xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2014.i,
  author =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  title =	{{Frontmatter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{i--xvi},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.i},
  URN =		{urn:nbn:de:0030-drops-45537},
  doi =		{10.4230/OASIcs.SLATE.2014.i},
  annote =	{Keywords: Frontmatter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Language-Driven Software Development (Invited Talk)

Authors: José-Luis Sierra


Abstract
Language-driven software development consists in applying computer language design and implementation techniques to build conventional software. The keynote reviews two different language- driven development approaches: domain-specific languages (DLSs), and language-oriented architectures (LOAs). The DSL approach focuses on the provision of languages specialized in different application aspects, which are used by developers, and even by domain experts, during application construction and maintenance. The LOA strategy, in its turn, conceives applications them- selves as coordinated collections of language processors, which can be developed using language implementation tools (parser generators, attribute grammar-based systems, etc.). The presentation of the approaches is supported by case studies from the fields of knowledge-based systems, e-Learning, semi-structured data processing, and digital humanities.

Cite as

José-Luis Sierra. Language-Driven Software Development (Invited Talk). In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 3-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{sierra:OASIcs.SLATE.2014.3,
  author =	{Sierra, Jos\'{e}-Luis},
  title =	{{Language-Driven Software Development}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{3--12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.3},
  URN =		{urn:nbn:de:0030-drops-45542},
  doi =		{10.4230/OASIcs.SLATE.2014.3},
  annote =	{Keywords: domain-specific languages, language-oriented architectures, parser generators, attribute grammars, application domains}
}
Document
Invited Talk
An Overview of Open Information Extraction (Invited Talk)

Authors: Pablo Gamallo


Abstract
Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will introduce the main properties of this extraction method.

Cite as

Pablo Gamallo. An Overview of Open Information Extraction (Invited Talk). In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 13-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{gamallo:OASIcs.SLATE.2014.13,
  author =	{Gamallo, Pablo},
  title =	{{An Overview of Open Information Extraction}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{13--16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.13},
  URN =		{urn:nbn:de:0030-drops-45559},
  doi =		{10.4230/OASIcs.SLATE.2014.13},
  annote =	{Keywords: information extraction, natural language processing}
}
Document
Conclave: Writing Programs to Understand Programs

Authors: Nuno Ramos Carvalho, José João Almeida, Maria João Varanda Pereira, and Pedro Rangel Henriques


Abstract
Software maintainers are often challenged with source code changes to improve software systems, or eliminate defects, in unfamiliar programs. To undertake these tasks a sufficient understanding of the system, or at least a small part of it, is required. One of the most time consuming tasks of this process is locating which parts of the code are responsible for some key functionality or feature. This paper introduces Conclave, an environment for software analysis, that enhances program comprehension activities. Programmers use natural languages to describe and discuss the problem domain, programming languages to write source code, and markup languages to have programs talking with other programs, and so this system has to cope with this heterogeneity of dialects, and provide tools in all these areas to effectively contribute to the understanding process. The source code, the problem domain, and the side effects of running the program are represented in the system using ontologies. A combination of tools (specialized in different kinds of languages) create mappings between the different domains. Conclave provides facilities for feature location, code search, and views of the software that ease the process of understanding the code, devising changes. The underlying feature location technique explores natural language terms used in programs (e.g. function and variable names); using textual analysis and a collection of Natural Language Processing techniques, computes synonymous sets of terms. These sets are used to score relatedness between program elements, and search queries or problem domain concepts, producing sorted ranks of program elements that address the search criteria, or concepts respectively.

Cite as

Nuno Ramos Carvalho, José João Almeida, Maria João Varanda Pereira, and Pedro Rangel Henriques. Conclave: Writing Programs to Understand Programs. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 19-34, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{carvalho_et_al:OASIcs.SLATE.2014.19,
  author =	{Carvalho, Nuno Ramos and Almeida, Jos\'{e} Jo\~{a}o and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Conclave: Writing Programs to Understand Programs}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{19--34},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.19},
  URN =		{urn:nbn:de:0030-drops-45561},
  doi =		{10.4230/OASIcs.SLATE.2014.19},
  annote =	{Keywords: software maintenance, software evolution, program comprehension, feature location, concept location, natural language processing}
}
Document
Leveraging Program Comprehension with Concern-oriented Source Code Projections

Authors: Jaroslav Porubän and Milan Nosál


Abstract
In this paper we briefly introduce our concern-oriented source code projections that enable looking at same source code in multiple different ways. The objective of this paper is to discuss projection creation process in detail and to explain benefits of using projections to aid program comprehension. We achieve this objective by showing a case study that illustrates using projections on examples. Presented case study was done using our prototypical tool that is implemented as a plugin for NetBeans IDE. We briefly introduce the tool and present an experiment that we have conducted with a group of students at our university. The results of the experiment indicate that projections have positive effect on program comprehension.

Cite as

Jaroslav Porubän and Milan Nosál. Leveraging Program Comprehension with Concern-oriented Source Code Projections. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 35-50, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{poruban_et_al:OASIcs.SLATE.2014.35,
  author =	{Porub\"{a}n, Jaroslav and Nos\'{a}l, Milan},
  title =	{{Leveraging Program Comprehension with Concern-oriented Source Code Projections}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{35--50},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.35},
  URN =		{urn:nbn:de:0030-drops-45575},
  doi =		{10.4230/OASIcs.SLATE.2014.35},
  annote =	{Keywords: concern-oriented source code projections, program comprehension, projectional editing, code projections, programming environments}
}
Document
Comment-based Concept Location over System Dependency Graphs

Authors: Nuno Pereira, Maria João Varanda Pereira, and Pedro Rangel Henriques


Abstract
Software maintenance is one of the most expensive phases of software development and understanding a program is one of the most important tasks of software maintenance. Before making the change to the program, software engineers need to find the location, or locations, where the changes will be made, they need to understand the program. Real applications are huge, sometimes old, were written by other person and it is difficult to find the location of the instructions related to a specific problem domain concept. There are various techniques to find these locations minimizing the time spent, but this stage of software development continues to be one of the most expensive and longer. The concept location is a crucial task for program understanding. This paper presents a project whose main objective is to explore and combine two Program Comprehension techniques: visualization of the system dependency graph and concept location over source code comments. The idea is to merge both features in order to perform concept location in system dependency graphs. More than locate a set of hot instructions (based on the associated comments) it will allow to detect the other instructions (the whole method).

Cite as

Nuno Pereira, Maria João Varanda Pereira, and Pedro Rangel Henriques. Comment-based Concept Location over System Dependency Graphs. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 51-58, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2014.51,
  author =	{Pereira, Nuno and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Comment-based Concept Location over System Dependency Graphs}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{51--58},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.51},
  URN =		{urn:nbn:de:0030-drops-45584},
  doi =		{10.4230/OASIcs.SLATE.2014.51},
  annote =	{Keywords: program comprehension, concept location, comment analysis, system dependency graph}
}
Document
ReCooPLa: a DSL for Coordination-based Reconfiguration of Software Architectures

Authors: Flávio Rodrigues, Nuno Oliveira, and Luís S. Barbosa


Abstract
In production environments where change is the rule rather than the exception, adaptation of software plays an important role. Such adaptations presuppose dynamic reconfiguration of the system architecture, however, it is in the static setting (design-phase) that such reconfigurations must be designed and analysed, to preclude erroneous evolutions. Modern software systems, which are built from the coordinated composition of loosely-coupled software components, are naturally adaptable; and coordination specification is, usually, the main reference point to inserting changes in these systems. In this paper, a domain-specific language—-referred to as ReCooPLa--is proposed to design reconfigurations that change the coordination structures, so that they are analysed before being applied in run time. Moreover, a reconfiguration engine is introduced, that takes conveniently translated ReCooPLa specifications and applies them to coordination structures.

Cite as

Flávio Rodrigues, Nuno Oliveira, and Luís S. Barbosa. ReCooPLa: a DSL for Coordination-based Reconfiguration of Software Architectures. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 61-76, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2014.61,
  author =	{Rodrigues, Fl\'{a}vio and Oliveira, Nuno and Barbosa, Lu{\'\i}s S.},
  title =	{{ReCooPLa: a DSL for Coordination-based Reconfiguration of Software Architectures}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{61--76},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.61},
  URN =		{urn:nbn:de:0030-drops-45593},
  doi =		{10.4230/OASIcs.SLATE.2014.61},
  annote =	{Keywords: domain-specific languages, architectural reconfiguration, coordination}
}
Document
A Workflow Description Language to Orchestrate Multi-Lingual Resources

Authors: Rui Brito and José João Almeida


Abstract
Texts aligned alongside their translation, or Parallel Corpora, are a very widely used resource in Computational Linguistics. Processing these resources, however, is a very intensive, time consuming task, which makes it a suitable case study for High Performance Computing (HPC). HPC underwent several recent changes, with the evolution of Heterogeneous Platforms, where multiple devices with different architectures are able to share workload to increase performance. Several frameworks/toolkits have been under development, in various fields, to aid the programmer in extracting more performance from these platforms. Either by dynamically scheduling the workload across the available resources or by exploring the opportunities for parallelism. However, there is no toolkit targeted at Computational Linguistics, more specifically, Parallel Corpora processing. Parallel Corpora processing can be a very time consuming task, and the field could definitely use a toolkit which aids the programmer in achieving not only better performance, but also a convenient and expressive way of specifying tasks and their dependencies.

Cite as

Rui Brito and José João Almeida. A Workflow Description Language to Orchestrate Multi-Lingual Resources. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 77-83, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{brito_et_al:OASIcs.SLATE.2014.77,
  author =	{Brito, Rui and Almeida, Jos\'{e} Jo\~{a}o},
  title =	{{A Workflow Description Language to Orchestrate Multi-Lingual Resources}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{77--83},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.77},
  URN =		{urn:nbn:de:0030-drops-45609},
  doi =		{10.4230/OASIcs.SLATE.2014.77},
  annote =	{Keywords: workflow, orchestration, parallelism, domain specific languages, corpora}
}
Document
Converting Ontologies into DSLs

Authors: João M. Sousa Fonseca, Maria João Varanda Pereira, and Pedro Rangel Henriques


Abstract
This paper presents a project whose main objective is to explore the Ontological-based development of Domain Specific Languages (DSL), more precisely, of their underlying Grammar. After reviewing the basic concepts characterizing Ontologies and Domain-Specific Languages, we introduce a tool, Onto2Gra, that takes profit of the knowledge described by the ontology and automatically generates a grammar for a DSL that allows to discourse about the domain described by that ontology. This approach represents a rigorous method to create, in a secure and effective way, a grammar for a new specialized language restricted to a concrete domain. The usual process of creating a grammar from the scratch is, as every creative action, difficult, slow and error prone; so this proposal is, from a Grammar Engineering point of view, of uttermost importance. After the grammar generation phase, the Grammar Engineer can manipulate it to add syntactic sugar to improve the final language quality or even to add semantic actions. The Onto2Gra project is composed of three engines. The main one is OWL2DSL, the component that converts an OWL ontology into an attribute grammar. The two additional modules are Onto2OWL, converts ontologies written in OntoDL (a light-weight DSL to describe ontologies) into standard OWL, and DDesc2OWL, converts domain instances written in the DSL generated by OWL2DSL into the initial OWL ontology.

Cite as

João M. Sousa Fonseca, Maria João Varanda Pereira, and Pedro Rangel Henriques. Converting Ontologies into DSLs. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 85-92, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{fonseca_et_al:OASIcs.SLATE.2014.85,
  author =	{Fonseca, Jo\~{a}o M. Sousa and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Converting Ontologies into DSLs}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{85--92},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.85},
  URN =		{urn:nbn:de:0030-drops-45611},
  doi =		{10.4230/OASIcs.SLATE.2014.85},
  annote =	{Keywords: ontology, OWL, RDF, languages, DSL, grammar}
}
Document
JSON on Mobile: is there an Efficient Parser?

Authors: Ricardo Queirós


Abstract
The two largest causes for battery consumption on mobile devices are related with the display and network operations. Since most application need to share data and communicate with remote servers, communications should be as lightweight and efficient as possible. In network communication, serialization plays a central role as the process of converting an object into a stream of bytes. One of the most popular data-interchange format is JSON (JavaScript Object Notation). This paper presents a survey on JSON parsers in mobile scenarios. The aim of the survey is to find the most efficient JSON parser in mobile communications characterised by high transfer rate of small amounts of data. In the performance benchmark we compare the time required to read and write data with several popular JSON parser implementations such as Gson, Jackson, org.json and others. The results of this survey are important for others that need to select an efficient parser for mobile communication.

Cite as

Ricardo Queirós. JSON on Mobile: is there an Efficient Parser?. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 93-100, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{queiros:OASIcs.SLATE.2014.93,
  author =	{Queir\'{o}s, Ricardo},
  title =	{{JSON on Mobile: is there an Efficient Parser?}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{93--100},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.93},
  URN =		{urn:nbn:de:0030-drops-45629},
  doi =		{10.4230/OASIcs.SLATE.2014.93},
  annote =	{Keywords: serialization formats, mobile communication}
}
Document
Unfuzzying Fuzzy Parsing

Authors: Pedro Carvalho, Nuno Oliveira, and Pedro Rangel Henriques


Abstract
Traditional parsing has always been a focus of discussion among the computer science community. Numerous techniques and algorithms have been proposed along these years, but they require that input texts are correct according to a specific grammar. However, in some cases it's necessary to cope with incorrect or unpredicted inputs that raise ambiguities, making traditional parsing unsuitable. These situations led to the emergence of robust parsing theories, where fuzzy parsing gains relevance. Robust parsing comes with a price by losing precision and decaying performance, as multiple parses of the input may be necessary while looking for an optimal one. In this short paper we briefly describe the main robust parsing techniques and end up proposing a different solution to deal with fuzziness of input texts. It is based on automata where states represent contexts and edges represent potential matches (of constructs of interest) inside those contexts. It is expected that such an approach reduces recognition time and ambiguity as contexts reduce the search space by defining a smaller domain for constructs of interest. Such benefits may be a great addition to the robust parsing area with application on program comprehension, among other research fields.

Cite as

Pedro Carvalho, Nuno Oliveira, and Pedro Rangel Henriques. Unfuzzying Fuzzy Parsing. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 101-108, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{carvalho_et_al:OASIcs.SLATE.2014.101,
  author =	{Carvalho, Pedro and Oliveira, Nuno and Henriques, Pedro Rangel},
  title =	{{Unfuzzying Fuzzy Parsing}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{101--108},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.101},
  URN =		{urn:nbn:de:0030-drops-45637},
  doi =		{10.4230/OASIcs.SLATE.2014.101},
  annote =	{Keywords: robust parsing, fuzzy parsing, automata}
}
Document
Contract-Java: Design by Contract in Java with Safe Error Handling

Authors: Miguel Oliveira e Silva and Pedro G. Francisco


Abstract
Design by Contract (DbC) is a programming methodology in which the meaning of program entities, such as methods and classes, is made explicit by the use of programming predicates named assertions. A false assertion is always a manifestation of an incorrect program. This simple founding idea, when properly applied, give programmers a tool able to specify, test, debug, document programs, as well as a mechanism to construct a simple, safe and sane error handling mechanism. Nevertheless, although well adapted to object-oriented programming (and other popular techniques such as unit testing), DbC still has a very low practical acceptance and application. We believe that one of the main reasons for such is the lack of a proper support for it in many programming languages currently in use (such as Java). A complete support for DbC requires not only the ability to specify assertions; but also the necessity to distinguish different kinds of assertions, depending of what is being asserted; a proper integration in object-oriented programming; and, finally, a coherent connection with error handling mechanisms. It is in this last requirement that existing tools that extend Java with DbC mechanisms completely fail to properly, and coherently, integrate DbC within Java programming. The dominant practices for systematically handling failures in programming languages are not DbC based, using instead a defensive programming approach, either by using normal languages mechanisms (as in programming language C) or by the use of typed exceptions in try/catch based exception mechanisms. In this article, we will present and justify the requirements posed on programming languages for a complete support for DbC; On the context of the last presented requirement – error handling – defensive programming will be discussed and criticized; It will be showed that, unlike Eiffel's original DbC error handling, existing typed exceptions in try/catch based exception mechanisms are not well adapted to algorithmic abstraction provided by methods; Finally, a new DbC Java extension named Contract-Java will be presented and it will be showed that it is coherently integrated both with Java existing mechanisms and DbC. It will be presented an innovative Contract-Java extension to DbC that automatically generates debugging information for (non-rescued) contract failures, that we believe further enhances the DbC debugging capabilities.

Cite as

Miguel Oliveira e Silva and Pedro G. Francisco. Contract-Java: Design by Contract in Java with Safe Error Handling. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 111-126, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{oliveiraesilva_et_al:OASIcs.SLATE.2014.111,
  author =	{Oliveira e Silva, Miguel and Francisco, Pedro G.},
  title =	{{Contract-Java: Design by Contract in Java with Safe Error Handling}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{111--126},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.111},
  URN =		{urn:nbn:de:0030-drops-45641},
  doi =		{10.4230/OASIcs.SLATE.2014.111},
  annote =	{Keywords: design by contract, defensive programming, exceptions, Java, contract-Java}
}
Document
Implementing Python for DrRacket

Authors: Pedro Palma Ramos and António Menezes Leitão


Abstract
The Python programming language is becoming increasingly popular in a variety of areas, most notably among novice programmers. On the other hand, Racket and other Scheme dialects are considered excellent vehicles for introducing Computer Science concepts. This paper presents an implementation of Python for Racket and the DrRacket IDE. This allows Python programmers to use Racket libraries and vice versa, as well as using DrRacket's pedagogic features. In particular, it allows architects and designers to use Python as a front-end programming language for Rosetta, an IDE for computer-aided design, whose modelling primitives are defined in Racket. Our proposed solution involves compiling Python code into equivalent Racket source code. For the runtime implementation, we present two different strategies: (1) using a foreign function interface to borrow the data types and primitives from Python's virtual machine or (2) implementing Python's data model over Racket data types. While the first strategy is easily implemented and provides immediate support for Python's standard library and existing third-party libraries, it suffers from performance issues: it runs, at least, one order of magnitude slower when compared to Python’s reference implementation. The second strategy requires us to implement Python's data model in Racket and port all of Python's standard library, but it succeeds in solving the former's performance issues. Furthermore, it makes interoperability between Python and Racket code easier to implement and simpler to use.

Cite as

Pedro Palma Ramos and António Menezes Leitão. Implementing Python for DrRacket. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 127-141, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{ramos_et_al:OASIcs.SLATE.2014.127,
  author =	{Ramos, Pedro Palma and Leit\~{a}o, Ant\'{o}nio Menezes},
  title =	{{Implementing Python for DrRacket}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{127--141},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.127},
  URN =		{urn:nbn:de:0030-drops-45656},
  doi =		{10.4230/OASIcs.SLATE.2014.127},
  annote =	{Keywords: Python, Racket, language implementations, compilers}
}
Document
Plagiarism Detection: A Tool Survey and Comparison

Authors: Vítor T. Martins, Daniela Fonte, Pedro Rangel Henriques, and Daniela da Cruz


Abstract
We illustrate the state of the art in software plagiarism detection tools by comparing their features and testing them against a wide range of source codes. The source codes were edited according to several types of plagiarism to show the tools accuracy at detecting each type. The decision to focus our research on plagiarism of programming languages is two fold: on one hand, it is a challenging case-study since programming languages impose a structured writing style; on the other hand, we are looking for the integration of such a tool in an Automatic-Grading System (AGS) developed to support teachers in the context of Programming courses. Besides the systematic characterisation of the underlying problem domain, the tools were surveyed with the objective of identifying the most successful approach in order to design the aimed plugin for our AGS.

Cite as

Vítor T. Martins, Daniela Fonte, Pedro Rangel Henriques, and Daniela da Cruz. Plagiarism Detection: A Tool Survey and Comparison. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 143-158, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{martins_et_al:OASIcs.SLATE.2014.143,
  author =	{Martins, V{\'\i}tor T. and Fonte, Daniela and Henriques, Pedro Rangel and da Cruz, Daniela},
  title =	{{Plagiarism Detection: A Tool Survey and Comparison}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{143--158},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.143},
  URN =		{urn:nbn:de:0030-drops-45667},
  doi =		{10.4230/OASIcs.SLATE.2014.143},
  annote =	{Keywords: software, plagiarism, detection, comparison, test}
}
Document
Target Code Selection by Tilling AST with the Use of Tree Pattern Pushdown Automaton

Authors: Jan Janousek and Jaroslav Málek


Abstract
A new and simple method for target code selection by tilling an abstract syntax tree is presented. As it is usual, tree patterns corresponding to target machine instructions are matched in the abstract syntax tree. Matching tree patterns is performed with the use of tree pattern pushdown automaton, which accepts all tree patterns matching the abstract syntax tree in the linear postfix bar notation and represents a full index of the abstract syntax tree for tree patterns. The use of the index allows to match patterns quickly, in time depending on the size of patterns and not depending on the size of the tree. The selection of a particular target instruction corresponds to a modification of the abstract syntax tree and also a corresponding incremental modification of the index is performed. A reference to a fully functional prototype is provided.

Cite as

Jan Janousek and Jaroslav Málek. Target Code Selection by Tilling AST with the Use of Tree Pattern Pushdown Automaton. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 159-165, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{janousek_et_al:OASIcs.SLATE.2014.159,
  author =	{Janousek, Jan and M\'{a}lek, Jaroslav},
  title =	{{Target Code Selection by Tilling AST with the Use of Tree Pattern Pushdown Automaton}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{159--165},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.159},
  URN =		{urn:nbn:de:0030-drops-45670},
  doi =		{10.4230/OASIcs.SLATE.2014.159},
  annote =	{Keywords: code generation, abstract syntax tree, indexing, tree pattern matching, pushdown automata}
}
Document
Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource

Authors: Hugo Gonçalo Oliveira, António Paulo Santos, and Paulo Gomes


Abstract
This article describes work towards the automatic creation of a conceptual polarity lexicon for Portuguese. For this purpose, we take advantage of a polarity lexicon based on single lemmas to assign polarities to the synsets of a wordnet-like resource. We assume that each synset has the polarity of the majority of its lemmas, given by the initial lexicon. After that, polarity is propagated to other synsets, through different types of semantic relations. The relation types used were selected after manual evaluation. The main result of this work is a lexicon with more than 10,000 synsets with an assigned polarity, with accuracy of 70% or 79%, depending on the human evaluator. For Portuguese, this is the first synset-based polarity lexicon we are aware of. In addition to this contribution, the presented approach can be applied to create similar resources for other languages.

Cite as

Hugo Gonçalo Oliveira, António Paulo Santos, and Paulo Gomes. Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 169-184, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2014.169,
  author =	{Gon\c{c}alo Oliveira, Hugo and Santos, Ant\'{o}nio Paulo and Gomes, Paulo},
  title =	{{Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{169--184},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.169},
  URN =		{urn:nbn:de:0030-drops-45689},
  doi =		{10.4230/OASIcs.SLATE.2014.169},
  annote =	{Keywords: sentiment analysis, polarity, lexicon, wordnet, Portuguese}
}
Document
Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends

Authors: Hugo Rosa, João Paulo Carvalho, and Fernando Batista


Abstract
In this paper we propose to approach the subject of Twitter Topic Detection when in the presence of a large number of trending topics. We use a new technique, called Twitter Topic Fuzzy Fingerprints, and compare it with two popular text classification techniques, Support Vector Machines (SVM) and k-Nearest Neighbours (kNN). Preliminary results show that it outperforms the other two techniques, while still being much faster, which is an essential feature when processing large volumes of streaming data. We focused on a data set of Portuguese language tweets and the respective top trends as indicated by Twitter.

Cite as

Hugo Rosa, João Paulo Carvalho, and Fernando Batista. Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 185-199, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rosa_et_al:OASIcs.SLATE.2014.185,
  author =	{Rosa, Hugo and Carvalho, Jo\~{a}o Paulo and Batista, Fernando},
  title =	{{Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{185--199},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.185},
  URN =		{urn:nbn:de:0030-drops-45696},
  doi =		{10.4230/OASIcs.SLATE.2014.185},
  annote =	{Keywords: topic detection, social networks data mining, Twitter, Portuguese language}
}
Document
Multiscale Parameter Tuning of a Semantic Relatedness Algorithm

Authors: José Paulo Leal and Teresa Costa


Abstract
The research presented in this paper builds on previous work that lead to the definition of a family of semantic relatedness algorithms that compute a proximity given as input a pair of concept labels. The algorithms depends on a semantic graph, provided as RDF data, and on a particular set of weights assigned to the properties of RDF statements (types of arcs in the RDF graph). The current research objective is to automatically tune the weights for a given graph in order to increase the proximity quality. The quality of a semantic relatedness method is usually measured against a benchmark data set. The results produced by the method are compared with those on the benchmark using the Spearman's rank coefficient. This methodology works the other way round and uses this coefficient to tune the proximity weights. The tuning process is controlled by a genetic algorithm using the Spearman's rank coefficient as the fitness function. The genetic algorithm has its own set of parameters which also need to be tuned. Bootstrapping is based on a statistical method for generating samples that is used in this methodology to enable a large number of repetitions of the genetic algorithm, exploring the results of alternative parameter settings. This approach raises several technical challenges due to its computational complexity. This paper provides details on the techniques used to speedup this process. The proposed approach was validated with the WordNet 2.0 and the WordSim-353 data set. Several ranges of parameters values were tested and the obtained results are better than the state of the art methods for computing semantic relatedness using the WordNet 2.0, with the advantage of not requiring any domain knowledge of the ontological graph.

Cite as

José Paulo Leal and Teresa Costa. Multiscale Parameter Tuning of a Semantic Relatedness Algorithm. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 201-213, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{leal_et_al:OASIcs.SLATE.2014.201,
  author =	{Leal, Jos\'{e} Paulo and Costa, Teresa},
  title =	{{Multiscale Parameter Tuning of a Semantic Relatedness Algorithm}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{201--213},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.201},
  URN =		{urn:nbn:de:0030-drops-45702},
  doi =		{10.4230/OASIcs.SLATE.2014.201},
  annote =	{Keywords: semantic similarity, linked data, genetic algorithms, bootstrapping, WordNet}
}
Document
Rocchio's Model Based on Vector Space Basis Change for Pseudo Relevance Feedback

Authors: Rabeb Mbarek, Mohamed Tmar, and Hawete Hattab


Abstract
Rocchio's relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The main problem with this method is that the relevant and the irrelevant documents overlap in the vector space because they often share same terms (at least the terms of the query). With respect to the initial vector space basis (index terms), it is difficult to select terms that separate relevant and irrelevant documents. The Vector Space Basis Change is used to separate relevant and irrelevant documents without any modification on the query term weights. In this paper, first, we study how to incorporate Vector Space Basis Change into the Rocchio's model. Second, we propose Rocchio's models based on Vector Space Basis Change, called VSBCRoc models. Experimental results on a TREC collection show that our proposed models are effective.

Cite as

Rabeb Mbarek, Mohamed Tmar, and Hawete Hattab. Rocchio's Model Based on Vector Space Basis Change for Pseudo Relevance Feedback. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 215-224, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{mbarek_et_al:OASIcs.SLATE.2014.215,
  author =	{Mbarek, Rabeb and Tmar, Mohamed and Hattab, Hawete},
  title =	{{Rocchio's Model Based on Vector Space Basis Change for Pseudo Relevance Feedback}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{215--224},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.215},
  URN =		{urn:nbn:de:0030-drops-45713},
  doi =		{10.4230/OASIcs.SLATE.2014.215},
  annote =	{Keywords: Rocchio model, vector space basis change, pseudo relevance feedback}
}
Document
Automatic Identification of Whole-Part Relations in Portuguese

Authors: Ilia Markov, Nuno Mamede, and Jorge Baptista


Abstract
In this paper, we improve the extraction of semantic relations between textual elements as it is currently performed by STRING, a hybrid statistical and rule-based Natural Language Processing chain for Portuguese, by targeting whole-part relations (meronymy), that is, a semantic relation between an entity that is perceived as a constituent part of another entity, or a member of a set. In this case, we focus on the type of meronymy involving human entities and body-part nouns.

Cite as

Ilia Markov, Nuno Mamede, and Jorge Baptista. Automatic Identification of Whole-Part Relations in Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 225-232, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{markov_et_al:OASIcs.SLATE.2014.225,
  author =	{Markov, Ilia and Mamede, Nuno and Baptista, Jorge},
  title =	{{Automatic Identification of Whole-Part Relations in Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{225--232},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.225},
  URN =		{urn:nbn:de:0030-drops-45723},
  doi =		{10.4230/OASIcs.SLATE.2014.225},
  annote =	{Keywords: whole-part relation, meronymy, body-part noun, disease noun, Portuguese}
}
Document
Automatic Detection of Proverbs and their Variants

Authors: Amanda P. Rassi, Jorge Baptista, and Oto Vale


Abstract
This article presents the task of automatic detection of proverbs in Brazilian Portuguese, from the intersection of the regular syntactic structure of proverbs and their core elements. We created finite-state automata that enabled us to look for these word combinations in running texts. The rationale behind this method consists in the fact that although proverbs may have a normal sentence structure and often a very commonly used lexicon, their specific word-combinations may enables us to identify them and their variants irrespective of the syntactic or structural changes the proverb may undergo. The goal of this task is to gather the largest number of proverbs and their variants. The results showed precision 60.15%.

Cite as

Amanda P. Rassi, Jorge Baptista, and Oto Vale. Automatic Detection of Proverbs and their Variants. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 235-249, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rassi_et_al:OASIcs.SLATE.2014.235,
  author =	{Rassi, Amanda P. and Baptista, Jorge and Vale, Oto},
  title =	{{Automatic Detection of Proverbs and their Variants}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{235--249},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.235},
  URN =		{urn:nbn:de:0030-drops-45738},
  doi =		{10.4230/OASIcs.SLATE.2014.235},
  annote =	{Keywords: Brazilian Portuguese, proverbs, syntactic structure, core element, variation}
}
Document
Language Identification: a Neural Network Approach

Authors: Alberto Simões, José João Almeida, and Simon D. Byers


Abstract
One of the first tasks when building a Natural Language application is the detection of the used language in order to adapt the system to that language. This task has been addressed several times. Nevertheless most of these attempts were performed a long time ago when the amount of computer data and the computational power were limited. In this article we analyze and explain the use of a neural network for language identification, where features can be extracted automatically, and therefore, easy to adapt to new languages. In our experiments we got some surprises, namely with the two Chinese variants, whose forced us for some language-dependent tweaking of the neural network. At the end, the network had a precision of 95%, only failing for the Portuguese language.

Cite as

Alberto Simões, José João Almeida, and Simon D. Byers. Language Identification: a Neural Network Approach. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 251-265, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{simoes_et_al:OASIcs.SLATE.2014.251,
  author =	{Sim\~{o}es, Alberto and Almeida, Jos\'{e} Jo\~{a}o and Byers, Simon D.},
  title =	{{Language Identification: a Neural Network Approach}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{251--265},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.251},
  URN =		{urn:nbn:de:0030-drops-45749},
  doi =		{10.4230/OASIcs.SLATE.2014.251},
  annote =	{Keywords: language identification, neural networks, language models, trigrams}
}
Document
LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese

Authors: Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes


Abstract
Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.

Cite as

Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 267-274, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2014.267,
  author =	{Rodrigues, Ricardo and Gon\c{c}alo Oliveira, Hugo and Gomes, Paulo},
  title =	{{LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{267--274},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.267},
  URN =		{urn:nbn:de:0030-drops-45753},
  doi =		{10.4230/OASIcs.SLATE.2014.267},
  annote =	{Keywords: lemmatization, normalization, rules, lexicon}
}
Document
Expanding a Database of Portuguese Tweets

Authors: Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz


Abstract
This paper describes an existing database of geolocated tweets that were produced in Portuguese regions and proposes an approach to further expand it. The existing database covers eight consecutive days of collected tweets, totaling about 300 thousand tweets, produced by about 11 thousand different users. A detailed analysis on the content of the messages suggests a predominance of young authors that use Twitter as a way of reaching their colleagues with their feelings, ideas and comments. In order to further characterize this community of young people, we propose a method for retrieving additional tweets produced by the same set of authors already in the database. Our goal is to further extend the knowledge about each user of this community, making it possible to automatically characterize each user by the content he/she produces, cluster users and open other possibilities in the scope of social analysis.

Cite as

Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz. Expanding a Database of Portuguese Tweets. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 275-282, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{brogueira_et_al:OASIcs.SLATE.2014.275,
  author =	{Brogueira, Gaspar and Batista, Fernando and Carvalho, Jo\~{a}o Paulo and Moniz, Helena},
  title =	{{Expanding a Database of Portuguese Tweets}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{275--282},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.275},
  URN =		{urn:nbn:de:0030-drops-45763},
  doi =		{10.4230/OASIcs.SLATE.2014.275},
  annote =	{Keywords: Twitter, corpus of Portuguese tweets, Twitter API, natural language processing, text analysis}
}
Document
MLT-prealigner: a Tool for Multilingual Text Alignment

Authors: Pedro Carvalho and José João Almeida


Abstract
Parallel text alignment is a key procedure in the automated translation area. A large number of aligners have been presented along the years, but these require that the target resources have been pre-prepared for alignment (either manually or automatically). It is rather normal to encounter mixed language documents, that is, documents where the same information is written in many languages (Ex: manuals of electronic devices, touristic information, PhD thesis with dual language abstracts, etc). In this article we present MLT-prealigner: a tool aimed at helping those that need to process mixed texts in order to feed alignment tools and other related language systems.

Cite as

Pedro Carvalho and José João Almeida. MLT-prealigner: a Tool for Multilingual Text Alignment. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 283-290, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{carvalho_et_al:OASIcs.SLATE.2014.283,
  author =	{Carvalho, Pedro and Almeida, Jos\'{e} Jo\~{a}o},
  title =	{{MLT-prealigner: a Tool for Multilingual Text Alignment}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{283--290},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.283},
  URN =		{urn:nbn:de:0030-drops-45776},
  doi =		{10.4230/OASIcs.SLATE.2014.283},
  annote =	{Keywords: parallel corpora, multilingual text alignment, language detection, Perl, automated translation}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail