DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2022.5

Synthetic Data Generation from JSON Schemas

Authors: Hugo André Coelho Cardoso and José Carlos Ramalho

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

This document describes the steps taken in the development of DataGen From Schemas. This new version of DataGen is an application that makes it possible to automatically generate representative synthetic datasets from JSON and XML schemas, in order to facilitate tasks such as the thorough testing of software applications and scientific endeavors in relevant areas, namely Data Science. This paper focuses solely on the JSON Schema component of the application. DataGen’s prior version is an online open-source application that allows the quick prototyping of datasets through its own Domain Specific Language (DSL) of specification of data models. DataGen is able to parse these models and generate synthetic datasets according to the structural and semantic restrictions stipulated, automating the whole process of data generation with spontaneous values created in runtime and/or from a library of support datasets. The objective of this new product, DataGen From Schemas, is to expand DataGen’s use cases and raise the datasets specification’s abstraction level, making it possible to generate synthetic datasets directly from schemas. This new platform builds upon its prior version and acts as its complement, operating jointly and sharing the same data layer, in order to assure the compatibility of both platforms and the portability of the created DSL models between them. Its purpose is to parse schema files and generate corresponding DSL models, effectively translating the JSON specification to a DataGen model, then using the original application as a middleware to generate the final datasets.

Cite as

Hugo André Coelho Cardoso and José Carlos Ramalho. Synthetic Data Generation from JSON Schemas. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 5:1-5:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{cardoso_et_al:OASIcs.SLATE.2022.5,
  author =	{Cardoso, Hugo Andr\'{e} Coelho and Ramalho, Jos\'{e} Carlos},
  title =	{{Synthetic Data Generation from JSON Schemas}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{5:1--5:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.5},
  URN =		{urn:nbn:de:0030-drops-167515},
  doi =		{10.4230/OASIcs.SLATE.2022.5},
  annote =	{Keywords: Schemas, JSON, Data Generation, Synthetic Data, DataGen, DSL, Dataset, Grammar, Randomization, Open Source, Data Science, REST API, PEG.js}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.7

EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses

Authors: Sofia Teixeira, José Carlos Ramalho, and Pedro Rangel Henriques

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

This paper describes a project which goal is to analyze and model a complete Virtual stack Machine (VM) environment and build a Web application with a graphical interface to deploy an environment to compile and execute VM programs. The new tool offers two main features: assembles and reports errors in programs written in the assembly language of the Virtual Machine; and animates the execution of the compiled code, displaying the internal state of the VM and providing an interface to control the execution step-by-step. In the paper, after discussing related concepts and works, a proposal to build such a tool, so far called EWVM, will be presented along the architecture drawn. A prototype will be shown, and its impact as an educational tool is argued.

Cite as

Sofia Teixeira, José Carlos Ramalho, and Pedro Rangel Henriques. EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 7:1-7:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{teixeira_et_al:OASIcs.SLATE.2022.7,
  author =	{Teixeira, Sofia and Ramalho, Jos\'{e} Carlos and Henriques, Pedro Rangel},
  title =	{{EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{7:1--7:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.7},
  URN =		{urn:nbn:de:0030-drops-167535},
  doi =		{10.4230/OASIcs.SLATE.2022.7},
  annote =	{Keywords: Virtual Machine, Stack Machine, Assembler, Debugger, Compiler, Code Generation}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.3

Major Minors - Ontological Representation of Minorities by Newspapers

Authors: Paulo Jorge Pereira Martins, Leandro José Abreu Dias Costa, and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

The stigma associated with certain minorities has changed throughout the years, yet there’s no central data repository that enables a concrete tracking of this representation. Published articles on renowned newspapers are a way of determining the public perception on this subject, mainly digital newspapers, being it through the media representation (text and photo illustrations) or user comments. The present paper seeks to showcase a project that attempts to fulfill that shortage of data by providing a repository in the form of an ontology: RDF triplestores composing a semantic database (W3C standards for Semantic Web). This open-source project aims to be a research tool for mapping and studying the representation of minority groups in a Portuguese journalistic context over the course of two decades.

Cite as

Paulo Jorge Pereira Martins, Leandro José Abreu Dias Costa, and José Carlos Ramalho. Major Minors - Ontological Representation of Minorities by Newspapers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{martins_et_al:OASIcs.SLATE.2021.3,
  author =	{Martins, Paulo Jorge Pereira and Costa, Leandro Jos\'{e} Abreu Dias and Ramalho, Jos\'{e} Carlos},
  title =	{{Major Minors - Ontological Representation of Minorities by Newspapers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.3},
  URN =		{urn:nbn:de:0030-drops-144201},
  doi =		{10.4230/OASIcs.SLATE.2021.3},
  annote =	{Keywords: RDF, OWL, Ontologies, Knowledge Representation, Minorities}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.6

DataGen: JSON/XML Dataset Generator

Authors: Filipa Alves dos Santos, Hugo André Coelho Cardoso, João da Cunha e Costa, Válter Ferreira Picas Carvalho, and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

In this document we describe the steps towards DataGen implementation. DataGen is a versatile and powerful tool that allows for quick prototyping and testing of software applications, since currently too few solutions offer both the complexity and scalability necessary to generate adequate datasets in order to feed a data API or a more complex APP enabling those applications testing with appropriate data volume and data complexity. DataGen core is a Domain Specific Language (DSL) that was created to specify datasets. This language suffered several updates: repeating fields (with no limit), fuzzy fields (statistically generated), lists, highorder functions over lists, custom made transformation functions. The final result is a complex algebra that allows the generation of very complex datasets coping with very complex requirements. Throughout the paper we will give several examples of the possibilities. After generating a dataset DataGen gives the user the possibility to generate a RESTFull data API with that dataset, creating a running prototype. This solution has already been used in real life cases, described with more detail throughout the paper, in which it was able to create the intended datasets successfully. These allowed the application’s performance to be tested and for the right adjustments to be made. The tool is currently being deployed for general use.

Cite as

Filipa Alves dos Santos, Hugo André Coelho Cardoso, João da Cunha e Costa, Válter Ferreira Picas Carvalho, and José Carlos Ramalho. DataGen: JSON/XML Dataset Generator. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2021.6,
  author =	{Santos, Filipa Alves dos and Cardoso, Hugo Andr\'{e} Coelho and da Cunha e Costa, Jo\~{a}o and Carvalho, V\'{a}lter Ferreira Picas and Ramalho, Jos\'{e} Carlos},
  title =	{{DataGen: JSON/XML Dataset Generator}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.6},
  URN =		{urn:nbn:de:0030-drops-144239},
  doi =		{10.4230/OASIcs.SLATE.2021.6},
  annote =	{Keywords: JSON, XML, Data Generation, Open Source, REST API, Strapi, JavaScript, Node.js, Vue.js, Scalability, Fault Tolerance, Dataset, DSL, PEG.js, MongoDB}
}

@InProceedings{santos_et_al:OASIcs.SLATE.2021.6,
  author =	{Santos, Filipa Alves dos and Cardoso, Hugo Andr\'{e} Coelho and da Cunha e Costa, Jo\~{a}o and Carvalho, V\'{a}lter Ferreira Picas and Ramalho, Jos\'{e} Carlos},
  title =	{{DataGen: JSON/XML Dataset Generator}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.6},
  URN =		{urn:nbn:de:0030-drops-144239},
  doi =		{10.4230/OASIcs.SLATE.2021.6},
  annote =	{Keywords: JSON, XML, Data Generation, Open Source, REST API, Strapi, JavaScript, Node.js, Vue.js, Scalability, Fault Tolerance, Dataset, DSL, PEG.js, MongoDB}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.8

NER in Archival Finding Aids

Authors: Luís Filipe Costa Cunha and José Carlos Ramalho

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

At the moment, the vast majority of Portuguese archives with an online presence use a software solution to manage their finding aids: e.g. Digitarq or Archeevo. Most of these finding aids are written in natural language without any annotation that would enable a machine to identify named entities, geographical locations or even some dates. That would allow the machine to create smart browsing tools on top of those record contents like entity linking and record linking. In this work we have created a set of datasets to train Machine Learning algorithms to find those named entities and geographical locations. After training several algorithms we tested them in several datasets and registered their precision and accuracy. These results enabled us to achieve some conclusions about what kind of precision we can achieve with this approach in this context and what to do with the results: do we have enough precision and accuracy to create toponymic and anthroponomic indexes for archival finding aids? Is this approach suitable in this context? These are some of the questions we intend to answer along this paper.

Cite as

Luís Filipe Costa Cunha and José Carlos Ramalho. NER in Archival Finding Aids. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{costacunha_et_al:OASIcs.SLATE.2021.8,
  author =	{Costa Cunha, Lu{\'\i}s Filipe and Ramalho, Jos\'{e} Carlos},
  title =	{{NER in Archival Finding Aids}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{8:1--8:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.8},
  URN =		{urn:nbn:de:0030-drops-144257},
  doi =		{10.4230/OASIcs.SLATE.2021.8},
  annote =	{Keywords: Named Entity Recognition, Archival Descriptions, Machine Learning, Deep Learning}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.15

Open Web Ontobud: An Open Source RDF4J Frontend (Short Paper)

Authors: Francisco José Moreira Oliveira and José Carlos Ramalho

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

Nowadays, we deal with increasing volumes of data. A few years ago, data was isolated, which did not allow communication or sharing between datasets. We live in a world where everything is connected, and our data mimics this. Data model focus changed from a square structure like the relational model to a model centered on the relations. Knowledge graphs are the new paradigm to represent and manage this new kind of information structure. Along with this new paradigm, a new kind of database emerged to support the new needs, graph databases! Although there is an increasing interest in this field, only a few native solutions are available. Most of these are commercial, and the ones that are open source have poor interfaces, and for that, they are a little distant from end-users. In this article, we introduce Ontobud, and discuss its design and development. A Web application that intends to improve the interface for one of the most interesting frameworks in this area: RDF4J. RDF4J is a Java framework to deal with RDF triples storage and management. Open Web Ontobud is an open source RDF4J web frontend, created to reduce the gap between end users and the RDF4J backend. We have created a web interface that enables users with a basic knowledge of OWL and SPARQL to explore ontologies and extract information from them.

Cite as

Francisco José Moreira Oliveira and José Carlos Ramalho. Open Web Ontobud: An Open Source RDF4J Frontend (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 15:1-15:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{oliveira_et_al:OASIcs.SLATE.2020.15,
  author =	{Oliveira, Francisco Jos\'{e} Moreira and Ramalho, Jos\'{e} Carlos},
  title =	{{Open Web Ontobud: An Open Source RDF4J Frontend}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{15:1--15:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.15},
  URN =		{urn:nbn:de:0030-drops-130283},
  doi =		{10.4230/OASIcs.SLATE.2020.15},
  annote =	{Keywords: RDF4J, Frontend, Open Source, Ontology, REST API, RDF, SPARQL, Graph Databases}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.17

SPARQLing Neo4J (Short Paper)

Authors: Ezequiel José Veloso Ferreira Moreira and José Carlos Ramalho

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

The growth experienced by the internet in the past few years as lead to an increased amount of available data and knowledge obtained from said data. However most of this knowledge is lost due to the lack of associated semantics making the task of interpreting data very hard to computers. To counter this, ontologies provide a extremely solid way to represent data and automatically derive knowledge from it. In this article we'll present the work being developed with the aim to store and explore ontologies in Neo4J. In order to achieve this a web frontend was developed, integrating a SPARQL to CYPHER translator to allow users to query stored ontologies using SPARQL. This translator and its code generation is the main subject of this paper.

Cite as

Ezequiel José Veloso Ferreira Moreira and José Carlos Ramalho. SPARQLing Neo4J (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 17:1-17:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{moreira_et_al:OASIcs.SLATE.2020.17,
  author =	{Moreira, Ezequiel Jos\'{e} Veloso Ferreira and Ramalho, Jos\'{e} Carlos},
  title =	{{SPARQLing Neo4J}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{17:1--17:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.17},
  URN =		{urn:nbn:de:0030-drops-130301},
  doi =		{10.4230/OASIcs.SLATE.2020.17},
  annote =	{Keywords: SPARQL, CYPHER, Graph Databases, RDF, OWL, Neo4J, GraphDB}
}

7 Search Results for "Ramalho, José Carlos"

Synthetic Data Generation from JSON Schemas

Abstract

Cite as

EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses

Abstract

Cite as

Major Minors - Ontological Representation of Minorities by Newspapers

Abstract

Cite as

DataGen: JSON/XML Dataset Generator

Abstract

Cite as

NER in Archival Finding Aids

Abstract

Cite as

Open Web Ontobud: An Open Source RDF4J Frontend (Short Paper)

Abstract

Cite as

SPARQLing Neo4J (Short Paper)

Abstract

Cite as

Thanks for your feedback!

Could not send message