DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2023.2

A Framework for Fostering Easier Access to Enriched Textual Information

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Considering the amount of information in unstructured data it is necessary to have suitable methods to extract information from it. Most of these methods have their own output making it difficult and costly to merge and share this information as there currently is no unified way of representing this information. While most of these methods rely on JSON or XML there has been a push to serialize these into RDF compliant formats due to their flexiblity and the existing ecosystem surrounding them. In this paper we introduce a framework whose goal is to provide a serialization of enriched data into an RDF format, following FAIR principles, making it more interpretable, interoperable and shareable. We process a subset of the WikiNER dataset and showcase two examples of using this framework: One using CoNLL annotations and the other by performing entity-linking on an already existing graph. The results are a graph with every connection starting from the document and finishing on tokens while keeping the original text intact while embedding the enriched data into it, in this case the CoNLL annotations and Entities.

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. A Framework for Fostering Easier Access to Enriched Textual Information. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2023.2,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{A Framework for Fostering Easier Access to Enriched Textual Information}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.2},
  URN =		{urn:nbn:de:0030-drops-185165},
  doi =		{10.4230/OASIcs.SLATE.2023.2},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Triplestore}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.11

Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Authors: Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training.

Cite as

Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira. Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.5

Towards European Portuguese Conversational Assistants for Smart Homes

Authors: Maksym Ketsmur, António Teixeira, Nuno Almeida, and Samuel Silva

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Nowadays, smart environments, such as Smart Homes, are becoming a reality, due to the access to a wide variety of smart devices at a low cost. These devices are connected to the home network and inhabitants can interact with them using smartphones, tablets and smart assistants, a feature with rising popularity. The diversity of devices, the user’s expectations regarding Smart Homes, and assistants' requirements pose several challenges. In this context, a Smart Home Assistant capable of conversation and device integration can be a valuable help to the inhabitants, not only for smart device control, but also to obtain valuable information and have a broader picture of how the house and its devices behave. This paper presents the current stage of development of one such assistant, targeting European Portuguese, not only supporting the control of home devices, but also providing a potentially more natural way to access a variety of information regarding the home and its devices. The development has been made in the scope of Smart Green Homes (SGH) project.

Cite as

Maksym Ketsmur, António Teixeira, Nuno Almeida, and Samuel Silva. Towards European Portuguese Conversational Assistants for Smart Homes. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{ketsmur_et_al:OASIcs.SLATE.2019.5,
  author =	{Ketsmur, Maksym and Teixeira, Ant\'{o}nio and Almeida, Nuno and Silva, Samuel},
  title =	{{Towards European Portuguese Conversational Assistants for Smart Homes}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{5:1--5:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.5},
  URN =		{urn:nbn:de:0030-drops-108725},
  doi =		{10.4230/OASIcs.SLATE.2019.5},
  annote =	{Keywords: Smart Homes, Conversational Assistants, Ontology}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.20

Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text

Authors: José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, and Joaquim Sousa Pinto

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

Information Extraction from natural texts has a great potential in areas such as Tourism and can be of great assistance in transforming customers' comments in valuable information for Tourism operators, governments and customers. After extraction, information needs to be efficiently transmitted to end-users in a natural way. Systems should not, in general, send extracted information directly to end-users, such as hotel managers, as it can be difficult to read. Naturally, humans transmit and encode information using natural languages, such as Portuguese. The problem arising from the need of efficient and natural transmission of the information to end-user is how to encode it. The use of natural language generation (NLG) is a possible solution, for producing sentences, and, with them, texts. In this paper we address this, with a data-to-text system, a derivation of formal NLG systems that use data as input. The proposed system uses an aligned corpus, which was defined, collected and processed, in about approximately 3 weeks of work. To build the language model were used three different in-domain and out-of-domain corpora. The effects of this approach were evaluated, and results are presented. Automatic metrics, BLEU and Meteor, were used to evaluate the different systems, comparing their values with similar systems. Results show that expanding the corpus has a major positive effect in BLEU and Meteor scores and use of additional corpora (in-domain and out-of-domain) in training language model does not result in significantly different performance. The scores obtained, combined with their comparison with other systems performance and informal evaluation by humans of the sentences produced, give additional support for the capabilities of the translation based approach for fast development of data-to-text for new domains.

Cite as

José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, and Joaquim Sousa Pinto. Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2017.20,
  author =	{Pereira, Jos\'{e} Casimiro and Teixeira, Ant\'{o}nio J. S. and Rodrigues, M\'{a}rio and Miguel, Pedro and Pinto, Joaquim Sousa},
  title =	{{Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{20:1--20:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.20},
  URN =		{urn:nbn:de:0030-drops-79530},
  doi =		{10.4230/OASIcs.SLATE.2017.20},
  annote =	{Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese}
}

@InProceedings{pereira_et_al:OASIcs.SLATE.2017.20,
  author =	{Pereira, Jos\'{e} Casimiro and Teixeira, Ant\'{o}nio J. S. and Rodrigues, M\'{a}rio and Miguel, Pedro and Pinto, Joaquim Sousa},
  title =	{{Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{20:1--20:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.20},
  URN =		{urn:nbn:de:0030-drops-79530},
  doi =		{10.4230/OASIcs.SLATE.2017.20},
  annote =	{Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese}
}

4 Search Results for "Teixeira, António"

A Framework for Fostering Easier Access to Enriched Textual Information

Abstract

Cite as

Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Abstract

Cite as

Towards European Portuguese Conversational Assistants for Smart Homes

Abstract

Cite as

Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text

Abstract

Cite as

Thanks for your feedback!

Could not send message