DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2024.1

Using Embeddings to Improve Named Entity Recognition Classification with Graphs

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 120, 13th Symposium on Languages, Applications and Technologies (SLATE 2024)

Abstract

Richer information has potential to improve performance of NLP (Natural Language Processing) tasks such as Named Entity Recognition. A linear sequence of words can be enriched with the sentence structure, as well as their syntactic structure. However, traditional NLP methods do not contemplate this kind of information. With the use of Knowledge Graphs all this information can be represented and made use off by Graph ML (Machine Learning) techniques. Previous experiments using only graphs with their syntactic structure as input to current state-of-the-art Graph ML models failed to prove the potential of the technology. As such, in this paper the use of word embeddings is explored as an additional enrichment of the graph and, in consequence, of the input to the classification models. This use of embeddings adds a layer of context that was previously missing when using only syntactic information. The proposed method was assessed using CoNLL dataset and results showed noticeable improvements in performance when adding embeddings. The best accuracy results with embedings attained 94.73 % accuracy, compared to the 88.58 % without embedings while metrics such as Macro-F1, Precision and Recall achieved an improvement in performance of over 20%. We test these models with a different number of classes to assess whether the quality of them would degrade or not. Due to the use of inductive learning methods (such as Graph SAGE) these results provide us with models that can be used in real-world scenarios as there is no need to re-train the whole graph to predict on new data points as is the case with traditional Graph ML methods (for example, Graph Convolutional Networks).

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. Using Embeddings to Improve Named Entity Recognition Classification with Graphs. In 13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 1:1-1:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2024.1,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{Using Embeddings to Improve Named Entity Recognition Classification with Graphs}},
  booktitle =	{13th Symposium on Languages, Applications and Technologies (SLATE 2024)},
  pages =	{1:1--1:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-321-8},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{120},
  editor =	{Rodrigues, M\'{a}rio and Leal, Jos\'{e} Paulo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2024.1},
  URN =		{urn:nbn:de:0030-drops-220722},
  doi =		{10.4230/OASIcs.SLATE.2024.1},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Named Entity Recognition}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.2

A Framework for Fostering Easier Access to Enriched Textual Information

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Considering the amount of information in unstructured data it is necessary to have suitable methods to extract information from it. Most of these methods have their own output making it difficult and costly to merge and share this information as there currently is no unified way of representing this information. While most of these methods rely on JSON or XML there has been a push to serialize these into RDF compliant formats due to their flexiblity and the existing ecosystem surrounding them. In this paper we introduce a framework whose goal is to provide a serialization of enriched data into an RDF format, following FAIR principles, making it more interpretable, interoperable and shareable. We process a subset of the WikiNER dataset and showcase two examples of using this framework: One using CoNLL annotations and the other by performing entity-linking on an already existing graph. The results are a graph with every connection starting from the document and finishing on tokens while keeping the original text intact while embedding the enriched data into it, in this case the CoNLL annotations and Entities.

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. A Framework for Fostering Easier Access to Enriched Textual Information. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2023.2,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{A Framework for Fostering Easier Access to Enriched Textual Information}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.2},
  URN =		{urn:nbn:de:0030-drops-185165},
  doi =		{10.4230/OASIcs.SLATE.2023.2},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Triplestore}
}

Search Results

Documents authored by Silva, Gabriel

Using Embeddings to Improve Named Entity Recognition Classification with Graphs

Abstract

Cite as

A Framework for Fostering Easier Access to Enriched Textual Information

Abstract

Cite as

Thanks for your feedback!

Could not send message