DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2022.3

Question Answering For Toxicological Information Extraction

Authors: Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Cite as

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.9

Classification of Public Administration Complaints

Authors: Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Complaint management is a problem faced by many organizations that is both vital to customer image and highly dependent on human resources. This work attempts to tackle a part of the problem, by classifying summaries of complaints using machine learning models in order to better redirect these to the appropriate responders. The main challenges of this task is that training datasets are often small and highly imbalanced. This can can have a big impact on the performance of classification models. The dataset analyzed in this work suffers from both of these problems, being relatively small and having labels in different proportions. In this work, two different techniques are analyzed: combining classes together to increase the number of elements of the new class; and, providing new artificial examples for some classes via translation into other languages. The classification models explored were the following: k-NN, SVM, Naïve Bayes, boosting, and Deep Learning approaches, including transformers. The paper concludes that although, as expected, the classes with little representation are hard to classify, the techniques explored helped to boost the performance, especially in the classes with a low number of elements. SVM and BERT-based models outperformed their peers.

Cite as

Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro. Classification of Public Administration Complaints. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{caldeira_et_al:OASIcs.SLATE.2022.9,
  author =	{Caldeira, Francisco and Nunes, Lu{\'\i}s and Ribeiro, Ricardo},
  title =	{{Classification of Public Administration Complaints}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{9:1--9:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.9},
  URN =		{urn:nbn:de:0030-drops-167555},
  doi =		{10.4230/OASIcs.SLATE.2022.9},
  annote =	{Keywords: Text Classification, Natural Language Processing, Deep Learning, BERT}
}

Document

Invited Paper

DOI: 10.4230/LIPIcs.CONCUR.2020.1

On Privacy and Accuracy in Data Releases (Invited Paper)

Authors: Mário S. Alvim, Natasha Fernandes, Annabelle McIver, and Gabriel H. Nunes

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)

Abstract

In this paper we study the relationship between privacy and accuracy in the context of correlated datasets. We use a model of quantitative information flow to describe the the trade-off between privacy of individuals' data and and the utility of queries to that data by modelling the effectiveness of adversaries attempting to make inferences after a data release. We show that, where correlations exist in datasets, it is not possible to implement optimal noise-adding mechanisms that give the best possible accuracy or the best possible privacy in all situations. Finally we illustrate the trade-off between accuracy and privacy for local and oblivious differentially private mechanisms in terms of inference attacks on medium-scale datasets.

Cite as

Mário S. Alvim, Natasha Fernandes, Annabelle McIver, and Gabriel H. Nunes. On Privacy and Accuracy in Data Releases (Invited Paper). In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 1:1-1:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{alvim_et_al:LIPIcs.CONCUR.2020.1,
  author =	{Alvim, M\'{a}rio S. and Fernandes, Natasha and McIver, Annabelle and Nunes, Gabriel H.},
  title =	{{On Privacy and Accuracy in Data Releases}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{1:1--1:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.1},
  URN =		{urn:nbn:de:0030-drops-128130},
  doi =		{10.4230/LIPIcs.CONCUR.2020.1},
  annote =	{Keywords: Privacy/utility trade-off, Quantitative Information Flow, inference attacks}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.1

Graph-of-Entity: A Model for Combined Data Representation and Retrieval

Authors: José Devezas, Carla Lopes, and Sérgio Nunes

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving.

Cite as

José Devezas, Carla Lopes, and Sérgio Nunes. Graph-of-Entity: A Model for Combined Data Representation and Retrieval. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 1:1-1:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{devezas_et_al:OASIcs.SLATE.2019.1,
  author =	{Devezas, Jos\'{e} and Lopes, Carla and Nunes, S\'{e}rgio},
  title =	{{Graph-of-Entity: A Model for Combined Data Representation and Retrieval}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{1:1--1:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.1},
  URN =		{urn:nbn:de:0030-drops-108686},
  doi =		{10.4230/OASIcs.SLATE.2019.1},
  annote =	{Keywords: Entity-oriented search, graph-based models, collection-based graph}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.18

Information Extraction for Event Ranking

Authors: José Devezas and Sérgio Nunes

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to effectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events.

Cite as

José Devezas and Sérgio Nunes. Information Extraction for Event Ranking. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 18:1-18:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{devezas_et_al:OASIcs.SLATE.2017.18,
  author =	{Devezas, Jos\'{e} and Nunes, S\'{e}rgio},
  title =	{{Information Extraction for Event Ranking}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{18:1--18:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.18},
  URN =		{urn:nbn:de:0030-drops-79515},
  doi =		{10.4230/OASIcs.SLATE.2017.18},
  annote =	{Keywords: Named Entity Recognition, Relation Extraction, Knowledge Base Population, Entity-Based Ranking, Academic Events}
}

5 Search Results for "Nunes, S�rgio"

Question Answering For Toxicological Information Extraction

Abstract

Cite as

Classification of Public Administration Complaints

Abstract

Cite as

On Privacy and Accuracy in Data Releases (Invited Paper)

Abstract

Cite as

Graph-of-Entity: A Model for Combined Data Representation and Retrieval

Abstract

Cite as

Information Extraction for Event Ranking

Abstract

Cite as

Thanks for your feedback!

Could not send message