DROPS

Document

DOI: 10.4230/OASIcs.LDK.2021.23

Encoder-Attention-Based Automatic Term Recognition (EA-ATR)

Authors: Sampritha H. Manjunath and John P. McCrae

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)

Abstract

Automated Term Recognition (ATR) is the task of finding terminology from raw text. It involves designing and developing techniques for the mining of possible terms from the text and filtering these identified terms based on their scores calculated using scoring methodologies like frequency of occurrence and then ranking the terms. Current approaches often rely on statistics and regular expressions over part-of-speech tags to identify terms, but this is error-prone. We propose a deep learning technique to improve the process of identifying a possible sequence of terms. We improve the term recognition by using Bidirectional Encoder Representations from Transformers (BERT) based embeddings to identify which sequence of words is a term. This model is trained on Wikipedia titles. We assume all Wikipedia titles to be the positive set, and random n-grams generated from the raw text as a weak negative set. The positive and negative set will be trained using the Embed, Encode, Attend and Predict (EEAP) formulation using BERT as embeddings. The model will then be evaluated against different domain-specific corpora like GENIA - annotated biological terms and Krapivin - scientific papers from the computer science domain.

Cite as

Sampritha H. Manjunath and John P. McCrae. Encoder-Attention-Based Automatic Term Recognition (EA-ATR). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 23:1-23:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{manjunath_et_al:OASIcs.LDK.2021.23,
  author =	{Manjunath, Sampritha H. and McCrae, John P.},
  title =	{{Encoder-Attention-Based Automatic Term Recognition (EA-ATR)}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{23:1--23:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.23},
  URN =		{urn:nbn:de:0030-drops-145597},
  doi =		{10.4230/OASIcs.LDK.2021.23},
  annote =	{Keywords: Automatic Term Recognition, Term Extraction, BERT, EEAP, Deep Learning for ATR}
}

Search Results

Documents authored by Manjunath, Sampritha H.

Encoder-Attention-Based Automatic Term Recognition (EA-ATR)

Abstract

Cite as