Search Results

Documents authored by McCrae, John P.


Found 2 Possible Name Variants:

McCrae, John P.

Document
Complete Volume
OASIcs, Volume 93, LDK 2021, Complete Volume

Authors: Dagmar Gromann, Gilles Sérasset, Thierry Declerck, John P. McCrae, Jorge Gracia, Julia Bosque-Gil, Fernando Bobillo, and Barbara Heinisch

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
OASIcs, Volume 93, LDK 2021, Complete Volume

Cite as

3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 1-516, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Proceedings{gromann_et_al:OASIcs.LDK.2021,
  title =	{{OASIcs, Volume 93, LDK 2021, Complete Volume}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{1--516},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021},
  URN =		{urn:nbn:de:0030-drops-145352},
  doi =		{10.4230/OASIcs.LDK.2021},
  annote =	{Keywords: OASIcs, Volume 93, LDK 2021, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Dagmar Gromann, Gilles Sérasset, Thierry Declerck, John P. McCrae, Jorge Gracia, Julia Bosque-Gil, Fernando Bobillo, and Barbara Heinisch

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{gromann_et_al:OASIcs.LDK.2021.0,
  author =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{0:i--0:xvi},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.0},
  URN =		{urn:nbn:de:0030-drops-145364},
  doi =		{10.4230/OASIcs.LDK.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Encoder-Attention-Based Automatic Term Recognition (EA-ATR)

Authors: Sampritha H. Manjunath and John P. McCrae

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Automated Term Recognition (ATR) is the task of finding terminology from raw text. It involves designing and developing techniques for the mining of possible terms from the text and filtering these identified terms based on their scores calculated using scoring methodologies like frequency of occurrence and then ranking the terms. Current approaches often rely on statistics and regular expressions over part-of-speech tags to identify terms, but this is error-prone. We propose a deep learning technique to improve the process of identifying a possible sequence of terms. We improve the term recognition by using Bidirectional Encoder Representations from Transformers (BERT) based embeddings to identify which sequence of words is a term. This model is trained on Wikipedia titles. We assume all Wikipedia titles to be the positive set, and random n-grams generated from the raw text as a weak negative set. The positive and negative set will be trained using the Embed, Encode, Attend and Predict (EEAP) formulation using BERT as embeddings. The model will then be evaluated against different domain-specific corpora like GENIA - annotated biological terms and Krapivin - scientific papers from the computer science domain.

Cite as

Sampritha H. Manjunath and John P. McCrae. Encoder-Attention-Based Automatic Term Recognition (EA-ATR). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 23:1-23:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{manjunath_et_al:OASIcs.LDK.2021.23,
  author =	{Manjunath, Sampritha H. and McCrae, John P.},
  title =	{{Encoder-Attention-Based Automatic Term Recognition (EA-ATR)}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{23:1--23:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.23},
  URN =		{urn:nbn:de:0030-drops-145597},
  doi =		{10.4230/OASIcs.LDK.2021.23},
  annote =	{Keywords: Automatic Term Recognition, Term Extraction, BERT, EEAP, Deep Learning for ATR}
}
Document
Complete Volume
OASIcs, Volume 70, LDK'19, Complete Volume

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
OASIcs, Volume 70, LDK'19, Complete Volume

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{eskevich_et_al:OASIcs.LDK.2019,
  title =	{{OASIcs, Volume 70, LDK'19, Complete Volume}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019},
  URN =		{urn:nbn:de:0030-drops-105045},
  doi =		{10.4230/OASIcs.LDK.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Knowledge representation and reasoning}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{eskevich_et_al:OASIcs.LDK.2019.0,
  author =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{0:i--0:xvi},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.0},
  URN =		{urn:nbn:de:0030-drops-103641},
  doi =		{10.4230/OASIcs.LDK.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Authors: Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.

Cite as

Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae. Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chakravarthi_et_al:OASIcs.LDK.2019.6,
  author =	{Chakravarthi, Bharathi Raja and Arcan, Mihael and McCrae, John P.},
  title =	{{Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.6},
  URN =		{urn:nbn:de:0030-drops-103700},
  doi =		{10.4230/OASIcs.LDK.2019.6},
  annote =	{Keywords: Under-resourced languages, Machine translation, Dravidian languages, Phonetic transcription, Transliteration, International Phonetic Alphabet, IPA, Multilingual machine translation, Multilingual data}
}
Document
Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors: Omnia Zayed, John P. McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Cite as

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{zayed_et_al:OASIcs.LDK.2019.10,
  author =	{Zayed, Omnia and McCrae, John P. and Buitelaar, Paul},
  title =	{{Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{10:1--10:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.10},
  URN =		{urn:nbn:de:0030-drops-103740},
  doi =		{10.4230/OASIcs.LDK.2019.10},
  annote =	{Keywords: metaphor, identification, tweets, dataset, annotation, crowd-sourcing}
}

McCrae, John

Document
Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors: Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research.

Cite as

Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar. Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{masoud_et_al:OASIcs.LDK.2021.19,
  author =	{Masoud, Maraim and Pereira, Bianca and McCrae, John and Buitelaar, Paul},
  title =	{{Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{19:1--19:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.19},
  URN =		{urn:nbn:de:0030-drops-145556},
  doi =		{10.4230/OASIcs.LDK.2021.19},
  annote =	{Keywords: Knowledge Graph Construction, Enterprise Knowledge Graph}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail