License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2018.13
URN: urn:nbn:de:0030-drops-92717
URL: https://drops.dagstuhl.de/opus/volltexte/2018/9271/
Go to the corresponding OASIcs Volume Portal


Gamallo, Pablo

Evaluation of Distributional Models with the Outlier Detection Task

pdf-format:
OASIcs-SLATE-2018-13.pdf (0.4 MB)


Abstract

In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for outlier detection task in English and Portuguese are released.

BibTeX - Entry

@InProceedings{gamallo:OASIcs:2018:9271,
  author =	{Pablo Gamallo},
  title =	{{Evaluation of Distributional Models with the Outlier Detection Task}},
  booktitle =	{7th Symposium on Languages, Applications and Technologies  (SLATE 2018)},
  pages =	{13:1--13:8},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-072-9},
  ISSN =	{2190-6807},
  year =	{2018},
  volume =	{62},
  editor =	{Pedro Rangel Henriques and Jos{\'e} Paulo Leal and Ant{\'o}nio Menezes Leit{\~a}o and Xavier G{\'o}mez Guinovart},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/9271},
  URN =		{urn:nbn:de:0030-drops-92717},
  doi =		{10.4230/OASIcs.SLATE.2018.13},
  annote =	{Keywords: distributional semantics, dependency analysis, outlier detection, similarity}
}

Keywords: distributional semantics, dependency analysis, outlier detection, similarity
Collection: 7th Symposium on Languages, Applications and Technologies (SLATE 2018)
Issue Date: 2018
Date of publication: 13.07.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI