DROPS

Document

Outlier detection and ranking based on subspace clustering

Authors: Thomas Seidl, Emmanuel Müller, Ira Assent, and Uwe Steinhausen

Published in: Dagstuhl Seminar Proceedings, Volume 8421, Uncertainty Management in Information Systems (2009)

Abstract

Detecting outliers is an important task for many applications including fraud detection or consistency validation in real world data. Particularly in the presence of uncertain data or imprecise data, similar objects regularly deviate in their attribute values. The notion of outliers has thus to be defined carefully. When considering outlier detection as a task which is complementary to clustering, binary decisions whether an object is regarded to be an outlier or not seem to be near at hand. For high-dimensional data, however, objects may belong to different clusters in different subspaces. More fine-grained concepts to define outliers are therefore demanded. By our new OutRank approach, we address outlier detection in heterogeneous high dimensional data and propose a novel scoring function that provides a consistent model for ranking outliers in the presence of different attribute types. Preliminary experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets.

Cite as

Thomas Seidl, Emmanuel Müller, Ira Assent, and Uwe Steinhausen. Outlier detection and ranking based on subspace clustering. In Uncertainty Management in Information Systems. Dagstuhl Seminar Proceedings, Volume 8421, pp. 1-4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{seidl_et_al:DagSemProc.08421.10,
  author =	{Seidl, Thomas and M\"{u}ller, Emmanuel and Assent, Ira and Steinhausen, Uwe},
  title =	{{Outlier detection and ranking based on subspace clustering}},
  booktitle =	{Uncertainty Management in Information Systems},
  pages =	{1--4},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2009},
  volume =	{8421},
  editor =	{Christoph Koch and Birgitta K\"{o}nig-Ries and Volker Markl and Maurice van Keulen},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.08421.10},
  URN =		{urn:nbn:de:0030-drops-19344},
  doi =		{10.4230/DagSemProc.08421.10},
  annote =	{Keywords: Outlier detection, outlier ranking, subspace clustering, data mining}
}

Document

DOI: 10.4230/DagSemProc.07181.10

Subspace outlier mining in large multimedia databases

Authors: Ira Assent, Ralph Krieger, Emmanuel Müller, and Thomas Seidl

Published in: Dagstuhl Seminar Proceedings, Volume 7181, Parallel Universes and Local Patterns (2007)

Abstract

Increasingly large multimedia databases in life sciences, e-commerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. One of the major tasks in KDD, clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped clusters even in noisy data bases. In high-dimensional data bases, meaningful clusters can no longer be detected due to the "curse of dimensionality". Consequently, subspace clustering searches for clusters hidden in any subset of the set of dimensions. As the number of subspaces is exponential in the number of dimensions, traditional approaches use fixed pruning thresholds. This results in dimensionality bias, i.e. with growing dimensionality, more clusters are missed. Clustering information is very useful for applications like fraud detection where outliers, i.e. objects which differ from all clusters, are searched. In subspace clustering, an object may be an outlier with respect to some groups, but not with respect to others, leading to possibly conflicting information. We propose a density-based unbiased subspace clustering model for outlier detection. We define outliers with respect to all maximal and non-redundant subspace clusters, taking their distance (deviation in attribute values), relevance (number of attributes covered) and support (number of objects covered) into account. We demonstrate the quality of our subspace clustering results in experiments on real world and synthetic databases and discuss our outlier model.

Cite as

Ira Assent, Ralph Krieger, Emmanuel Müller, and Thomas Seidl. Subspace outlier mining in large multimedia databases. In Parallel Universes and Local Patterns. Dagstuhl Seminar Proceedings, Volume 7181, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{assent_et_al:DagSemProc.07181.10,
  author =	{Assent, Ira and Krieger, Ralph and M\"{u}ller, Emmanuel and Seidl, Thomas},
  title =	{{Subspace outlier mining in large multimedia databases}},
  booktitle =	{Parallel Universes and Local Patterns},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{7181},
  editor =	{Michael R. Berthold and Katharina Morik and Arno Siebes},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.07181.10},
  URN =		{urn:nbn:de:0030-drops-12574},
  doi =		{10.4230/DagSemProc.07181.10},
  annote =	{Keywords: Data mining, outlier detection, subspace clustering, density-based clustering}
}

Search Results

Documents authored by Müller, Emmanuel

Outlier detection and ranking based on subspace clustering

Abstract

Cite as

Subspace outlier mining in large multimedia databases

Abstract

Cite as

Thanks for your feedback!

Could not send message