DROPS

Document

Outlier detection and ranking based on subspace clustering

Authors: Thomas Seidl, Emmanuel Müller, Ira Assent, and Uwe Steinhausen

Published in: Dagstuhl Seminar Proceedings, Volume 8421, Uncertainty Management in Information Systems (2009)

Abstract

Detecting outliers is an important task for many applications including fraud detection or consistency validation in real world data. Particularly in the presence of uncertain data or imprecise data, similar objects regularly deviate in their attribute values. The notion of outliers has thus to be defined carefully. When considering outlier detection as a task which is complementary to clustering, binary decisions whether an object is regarded to be an outlier or not seem to be near at hand. For high-dimensional data, however, objects may belong to different clusters in different subspaces. More fine-grained concepts to define outliers are therefore demanded. By our new OutRank approach, we address outlier detection in heterogeneous high dimensional data and propose a novel scoring function that provides a consistent model for ranking outliers in the presence of different attribute types. Preliminary experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets.

Cite as

Thomas Seidl, Emmanuel Müller, Ira Assent, and Uwe Steinhausen. Outlier detection and ranking based on subspace clustering. In Uncertainty Management in Information Systems. Dagstuhl Seminar Proceedings, Volume 8421, pp. 1-4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{seidl_et_al:DagSemProc.08421.10,
  author =	{Seidl, Thomas and M\"{u}ller, Emmanuel and Assent, Ira and Steinhausen, Uwe},
  title =	{{Outlier detection and ranking based on subspace clustering}},
  booktitle =	{Uncertainty Management in Information Systems},
  pages =	{1--4},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2009},
  volume =	{8421},
  editor =	{Christoph Koch and Birgitta K\"{o}nig-Ries and Volker Markl and Maurice van Keulen},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08421.10},
  URN =		{urn:nbn:de:0030-drops-19344},
  doi =		{10.4230/DagSemProc.08421.10},
  annote =	{Keywords: Outlier detection, outlier ranking, subspace clustering, data mining}
}

Document

DOI: 10.4230/DagSemProc.07181.10

Subspace outlier mining in large multimedia databases

Authors: Ira Assent, Ralph Krieger, Emmanuel Müller, and Thomas Seidl

Published in: Dagstuhl Seminar Proceedings, Volume 7181, Parallel Universes and Local Patterns (2007)

Abstract

Increasingly large multimedia databases in life sciences, e-commerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. One of the major tasks in KDD, clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped clusters even in noisy data bases. In high-dimensional data bases, meaningful clusters can no longer be detected due to the "curse of dimensionality". Consequently, subspace clustering searches for clusters hidden in any subset of the set of dimensions. As the number of subspaces is exponential in the number of dimensions, traditional approaches use fixed pruning thresholds. This results in dimensionality bias, i.e. with growing dimensionality, more clusters are missed. Clustering information is very useful for applications like fraud detection where outliers, i.e. objects which differ from all clusters, are searched. In subspace clustering, an object may be an outlier with respect to some groups, but not with respect to others, leading to possibly conflicting information. We propose a density-based unbiased subspace clustering model for outlier detection. We define outliers with respect to all maximal and non-redundant subspace clusters, taking their distance (deviation in attribute values), relevance (number of attributes covered) and support (number of objects covered) into account. We demonstrate the quality of our subspace clustering results in experiments on real world and synthetic databases and discuss our outlier model.

Cite as

Ira Assent, Ralph Krieger, Emmanuel Müller, and Thomas Seidl. Subspace outlier mining in large multimedia databases. In Parallel Universes and Local Patterns. Dagstuhl Seminar Proceedings, Volume 7181, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{assent_et_al:DagSemProc.07181.10,
  author =	{Assent, Ira and Krieger, Ralph and M\"{u}ller, Emmanuel and Seidl, Thomas},
  title =	{{Subspace outlier mining in large multimedia databases}},
  booktitle =	{Parallel Universes and Local Patterns},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{7181},
  editor =	{Michael R. Berthold and Katharina Morik and Arno Siebes},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.07181.10},
  URN =		{urn:nbn:de:0030-drops-12574},
  doi =		{10.4230/DagSemProc.07181.10},
  annote =	{Keywords: Data mining, outlier detection, subspace clustering, density-based clustering}
}

Document

DOI: 10.4230/DagSemProc.06171.6

Efficient multi-step query processing for EMD-based similarity

Authors: Ira Assent and Thomas Seidl

Published in: Dagstuhl Seminar Proceedings, Volume 6171, Content-Based Retrieval (2006)

Abstract

Similarity search in large multimedia databases requires ef- ficient query processing based on suitable similarity models. Similarity models consist of a feature extraction step as well as a distance defined for these features, and they demand an efficient algorithm for retrieving similar objects under this model. In this work, we focus on the Earth Movers Distance (EMD), a recently introduced similarity model which has been successfully employed in numerous applications and has been reported as well reflecting human perceptual similarity. As its computation is complex, the direct application of the EMD to large, high-dimensional databases is not feasible. To remedy this and allow users to benefit from the high quality of the model even in larger settings, we developed various lower bounds for the EMD to be used in index-supported multistep query processing algorithms. We prove that our algorithms are complete, thus producing no false drops. We also show that it is highly efficient as experiments on large image databases with high-dimensional features demonstrate.

Cite as

Ira Assent and Thomas Seidl. Efficient multi-step query processing for EMD-based similarity. In Content-Based Retrieval. Dagstuhl Seminar Proceedings, Volume 6171, pp. 1-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{assent_et_al:DagSemProc.06171.6,
  author =	{Assent, Ira and Seidl, Thomas},
  title =	{{Efficient multi-step query processing for EMD-based similarity}},
  booktitle =	{Content-Based Retrieval},
  pages =	{1--12},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6171},
  editor =	{Tim Crawford and Remco C. Veltkamp},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06171.6},
  URN =		{urn:nbn:de:0030-drops-6490},
  doi =		{10.4230/DagSemProc.06171.6},
  annote =	{Keywords: Content-based retrieval, indexing, multimedia databases, efficiency, similarity}
}

3 Search Results for "Assent, Ira"

Outlier detection and ranking based on subspace clustering

Abstract

Cite as

Subspace outlier mining in large multimedia databases

Abstract

Cite as

Efficient multi-step query processing for EMD-based similarity

Abstract

Cite as

Thanks for your feedback!

Could not send message