Search Results

Documents authored by Cilibrasi, Rudi


Document
Automatic Meaning Discovery Using Google

Authors: Rudi Cilibrasi and Paul M.B. Vitanyi

Published in: Dagstuhl Seminar Proceedings, Volume 6051, Kolmogorov Complexity and Applications (2006)


Abstract
We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches.

Cite as

Rudi Cilibrasi and Paul M.B. Vitanyi. Automatic Meaning Discovery Using Google. In Kolmogorov Complexity and Applications. Dagstuhl Seminar Proceedings, Volume 6051, pp. 1-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)


Copy BibTex To Clipboard

@InProceedings{cilibrasi_et_al:DagSemProc.06051.3,
  author =	{Cilibrasi, Rudi and Vitanyi, Paul M.B.},
  title =	{{Automatic Meaning Discovery Using Google}},
  booktitle =	{Kolmogorov Complexity and Applications},
  pages =	{1--23},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6051},
  editor =	{Marcus Hutter and Wolfgang Merkle and Paul M.B. Vitanyi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06051.3},
  URN =		{urn:nbn:de:0030-drops-6296},
  doi =		{10.4230/DagSemProc.06051.3},
  annote =	{Keywords: Normalized Compression Distance, Clustering, Clasification, Relative Semantics of Terms, Google, World-Wide-Web, Kolmogorov complexity}
}
Document
A New Quartet Tree Heuristic for Hierarchical Clustering

Authors: Rudi Cilibrasi and Paul M. B. Vitany

Published in: Dagstuhl Seminar Proceedings, Volume 6061, Theory of Evolutionary Algorithms (2006)


Abstract
We present a new quartet heuristic for hierarchical clustering from a given distance matrix. We determine a dendrogram (ternary tree) by a new quartet method and a fast heuristic to implement it. We do not assume that there is a true ternary tree that generated the distances and which we with to recover as closeley as possible. Our aim is to model the distance matrix as faithfully as possible by the dendrogram. Our algorithm is essentially randomized hill-climbing, using parallellized Genetic Programming, where undirected trees evolve in a random walk driven by a prescribed fitness function. Our method is capable of handling up to 60--80 objects in a matter of hours, while no existing quartet heuristic can directly compute a quartet tree of more than about 20--30 objects without running for years. The method is implemented and available as public software at www.complearn.org. We present applications in many areas like music, literature, bird-flu (H5N1) virus clustering, and automatic meaning discovery using Google.

Cite as

Rudi Cilibrasi and Paul M. B. Vitany. A New Quartet Tree Heuristic for Hierarchical Clustering. In Theory of Evolutionary Algorithms. Dagstuhl Seminar Proceedings, Volume 6061, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)


Copy BibTex To Clipboard

@InProceedings{cilibrasi_et_al:DagSemProc.06061.4,
  author =	{Cilibrasi, Rudi and Vitany, Paul M. B.},
  title =	{{A New Quartet Tree Heuristic for Hierarchical Clustering}},
  booktitle =	{Theory of Evolutionary Algorithms},
  pages =	{1--13},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6061},
  editor =	{Dirk V. Arnold and Thomas Jansen and Michael D. Vose and Jonathan E. Rowe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06061.4},
  URN =		{urn:nbn:de:0030-drops-5985},
  doi =		{10.4230/DagSemProc.06061.4},
  annote =	{Keywords: Genetic programming, hierarchical clustering, quartet tree method}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail