Automatic Meaning Discovery Using Google

Cilibrasi, Rudi; Vitanyi, Paul M.B.

doi:10.4230/DagSemProc.06051.3

File

Subject Classification

Keywords

Normalized Compression Distance
Clustering
Clasification
Relative Semantics of Terms
Google
World-Wide-Web
Kolmogorov complexity

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches.

Cite As Get BibTex

Rudi Cilibrasi and Paul M.B. Vitanyi. Automatic Meaning Discovery Using Google. In Kolmogorov Complexity and Applications. Dagstuhl Seminar Proceedings, Volume 6051, pp. 1-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006) https://doi.org/10.4230/DagSemProc.06051.3

Author Details

Rudi Cilibrasi

Paul M.B. Vitanyi

Automatic Meaning Discovery Using Google

Authors Rudi Cilibrasi, Paul M.B. Vitanyi

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Thanks for your feedback!

Could not send message