Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering

Cambi, Francesco; Crescenzi, Pierluigi; Pagli, Linda

doi:10.4230/LIPIcs.FUN.2016.9

File

Subject Classification

Keywords

text mining
incremental clustering
on-line news

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

In this paper, we analyse the contents of the web site of two Italian press agencies and of four of the most popular Italian newspapers, in order to answer questions such as what are the most relevant news, what is the average life of news, and how much different are different sites. To this aim, we have developed a web-based application which hourly collects the articles in the main column of the six web sites, implements an incremental clustering algorithm for grouping the articles into news, and finally allows the user to see the answer to the above questions. We have also designed and implemented a two-layer modification of the incremental clustering algorithm and executed some preliminary experimental evaluation of this modification: it turns out that the two-layer clustering is extremely efficient in terms of time performances, and it has quite good performances in terms of precision and recall.

Cite As Get BibTex

Francesco Cambi, Pierluigi Crescenzi, and Linda Pagli. Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering. In 8th International Conference on Fun with Algorithms (FUN 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 49, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.FUN.2016.9

Author Details

Francesco Cambi

Pierluigi Crescenzi

Linda Pagli

References

J. Azzopardi and C. Staff. Incremental Clustering of News Reports. Algorithms, 5:364-378, 2012.
D. Bhattacharya and S. Ram. Sharing News Articles Using 140 Characters: A Diffusion Analysis on Twitter. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 966-971, 2012.
Jon Borglund. Event-Centric Clustering of News Articles. Technical report, Department of Information Technology, University of Uppsala, 2013.
T.F. Cox and M.A.A. Cox. Multidimensional Scaling (2nd ed.). Chapman and Hall, 2000.
S. Edunov, C.G. Diuk, I.O. Filiz, S. Bhagat, and M. Burke. Three and a half degrees of separation, 2016. URL: http://research.facebook.com/blog/.
R. Fagin, R. Kumar, and D. Sivakumar. Comparing Top K Lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 28-36, 2003.
M. Kendall and J. D. Gibbons. Rank Correlation Methods. Edward Arnold, 1990.
J. Leskovec, A. Rajaraman, and J.D. Ullman. Mining of Massive Datasets. Cambridge University Press, 2014.
Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10:707-710, 1966.
J.B. Lovins. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11:22-31, 1968.
Parse.ly. What is the Lifespan of an Article?, 2015. URL: http://parsely.com.
G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Two-level Message Clustering for Topic Detection in Twitter. In SNOW 2014 Data Challenge co-located with 23rd International World Wide Web Conference, pages 49-56, 2014.
Wikipedia - News Agency. URL: https://en.wikipedia.org/wiki/News_agency.

Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering

Authors Francesco Cambi, Pierluigi Crescenzi, Linda Pagli

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message