Co-Bidding Graphs for Constrained Paper Clustering

Škvorc, Tadej; Lavrač, Nada; Robnik-Šikonja, Marko

doi:10.4230/OASIcs.SLATE.2016.1

File

OASIcs.SLATE.2016.1.pdf

Filesize: 0.68 MB
13 pages

Document Identifiers

DOI: 10.4230/OASIcs.SLATE.2016.1
URN: urn:nbn:de:0030-drops-60062

Author Details

Tadej Škvorc

Nada Lavrač

Marko Robnik-Šikonja

Cite AsGet BibTex

Tadej Škvorc, Nada Lavrač, and Marko Robnik-Šikonja. Co-Bidding Graphs for Constrained Paper Clustering. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/OASIcs.SLATE.2016.1

Abstract

The information for many important problems can be found in various formats and modalities. Besides standard tabular form, these include also text and graphs. To solve such problems fusion of different data sources is required. We demonstrate a methodology which is capable to enrich textual information with graph based data and utilize both in an innovative machine learning application of clustering. The proposed solution is helpful in organization of academic conferences and automates one of its time consuming tasks. Conference organizers can currently use a small number of software tools that allow managing of the paper review process with no/little support for automated conference scheduling. We present a two-tier constrained clustering method for automatic conference scheduling that can automatically assign paper presentations into predefined schedule slots instead of requiring the program chairs to assign them manually. The method uses clustering algorithms to group papers into clusters based on similarities between papers. We use two types of similarities: text similarities (paper similarity with respect to their abstract and title), together with graph similarity based on reviewers' co-bidding information collected during the conference reviewing phase. In this way reviewers' preferences serve as a proxy for preferences of conference attendees. As a result of the proposed two-tier clustering process similar papers are assigned to predefined conference schedule slots. We show that using graph based information in addition to text based similarity increases clustering performance. The source code of the solution is freely available.

Keywords

Text mining
data fusion
scheduling
constrained clustering
conference

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M Pérez, and Iñigo Perona. An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1):243-256, 2013.
David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012.
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of Knowledge Discovery and Data Mining, volume 96, pages 226-231, 1996.
Brendan J Frey and Delbert Dueck. Clustering by passing messages between data points. Science, 315(5814):972-976, 2007.
Keinosuke Fukunaga and Larry D Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1):32-40, 1975.
Miha Grčar, Nejc Trdin, and Nada Lavrač. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 2012.
John A Hartigan and Manchek A Wong. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, pages 100-108, 1979.
Tin Huynh, Kiem Hoang, Loc Do, Huong Tran, Hiep Luong, and Susan Gauch. Scientific publication recommendations based on collaborative citation networks. In International Conference on Collaboration Technologies and Systems (CTS), pages 316-321. IEEE, 2012.
Stephen C Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241-254, 1967.
Yicong Liang, Qing Li, and Tieyun Qian. Finding relevant papers based on citation relations. In Web-age Information Management, pages 403-414. Springer, 2011.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab, Stanford, CA, November 1999.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825-2830, 2011.
Niels Peek, Roque Marin Morales, and Mor Peleg, editors. Artificial Intelligence in Medicine: 14th Conference on Artificial Intelligence in Medicine, AIME 2013, Murcia, Spain, volume 7885 of Lecture Notes in Artificial Intelligence. Springer, 2013.
Manh Cuong Pham, Dejan Kovachev, Yiwei Cao, Ghislain Manib Mbogos, and Ralf Klamma. Enhancing academic event participation with context-aware and social recommendations. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 464-471. IEEE, 2012.
Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53-65, 1987.
Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975.
David Sculley. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, pages 1177-1178. ACM, 2010.
Irena Spasic, Sophia Ananiadou, John McNaught, and Anand Kumar. Text mining and ontologies in biomedicine: making sense of raw text. Briefings in bioinformatics, 6(3):239-251, 2005.
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, et al. Constrained k-means clustering with background knowledge. In Proceedings of the International Conference on Machine Learning, volume 1, pages 577-584, 2001.
Feng Xia, Nana Yaw Asabere, Haifeng Liu, Nakema Deonauth, and Fengqi Li. Folksonomy based socially-aware recommendation of scholarly papers for conference participants. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pages 781-786. International World Wide Web Conferences Steering Committee, 2014.
Feng Xia, Nana Yaw Asabere, Joel JPC Rodrigues, Filippo Basso, Nakema Deonauth, and Wei Wang. Socially-aware venue recommendation for conference participants. In Proceedings of the 10th International Conference on Autonomic and Trusted Computing (UIC/ATC), pages 134-141. IEEE, 2013.
Shunzhi Zhu, Dingding Wang, and Tao Li. Data clustering with size constraints. Knowledge-Based Systems, 23(8):883-889, 2010.