Co-Bidding Graphs for Constrained Paper Clustering

Authors Tadej Škvorc, Nada Lavrač, Marko Robnik-Šikonja



PDF
Thumbnail PDF

File

OASIcs.SLATE.2016.1.pdf
  • Filesize: 0.68 MB
  • 13 pages

Document Identifiers

Author Details

Tadej Škvorc
Nada Lavrač
Marko Robnik-Šikonja

Cite AsGet BibTex

Tadej Škvorc, Nada Lavrač, and Marko Robnik-Šikonja. Co-Bidding Graphs for Constrained Paper Clustering. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/OASIcs.SLATE.2016.1

Abstract

The information for many important problems can be found in various formats and modalities. Besides standard tabular form, these include also text and graphs. To solve such problems fusion of different data sources is required. We demonstrate a methodology which is capable to enrich textual information with graph based data and utilize both in an innovative machine learning application of clustering. The proposed solution is helpful in organization of academic conferences and automates one of its time consuming tasks. Conference organizers can currently use a small number of software tools that allow managing of the paper review process with no/little support for automated conference scheduling. We present a two-tier constrained clustering method for automatic conference scheduling that can automatically assign paper presentations into predefined schedule slots instead of requiring the program chairs to assign them manually. The method uses clustering algorithms to group papers into clusters based on similarities between papers. We use two types of similarities: text similarities (paper similarity with respect to their abstract and title), together with graph similarity based on reviewers' co-bidding information collected during the conference reviewing phase. In this way reviewers' preferences serve as a proxy for preferences of conference attendees. As a result of the proposed two-tier clustering process similar papers are assigned to predefined conference schedule slots. We show that using graph based information in addition to text based similarity increases clustering performance. The source code of the solution is freely available.
Keywords
  • Text mining
  • data fusion
  • scheduling
  • constrained clustering
  • conference

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M Pérez, and Iñigo Perona. An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1):243-256, 2013. Google Scholar
  2. David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012. Google Scholar
  3. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of Knowledge Discovery and Data Mining, volume 96, pages 226-231, 1996. Google Scholar
  4. Brendan J Frey and Delbert Dueck. Clustering by passing messages between data points. Science, 315(5814):972-976, 2007. Google Scholar
  5. Keinosuke Fukunaga and Larry D Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1):32-40, 1975. Google Scholar
  6. Miha Grčar, Nejc Trdin, and Nada Lavrač. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 2012. Google Scholar
  7. John A Hartigan and Manchek A Wong. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, pages 100-108, 1979. Google Scholar
  8. Tin Huynh, Kiem Hoang, Loc Do, Huong Tran, Hiep Luong, and Susan Gauch. Scientific publication recommendations based on collaborative citation networks. In International Conference on Collaboration Technologies and Systems (CTS), pages 316-321. IEEE, 2012. Google Scholar
  9. Stephen C Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241-254, 1967. Google Scholar
  10. Yicong Liang, Qing Li, and Tieyun Qian. Finding relevant papers based on citation relations. In Web-age Information Management, pages 403-414. Springer, 2011. Google Scholar
  11. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab, Stanford, CA, November 1999. Google Scholar
  12. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  13. Niels Peek, Roque Marin Morales, and Mor Peleg, editors. Artificial Intelligence in Medicine: 14th Conference on Artificial Intelligence in Medicine, AIME 2013, Murcia, Spain, volume 7885 of Lecture Notes in Artificial Intelligence. Springer, 2013. Google Scholar
  14. Manh Cuong Pham, Dejan Kovachev, Yiwei Cao, Ghislain Manib Mbogos, and Ralf Klamma. Enhancing academic event participation with context-aware and social recommendations. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 464-471. IEEE, 2012. Google Scholar
  15. Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53-65, 1987. Google Scholar
  16. Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975. Google Scholar
  17. David Sculley. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, pages 1177-1178. ACM, 2010. Google Scholar
  18. Irena Spasic, Sophia Ananiadou, John McNaught, and Anand Kumar. Text mining and ontologies in biomedicine: making sense of raw text. Briefings in bioinformatics, 6(3):239-251, 2005. Google Scholar
  19. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605):85, 2008. Google Scholar
  20. Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, et al. Constrained k-means clustering with background knowledge. In Proceedings of the International Conference on Machine Learning, volume 1, pages 577-584, 2001. Google Scholar
  21. Feng Xia, Nana Yaw Asabere, Haifeng Liu, Nakema Deonauth, and Fengqi Li. Folksonomy based socially-aware recommendation of scholarly papers for conference participants. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pages 781-786. International World Wide Web Conferences Steering Committee, 2014. Google Scholar
  22. Feng Xia, Nana Yaw Asabere, Joel JPC Rodrigues, Filippo Basso, Nakema Deonauth, and Wei Wang. Socially-aware venue recommendation for conference participants. In Proceedings of the 10th International Conference on Autonomic and Trusted Computing (UIC/ATC), pages 134-141. IEEE, 2013. Google Scholar
  23. Shunzhi Zhu, Dingding Wang, and Tao Li. Data clustering with size constraints. Knowledge-Based Systems, 23(8):883-889, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail