Incremental Constrained Clustering by Minimal Weighted Modification

Authors Aymeric Beauchamp , Thi-Bich-Hanh Dao , Samir Loudni , Christel Vrain



PDF
Thumbnail PDF

File

LIPIcs.CP.2023.10.pdf
  • Filesize: 4.88 MB
  • 22 pages

Document Identifiers

Author Details

Aymeric Beauchamp
  • University of Orléans, INSA Centre Val de Loire, LIFO EA 4022, France
Thi-Bich-Hanh Dao
  • University of Orléans, INSA Centre Val de Loire, LIFO EA 4022, France
Samir Loudni
  • TASC (LS2N-CNRS), IMT Atlantique, France, GREYC, University of Caen Normandy, France
Christel Vrain
  • University of Orléans, INSA Centre Val de Loire, LIFO EA 4022, France

Acknowledgements

The authors want to thank the anonymous reviewers for their comments and suggestions which helped to improve this paper.

Cite AsGet BibTex

Aymeric Beauchamp, Thi-Bich-Hanh Dao, Samir Loudni, and Christel Vrain. Incremental Constrained Clustering by Minimal Weighted Modification. In 29th International Conference on Principles and Practice of Constraint Programming (CP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 280, pp. 10:1-10:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CP.2023.10

Abstract

Clustering is a well-known task in Data Mining that aims at grouping data instances according to their similarity. It is an exploratory and unsupervised task whose results depend on many parameters, often requiring the expert to iterate several times before satisfaction. Constrained clustering has been introduced for better modeling the expectations of the expert. Nevertheless constrained clustering is not yet sufficient since it usually requires the constraints to be given before the clustering process. In this paper we address a more general problem that aims at modeling the exploratory clustering process, through a sequence of clustering modifications where expert constraints are added on the fly. We present an incremental constrained clustering framework integrating active query strategies and a Constraint Programming model to fit the expert expectations while preserving the stability of the partition, so that the expert can understand the process and apprehend its impact. Our model supports instance and group-level constraints, which can be relaxed. Experiments on reference datasets and a case study related to the analysis of satellite image time series show the relevance of our framework.

Subject Classification

ACM Subject Classification
  • Theory of computation → Constraint and logic programming
  • Computing methodologies → Semi-supervised learning settings
Keywords
  • Incremental constrained clustering
  • Constrained optimization problem
  • User feedback

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Arindam Banerjee and Joydeep Ghosh. Scalable clustering algorithms with balancing constraints. Data Min. Knowl. Discov., 13(3):365-395, 2006. Google Scholar
  2. Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. Active semi-supervision for pairwise constrained clustering. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 333-344. SIAM, 2004. URL: https://doi.org/10.1137/1.9781611972740.31.
  3. Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. Active Semi-Supervision for Pairwise Constrained Clustering. In ICDM, pages 333-344, 2004. Google Scholar
  4. Alessio Benavoli, Giorgio Corani, Janez Demšar, and Marco Zaffalon. Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis, 2017. Google Scholar
  5. Christian Bessière, Emmanuel Hébrard, Brahim Hnich, Zeynep Kiziltan, and Toby Walsh. Filtering Algorithms for the NValue Constraint. Constraints, 11(4):271-293, 2006. Google Scholar
  6. Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 11-18, 2004. Google Scholar
  7. David Cohn, Rich Caruana, and Andrew Mccallum. Semi-Supervised Clustering with User Feedback. Technical report, Cornell University, 2001. URL: https://doi.org/10.1201/9781584889977.ch2.
  8. Giorgio Corani, Alessio Benavoli, Janez Demšar, Francesca Mangili, and Marco Zaffalon. Statistical comparison of classifiers through Bayesian hierarchical modelling. Machine Learning, 106(11):1817-1837, November 2017. URL: https://doi.org/10.1007/s10994-017-5641-9.
  9. Thi-Bich-Hanh Dao, Khanh-Chuong Duong, and Christel Vrain. Constrained clustering by constraint programming. Artificial Intelligence, 244:70-94, 2017. Google Scholar
  10. Ian Davidson and S. S. Ravi. Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results. In Knowledge Discovery in Databases: PKDD 2005, Lecture Notes in Computer Science, pages 59-70, 2005. Google Scholar
  11. Ian Davidson, S. S. Ravi, and Martin Ester. Efficient incremental constrained clustering. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 240-249, 2007. Google Scholar
  12. Ian Davidson, S. S. Ravi, and Leonid Shamis. A SAT-based Framework for Efficient Constrained Clustering. In Proceedings of the 2010 SIAM International Conference on Data Mining, pages 94-105, 2010. Google Scholar
  13. E. B. Fowlkes and C. L. Mallows. A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association, 78(383):553-569, 1983. Google Scholar
  14. Marek Gagolewski. A Framework for Benchmarking Clustering Algorithms, 2022. Google Scholar
  15. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306, 1985. Google Scholar
  16. Germán González-Almagro, Daniel Peralta, Eli De Poorter, José-Ramón Cano, and Salvador García. Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions, 2023. Google Scholar
  17. Mathieu Guilbert, Christel Vrain, Thi-Bich-Hanh Dao, and Marcilio C. P. de Souto. Anchored Constrained Clustering Ensemble. In International Joint Conference on Neural Networks, IJCNN, 2022. Google Scholar
  18. Tias Guns. Increasing modeling language convenience with a universal n-dimensional array, cppy as python-embedded example. In Modref, volume 19, 2019. Google Scholar
  19. Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193-218, 1985. Google Scholar
  20. Chia-Tung Kuo, S. S. Ravi, Thi-Bich-Hanh Dao, Christel Vrain, and Ian Davidson. A framework for minimal clustering modification via constraint programming. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI'17, pages 1389-1395, 2017. Google Scholar
  21. Baptiste Lafabregue, Pierre Gançarski, Jonathan Weber, and Germain Forestier. Incremental constrained clustering with application to remote sensing images time series. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW), 2022. Google Scholar
  22. Thomas Lampert, Baptiste Lafabregue, Thi-Bich-Hanh Dao, Nicolas Serrette, Christel Vrain, and Pierre Gancarski. Constrained Distance-Based Clustering for Satellite Image Time-Series. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(11):4606-4621, 2019. Google Scholar
  23. Eric Yi Liu, Zhaojun Zhang, and Wei Wang. Clustering with relative constraints. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 947-955, New York, NY, USA, August 2011. Association for Computing Machinery. URL: https://doi.org/10.1145/2020408.2020564.
  24. James MacQueen. Some Methods For Classification And Analysis Of Multivariate Observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability,, volume 1, pages 281-297, 1967. Google Scholar
  25. Pavan Kumar Mallapragada, Rong Jin, and Anil K. Jain. Active query selection for semi-supervised clustering. In 19th International Conference on Pattern Recognition, pages 1-4, 2008. Google Scholar
  26. Logan Adam Mitchell. INCREMENT - Interactive Cluster Refinement. PhD thesis, Brigham Young University, 2016. Google Scholar
  27. Nguyen-Viet-Dung Nghiem, Christel Vrain, and Thi-Bich-Hanh Dao. Knowledge integration in deep clustering. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2022, Proceedings, Part I, volume 13713 of Lecture Notes in Computer Science, pages 174-190. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-26387-3_11.
  28. Nguyen-Viet-Dung Nghiem, Christel Vrain, Thi-Bich-Hanh Dao, and Ian Davidson. Constrained Clustering via Post-processing. In Discovery Science, Lecture Notes in Computer Science, pages 53-67, 2020. Google Scholar
  29. Abdelkader Ouali, Samir Loudni, Yahia Lebbah, Patrice Boizumault, Albrecht Zimmermann, and Lakhdar Loukil. Efficiently finding conceptual clustering models with integer linear programming. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI'16, pages 647-654, 2016. Google Scholar
  30. Alfred Ultsch and Jörn Lötsch. The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms. Data, 5(1):13, 2020. Google Scholar
  31. Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, 2009. Google Scholar
  32. Kiri Wagstaff and Claire Cardie. Clustering with instance-level constraints. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pages 1103-1110, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google Scholar
  33. Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 577-584, 2001. Google Scholar
  34. Kiri L. Wagstaff. Value, Cost, and Sharing: Open Issues in Constrained Clustering. In Knowledge Discovery in Inductive Databases, pages 1-10, 2007. Google Scholar
  35. Xiang Wang and Ian Davidson. Flexible constrained spectral clustering. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10, pages 563-572, 2010. Google Scholar
  36. Eric Xing, Michael Jordan, Stuart J Russell, and Andrew Ng. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2002. Google Scholar
  37. Sicheng Xiong, Javad Azimi, and Xiaoli Z. Fern. Active Learning of Constraints for Semi-Supervised Clustering. IEEE Trans. on Knowledge and Data Engineering, 26(1):43-54, 2014. Google Scholar
  38. Xueying Zhan, Huan Liu, Qing Li, and Antoni B. Chan. A Comparative Survey: Benchmarking for Pool-based Active Learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, volume 5, pages 4679-4686, August 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail