On the Enumeration of Frequent High Utility Itemsets: A Symbolic AI Approach

Authors Amel Hidouri, Said Jabbour, Badran Raddaoui



PDF
Thumbnail PDF

File

LIPIcs.CP.2022.27.pdf
  • Filesize: 1.78 MB
  • 17 pages

Document Identifiers

Author Details

Amel Hidouri
  • CRIL - CNRS UMR 8188, University of Artois, France
  • LARODEC, University of Tunis, Tunisia
Said Jabbour
  • CRIL - CNRS UMR 8188, University of Artois, France
Badran Raddaoui
  • SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris, France

Cite As Get BibTex

Amel Hidouri, Said Jabbour, and Badran Raddaoui. On the Enumeration of Frequent High Utility Itemsets: A Symbolic AI Approach. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 235, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.CP.2022.27

Abstract

Mining interesting patterns from data is a core part of the data mining world. High utility mining, an active research topic in data mining, aims to discover valuable itemsets with high profit (e.g., cost, risk). However, the measure of interest of an itemset must primarily reflect not only the importance of items in terms of profit, but also their occurrence in data in order to make more crucial decisions. Some proposals are then introduced to deal with the problem of computing high utility itemsets that meet a minimum support threshold. However, in these existing proposals, all transactions in which the itemset appears are taken into account, including those in which the itemset has a low profit. So, no additional information about the overall utility of the itemset is taken into account. This paper addresses this issue by introducing a SAT-based model to efficiently find the set of all frequent high utility itemsets with the use of a minimum utility threshold applied to each transaction in which the itemset appears. More specifically, we reduce the problem of mining frequent high utility itemsets to the one of enumerating the models of a formula in propositional logic, and then we use state-of-the-art SAT solvers to solve it. Afterwards, to make our approach more efficient, we provide a decomposition technique that is particularly suitable for deriving smaller and independent sub-problems easy to resolve. Finally, an extensive experimental evaluation on various popular datasets shows that our method is fast and scale well compared to the state-of-the art algorithms.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Artificial intelligence
  • Information systems → Data mining
Keywords
  • Data Mining
  • High Utility Itemsets
  • Propositional Satisfiability

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Fadi A Aloul, Arathi Ramani, Igor Markov, and Karem Sakallah. Pbs: a backtrack-search pseudo-boolean solver and optimizer. In International Symposium on Theory and Applications of Satisfiability, pages 346-353, 2002. Google Scholar
  2. Mohamed-Bachir Belaid, Christian Bessiere, and Nadjib Lazaar. Constraint programming for mining borders of frequent itemsets. In IJCAI, pages 1064-1070, 2019. Google Scholar
  3. Abdelhamid Boudane, Saïd Jabbour, Badran Raddaoui, and Lakhdar Sais. Efficient sat-based encodings of conditional cardinality constraints. In LPAR, pages 181-195, 2018. Google Scholar
  4. Abdelhamid Boudane, Saïd Jabbour, Lakhdar Sais, and Yakoub Salhi. SAT-based data mining. Int. J. Artif. Intell. Tools, pages 1840002:1-1840002:24, 2018. Google Scholar
  5. D. Chai and A. Kuehlmann. A fast pseudo-boolean constraint solver. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 305-317, 2005. Google Scholar
  6. W. Cook, C.R. Coullard, and Gy. Turán. On the complexity of cutting-plane proofs. Discrete Applied Mathematics, pages 25-38, 1987. Google Scholar
  7. Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem-proving. Commun. ACM, pages 394-397, 1962. Google Scholar
  8. Heidi E. Dixon and Matthew L. Ginsberg. Inference methods for a pseudo-boolean satisfiability solver. In AAAI, pages 635-640, 2002. Google Scholar
  9. Imen Ouled Dlala, Saïd Jabbour, Badran Raddaoui, and Lakhdar Sais. A parallel SAT-based framework for closed frequent itemsets mining. In CP, pages 570-587, 2018. Google Scholar
  10. Niklas Eén and Niklas Sörensson. An extensible SAT-solver. In SAT, pages 502-518, 2004. Google Scholar
  11. Niklas Eén and Niklas Sörensson. Translating pseudo-boolean constraints into sat. J. Satisf. Boolean Model. Comput., pages 1-26, 2006. Google Scholar
  12. Philippe Fournier-Viger, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, and Hoang Thanh Lam. The spmf open-source data mining library version 2. In Joint European conference on machine learning and knowledge discovery in databases, pages 36-40, 2016. Google Scholar
  13. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S. Tseng, and Philip S. Yu. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering, pages 1306-1327, 2021. Google Scholar
  14. Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, and Luc De Raedt. Miningzinc: A declarative framework for constraint-based mining. Artif. Intell., pages 6-29, 2017. Google Scholar
  15. Amel Hidouri, Said Jabbour, Badran Raddaoui, and Boutheina Ben Yaghlane. Mining closed high utility itemsets based on propositional satisfiability. DKE, page 101927, 2021. Google Scholar
  16. Saïd Jabbour, Fatima Ezzahra Mana, Imen Ouled Dlala, Badran Raddaoui, and Lakhdar Sais. On maximal frequent itemsets mining with constraints. In CP, pages 554-569, 2018. Google Scholar
  17. Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. Triangle-driven community detection in large graphs using propositional satisfiability. In AINA, pages 437-444, 2018. Google Scholar
  18. Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. Sat-based models for overlapping community detection in networks. Computing, 102(5):1275-1299, 2020. Google Scholar
  19. Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. A declarative framework for maximal k-plex enumeration problems. In AAMAS, pages 660-668, 2022. Google Scholar
  20. Saïd Jabbour, Lakhdar Sais, and Yakoub Salhi. Mining Top-k motifs with a SAT-based framework. Artif. Intell., pages 30-47, 2017. Google Scholar
  21. Daniel Le Berre and Anne Parrain. The SAT4J library, Release 2.2, System Description. Journal on Satisfiability, Boolean Modeling and Computation, pages 59-64, 2010. Google Scholar
  22. Ying Liu, Wei-keng Liao, and Alok Choudhary. A two-phase algorithm for fast discovery of high utility itemsets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 689-695, 2005. Google Scholar
  23. Vasco Manquinho and J. Marques-Silva. On using cutting planes in pseudo-boolean optimization. Journal on Satisfiability, Boolean Modeling and Computation, 2006. Google Scholar
  24. João P. Marques-Silva and Karem A. Sakallah. Grasp: A search algorithm for propositional satisfiability. IEEE Trans. Comput., pages 506-521, 1999. Google Scholar
  25. A Morgado and J Marques-Silva. Algorithms for propositional model enumeration and counting. Technical report, Citeseer, 2005. Google Scholar
  26. A Sakthi Nathiarasan and M Manikandan. Performance oriented mining of utility frequent itemsets. In International Conference on Circuits, Communication, Control and Computing, pages 317-321, 2014. Google Scholar
  27. Vid Podpecan, Nada Lavrac, and Igor Kononenko. A fast algorithm for mining utility-frequent itemsets. Constraint-Based Mining and Learning, page 9, 2007. Google Scholar
  28. R Uday Kiran, T Yashwanth Reddy, Philippe Fournier-Viger, Masashi Toyoda, P Krishna Reddy, and Masaru Kitsuregawa. Efficiently finding high utility-frequent itemsets using cutoff and suffix utility. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 191-203. Springer, 2019. Google Scholar
  29. Jilles Vreeken and Nikolaj Tatti. Interesting patterns. In Frequent Pattern Mining, pages 105-134. Springer, 2014. Google Scholar
  30. Tianyou Wei, Bin Wang, Yuntian Zhang, Keyong Hu, Yinfeng Yao, and Hao Liu. FCHUIM: Efficient frequent and closed high-utility itemsets mining. IEEE Access, pages 109928-109939, 2020. Google Scholar
  31. Ryan Williams, Carla Gomes, and Bart Selman. On the connections between backdoors, restarts, and heavy-tailedness in combinatorial search. structure, 23(4), 2003. Google Scholar
  32. Jieh-Shan Yeh, Yu-Chiang Li, and Chin-Chen Chang. Two-phase algorithms for a novel utility-frequent mining model. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 433-444, 2007. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail