On the Enumeration of Frequent High Utility Itemsets: A Symbolic AI Approach

Hidouri, Amel; Jabbour, Said; Raddaoui, Badran

doi:10.4230/LIPIcs.CP.2022.27

File

Subject Classification

ACM Subject Classification

Computing methodologies → Artificial intelligence
Information systems → Data mining

Keywords

Data Mining
High Utility Itemsets
Propositional Satisfiability

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

Mining interesting patterns from data is a core part of the data mining world. High utility mining, an active research topic in data mining, aims to discover valuable itemsets with high profit (e.g., cost, risk). However, the measure of interest of an itemset must primarily reflect not only the importance of items in terms of profit, but also their occurrence in data in order to make more crucial decisions. Some proposals are then introduced to deal with the problem of computing high utility itemsets that meet a minimum support threshold. However, in these existing proposals, all transactions in which the itemset appears are taken into account, including those in which the itemset has a low profit. So, no additional information about the overall utility of the itemset is taken into account. This paper addresses this issue by introducing a SAT-based model to efficiently find the set of all frequent high utility itemsets with the use of a minimum utility threshold applied to each transaction in which the itemset appears. More specifically, we reduce the problem of mining frequent high utility itemsets to the one of enumerating the models of a formula in propositional logic, and then we use state-of-the-art SAT solvers to solve it. Afterwards, to make our approach more efficient, we provide a decomposition technique that is particularly suitable for deriving smaller and independent sub-problems easy to resolve. Finally, an extensive experimental evaluation on various popular datasets shows that our method is fast and scale well compared to the state-of-the art algorithms.

Cite As Get BibTex

Amel Hidouri, Said Jabbour, and Badran Raddaoui. On the Enumeration of Frequent High Utility Itemsets: A Symbolic AI Approach. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 235, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.CP.2022.27

Author Details

Amel Hidouri

CRIL - CNRS UMR 8188, University of Artois, France
LARODEC, University of Tunis, Tunisia

Said Jabbour

CRIL - CNRS UMR 8188, University of Artois, France

Badran Raddaoui

SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris, France

References

Fadi A Aloul, Arathi Ramani, Igor Markov, and Karem Sakallah. Pbs: a backtrack-search pseudo-boolean solver and optimizer. In International Symposium on Theory and Applications of Satisfiability, pages 346-353, 2002.
Mohamed-Bachir Belaid, Christian Bessiere, and Nadjib Lazaar. Constraint programming for mining borders of frequent itemsets. In IJCAI, pages 1064-1070, 2019.
Abdelhamid Boudane, Saïd Jabbour, Badran Raddaoui, and Lakhdar Sais. Efficient sat-based encodings of conditional cardinality constraints. In LPAR, pages 181-195, 2018.
Abdelhamid Boudane, Saïd Jabbour, Lakhdar Sais, and Yakoub Salhi. SAT-based data mining. Int. J. Artif. Intell. Tools, pages 1840002:1-1840002:24, 2018.
D. Chai and A. Kuehlmann. A fast pseudo-boolean constraint solver. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 305-317, 2005.
W. Cook, C.R. Coullard, and Gy. Turán. On the complexity of cutting-plane proofs. Discrete Applied Mathematics, pages 25-38, 1987.
Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem-proving. Commun. ACM, pages 394-397, 1962.
Heidi E. Dixon and Matthew L. Ginsberg. Inference methods for a pseudo-boolean satisfiability solver. In AAAI, pages 635-640, 2002.
Imen Ouled Dlala, Saïd Jabbour, Badran Raddaoui, and Lakhdar Sais. A parallel SAT-based framework for closed frequent itemsets mining. In CP, pages 570-587, 2018.
Niklas Eén and Niklas Sörensson. An extensible SAT-solver. In SAT, pages 502-518, 2004.
Niklas Eén and Niklas Sörensson. Translating pseudo-boolean constraints into sat. J. Satisf. Boolean Model. Comput., pages 1-26, 2006.
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, and Hoang Thanh Lam. The spmf open-source data mining library version 2. In Joint European conference on machine learning and knowledge discovery in databases, pages 36-40, 2016.
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S. Tseng, and Philip S. Yu. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering, pages 1306-1327, 2021.
Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, and Luc De Raedt. Miningzinc: A declarative framework for constraint-based mining. Artif. Intell., pages 6-29, 2017.
Amel Hidouri, Said Jabbour, Badran Raddaoui, and Boutheina Ben Yaghlane. Mining closed high utility itemsets based on propositional satisfiability. DKE, page 101927, 2021.
Saïd Jabbour, Fatima Ezzahra Mana, Imen Ouled Dlala, Badran Raddaoui, and Lakhdar Sais. On maximal frequent itemsets mining with constraints. In CP, pages 554-569, 2018.
Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. Triangle-driven community detection in large graphs using propositional satisfiability. In AINA, pages 437-444, 2018.
Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. Sat-based models for overlapping community detection in networks. Computing, 102(5):1275-1299, 2020.
Saïd Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais. A declarative framework for maximal k-plex enumeration problems. In AAMAS, pages 660-668, 2022.
Saïd Jabbour, Lakhdar Sais, and Yakoub Salhi. Mining Top-k motifs with a SAT-based framework. Artif. Intell., pages 30-47, 2017.
Daniel Le Berre and Anne Parrain. The SAT4J library, Release 2.2, System Description. Journal on Satisfiability, Boolean Modeling and Computation, pages 59-64, 2010.
Ying Liu, Wei-keng Liao, and Alok Choudhary. A two-phase algorithm for fast discovery of high utility itemsets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 689-695, 2005.
Vasco Manquinho and J. Marques-Silva. On using cutting planes in pseudo-boolean optimization. Journal on Satisfiability, Boolean Modeling and Computation, 2006.
João P. Marques-Silva and Karem A. Sakallah. Grasp: A search algorithm for propositional satisfiability. IEEE Trans. Comput., pages 506-521, 1999.
A Morgado and J Marques-Silva. Algorithms for propositional model enumeration and counting. Technical report, Citeseer, 2005.
A Sakthi Nathiarasan and M Manikandan. Performance oriented mining of utility frequent itemsets. In International Conference on Circuits, Communication, Control and Computing, pages 317-321, 2014.
Vid Podpecan, Nada Lavrac, and Igor Kononenko. A fast algorithm for mining utility-frequent itemsets. Constraint-Based Mining and Learning, page 9, 2007.
R Uday Kiran, T Yashwanth Reddy, Philippe Fournier-Viger, Masashi Toyoda, P Krishna Reddy, and Masaru Kitsuregawa. Efficiently finding high utility-frequent itemsets using cutoff and suffix utility. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 191-203. Springer, 2019.
Jilles Vreeken and Nikolaj Tatti. Interesting patterns. In Frequent Pattern Mining, pages 105-134. Springer, 2014.
Tianyou Wei, Bin Wang, Yuntian Zhang, Keyong Hu, Yinfeng Yao, and Hao Liu. FCHUIM: Efficient frequent and closed high-utility itemsets mining. IEEE Access, pages 109928-109939, 2020.
Ryan Williams, Carla Gomes, and Bart Selman. On the connections between backdoors, restarts, and heavy-tailedness in combinatorial search. structure, 23(4), 2003.
Jieh-Shan Yeh, Yu-Chiang Li, and Chin-Chen Chang. Two-phase algorithms for a novel utility-frequent mining model. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 433-444, 2007.

On the Enumeration of Frequent High Utility Itemsets: A Symbolic AI Approach

Authors Amel Hidouri, Said Jabbour, Badran Raddaoui

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message