Fishing Fort: A System for Graph Analytics with ML Prediction and Logic Deduction

Fan, Wenfei; Liu, Shuhao

doi:10.4230/OASIcs.Tannen.6

Abstract

This paper reports Fishing Fort, a graph analytic system developed in response to the following questions. What practical value can we get out of graph analytics? How can we effectively deduce the value from a real-life graph? Where can we get clean graphs to make accurate analyses possible? To answer these questions, Fishing Fort advocates to unify logic deduction and ML prediction by proposing Graph Association Rules (GARs), a class of logic rules in which ML models can be embedded as predicates. It employs GARs to deduce graph associations, enrich graphs and clean graphs. It has been deployed in production lines and proven effective in online recommendation, drug discovery, credit risk assessment, battery manufacturing and cybersecurity, among other things.

Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
BioGRID, 2024. URL: https://thebiogrid.org/.
Businesswire. Over 80 percent of companies rely on stale data for decision-making, 2022. https://www.businesswire.com/news/home/20220511005403/en/Over-80-Percent-of-Companies-Rely-on-Stale-Data-for-Decision-Making.
Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identifications. Comb., 12(4):389-410, 1992.
Comparative Toxicogenomics Database (CTD), 2024. URL: https://ctdbase.org/.
A Dairam, Edith M Antunes, KS Saravanan, and Santylal Daya. Non-steroidal anti-inflammatory agents, tolmetin and sulindac, inhibit liver tryptophan 2, 3-dioxygenase activity and alter brain neurotransmitter levels. Life sciences, 79(24):2269-2274, 2006.
Brian Dean. Social network usage and growth statistics. https://backlinko.com/social-media-users, 2023.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Exasol. Exasol research finds 58% of organizations make decisions based on outdated data, 2020. https://www.exasol.com/news-exasol-research-finds-organizations-make-decisions-based-on-outdated-data/.
Lihang Fan, Wenfei Fan, Ping Lu, Chao Tian, and Qiang Yin. Enriching recommendation models with logic conditions. Proc. ACM Manag. Data, 2024.
Wenfei Fan. Big graphs: Challenges and opportunities. PVLDB, 15(12):3782-3797, 2022.
Wenfei Fan, Wenzhi Fu, Ruochun Jin, Muyang Liu, Ping Lu, and Chao Tian. Making it tractable to catch duplicates and conflicts in graphs. Proc. ACM Manag. Data, 1(1):86:1-86:28, 2023.
Wenfei Fan, Wenzhi Fu, Ruochun Jin, Ping Lu, and Chao Tian. Discovering association rules from big graphs. PVLDB, 15(7):1479-1492, 2022.
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. on Database Systems, 33(1), 2008.
Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugey, and Wenyuan Yu. Linking entities across relations and graphs. In ICDE, pages 634-647. IEEE, 2022.
Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, Youyang Yao, Qiang Yin, Wenyuan Yu, Kai Zeng, Kun Zhao, Jingren Zhou, Diwen Zhu, and Rong Zhu. GraphScope: A unified engine for big graph processing. PVLDB, 14(12):2879-2892, 2021.
Wenfei Fan, Ruochun Jin, Muyang Liu, Ping Lu, Chao Tian, and Jingren Zhou. Capturing associations in graphs. PVLDB, 13(11):1863-1876, 2020.
Wenfei Fan, Ruochun Jin, Ping Lu, Chao Tian, and Ruiqi Xu. Towards event prediction in temporal graphs. PVLDB, 15(9):1861-1874, 2022.
Wenfei Fan, Muyang Liu, Shuhao Liu, and Chao Tian. Capturing more associations by referencing knowledge graphs. PVLDB, 2024.
Wenfei Fan and Ping Lu. Dependencies for graphs. ACM Trans. Database Syst., 44(2):5:1-5:40, 2019.
Wenfei Fan, Ping Lu, Chao Tian, and Jingren Zhou. Deducing certain fixes to graphs. PVLDB, 12(7):752-765, 2019.
Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. Association rules with graph patterns. PVLDB, 8(12):1502-1513, 2015.
Wenfei Fan, Yinghui Wu, and Jingbo Xu. Functional dependencies for graphs. In SIGMOD, pages 1843-1857. ACM, 2016.
Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. Parallelizing sequential graph computations. ACM Trans. Database Syst., 43(4):18:1-18:39, 2018.
Chris Fotis, Asier Antoranz, Dimitris Hatziavramidis, Theodore Sakellaropoulos, and Leonidas G. Alexopoulos. Pathway-based technologies for early drug discovery. Drug Discovery Today, 2017.
Martin Grohe. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In PODS, pages 1-16. ACM, 2020.
Yang Hu, Xiyuan Wang, Zhouchen Lin, Pan Li, and Muhan Zhang. Two-dimensional Weisfeiler-Lehman graph neural networks for link prediction. CoRR, abs/2206.09567, 2022.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In NeurIPS, 2017.
Clyde P. Kruskal, Larry Rudolph, and Marc Snir. A complexity theory of efficient parallel algorithms. Theor. Comput. Sci., 71(1):95-132, 1990.
Market Research Intellignece Lab. Battery formation and grading system market size, outlook: Share, growth, and forecast (2024-2031), 2024. https://www.linkedin.com/pulse/battery-formation-grading-system-market-yae8f/.
Ying Lai, Giorgio Fois, Jose R Flores, Michael J Tuvim, Qiangjun Zhou, Kailu Yang, Jeremy Leitz, John Peters, Yunxiang Zhang, Richard A Pfuetzner, Luis Esquivies, Philip Jones, Manfred Frick, Burton F. Dickey, and Axel T. Brunger. Inhibition of calcium-triggered secretion by hydrocarbon-stapled peptides. Nature, 603(7903):949-956, 2022.
Jeanne C Latourelle, Merete Dybdahl, Anita L Destefano, Richard H Myers, and Timothy L Lash. Risk of parkinson’s disease after tamoxifen treatment. BMC neurology, 10(1):1-7, 2010.
Bing Li, Wei Wang, Yifang Sun, Linhan Zhang, Muhammad Asif Ali, and Yi Wang. GraphER: Token-centric entity resolution with graph convolutional neural networks. In AAAI, pages 8172-8179, 2020.
Yu Li, Hiroyuki Kuwahara, Peng Yang, Le Song, and Xin Gao. PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks. biorxiv, page 532226, 2019.
Medical Subject Headings (MeSH), 2024. URL: https://www.nlm.nih.gov/mesh/.
R Sandyk and MA Gillman. Acute exacerbation of parkinson’s disease with sulindac. Annals of neurology, 17(1):104-105, 1985.
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The Semantic Web (ESWC), pages 593-607. Springer, 2018.
Feichen Shen and Yugyung Lee. Knowledge discovery from biomedical ontologies in cross domains. PloS one, 11(8):e0160005, 2016.
Kartik Shenoy, Filip Ilievski, Daniel Garijo, Daniel Schwabe, and Pedro A. Szekely. A study of the quality of Wikidata. J. Web Semant., 72:100679, 2022.
Juan Shu, Yu Li, Sheng Wang, Bowei Xi, and Jianzhu Ma. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics, 37(Supplement_1):i410-i417, 2021.
Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. SIGKDD Explor., 18(2):5-17, 2016.
Julie Smiley. Missing data and its impact on clinical research, 2016. https://blogs.oracle.com/health-sciences/post/missing-data-and-its-impact-on-clinical-research.
Bo-Tao Tan, Li Wang, Sen Li, Zai-Yun Long, Ya-Min Wu, and Yuan Liu. Retinoic acid induced the differentiation of neural stem cells from embryonic spinal cord into functional neurons in vitro. International journal of clinical and experimental pathology, 8(7), 2015.
Xiaochan Wang, Yuchong Gong, Jing Yi, and Wen Zhang. Predicting gene-disease associations from the heterogeneous network using graph embedding. In IEEE International conference on bioinformatics and biomedicine (BIBM), pages 504-511, 2019.
Antony J Williams, Lee Harland, Paul Groth, Stephen Pettifer, Christine Chichester, Egon L Willighagen, Chris T Evelo, Niklas Blomberg, Gerhard Ecker, Carole Goble, and Barend Mons. Open PHACTS: semantic interoperability for drug discovery. Drug discovery today, 17(21-22):1188-1198, 2012.
Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. GNNExplainer: Generating explanations for graph neural networks. In NeurIPS, pages 9240-9251, 2019.
Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. On explainability of graph neural networks via subgraph explorations. In ICML, pages 12241-12252. PMLR, 2021.
Yuebo Yuan, Xiangdong Kong, Jianfeng Hua, Yue Pan, Yukun Sun, Xuebing Han, Hongxin Yang, Yihui Li, Xiaoan Liu, Xiaoyi Zhou, Languang Lu, and Hewu Wang. Fast grading method based on data driven capacity prediction for high-efficient lithium-ion battery manufacturing. Journal of Energy Storage, 73:109143, 2023.
Reza Zafarani and Huan Liu. Users joining multiple sites: Friendship and popularity variations across sites. Inf. Fusion, 28:83-89, 2016.
Xiangxiang Zeng, Xinqi Tu, Yuansheng Liu, Xiangzheng Fu, and Yansen Su. Toward better drug discovery with knowledge graph. Current opinion in structural biology, 72:114-126, 2022.
Qianyi Zhan, Jiawei Zhang, Senzhang Wang, Philip S. Yu, and Junyuan Xie. Influence maximization across partially aligned heterogenous social networks. In PAKDD, pages 58-69, 2015.
Qinggang Zhang, Junnan Dong, Keyu Duan, Xiao Huang, Yezi Liu, and Linchuan Xu. Contrastive knowledge graph error detection. In CIKM, 2022.
Jie Zhao, Manish Kumar, Jeevan Sharma, and Zhihai Yuan. Arbutin effectively ameliorates the symptoms of parkinson’s disease: The role of adenosine receptors and cyclic adenosine monophosphate. Neural regeneration research, 16(10):2030, 2021.

Fishing Fort: A System for Graph Analytics with ML Prediction and Logic Deduction

Authors Wenfei Fan , Shuhao Liu

File

Document Identifiers

Author Details

Acknowledgements

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message