Fishing Fort: A System for Graph Analytics with ML Prediction and Logic Deduction

Authors Wenfei Fan , Shuhao Liu



PDF
Thumbnail PDF

File

OASIcs.Tannen.6.pdf
  • Filesize: 2.47 MB
  • 18 pages

Document Identifiers

Author Details

Wenfei Fan
  • Shenzhen Institute of Computing Sciences, China
  • University of Edinburgh, UK
  • Beihang University, Beijing, China
Shuhao Liu
  • Shenzhen Institute of Computing Sciences, China

Acknowledgements

The paper is a tribute to Professor Val Tannen. Val was a professor at UPenn when Fan was doing PhD there, and has been providing Fan with unfailing support ever since.

Cite AsGet BibTex

Wenfei Fan and Shuhao Liu. Fishing Fort: A System for Graph Analytics with ML Prediction and Logic Deduction. In The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen. Open Access Series in Informatics (OASIcs), Volume 119, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/OASIcs.Tannen.6

Abstract

This paper reports Fishing Fort, a graph analytic system developed in response to the following questions. What practical value can we get out of graph analytics? How can we effectively deduce the value from a real-life graph? Where can we get clean graphs to make accurate analyses possible? To answer these questions, Fishing Fort advocates to unify logic deduction and ML prediction by proposing Graph Association Rules (GARs), a class of logic rules in which ML models can be embedded as predicates. It employs GARs to deduce graph associations, enrich graphs and clean graphs. It has been deployed in production lines and proven effective in online recommendation, drug discovery, credit risk assessment, battery manufacturing and cybersecurity, among other things.

Subject Classification

ACM Subject Classification
  • Information systems → Data mining
  • Information systems → Data management systems
  • Information systems → Information integration
Keywords
  • graph analytics
  • data cleaning
  • association analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
  2. BioGRID, 2024. URL: https://thebiogrid.org/.
  3. Businesswire. Over 80 percent of companies rely on stale data for decision-making, 2022. https://www.businesswire.com/news/home/20220511005403/en/Over-80-Percent-of-Companies-Rely-on-Stale-Data-for-Decision-Making. Google Scholar
  4. Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identifications. Comb., 12(4):389-410, 1992. Google Scholar
  5. Comparative Toxicogenomics Database (CTD), 2024. URL: https://ctdbase.org/.
  6. A Dairam, Edith M Antunes, KS Saravanan, and Santylal Daya. Non-steroidal anti-inflammatory agents, tolmetin and sulindac, inhibit liver tryptophan 2, 3-dioxygenase activity and alter brain neurotransmitter levels. Life sciences, 79(24):2269-2274, 2006. Google Scholar
  7. Brian Dean. Social network usage and growth statistics. https://backlinko.com/social-media-users, 2023. Google Scholar
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Google Scholar
  9. Exasol. Exasol research finds 58% of organizations make decisions based on outdated data, 2020. https://www.exasol.com/news-exasol-research-finds-organizations-make-decisions-based-on-outdated-data/. Google Scholar
  10. Lihang Fan, Wenfei Fan, Ping Lu, Chao Tian, and Qiang Yin. Enriching recommendation models with logic conditions. Proc. ACM Manag. Data, 2024. Google Scholar
  11. Wenfei Fan. Big graphs: Challenges and opportunities. PVLDB, 15(12):3782-3797, 2022. Google Scholar
  12. Wenfei Fan, Wenzhi Fu, Ruochun Jin, Muyang Liu, Ping Lu, and Chao Tian. Making it tractable to catch duplicates and conflicts in graphs. Proc. ACM Manag. Data, 1(1):86:1-86:28, 2023. Google Scholar
  13. Wenfei Fan, Wenzhi Fu, Ruochun Jin, Ping Lu, and Chao Tian. Discovering association rules from big graphs. PVLDB, 15(7):1479-1492, 2022. Google Scholar
  14. Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. on Database Systems, 33(1), 2008. Google Scholar
  15. Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugey, and Wenyuan Yu. Linking entities across relations and graphs. In ICDE, pages 634-647. IEEE, 2022. Google Scholar
  16. Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, Youyang Yao, Qiang Yin, Wenyuan Yu, Kai Zeng, Kun Zhao, Jingren Zhou, Diwen Zhu, and Rong Zhu. GraphScope: A unified engine for big graph processing. PVLDB, 14(12):2879-2892, 2021. Google Scholar
  17. Wenfei Fan, Ruochun Jin, Muyang Liu, Ping Lu, Chao Tian, and Jingren Zhou. Capturing associations in graphs. PVLDB, 13(11):1863-1876, 2020. Google Scholar
  18. Wenfei Fan, Ruochun Jin, Ping Lu, Chao Tian, and Ruiqi Xu. Towards event prediction in temporal graphs. PVLDB, 15(9):1861-1874, 2022. Google Scholar
  19. Wenfei Fan, Muyang Liu, Shuhao Liu, and Chao Tian. Capturing more associations by referencing knowledge graphs. PVLDB, 2024. Google Scholar
  20. Wenfei Fan and Ping Lu. Dependencies for graphs. ACM Trans. Database Syst., 44(2):5:1-5:40, 2019. Google Scholar
  21. Wenfei Fan, Ping Lu, Chao Tian, and Jingren Zhou. Deducing certain fixes to graphs. PVLDB, 12(7):752-765, 2019. Google Scholar
  22. Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. Association rules with graph patterns. PVLDB, 8(12):1502-1513, 2015. Google Scholar
  23. Wenfei Fan, Yinghui Wu, and Jingbo Xu. Functional dependencies for graphs. In SIGMOD, pages 1843-1857. ACM, 2016. Google Scholar
  24. Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. Parallelizing sequential graph computations. ACM Trans. Database Syst., 43(4):18:1-18:39, 2018. Google Scholar
  25. Chris Fotis, Asier Antoranz, Dimitris Hatziavramidis, Theodore Sakellaropoulos, and Leonidas G. Alexopoulos. Pathway-based technologies for early drug discovery. Drug Discovery Today, 2017. Google Scholar
  26. Martin Grohe. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In PODS, pages 1-16. ACM, 2020. Google Scholar
  27. Yang Hu, Xiyuan Wang, Zhouchen Lin, Pan Li, and Muhan Zhang. Two-dimensional Weisfeiler-Lehman graph neural networks for link prediction. CoRR, abs/2206.09567, 2022. Google Scholar
  28. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In NeurIPS, 2017. Google Scholar
  29. Clyde P. Kruskal, Larry Rudolph, and Marc Snir. A complexity theory of efficient parallel algorithms. Theor. Comput. Sci., 71(1):95-132, 1990. Google Scholar
  30. Market Research Intellignece Lab. Battery formation and grading system market size, outlook: Share, growth, and forecast (2024-2031), 2024. https://www.linkedin.com/pulse/battery-formation-grading-system-market-yae8f/. Google Scholar
  31. Ying Lai, Giorgio Fois, Jose R Flores, Michael J Tuvim, Qiangjun Zhou, Kailu Yang, Jeremy Leitz, John Peters, Yunxiang Zhang, Richard A Pfuetzner, Luis Esquivies, Philip Jones, Manfred Frick, Burton F. Dickey, and Axel T. Brunger. Inhibition of calcium-triggered secretion by hydrocarbon-stapled peptides. Nature, 603(7903):949-956, 2022. Google Scholar
  32. Jeanne C Latourelle, Merete Dybdahl, Anita L Destefano, Richard H Myers, and Timothy L Lash. Risk of parkinson’s disease after tamoxifen treatment. BMC neurology, 10(1):1-7, 2010. Google Scholar
  33. Bing Li, Wei Wang, Yifang Sun, Linhan Zhang, Muhammad Asif Ali, and Yi Wang. GraphER: Token-centric entity resolution with graph convolutional neural networks. In AAAI, pages 8172-8179, 2020. Google Scholar
  34. Yu Li, Hiroyuki Kuwahara, Peng Yang, Le Song, and Xin Gao. PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks. biorxiv, page 532226, 2019. Google Scholar
  35. Medical Subject Headings (MeSH), 2024. URL: https://www.nlm.nih.gov/mesh/.
  36. R Sandyk and MA Gillman. Acute exacerbation of parkinson’s disease with sulindac. Annals of neurology, 17(1):104-105, 1985. Google Scholar
  37. Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The Semantic Web (ESWC), pages 593-607. Springer, 2018. Google Scholar
  38. Feichen Shen and Yugyung Lee. Knowledge discovery from biomedical ontologies in cross domains. PloS one, 11(8):e0160005, 2016. Google Scholar
  39. Kartik Shenoy, Filip Ilievski, Daniel Garijo, Daniel Schwabe, and Pedro A. Szekely. A study of the quality of Wikidata. J. Web Semant., 72:100679, 2022. Google Scholar
  40. Juan Shu, Yu Li, Sheng Wang, Bowei Xi, and Jianzhu Ma. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics, 37(Supplement_1):i410-i417, 2021. Google Scholar
  41. Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. SIGKDD Explor., 18(2):5-17, 2016. Google Scholar
  42. Julie Smiley. Missing data and its impact on clinical research, 2016. https://blogs.oracle.com/health-sciences/post/missing-data-and-its-impact-on-clinical-research. Google Scholar
  43. Bo-Tao Tan, Li Wang, Sen Li, Zai-Yun Long, Ya-Min Wu, and Yuan Liu. Retinoic acid induced the differentiation of neural stem cells from embryonic spinal cord into functional neurons in vitro. International journal of clinical and experimental pathology, 8(7), 2015. Google Scholar
  44. Xiaochan Wang, Yuchong Gong, Jing Yi, and Wen Zhang. Predicting gene-disease associations from the heterogeneous network using graph embedding. In IEEE International conference on bioinformatics and biomedicine (BIBM), pages 504-511, 2019. Google Scholar
  45. Antony J Williams, Lee Harland, Paul Groth, Stephen Pettifer, Christine Chichester, Egon L Willighagen, Chris T Evelo, Niklas Blomberg, Gerhard Ecker, Carole Goble, and Barend Mons. Open PHACTS: semantic interoperability for drug discovery. Drug discovery today, 17(21-22):1188-1198, 2012. Google Scholar
  46. Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. GNNExplainer: Generating explanations for graph neural networks. In NeurIPS, pages 9240-9251, 2019. Google Scholar
  47. Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. On explainability of graph neural networks via subgraph explorations. In ICML, pages 12241-12252. PMLR, 2021. Google Scholar
  48. Yuebo Yuan, Xiangdong Kong, Jianfeng Hua, Yue Pan, Yukun Sun, Xuebing Han, Hongxin Yang, Yihui Li, Xiaoan Liu, Xiaoyi Zhou, Languang Lu, and Hewu Wang. Fast grading method based on data driven capacity prediction for high-efficient lithium-ion battery manufacturing. Journal of Energy Storage, 73:109143, 2023. Google Scholar
  49. Reza Zafarani and Huan Liu. Users joining multiple sites: Friendship and popularity variations across sites. Inf. Fusion, 28:83-89, 2016. Google Scholar
  50. Xiangxiang Zeng, Xinqi Tu, Yuansheng Liu, Xiangzheng Fu, and Yansen Su. Toward better drug discovery with knowledge graph. Current opinion in structural biology, 72:114-126, 2022. Google Scholar
  51. Qianyi Zhan, Jiawei Zhang, Senzhang Wang, Philip S. Yu, and Junyuan Xie. Influence maximization across partially aligned heterogenous social networks. In PAKDD, pages 58-69, 2015. Google Scholar
  52. Qinggang Zhang, Junnan Dong, Keyu Duan, Xiao Huang, Yezi Liu, and Linchuan Xu. Contrastive knowledge graph error detection. In CIKM, 2022. Google Scholar
  53. Jie Zhao, Manish Kumar, Jeevan Sharma, and Zhihai Yuan. Arbutin effectively ameliorates the symptoms of parkinson’s disease: The role of adenosine receptors and cyclic adenosine monophosphate. Neural regeneration research, 16(10):2030, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail